REAL Database Subsets
Practical way to start exploring REAL Database
REAL Diversity Set
Virtual screening of the ultra-large databases can be performed iteratively, starting with a small subset. Such a diverse subset can provide essential data to teach AI-based algorithms or already result in promising hits. REAL Diversity Set has 43.8 million compounds identified using the MaxMin algorithm in the entire REAL Database. The compounds have no analogs having a Tanimoto similarity of more than 0.65 (Morgan 2 fingerprint, 512 bit) within the set and within the entire Enamine stock screening compound collection. REAL Diversity Set compounds comply with the Ro5 and Veber criteria: MW≤500, SlogP≤5, HBA≤10, HBD≤5, rotatable bonds≤10, and TPSA≤140 and lack PAINS and toxicophores.
REAL lead-like compounds
The lead-like subset of REAL Database has been obtained by filtration using the following molecular criteria: MW≤460, -4≤SlogP≤4.2, HBA≤9, HBD≤5, rings≤4, rotatable bonds≤10. Within the set, we have charted a “350/3” subset with compounds with the most stringent physicochemical profiles to have high potency for optimization: 270≤MW≤350, 14≤heavy atoms≤26, SlogP≤3, and aryl rings≤2. PAINS and toxic compounds have been removed.
REAL fragments
Enamine has a large fragment collection in stock. REAL Database expands this fragment space allowing you to find novel compounds to grow and optimize found hits. We have selected REAL fragments by applying the Ro3 criteria (MW<300, SlogP≤3, HBA≤3, HBD≤3, rotatable bonds≤3, and TPSA≤60) to the entire REAL collection. We have also extracted a single pharmacophore subset that complies with even more stringent molecular selection criteria: 140≤MW≤230, 0≤SlogP≤2, 10≤heavy atoms≤16, rotatable bonds≤3, and chiral centers≤1. PAINS and toxic compounds have been removed.
REAL compounds by chemical classes
Prefiltering REAL Database by distinct structural motives that pop up frequently in virtual screening significantly reduces computational time. We have created a number of REAL Database subsets based on the presence of specific chemical moieties/pharmacophores in compound structures. PAINS and toxic compounds have been removed.
- REAL amino acids, 4.8M cpds, CXSMILES
- REAL carboxylic acids, MW≤400, clogP≤3, 57.7M cpds, CXSMILES
- REAL lead-like aliphatic carboxylic acids, 41.7M cpds, CXSMILES
- REAL lead-like aromatic carboxylic acids, 14M cpds, CXSMILES
- REAL lead-like aliphatic primary amines, 49.9M cpds, CXSMILES
- REAL lead-like aromatic primary amines, 220.8M cpds, CXSMILES
- REAL secondary amines, 8-21 heavy atoms, 59.4M cpds, CXSMILES
- REAL hydroxamates, 282K cpds, CXSMILES
- REAL Terminal Acetylenes, 151.3M cpds, CXSMILES
REAL natural product-like compounds
We have utilized the approach published by P. Ertl, et al. to predict the natural product-likeness of the REAL compounds. The REAL natural product-like compounds comprise drug-like molecules with positive natural product-likeness scores.