More than a decade ago, we have launched an in-house project on the generation of a virtual compound database based on the experimentally validated synthetic accessibility (the so-called REAL Database, where REAL is for REadily AccessibLe). Further extension of this concept led to the development of REAL Space, a searchable chemical space that is not typically stored as an enumerated database but generated upon query through chemoinformatics software. Until now the largest REAL Space has comprised 15.5 billion make-on-demand molecules and is currently the largest offer of commercially available compounds. Recently, utility of the REAL methodology was confirmed by discovery of highly potent AmpC β-lactamase inhibitors, D4 dopamine receptor ligands, and Kelch-like ECH-associated protein 1 (KEAP1) inhibitors published in Nature (see Nature 2019, 566, 224–229 and Nature 2020, 580, 663–668).
In a new paper just published in iScience, a team of Enamine, ChemSpace, UORSY, and Kyiv University scientists describes the principles behind the generation of multi-billion REAL Space. A nearly 30-billion chemical space is easily obtained using only two synthesis approaches via the three-component reaction sequences. In spite of its immensely expanded size REAL Space features the same trusted compounds that can be delivered within only 3-4 weeks with at least ~80% synthesis success rate. Newly elaborated REAL Space contains significant fractions of either drug-like or “beyond rule-of-five” molecules, whereas the strictest Churcher’s lead-likeness criteria are still met for many of the compounds (exceeding 22 Millions). Huge and rapidly expandable size of REAL Space, diversity of its compounds along with convenience and ease with which they can be accessed make it resemble DNA-encoded libraries covering enormous chemical space and wide physico-chemical profiles of the compounds.
REAL Space described in the paper has not been officially released by Enamine yet but is available
See the full paper for more details.