Chemography: Searching for Hidden Treasures

J. Chem. Inf. Model. 2021, 61 (1), 179-188

DOI: 10.1021/acs.jcim.0c00936

Zabolotna Y.; Lin A.; Horvath D.; Marcou G.; Volochnyuk D.; Varnek A.

The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential “blockbuster drugs” are well hidden and yet only a few mouse clicks away. To reach these “hidden treasures”, we adapted the generative topographic mapping method to enable efficient navigation through the chemical space, from a global overview to a structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure–activity bases, are novel paths to explore in medicinal chemistry. The complete list of these chemotypes can be downloaded using the link

