Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17
Abstract

Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.
Introduction
Results and Discussion
Enumeration
Figure 1

Figure 1. Enumeration of GDB-17 starting from mathematical graphs.
filter | description | comment |
---|---|---|
H1 | SAV (″smallest atomic volume″): all graphs ≤ C11 are converted to a 3D-structure, and the volume of the tetrahedron around each C atom is checked for a minimum value. | 95.2% of graphs ≤ C11 are discarded due to failed 3D-conversion (using CORINA or ChemAxon molconverter) or due to distorted (planar, pyramidal) centers. Simple polyhedra such as cubane are preserved. See the SI of ref 22 for details. |
H2 | NA2SR (″no atom shared by two small rings″): removes graphs ≥ C12 containing fused or spiro linkages between 3- or 4-membered rings. | The vast majority of fused small ring systems are highly strained and reactive. 96.7% of C12- and 97.3% of C13-graphs are removed. See also filter H3. |
H3 | NBH3R (″no bridgehead in 3 rings″): Graphs ≥ C14 with three or more ″nonzero″ bridgehead atoms shared by three or more rings are removed. | These multilooped topologies would correspond to molecules of high synthetic complexity. 99.60% of C14-, 99.77% of C15-, 99.95% of C16-, and 99.95% of C17-graphs are removed by filters H2+H3. |
H4 | 1SR (″one small ring″): C15 and C16 graphs are allowed at most one 3- or 4-membered ring. | 71.89% of the C15- and 79.50% the C16-graphs that passed H2+H3 are removed. |
H5 | 0SR (″no small ring″): C17-graphs are not allowed any small rings. | 96.95% C17-graphs that passed H2+H3 are removed. |
filter | description | comment |
---|---|---|
S1 | no allenes (C═C═C) | Although known and sometime found in bioactive molecules, allenes are usually reactive and quite difficult to prepare but combinatorially extremely frequent. |
S2 | no unsaturations in 3-membered rings | Cyclopropenes are known but quite reactive and difficult to prepare. Cyclopropynes are unstable. |
S3 | at most one sp2-center in 4-membered rings | Cyclobutenes and cyclobutynes are not enumerated, but the skeletons leading to β-lactams and β-lactones are generated. |
S4 | triple bonds restrictions | No triple bond in 3- or 4-membered rings, max. One triple bond in rings ≥9 and max two triple bonds in ≥11 rings. Only terminal triple bonds for C17-hydrocarbons (allowing to generate nitriles). |
S5 | bridgehead double bond restrictions | If a “non-zero” bridgehead carbon is sp2, the ring sizes of the smallest set of smallest rings will be checked. At least one ring of this carbon must be ≥8. In case of two such bridgeheads are sp2, the ring size must be ≥10. |
filter | description | comment |
---|---|---|
F1 | XCX: only one N or O next to a sp3 carbon or two oxygens if both oxygens are ring atoms. | Aminals, hemiacetals gem-diols, and acyclic acetals are not enumerated. Only cyclic acetals are allowed. |
F2 | Maximum one N or O in small rings. | Allows epoxides, oxiranes, aziridines, azetidines but no cyclic acetals inside 4-membered rings. |
F3 | Anhydrides (O═C)–O–(C═O) are removed. | Most anhydrides are unstable toward hydrolysis. |
F4 | Acetal chains O–Csp3–O–Csp3–O are removed. | Although sometimes found, acetal chains are diffult to plan synthetically. |
F5 | Molecules with a primary amine and a ketone or aldehyde are removed. | This combination often polymerizes. |
F6 | C═N are removed unless the sp2 carbon is connected to a further N or O atom. | Removes imines which are unstable but retains amidines and guanidines. |
F7 | (N/O)–C═N–C═(N/O) are removed. | The corresponding (N/O)═C–N–C═(N/O) tautomer is allowed. |
F8 | Enol/enamine: removes O or N atoms adjacent to a nonaromatic C═C. | Enols, enamines, enol ethers, etc. are almost always unstable toward hydrolysis to the parent carbonyl compound. |
F9 | Acyclic carbonates C–O–(C═O)–O–C are removed. | Acyclic carbonates are rather unstable toward hydrolysis. |
F10 | Carbonic acids (O–CO2H), carbamic acids (N–CO2H), and β-carboxylic acid ((C═O)–C–CO2H) are removed. | These FG decarboxylate spontaneously. |
F11 | Bridgehead amides: If a “non-zero” bridgehead nitrogen is bound to a nonaromatic sp2 atom, the ring sizes of the smallest set of smallest rings will be checked. At least one ring of the nitrogen must be ≥9. | Such amides are ″twisted″ and nonconjugated and therefore quite unstable toward hydrolysis. |
F12 | C═C: Molecules of 17 atoms with nonaromatic carbon–carbon unsaturations are removed. | Nonaromatic C═C are highly frequent but often reactive toward polymerization, cycloadditions, isomerizations, oxidation, or nucleophilic addition. |
No heteroatom–heteroatom bonds are generated at all.
step | description | comment |
---|---|---|
P1 | Aromatic C to N: aromatic C atoms adjacent to an aromatic N or O atom are converted to N if valency allows. | Aromatic heterocycles with heteroatom–heteroatom bonds are created e.g. 1,2-oxazoles from furans, 1,2,3-triazoles from imidazoles. |
P2a | Ketone oximes C═N–OH: ketones are converted to oximes. | Note that alkylated oximes, hydroxamates, hydrazides, and hydrazone are not considered. |
P3 | Aromatic halogens: aromatic OH groups are changed to halogens. | Halogen = F, Cl, Br, I, max. Two Br or I per aromatic ring. |
P4 | Trifluoromethyls: tert-butyl groups are changed to CF3. | |
P5 | Aromatic nitro groups: aromatic CO2H are converted to NO2. | Aliphatic nitro groups are not considered. |
P6 | Thiophenes: sulfur was substituted for all heteroaromatic oxygen atoms. | Aliphatic thiols and thioethers are not considered. |
P7a | Sulfones: carbonyl groups (C═O) in ketones, acids, carboxamides, and carbamates are changed to SO2. | Note that C═S and sulfoxides (S═O) are not generated. |
Steps P2 and P7 increase the heavy atom count (hac), generating for example some 17 atoms molecules with small rings. All molecules with hac >17 were removed to avoid a combinatorial explosion.
HAC | filtersa | graphsb | hydrocarbonsc | skeletonsd | moleculese | CPU, hf |
---|---|---|---|---|---|---|
1 | SAV, FG | 1 | 1 | 1 | 3 | 0 |
2 | 1 | 1 | 3 | 6 | 0 | |
3 | 2 | 2 | 4 | 14 | 0 | |
4 | 6 | 4 | 12 | 47 | 0 | |
5 | 20 | 10 | 32 | 219 | 0 | |
6 | 74 | 31 | 119 | 1,091 | 0 | |
7 | 321 | 98 | 448 | 6,029 | 0 | |
8 | 1,663 | 370 | 2,004 | 37,435 | 0 | |
9 | 9,616 | 1,448 | 9,472 | 243,233 | 0 | |
10 | 61,840 | 6,325 | 48,721 | 1,670,163 | 0 | |
11 | 427,135 | 29,496 | 264,321 | 12,219,460 | 3 | |
12 | NA2SR | 3,120,002 | 104,165 | 1,188,127 | 72,051,665 | 18 |
13 | 23,722,244 | 651,850 | 7,370,864 | 836,687,200 | 206 | |
14 | NBH3R | 186,092,397 | 752,277 | 27,419,837 | 2,921,398,415 | 856 |
15 | 1SR | 1,496,007,875 | 960,415 | 118,977,963 | 15,084,103,347 | 5,378 |
16 | 12,176,341,897 | 1,331,875 | 213,259,331 | 38,033,661,355 | 14,415 | |
17 | 0SR, C═C | 100,418,784,003 | 1,583,786 | 962,417,271 | 109,481,780,580 | 79,259 |
SUM | 114,304,569,097 | 5,422,154 | 1,330,958,530 | 166,443,860,262 | 100,134 |
Graphs produced by GENG for planar, connected graphs up to 17 nodes with maximum node valence of four.
Hydrocarbons generated from graphs and passing the filters in Table 1 for limited ring strain and complexity.
Unsaturated hydrocarbons generated from hydrocarbons using filters in Table 2.
Molecules generated from hydrocarbons by adding heteroatoms (Table 3 and 4), as 2D-structures and stored as SMILES.
Computation was parallelized on 360 CPU.
Comparing GDB-17 with PubChem, ChEMBL, and DrugBank
Figure 2

Figure 2. Size and MW profiles of the enumerated chemical space in GDB and the reference databases PubChem, ChEMBL, and DrugBank. The size of the leadlike subsets of GDB (GDBLL, GDBLLnoSR) is extrapolated from analyzing a 1% random subset of GDB-17.
drug namea | elemental formula | no. of isomer |
---|---|---|
Acyclovir | C8H11N5O3 | 8,132,952 |
Aminoglutethimide | C13H16N2O2 | 183,901,628 |
Aminophenazone | C13H17N3O | 97,853,936 |
Dexmedetomidine | C13H16N2 | 9,721,191 |
Diethylcarbamazine | C10H21N3O | 22,409 |
Ethoxzolamide | C9H10N2S2O3 | 4,563,491 |
Felbamate | C11H14N2O4 | 369,751,288 |
Fencamfamine | C15H21N | 53,917,207 |
Guanadrel | C10H19N3O2 | 60,319,220 |
Procaine | C13H20N2O2 | 476,975,898 |
Sulfadiazine | C10H10N4SO2 | 17,003,297 |
Tinidazole | C8H13N3SO4 | 24,575,941 |
Tizanidine | C9H8N5SCl | 109,635 |
Trioxsalen | C14H12O3 | 1,800,849 |
Varenicline | C13H13N3 | 19,676,640 |
See Figure 3 for structural formula of the drugs and examples of isomers.
Figure 3

Figure 3. Drugs and examples of isomers found in GDB-17. All isomers shown have a shape similarity score ROCS > 1.4. None of the isomers shown are known (Scifinder search). Only acyclovir does not occur in GDB-17 because it contains a hemiaminal (N–Csp3–O), a functional group which is excluded from the enumeration.
Figure 4

Figure 4. Molecule topologies and categories in GDB-17 and reference databases. A. Percentage of reference database compatible with GDB-17 enumeration rules or excluded due to nonenumerated halogen (acyl halide, aliphatic halocarbons) or sulfur (thiols, thioethers), functional groups (acyclic acetals, hemiacetals, aminals, azides, aliphatic nitro groups), element (P, Si, B, Bi, Hg, etc.), skeleton (nonaromatic C═C), or graph (e.g., small rings at 17 atoms). B. Fraction of compounds with small rings. C. Topologies D. Database contents as function of molecular categories. Molecules are assigned to one category only with priority order heteroaromatic > aromatic > heterocyclic > carbocyclic > acyclic. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.
Small Rings, Topology, and Compound Categories
Polarity and Leadlikeness
Figure 5

Figure 5. Polarity features. A. c logP histogram in intervals −5.5 to −4.5, −4.5 to −3.5, etc; B. Average clogP as function of hac; C. H-bond donor atom (HBD) histogram; D. Average HBD as function of hac. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.
Molecular Shape
Figure 6

Figure 6. Molecular shape analyzed by the principal moments of inertia. (41) Occupancy maps are shown in the (P1,P2)-plane, in which P1 and P2 are the normalized ratios of the principal moments of inertia (for details see section Methods), and are colored from blue (1 cpd/pixel) to purple (maximum cpd/pixel for each map: GDB-17: 4,691, GDBLL-17: 889, GDBLLnoSR-17: 684, Pubchem-17: 6202, Chembl-17: 487, Drugbank-17: 4). The inserts show an enlarged view of the lower left edge of each triangle where occupancy is highest for PubChem-17, ChEMBL-17, and DrugBank-17. The GDB-17, GDBLL-17, and GDBLLnoSR-17 were analyzed with a random subset of 16.7 million molecules from GDB-17. For all compounds a single stereoisomer was analyzed as generated by CORINA.
Figure 7

Figure 7. Histograms of quaternary centers (qv) and bonds in fused rings (bfr) in the different databases. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.
Stereochemistry
Figure 8

Figure 8. Stereochemistry. A Numbers of stereoisomers per compounds. B. Average number of stereoisomer per compound as a function of hac. Stereoisomers were generated from SMILES using CORINA. The data for GDB-17, GDBLL-17, and GDBLLnoSR-17 stem from the analysis of a random 16.7 million subset of GDB-17.
Novelty and Scaffolds
Murcko scaffolds | |||||
---|---|---|---|---|---|
no. of quat. C | 0 | 1 | 2 | >2 | SUM |
GDB-17, SR | 19,804 | 56,975 | 65,536 | 42,895 | 185,210 |
GDB-17, 5–7 | 1,736 | 2,561 | 1,195 | 217 | 5,709 |
GDB-17, 8+ | 1,113 | 466 | 44 | 0 | 1,623 |
SUM | 22,653 | 60,002 | 66,775 | 43,112 | 192,542 |
PubChem-17, SR | 1,997 | 1,114 | 405 | 56 | 3,572 |
PubChem-17, 5–7 | 960 | 562 | 121 | 3 | 1,646 |
PubChem-17, 8+ | 307 | 41 | 8 | 0 | 356 |
SUM | 3,264 | 1,717 | 534 | 59 | 5,574 |
Ring Systems | |||||
---|---|---|---|---|---|
no. of quat. C | 0 | 1 | 2 | >2 | SUM |
GDB-17, SR | 12,607 | 45,419 | 60,720 | 42,293 | 161,039 |
GDB-17, 5–7 | 1,135 | 2,143 | 1,126 | 217 | 4,621 |
GDB-17, 8+ | 978 | 426 | 44 | 0 | 1,448 |
SUM | 14,720 | 47,988 | 61,890 | 42,510 | 167,108 |
PubChem-17, SR | 600 | 521 | 314 | 45 | 1,480 |
PubChem-17, 5–7 | 480 | 375 | 111 | 3 | 969 |
PubChem-17, 8+ | 254 | 36 | 8 | 0 | 298 |
SUM | 1,334 | 932 | 433 | 48 | 2,747 |
Murcko scaffolds are hydrocarbon graphs without any terminal atoms and ring systems are hydrocarbon graphs without any acyclic bonds. Scaffolds and ring systems are divided into three categories: SR: at least one small (3- or 4-membered) ring; 5–7: containing only 5- to 7-membered rings; 8+: no small ring and at least one 8-membered or larger ring. Rings are analyzed in the smallest set of smallest rings i.e.. bicyclo[2.2.1]heptane (norbornane) contains two 5-membered rings, while its 6-membered ring is not considered.
Figure 9

Figure 9. Examples of yet unknown C17-ring systems from GDB-17. These hydrocarbons do not give any hits in Scifinder using ″any atom″ types for carbons and ″any bond″ for bonds, including substructure searches but locking further ring fusions. Stereochemistry is not considered in these searches. The ring systems are shown as one possible stereoisomer.
Conclusion
Methods
General
Enumeration
Graphs
Hydrocarbons
Skeletons
CNO Molecules
Postprocessing for Oximes, Nitro, CF3, Halogens, and Sulfur
Shape Analysis




Stereoisomer Counting
Distribution
Acknowledgment
This work was supported financially by the University of Berne, the Swiss National Science Foundation, the NCCR TransCure, and the NCCR Chemical Biology.
References
This article references 47 other publications.
- 1Lipkus, A. H.; Yuan, Q.; Lucas, K. A.; Funk, S. A.; Bartelt, W. F.; Schenck, R. J.; Trippe, A. J. Structural diversity of organic chemistry. A scaffold analysis of the CAS Registry J. Org. Chem. 2008, 73, 4443– 4451[ACS Full Text
], [CAS], Google Scholar
1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXmtlylu7c%253D&md5=c03682d3a3fab0b0eec7a36ef1ecc778Structural Diversity of Organic Chemistry. A Scaffold Analysis of the CAS RegistryLipkus, Alan H.; Yuan, Qiong; Lucas, Karen A.; Funk, Susan A.; Bartelt, William F., III; Schenck, Roger J.; Trippe, Anthony J.Journal of Organic Chemistry (2008), 73 (12), 4443-4451CODEN: JOCEAH; ISSN:0022-3263. (American Chemical Society)The anal. of chem. diversity has become a topic of considerable interest in recent years. This interest has been stimulated by the challenge of discovering novel small-mol. pharmaceuticals. The development of technologies such as combinatorial synthesis and high-throughput screening has made it possible to explore drug-like regions of chem. space in relatively short times. Chem. space is vast and the problem of selecting which region of that space to explore remains a key issue in drug discovery. By analyzing the scaffold content of the CAS Registry, the authors attempt to characterize in a comprehensive way the structural diversity of org. chem. The scaffold of a mol. is taken to be its framework, defined as all its ring systems and all the linkers that connect them. Framework data from more than 24 million org. compds. are analyzed. The distribution of frameworks among compds. is found to be top-heavy, i.e., a small percentage of frameworks occurs in a large percentage of compds. When frameworks are analyzed at the graph level, an even more top-heavy distribution is found: half of the compds. can be described by only 143 framework shapes. The most significant finding is that the framework distribution conforms almost exactly to a power law. This suggests that the more often a framework has been used as the basis for a compd., the more likely it is to be used in another compd. This may be explained by the cost of synthesis: making a new deriv. of a framework is probably less costly if many other derivs. are known. The authors believe that this power law is evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of org. chem. - 2ACS NEWS Chem. Eng. News 2011, 89, 38Google ScholarThere is no corresponding record for this reference.
- 3Bleicher, K. H.; Bohm, H. J.; Muller, K.; Alanine, A. I. Hit and lead generation: Beyond high-throughput screening Nat. Rev. Drug Discovery 2003, 2, 369– 378[Crossref], [PubMed], [CAS], Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXjslamtb8%253D&md5=b54e55c1f0d34fbd172c8aa9b996591dA guide to drug discovery: Hit and lead generation: beyond high-throughput screeningBleicher, Konrad H.; Boehm, Hans-Joachim; Mueller, Klaus; Alanine, Alexander I.Nature Reviews Drug Discovery (2003), 2 (5), 369-378CODEN: NRDDAG; ISSN:1474-1776. (Nature Publishing Group)A review. The identification of small-mol. modulators of protein function, and the process of transforming these into high-content lead series, are key activities in modern drug discovery. The decisions taken during this process have far-reaching consequences for success later in lead optimization and even more crucially in clin. development. Recently, there has been an increased focus on these activities due to escalating downstream costs resulting from high clin. failure rates. In addn., the vast emerging opportunities from efforts in functional genomics and proteomics demands a departure from the linear process of identification, evaluation and refinement activities towards a more integrated parallel process. This calls for flexible, fast and cost-effective strategies to meet the demands of producing high-content lead series with improved prospects for clin. success.
- 4Schreiber, S. L. Small molecules: the missing link in the central dogma Nat. Chem. Biol. 2005, 1, 64– 66[Crossref], [PubMed], [CAS], Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXls1Oms7Y%253D&md5=d4a54994c0d933602d9d02747e5e1ddcSmall molecules: the missing link in the central dogmaSchreiber, Stuart L.Nature Chemical Biology (2005), 1 (2), 64-66CODEN: NCBABT; ISSN:1552-4450. (Nature Publishing Group)There is no expanded citation for this reference.
- 5Mayr, L. M.; Bojanic, D. Novel trends in high-throughput screening Curr. Opin. Pharmacol. 2009, 9, 580– 588[Crossref], [PubMed], [CAS], Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXht1WntLnL&md5=50284fdd9f4d610e81fb442b78b7b69eNovel trends in high-throughput screeningMayr, Lorenz M.; Bojanic, DejanCurrent Opinion in Pharmacology (2009), 9 (5), 580-588CODEN: COPUBK; ISSN:1471-4892. (Elsevier B.V.)A review. High-throughput screening (HTS) is a well-established process for lead discovery in Pharma and Biotech companies and is now also being used for basic and applied research in academia. It comprises the screening of large chem. libraries for activity against biol. targets via the use of automation, miniaturized assays and large-scale data anal. Since its first advent in the early to mid 1990s, the field of HTS has seen not only a continuous change in technol. and processes, but also an adaptation to various needs in lead discovery. HTS has now evolved into a mature discipline that is a crucial source of chem. starting points for drug discovery. Whereas in previous years much emphasis has been put on a steady increase in screening capacity (quant. increase') via automation and miniaturization, the past years have seen a much greater emphasis on content and quality (qual. increase'). Today, many experts in the field see HTS at a crossroad with the need to decide on either higher throughput/more experimentation or a greater focus on assays of greater physiol. relevance, both of which may lead to higher productivity in pharmaceutical R&D. In this paper, we describe the development of HTS over the past decade and point out our own ideas for future directions of HTS in biomedical research. We predict that the trend toward further miniaturization will slow down with the balanced implementation of 384 well, 1536 well, and 384 low vol. well plates. Furthermore, we envisage that there will be much more emphasis on rigorous assay and chem. characterization, particularly considering that novel and more difficult target classes will be pursued. In recent years we have witnessed a clear trend in the drug discovery community toward rigorous hit validation by the use of orthogonal readout technologies, label free and biophys. methodologies. We also see a trend toward a more flexible use of the various screening approaches in lead discovery, i.e., the use of both full deck compd. screening as well as the use of focused screening and iterative screening approaches. Moreover, we expect greater usage of target identification strategies downstream of phenotypic screening and the more effective implementation of affinity selection technologies as a result of advances in chem. diversity methodologies. We predict that, ultimately, each hit finding strategy will be much more project-related, tailor-made, and better integrated into the broader drug discovery efforts.
- 6Renner, S.; Popov, M.; Schuffenhauer, A.; Roth, H. J.; Breitenstein, W.; Marzinzik, A.; Lewis, I.; Krastel, P.; Nigsch, F.; Jenkins, J.; Jacoby, E. Recent trends and observations in the design of high-quality screening collections Future Med. Chem 2011, 3, 751– 766[Crossref], [PubMed], [CAS], Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlvFGisLY%253D&md5=f876f165f937c9792c989053492e59f5Recent trends and observations in the design of high-quality screening collectionsRenner, Steffen; Popov, Maxim; Schuffenhauer, Ansgar; Roth, Hans-Joerg; Breitenstein, Werner; Marzinzik, Andreas; Lewis, Ian; Krastel, Philipp; Nigsch, Florian; Jenkins, Jeremy; Jacoby, EdgarFuture Medicinal Chemistry (2011), 3 (6), 751-766CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)A review. The design of a high-quality screening collection is of utmost importance for the early drug-discovery process and provides, in combination with high-quality assay systems, the foundation of future discoveries. Herein, we review recent trends and observations to successfully expand the access to bioactive chem. space, including the feedback from hit assessment interviews of high-throughput screening campaigns; recent successes with chemogenomics target family approaches, the identification of new relevant target/domain families, diversity-oriented synthesis and new emerging compd. classes, and non-classical approaches, such as fragment-based screening and DNA-encoded chem. libraries. The role of in silico library design approaches are emphasized.
- 7Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discovery 2004, 3, 711– 715[Crossref], [PubMed], [CAS], Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmtVOhtLs%253D&md5=f9025c13a1506f607aaf68415570ed01Opinion: Can the pharmaceutical industry reduce attrition rates?Kola, Ismail; Landis, JohnNature Reviews Drug Discovery (2004), 3 (8), 711-716CODEN: NRDDAG; ISSN:1474-1776. (Nature Publishing Group)The pharmaceutical industry faces considerable challenges, both politically and fiscally. Politically, governments around the world are trying to contain costs and, as health care budgets constitute a very significant part of governmental spending, these costs are the subject of intense scrutiny. In the United States, drug costs are also the subject of intense political discourse. This article deals with the fiscal pressures that face the industry from the perspective of R&D. What impinges on productivity How can we improve current reduced R&D productivity.
- 8Hann, M. M. Molecular obesity, potency and other addictions in drug discovery MedChemComm 2011, 2, 349– 355[Crossref], [CAS], Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXls1Kht7g%253D&md5=954870db4b1d9e846612c3bb256582f7Molecular obesity, potency and other addictions in drug discoveryHann, Michael M.MedChemComm (2011), 2 (5), 349-355CODEN: MCCEAY; ISSN:2040-2503. (Royal Society of Chemistry)A review. Despite the increase in global biol. and chem. knowledge the discovery of effective and safe new drugs seems to become harder rather than easier. Some of this challenge is due to increasing demands for safety and novelty, but some of the risk involved in this should be controllable if we had more effectively learned from our failures. This perspective reflects on some of the learnings of recent years in relation to the causes of attrition. The term Mol. Obesity is introduced to describe our tendency to build potency into mols. by the inappropriate use of lipophilicity which leads to the premature demise of drug candidates.
- 9Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules Nat. Rev. Drug Discovery 2005, 4, 649– 663[Crossref], [PubMed], [CAS], Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmvVOqtro%253D&md5=a30dbc58ed81e0b7fe3f7d41a668e9acComputer-based de novo design of drug-like moleculesSchneider, Gisbert; Fechner, UliNature Reviews Drug Discovery (2005), 4 (8), 649-663CODEN: NRDDAG; ISSN:1474-1776. (Nature Publishing Group)A review with refs. Ever since the first automated de novo design techniques were conceived only 15 years ago, the computer-based design of hit and lead structure candidates has emerged as a complementary approach to high-throughput screening. Although many challenges remain, de novo design supports drug discovery projects by generating novel pharmaceutically active agents with desired properties in a cost- and time-efficient manner. In this review, we outline the various design concepts and highlight current developments in computer-based de novo design.
- 10Jorgensen, W. L. Efficient drug lead discovery and optimization Acc. Chem. Res. 2009, 42, 724– 733[ACS Full Text
], [CAS], Google Scholar
10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXjsFentr4%253D&md5=9641587769701c6541b38e18fa05538aEfficient Drug Lead Discovery and OptimizationJorgensen, William L.Accounts of Chemical Research (2009), 42 (6), 724-733CODEN: ACHRE4; ISSN:0001-4842. (American Chemical Society)A review. During the 1980s, advances in the abilities to perform computer simulations of chem. and biomol. systems and to calc. free energy changes led to the expectation that such methodol. would soon show great utility for guiding mol. design. Important potential applications included design of selective receptors, catalysts, and regulators of biol. function including enzyme inhibitors. This time also saw the rise of high-throughput screening and combinatorial chem. along with complementary computational methods for de novo design and virtual screening including docking. These technologies appeared poised to deliver diverse lead compds. for any biol. target. As with many technol. advances, realization of the expectations required significant addnl. effort and time. However, as summarized here, striking success has now been achieved for computer-aided drug lead generation and optimization. De novo design using both mol. growing and docking are illustrated for lead generation, and lead optimization features free energy perturbation calcns. in conjunction with Monte Carlo statistical mechanics simulations for protein-inhibitor complexes in aq. soln. The specific applications are to the discovery of non-nucleoside inhibitors of HIV reverse transcriptase (HIV-RT) and inhibitors of the binding of the proinflammatory cytokine MIF to its receptor CD74. A std. protocol is presented that includes scans for possible addns. of small substituents to a mol. core, interchange of heterocycles, and focused optimization of substituents at one site. Initial leads with activities at low-micromolar concns. have been advanced rapidly to low-nanomolar inhibitors. - 11Reymond, J. L.; Van Deursen, R.; Blum, L. C.; Ruddigkeit, L. Chemical space as a source for new drugs MedChemComm 2010, 1, 30– 38[Crossref], [CAS], Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtVeju7rF&md5=579428edc79f706101ab58c505052c72Chemical space as a source for new drugsReymond, Jean-Louis; van Deursen, Ruud; Blum, Lorenz C.; Ruddigkeit, LarsMedChemComm (2010), 1 (1), 30-38CODEN: MCCEAY; ISSN:2040-2503. (Royal Society of Chemistry)A review. The chem. space is the ensemble of all possible mols., which is believed to contain at least 1060 org. mols. below 500 Da of possible interest for drug discovery. This review summarizes the development of the chem. space concept from enumerating acyclic hydrocarbons in the 1800's to the recent assembly of the chem. universe database GDB. Chem. space travel algorithms can be used to explore defined regions of chem. space by generating focused virtual libraries. Maps of the chem. space are produced from property spaces visualized by principal component anal. or by self-organizing maps, and from structural analyses such as the scaffold-tree or the MQN-system. Virtual screening of virtual chem. space followed by synthesis and testing of the best hits leads to the discovery of new drug mols.
- 12Hartenfeller, M.; Schneider, G. De novo drug design Methods Mol. Biol. 2011, 672, 299– 323[Crossref], [PubMed], [CAS], Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtlejsr7L&md5=3a30147da9187fa4895ba807cefed82aDe novo drug designHartenfeller, Markus; Schneider, GisbertMethods in Molecular Biology (New York, NY, United States) (2011), 672 (Chemoinformatics and Computational Chemical Biology), 299-323CODEN: MMBIED; ISSN:1064-3745. (Springer)A review. Computer-assisted mol. design supports drug discovery by suggesting novel chemotypes and compd. modifications for lead structure optimization. While the aspect of synthetic feasibility of the automatically designed compds. has been neglected for a long time, we are currently witnessing an increased interest in this topic. Here, we review state-of-the-art software for de novo drug design with a special emphasis on fragment-based techniques that generate druglike, synthetically accessible compds. The importance of scoring functions that can be used to predict compd. reactivity and potency is highlighted, and several promising solns. are discussed. Recent practical validation studies are presented that have already demonstrated that rule-based fragment assembly can result in novel synthesizable compds. with druglike properties and a desired biol. activity.
- 13Klebe, G. Virtual ligand screening: strategies, perspectives and limitations Drug Discovery Today 2006, 11, 580– 594[Crossref], [PubMed], [CAS], Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XlvFGqtLo%253D&md5=5adfd125d48082238a6c5ad0e8343c59Virtual ligand screening: strategies, perspectives and limitationsKlebe, GerhardDrug Discovery Today (2006), 11 (13 & 14), 580-594CODEN: DDTOFS; ISSN:1359-6446. (Elsevier B.V.)A review. In contrast to high-throughput screening, in virtual ligand screening (VS), compds. are selected using computer programs to predict their binding to a target receptor. A key prerequisite is knowledge about the spatial and energetic criteria responsible for protein-ligand binding. The concepts and prerequisites to perform VS are summarized here, and explanations are sought for the enduring limitations of the technol. Target selection, anal. and prepn. are discussed, as well as considerations about the compilation of candidate ligand libraries. The tools and strategies of a VS campaign, and the accuracy of scoring and ranking of the results, are also considered.
- 14Kolb, P.; Ferreira, R. S.; Irwin, J. J.; Shoichet, B. K. Docking and chemoinformatic screens for new ligands and targets Curr. Opin. Biotechnol. 2009, 20, 429– 36[Crossref], [PubMed], [CAS], Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXht1OrtLbL&md5=4b6b2457a735e18be5780d9299ebffe6Docking and chemoinformatic screens for new ligands and targetsKolb, Peter; Ferreira, Rafaela S.; Irwin, John J.; Shoichet, Brian K.Current Opinion in Biotechnology (2009), 20 (4), 429-436CODEN: CUOBE3; ISSN:0958-1669. (Elsevier B.V.)A review. Computer-based docking screens are now widely used to discover new ligands for targets of known structure; in the last two years alone, the discovery of ligands for more than 20 proteins has been reported. Recently, investigators have also turned to predicting new substrates for enzymes of unknown function, taking docking in a wholly new direction. Increasingly, the hit rates, the true-positives, and the false-positives from the docking screens are being compared to those from empirical, high-throughput screens, revealing the strengths, weaknesses, and complementarities of both techniques. The recent efflorescence of GPCR structures has made these quintessential drug targets available to structure-based approaches. Consistent with their druggability', the docking screens have returned high hit rates and potent mols. Finally, in the last several years, an approach almost exactly opposite to docking has also appeared; this pharmacol. network approach begins not with the structure of the target but rather those of drug mols. and asks, given a pattern of chem. in the ligands, what targets may a particular drug bind to. This method, which returns to an older, pharmacol. logic, has been surprisingly successful in predicting new off-targets' for established drugs.
- 15Geppert, H.; Vogt, M.; Bajorath, J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation J. Chem. Inf. Model. 2010, 50, 205– 216[ACS Full Text
], [CAS], Google Scholar
15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXnvFWmsA%253D%253D&md5=aeb674ef7c6c93711eb21452732e21d8Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance EvaluationGeppert, Hanna; Vogt, Martin; Bajorath, JurgenJournal of Chemical Information and Modeling (2010), 50 (2), 205-216CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)There is no expanded citation for this reference. - 16Cayley, E. Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen Chem. Ber. 1875, 8, 1056– 1059
- 17Lederberg, J.; Sutherland, G. L.; Buchanan, B. G.; Feigenbaum, E. A.; Robertson, A. V.; Duffield, A. M.; Djerassi, C. Applications of artificial intelligence for chemical inference. I. Number of possible organic compounds. Acyclic structures containing carbon, hydrogen, oxygen, and nitrogen J. Am. Chem. Soc. 1969, 91, 2973– 2976
- 18Steinbeck, C. Recent developments in automated structure elucidation of natural products Nat. Prod. Rep. 2004, 21, 512– 518[Crossref], [PubMed], [CAS], Google Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXnsVKltr0%253D&md5=7d7486d14d46798ede0aa853faf013f9Recent developments in automated structure elucidation of natural productsSteinbeck, ChristophNatural Product Reports (2004), 21 (4), 512-518CODEN: NPRRDF; ISSN:0265-0568. (Royal Society of Chemistry)A review. Advancements in the field of computer-assisted structure elucidation (CASE) of natural products achieved in the past five years are discussed. This process starts with a dereplication procedure, supported by structure-spectrum databases. Both com. and free products are available to support the procedure. A no. of new programs, as well as advancements in existing ones, are presented. Finally, the option to validate the result by an independent procedure, a high quality ab initio quantum mech. calcn., is discussed.
- 19Reymond, J. L.; Ruddigkeit, L.; Blum, L. C.; Van Deursen, R. The enumeration of chemical space Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 717– 733[Crossref], [CAS], Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFCmsr%252FM&md5=cda6dfd2d048e668455f52ee884b718aThe enumeration of chemical spaceReymond, Jean-Louis; Ruddigkeit, Lars; Blum, Lorenz; van Deursen, RuudWiley Interdisciplinary Reviews: Computational Molecular Science (2012), 2 (5), 717-733CODEN: WIRCAH; ISSN:1759-0884. (Wiley-Blackwell)A review. In the field of medicinal chem., the chem. space describes the ensemble of all org. mols. to be considered when searching for new drugs (estd. >1060 mols.), as well as the property spaces in which these mols. are placed for the sake of describing them. Mols. can be enumerated computationally by the millions, which was first undertaken in the field of computer-aided structure elucidation. Scoring the enumerated virtual libraries by virtual screening has recently become an attractive strategy to prioritize compds. for synthesis and testing. Enumeration methods include combinatorial linking of fragments, genetic algorithms based on cycles of enumeration and selection by ligand-based or target-based scoring functions, and exhaustive enumeration from first principles. The chem. space of mols. following simple rules of chem. stability and synthetic feasibility has been enumerated up to 13 atoms of C, N, O, Cl, S, forming the GDB-13 database with 977 million structures. The database has been organized in a 42-dimensional chem. space using mol. quantum nos. (MQN) as descriptors, which can be visualized by projection in two dimensions by principal component anal.
- 20Fink, T.; Bruggesser, H.; Reymond, J. L. Virtual exploration of the small-molecule chemical universe below 160 Da Angew. Chem., Int. Ed. Engl. 2005, 44, 1504– 1508
- 21Fink, T.; Reymond, J. L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery J. Chem. Inf. Model. 2007, 47, 342– 353[ACS Full Text
], [CAS], Google Scholar
21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtVens7k%253D&md5=fe97c30ee8269de1e889648f4818f42fVirtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug DiscoveryFink, Tobias; Reymond, Jean-LouisJournal of Chemical Information and Modeling (2007), 47 (2), 342-353CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)All mols. of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chem. stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million mols. (110.9 million stereoisomers), including three- and four-membered rings and triple bonds. By comparison, only 63 857 compds. of up to 11 atoms were found in public databases (a combination of PubChem, ChemACX, ChemSCX, NCI open database, and the Merck Index). A total of 538 of the 1208 ring systems in GDB are currently unknown in the CAS Registry and Beilstein databases in any carbon/heteroatom/multiple-bond combination or as a substructure. Over 70% of GDB mols. are chiral. Because of their small size, all compds. obey Lipinski's bioavailability rule. A total of 13.2 million compds. also follow Congreve's "Rule of 3" for lead-likeness. A Kohonen map trained with autocorrelation descriptors organizes GDB according to compd. classes and shows that leadlike compds. are most abundant in chiral regions of fused carbocycles and fused heterocycles. The projection of known compds. into this map indicates large uncharted areas of chem. space. The potential of GDB for drug discovery is illustrated by virtual screening for kinase inhibitors, G-protein coupled receptor ligands, and ion-channel modulators. The database is available from the author's Web page. - 22Blum, L. C.; Reymond, J. L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 J. Am. Chem. Soc. 2009, 131, 8732– 8733[ACS Full Text
], [CAS], Google Scholar
22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXmvFWru7k%253D&md5=22c200e887a6480b19a73852ca0a3435970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13Blum, Lorenz C.; Reymond, Jean-LouisJournal of the American Chemical Society (2009), 131 (25), 8732-8733CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)GDB-13 enumerates small org. mols. contg. up to 13 atoms of C, N, O, S, and Cl following simple chem. stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small org. mol. database to date. - 23Blum, L. C.; van Deursen, R.; Reymond, J. L. Visualisation and subsets of the chemical universe database GDB-13 for virtual screening J. Comput.-Aided Mol. Des. 2011, 25, 637– 647[Crossref], [PubMed], [CAS], Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtVSqtbbN&md5=45f426cafa64b3bb06fc11e68a1b4022Visualisation and subsets of the chemical universe database GDB-13 for virtual screeningBlum, Lorenz C.; Deursen, Ruud; Reymond, Jean-LouisJournal of Computer-Aided Molecular Design (2011), 25 (7), 637-647CODEN: JCADEQ; ISSN:0920-654X. (Springer)The chem. universe database GDB-13, which enumerates 977 million org. mols. up to 13 atoms of C, N, O, S and Cl following simple chem. stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the anal. of PubChem fragments. Two hundred and fifty-five subsets of GDB-13 were generated by the combinatorial use of eight restrictive criteria, including fragment-like ("rule of three") and scaffold-like (no acyclic carbon atoms) filters. Virtual screening for analogs of 15 com. drugs of 13 non-hydrogen atoms or less shows that retrieving MQN-neighbors of a query mol. from GDB-13 or its subsets provides on av. a 38-fold enrichment in structural analogs (Daylight-type substructure fingerprint Tanimoto TSF > 0.7), and a 75-fold enrichment in shape-similar analogs (ROCS TanimotoCombo score > 1.4). An MQN-searchable version of GDB-13 is provided at www.ghb.unibe.ch.
- 24Nguyen, K. T.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Discovery of NMDA glycine site inhibitors from the chemical universe database GDB ChemMedChem 2008, 3, 1520– 1524[Crossref], [PubMed], [CAS], Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhtlOgt73P&md5=d240511779046f0794b75e35f52c42ccDiscovery of NMDA glycine site inhibitors from the chemical universe database GDBNguyen, Kong Thong; Syed, Salahuddin; Urwyler, Stephan; Bertrand, Sonia; Bertrand, Daniel; Reymond, Jean-LouisChemMedChem (2008), 3 (10), 1520-1524CODEN: CHEMGX; ISSN:1860-7179. (Wiley-VCH Verlag GmbH & Co. KGaA)Using virtual screening tools, promising small mol. ligands of the NMDA receptor glycine site are identified for subsequent synthesis.
- 25Nguyen, K. T.; Luethi, E.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. 3-(aminomethyl)piperazine-2,5-dione as a novel NMDA glycine site inhibitor from the chemical universe database GDB Bioorg. Med. Chem. Lett. 2009, 19, 3832– 3835
- 26Garcia-Delgado, N.; Bertrand, S.; Nguyen, K. T.; van Deursen, R.; Bertrand, D.; Reymond, J.-L. Exploring a7-nicotinic receptor ligand diversity by scaffold enumeration from the Chemical Universe Database GDB ACS Med. Chem. Lett. 2010, 1, 422– 426
- 27Luethi, E.; Nguyen, K. T.; Burzle, M.; Blum, L. C.; Suzuki, Y.; Hediger, M.; Reymond, J. L. Identification of selective norbornane-type aspartate analogue inhibitors of the glutamate transporter 1 (GLT-1) from the chemical universe generated database (GDB) J. Med. Chem. 2010, 53, 7236– 7250[ACS Full Text
], [CAS], Google Scholar
27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtFWlsLvF&md5=3834d28f487e11ead1a8a17698587935Identification of Selective Norbornane-Type Aspartate Analogue Inhibitors of the Glutamate Transporter 1 (GLT-1) from the Chemical Universe Generated Database (GDB)Luethi, Erika; Nguyen, Kong T.; Burzle, Marc; Blum, Lorenz C.; Suzuki, Yoshiro; Hediger, Matthias; Reymond, Jean-LouisJournal of Medicinal Chemistry (2010), 53 (19), 7236-7250CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A variety of conformationally constrained aspartate and glutamate analogs inhibit the glutamate transporter 1 (GLT-1, also known as EAAT2). To expand the search for such analogs, a virtual library of aliph. aspartate and glutamate analogs was generated starting from the chem. universe database GDB-11, which contains 26.4 million possible mols. up to 11 atoms of C, N, O, F, resulting in 101026 aspartate analogs and 151285 glutamate analogs. Virtual screening was realized by high-throughput docking to the glutamate binding site of the glutamate transporter homolog from Pyrococcus horikoshii (PDB code: 1XFH) using Autodock. Norbornane-type aspartate analogs were selected from the top-scoring virtual hits and synthesized. Testing and optimization led to the identification of (1R*,2R*,3S*,4R*,6R*)-2-amino-6-phenethyl-bicyclo[2.2.1]heptane-2,3-dicarboxylic acid (I) as a new inhibitor of GLT-1 with IC50 = 1.4 μM against GLT-1 and no inhibition of the related transporter EAAC1. The systematic diversification of known ligands by enumeration with help of GDB followed by virtual screening, synthesis, and testing as exemplified here provides a general strategy for drug discovery. - 28Blum, L. C.; van Deursen, R.; Bertrand, S.; Mayer, M.; Burgi, J. J.; Bertrand, D.; Reymond, J. L. Discovery of alpha7-nicotinic receptor ligands by virtual screening of the Chemical Universe Database GDB-13 J. Chem. Inf. Model. 2011, 51, 3105– 3112[ACS Full Text
], [CAS], Google Scholar
28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsVGru7vK&md5=c176859de6f18cc54b2f25eee42ab79aDiscovery of α7-Nicotinic Receptor Ligands by Virtual Screening of the Chemical Universe Database GDB-13Blum, Lorenz C.; van Deursen, Ruud; Bertrand, Sonia; Mayer, Milena; Burgi, Justus J.; Bertrand, Daniel; Reymond, Jean-LouisJournal of Chemical Information and Modeling (2011), 51 (12), 3105-3112CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The chem. universe database GDB-13 enumerates 977 million org. mols. up to 13 atoms of C, N, O, Cl, and S that are virtually possible following simple rules for chem. stability and synthetic feasibility. Analogs of nicotine were identified in GDB-13 using the city-block distance in MQN-space (CBDMQN) as a similarity measure, combined with a restriction eliminating problematic structural elements. The search was carried out with a Web browser available at www.gdb.unibe.ch. This virtual screening procedure selected 31 504 analogs of nicotine from GDB-13, from which 48 were known nicotinic ligands reported in Chembl. An addnl. 60 virtual screening hits were purchased and tested for modulation of the acetylcholine signal at the human α7 nAChR expressed in Xenopus oocytes, which led to the identification of three previously unknown inhibitors. These expts. demonstrate for the first time the use of GDB-13 for ligand discovery. - 29Brethous, L.; Garcia-Delgado, N.; Schwartz, J.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Synthesis and nicotinic receptor activity of chemical space analogues of N-(3R)-1-azabicyclo[2.2.2]oct-3-yl-4-chlorobenzamide (PNU-282,987) and 1,4-diazabicyclo[3.2.2]nonane-4-carboxylic acid 4-bromophenyl ester (SSR180711) J. Med. Chem. 2012, 55, 4605– 4618
- 30Reymond, J. L.; Awale, M. Exploring chemical space for drug discovery using the Chemical Universe Database ACS Chem. Neurosci. 2012, 3, 649– 657[ACS Full Text
], [CAS], Google Scholar
30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmtValu70%253D&md5=bbbe3c0931328f3796ce999189374864Exploring Chemical Space for Drug Discovery Using the Chemical Universe DatabaseReymond, Jean-Louis; Awale, MahendraACS Chemical Neuroscience (2012), 3 (9), 649-657CODEN: ACNCDM; ISSN:1948-7193. (American Chemical Society)Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chem. space of small mols. Enumeration from first principles shows that almost all small mols. (>99.9%) have never been synthesized and are still available to be prepd. and tested. We discuss open access sources of mols., the classification and representation of chem. space using mol. quantum nos. (MQN), its exhaustive enumeration in form of the chem. universe generated databases (GDB), and examples of using these databases for prospective drug discovery. MQN-searchable GDB, PubChem, and DrugBank are freely accessible at www.gdb.unibe.ch. - 31Foloppe, N. The benefits of constructing leads from fragment hits Future Med. Chem. 2011, 3, 1111– 1115[Crossref], [PubMed], [CAS], Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpsFeisLY%253D&md5=915652874047d19b21987cccf52c0df7The benefits of constructing leads from fragment hitsFoloppe, N.Future Medicinal Chemistry (2011), 3 (9), 1111-1115CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)A review. Fragments'' refer to particularly small mol. starting points in medicinal chem. The small size of fragments requires adapted techniques for their screening and subsequent elaboration. The detection of the weak binding affinity of fragments for their target, and assocd. screening issues, have been debated at length. Since it is now clear that fragments can be developed into clin. candidates, the discussion is shifting to the design of good-quality lead compds. from fragment hits. The increasing ability to control and tailor this construction process highlights the potential benefits of fragment-based drug discovery.
- 32Teague, S. J.; Davis, A. M.; Leeson, P. D.; Oprea, T. The design of leadlike combinatorial libraries Angew. Chem., Int. Ed. Engl. 1999, 38, 3743– 3748
- 33Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Res. 2009, 37, W623– W633[Crossref], [PubMed], [CAS], Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXosFSktL8%253D&md5=11d50a1ff4d9b353728e1e03ee2e33caPubChem: a public information system for analyzing bioactivities of small moleculesWang, Yanli; Xiao, Jewen; Suzek, Tugba O.; Zhang, Jian; Wang, Jiyao; Bryant, Stephen H.Nucleic Acids Research (2009), 37 (Web Server), W623-W633CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biol. properties of small mols. hosted by the US National Institutes of Health (NIH). PubChem BioAssay database currently contains biol. test results for more than 700 000 compds. The goal of PubChem is to make this information easily accessible to biomedical researchers. In this work, we present a set of web servers to facilitate and optimize the utility of biol. activity information within PubChem. These web-based services provide tools for rapid data retrieval, integration and comparison of biol. screening results, exploratory structure-activity anal., and target selectivity examn. This article reviews these bioactivity anal. tools and discusses their uses. Most of the tools described in this work can be directly accessed at http://pubchem.ncbi.nlm.nih.gov/assay/. URLs for accessing other tools described in this work are specified individually.
- 34Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery Nucleic Acids Res. 2012, 40, D1100– D1107[Crossref], [PubMed], [CAS], Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs12htbjN&md5=aedf7793e1ca54b6a4fa272ea3ef7d0eChEMBL: a large-scale bioactivity database for drug discoveryGaulton, Anna; Bellis, Louisa J.; Bento, A. Patricia; Chambers, Jon; Davies, Mark; Hersey, Anne; Light, Yvonne; McGlinchey, Shaun; Michalovich, David; Al-Lazikani, Bissan; Overington, John P.Nucleic Acids Research (2012), 40 (D1), D1100-D1107CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)ChEMBL is an Open Data database contg. binding, functional and ADMET information for a large no. of drug-like bioactive compds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chem. biol. and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compds. and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.
- 35Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S. DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs Nucleic Acids Res. 2011, 39, D1035– D1041
- 36McKay, B. D. Practical graph isomorphism Congressus Numerantium 1981, 30, 45– 87Google ScholarThere is no corresponding record for this reference.
- 37Rishton, G. M. Reactive compounds and in vitro false positives in HTS Drug Discovery Today 1997, 2, 382– 384[Crossref], [CAS], Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXlvFeksL4%253D&md5=b43a00783098a324ab3dc4d5f3704013Reactive compounds and in vitro false positives in HTSRishton, Gilbert M.Drug Discovery Today (1997), 2 (9), 382-384CODEN: DDTOFS; ISSN:1359-6446. (Elsevier)A review without refs. An important component of the successful high-throughput screening (HTS) strategy in drug discovery is the ability to assess HTS structure-activity data, and to distinguish between promising drug leads and the many useless false positives that can plague screening efforts. The author discusses simple chem. guidelines for the evaluation of "positives" in biochem. screens, with the aim of selecting stable, non-covalent binders (ligands) and eliminating protein-reactive compds. (reagents) from consideration as drug leads at an early stage.
- 38Rishton, G. M. Nonleadlikeness and leadlikeness in biochemical screening Drug Discovery Today 2003, 8, 86– 96[Crossref], [PubMed], [CAS], Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD3s%252Fls1SjtQ%253D%253D&md5=5e34cece6f751fa727f949c3526aa95cNonleadlikeness and leadlikeness in biochemical screeningRishton Gilbert MDrug discovery today (2003), 8 (2), 86-96 ISSN:1359-6446.Biochemical assays have largely supplanted functional biological assays as drug screening tools in the early stages of drug discovery. The de-selection of compounds that are 'nonleadlike' binders (and bonders) and the proactive selection of those compounds that are 'leadlike' in their binding to the target are vital components of the screening effort. The physiochemical properties of leadlikeness and the surprising differences between those properties and the now classical definitions of druglikeness are becoming apparent.
- 39Rush, T. S., III; Grant, J. A.; Mosyak, L.; Nicholls, A. A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction J. Med. Chem. 2005, 48, 1489– 1495[ACS Full Text
], [CAS], Google Scholar
39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXht1Ols78%253D&md5=05b6e54a657a3b8c768e63852d871ef6A Shape-Based 3-D Scaffold Hopping Method and Its Application to a Bacterial Protein-Protein InteractionRush, Thomas S., III; Grant, J. Andrew; Mosyak, Lidia; Nicholls, AnthonyJournal of Medicinal Chemistry (2005), 48 (5), 1489-1495CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)In this paper, the authors describe the first prospective application of the shape-comparison program ROCS (Rapid Overlay of Chem. Structures) to find new scaffolds for small mol. inhibitors of the ZipA-FtsZ protein-protein interaction, a proposed antibacterial target. The shape comparisons are made relative to the crystallog. detd., bioactive conformation of a high-throughput screening (HTS) hit. The use of ROCS led to the identification of a set of novel, weakly binding inhibitors with scaffolds presenting synthetic opportunities to further optimize biol. affinity and lacking development issues assocd. with the HTS lead. These ROCS-identified scaffolds would have been missed using other structural similarity approaches such as ISIS 2D fingerprints. X-ray crystallog. anal. of one of the new inhibitors bound to ZipA reveals that the shape comparison approach very accurately predicted the binding mode. These exptl. results validate this use of ROCS for chemotype switching or "lead hopping" and suggest that it is of general interest for lead identification in drug discovery endeavors. - 40Nicholls, A.; McGaughey, G. B.; Sheridan, R. P.; Good, A. C.; Warren, G.; Mathieu, M.; Muchmore, S. W.; Brown, S. P.; Grant, J. A.; Haigh, J. A.; Nevins, N.; Jain, A. N.; Kelley, B. Molecular shape and medicinal chemistry: a perspective J. Med. Chem. 2010, 53, 3862– 3886[ACS Full Text
], [CAS], Google Scholar
40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhvF2kt7k%253D&md5=85664344e13872527a3dfb2296d34864Molecular Shape and Medicinal Chemistry: A PerspectiveNicholls, Anthony; McGaughey, Georgia B.; Sheridan, Robert P.; Good, Andrew C.; Warren, Gregory; Mathieu, Magali; Muchmore, Steven W.; Brown, Scott P.; Grant, J. Andrew; Haigh, James A.; Nevins, Neysa; Jain, Ajay N.; Kelley, BrianJournal of Medicinal Chemistry (2010), 53 (10), 3862-3886CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A review article with 111 refs. summarized perspectives of mol. shape and medicinal chem. in drug screening. - 41Sauer, W. H.; Schwarz, M. K. Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity J. Chem. Inf. Comput. Sci. 2003, 43, 987– 1003[ACS Full Text
], [CAS], Google Scholar
41https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXhvF2muro%253D&md5=c07401742457e76b066bed4b43feca1dMolecular Shape Diversity of Combinatorial Libraries: A Prerequisite for Broad BioactivitySauer, Wolfgang H. B.; Schwarz, Matthias K.Journal of Chemical Information and Computer Sciences (2003), 43 (3), 987-1003CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)A computational method to rapidly assess and visualize the diversity in mol. shape assocd. with a given compd. set has been developed. Normalized ratios of principal moments of inertia are plotted into two-dimensional triangular graphs and then used to compare the shape space covered by different compd. sets, such as combinatorial libraries of varying size and compn. We have further developed a computational method to analyze interset similarity in terms of shape space coverage, which allows the shape redundancy between the different subsets of a given compd. collection to be analyzed in a quant. way. The shape space coverage has been found to originate mainly from the nature and the 3D-geometry (but not the size) of the central scaffold, while the no. and nature of the peripheral substituents and conformational aspects were shown to be of minor importance. Substantial shape space coverage has been correlated with broad biol. activity by applying the same shape anal. to collections of known bioactive compds., such as MDDR and the GOLD-set. The aggregate of our results corroborates the intuitive notion that mol. shape is intimately linked to biol. activity and that a high degree of shape (hence scaffold) diversity in screening collections will increase the odds of addressing a broad range of biol. targets. - 42Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success J. Med. Chem. 2009, 52, 6752– 6756[ACS Full Text
], [CAS], Google Scholar
42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXht1KjtLvN&md5=4ca92c30c17c53d77ad376719bad951eEscape from Flatland: Increasing Saturation as an Approach to Improving Clinical SuccessLovering, Frank; Bikker, Jack; Humblet, ChristineJournal of Medicinal Chemistry (2009), 52 (21), 6752-6756CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)The medicinal chem. community has become increasingly aware of the value of tracking calcd. phys. properties such as mol. wt., topol. polar surface area, rotatable bonds, and hydrogen bond donors and acceptors. The authors hypothesized that the shift to high-throughput synthetic practices over the past decade may be another factor that may predispose mols. to fail by steering discovery efforts toward achiral, arom. compds. The authors have proposed two simple and interpretable measures of the complexity of mols. prepd. as potential drug candidates. The first is carbon bond satn. as defined by fraction Sp3 (Fsp3) where Fsp3 = (no. of Sp3 hybridized carbons/total carbon count). The second is simply whether a chiral carbon exists in the mol. The authors demonstrate that both complexity (as measured by Fsp3) and the presence of chiral centers correlate with success as compds. transition from discovery, through clin. testing, to drugs. To explain these observations, the authors further demonstrate that satn. correlates with soly., an exptl. phys. property important to success in the drug discovery setting. - 43Ritchie, T. J.; Macdonald, S. J.; Young, R. J.; Pickett, S. D. The impact of aromatic ring count on compound developability: further insights by examining carbo- and hetero-aromatic and -aliphatic ring types Drug Discovery Today 2011, 16, 164– 171[Crossref], [PubMed], [CAS], Google Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhvVyqsr0%253D&md5=8e2d0e4e499d2f07d66ca2f385defb2dThe impact of aromatic ring count on compound developability: further insights by examining carbo- and hetero-aromatic and -aliphatic ring typesRitchie, Timothy J.; MacDonald, Simon J. F.; Young, Robert J.; Pickett, Stephen D.Drug Discovery Today (2011), 16 (3/4), 164-171CODEN: DDTOFS; ISSN:1359-6446. (Elsevier B.V.)A review. The impact of carboarom., heteroarom., carboaliph. and heteroaliph. ring counts and fused arom. ring count on several developability measures (soly., lipophilicity, protein binding, P 450 inhibition and hERG binding) is the topic for this review article. Recent results indicate that increasing ring counts have detrimental effects on developability in the order carboaroms. » heteroaroms. > carboaliphatics > heteroaliphatics, with heteroaliphatics exerting a beneficial effect in many cases. Increasing arom. ring count exerts effects on several developability parameters that are lipophilicity- and size-independent, and fused arom. systems have a beneficial effect relative to their nonfused counterparts. Increasing arom. ring count has a detrimental effect on human bioavailability parameters, and heteroarom. ring count (but not other ring counts) has increased over time in marketed oral drugs.
- 44Clemons, P. A.; Bodycombe, N. E.; Carrinski, H. A.; Wilson, J. A.; Shamji, A. F.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 18787– 18792[Crossref], [PubMed], [CAS], Google Scholar44https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVCqsb%252FN&md5=198e298ca5391d7dfd91184680e3c2ecSmall molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profilesClemons, Paul A.; Bodycombe, Nicole E.; Carrinski, Hyman A.; Wilson, J. Anthony; Shamji, Alykhan F.; Wagner, Bridget K.; Koehler, Angela N.; Schreiber, Stuart L.Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (44), 18787-18792, S18787/1-S18787/5CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Using a diverse collection of small mols. generated from a variety of sources, we measured protein-binding activities of each individual compd. against each of 100 diverse (sequence-unrelated) proteins using small-mol. microarrays. We also analyzed structural features, including complexity, of the small mols. We found that compds. from different sources (com., academic, natural) have different protein-binding behaviors and that these behaviors correlate with general trends in stereochem. and shape descriptors for these compd. collections. Increasing the content of sp3-hybridized and stereogenic atoms relative to compds. from com. sources, which comprise the majority of current screening collections, improved binding selectivity and frequency. The results suggest structural features that synthetic chemists can target when synthesizing screening collections for biol. discovery. Because binding proteins selectively can be a key feature of high-value probes and drugs, synthesizing com- pounds having features identified in this study may result in improved performance of screening collections.
- 45Clemons, P. A.; Wilson, J. A.; Dancik, V.; Muller, S.; Carrinski, H. A.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 6817– 6822[Crossref], [PubMed], [CAS], Google Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlslyru7c%253D&md5=d83343549289415060ea55d45118d31fQuantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collectionsClemons, Paul A.; Wilson, J. Anthony; Dancik, Vlado; Muller, Sandrine; Carrinski, Hyman A.; Wagner, Bridget K.; Koehler, Angela N.; Schreiber, Stuart L.Proceedings of the National Academy of Sciences of the United States of America (2011), 108 (17), 6817-6822, S6817/1-S6817/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Using a diverse collection of small mols. we recently found that compd. sets from different sources (com.; academic; natural) have different protein-binding behaviors, and these behaviors correlate with trends in stereochem. complexity for these compd. sets. These results lend insight into structural features that synthetic chemists might target when synthesizing screening collections for biol. discovery. We report extensive characterization of structural properties and diversity of biol. performance for these compds. and expand comparative analyses to include physicochem. properties and three-dimensional shapes of predicted conformers. The results highlight addnl. similarities and differences between the sets, but also the dependence of such comparisons on the choice of mol. descriptors. Using a protein-binding dataset, we introduce an information-theoretic measure to assess diversity of performance with a constraint on specificity. Rather than relying on finding individual active compds., this measure allows rational judgment of compd. subsets as groups. We also apply this measure to publicly available data from ChemBank for the same compd. sets across a diverse group of functional assays. We find that performance diversity of compd. sets is relatively stable across a range of property values as judged by this measure, both in protein-binding studies and functional assays. Because building screening collections with improved performance depends on efficient use of synthetic org. chem. resources, these studies illustrate an important quant. framework to help prioritize choices made in building such collections.
- 46Sadowski, J.; Gasteiger, J. From atoms and bonds to 3-dimensional atomic coordinates - automatic model builders Chem. Rev. 1993, 93, 2567– 2581[ACS Full Text
], [CAS], Google Scholar
46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXmt1GgsLs%253D&md5=fcdb50dfbd981c8f122da06e06e62e27From atoms and bonds to three-dimensional atomic coordinates: automatic model buildersSadowski, Jens; Gasteiger, JohannChemical Reviews (Washington, DC, United States) (1993), 93 (7), 2567-81CODEN: CHREAY; ISSN:0009-2665.A review with ∼75 refs. in which WIZARD, COBRA, CONCORD, and CORINA are discussed. - 47Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks J. Med. Chem. 1996, 39, 2887– 2893[ACS Full Text
], [CAS], Google Scholar
47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK28XjvVejtro%253D&md5=5e2c4fdfea9434456a0cca83de4185b3The Properties of Known Drugs. 1. Molecular FrameworksBemis, Guy W.; Murcko, Mark A.Journal of Medicinal Chemistry (1996), 39 (15), 2887-2893CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)To better understand the common features present in drug mols., we use shape description methods to analyze a database of com. available drugs and prep. a list of common drug shapes. A useful way of organizing this structural data is to group the atoms of each drug mol. into ring, linker, framework, and side chain atoms. On the basis of the two-dimensional mol. structures (without regard to atom type, hybridization, and bond order), there are 1179 different frameworks among the 5120 compds. analyzed. However, the shapes of half of the drugs in the database are described by the 32 most frequently occurring frameworks. This suggests that the diversity of shapes in the set of known drugs is extremely low. In our second method of anal., in which atom type, hybridization, and bond order are considered, more diversity is seen; there are 2506 different frameworks among the 5120 compds. in the database, and the most frequently occurring 42 frameworks account for only one-fourth of the drugs. We discuss the possible interpretations of these findings and the way they may be used to guide future drug discovery research.
Cited By
This article is cited by 743 publications.
- Adrian Krzyzanowski, Axel Pahl, Michael Grigalunas, Herbert Waldmann. Spacial Score─A Comprehensive Topological Indicator for Small-Molecule Complexity. Journal of Medicinal Chemistry 2023, 66 (18) , 12739-12750. https://doi.org/10.1021/acs.jmedchem.3c00689
- Maho Nakata, Toshiyuki Maeda. PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* Calculations. Journal of Chemical Information and Modeling 2023, 63 (18) , 5734-5754. https://doi.org/10.1021/acs.jcim.3c00899
- Gergely Takács, Dávid Havasi, Márk Sándor, Zsolt Dohánics, György T. Balogh, Róbert Kiss. DIY Virtual Chemical Libraries - Novel Starting Points for Drug Discovery. ACS Medicinal Chemistry Letters 2023, 14 (9) , 1188-1197. https://doi.org/10.1021/acsmedchemlett.3c00146
- Hyeongwoo Kim, Kyunghoon Lee, Chansu Kim, Jaechang Lim, Woo Youn Kim. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. Journal of Chemical Information and Modeling 2023, Article ASAP.
- Sibei Guo, Jun Jiang, Hao Ren, Song Wang. Fusion of Multiple Spectra for Investigating Chemical Bonding Properties via Machine Learning. The Journal of Physical Chemistry Letters 2023, 14 (33) , 7461-7468. https://doi.org/10.1021/acs.jpclett.3c01709
- Rostislav Fedorov, Ganna Gryn’ova. Unlocking the Potential: Predicting Redox Behavior of Organic Molecules, from Linear Fits to Neural Networks. Journal of Chemical Theory and Computation 2023, 19 (15) , 4796-4814. https://doi.org/10.1021/acs.jctc.3c00355
- Yuyang Wang, Changwen Xu, Zijie Li, Amir Barati Farimani. Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials. Journal of Chemical Theory and Computation 2023, 19 (15) , 5077-5087. https://doi.org/10.1021/acs.jctc.3c00289
- Jeffrey A. Dewey, Clémence Delalande, Saara-Anne Azizi, Vivian Lu, Dionysios Antonopoulos, Gyorgy Babnigg. Molecular Glue Discovery: Current and Future Approaches. Journal of Medicinal Chemistry 2023, 66 (14) , 9278-9296. https://doi.org/10.1021/acs.jmedchem.3c00449
- Guoxiang Zhao, Weiyin Yan, Zirui Wang, Yao Kang, Zuju Ma, Zhi-Gang Gu, Qiao-Hong Li, Jian Zhang. Predict the Polarizability and Order of Magnitude of Second Hyperpolarizability of Molecules by Machine Learning. The Journal of Physical Chemistry A 2023, 127 (29) , 6109-6115. https://doi.org/10.1021/acs.jpca.2c08563
- Thorren Kirschbaum, Börries von Seggern, Joachim Dzubiella, Annika Bande, Frank Noé. Machine Learning Frontier Orbital Energies of Nanodiamonds. Journal of Chemical Theory and Computation 2023, 19 (14) , 4461-4473. https://doi.org/10.1021/acs.jctc.2c01275
- Sheng-Hsuan Hung, Zong-Rong Ye, Chi-Feng Cheng, Berlin Chen, Ming-Kang Tsai. Enhanced Predictions for the Experimental Photophysical Data Using the Featurized Schnet-Bondstep Approach. Journal of Chemical Theory and Computation 2023, 19 (14) , 4559-4567. https://doi.org/10.1021/acs.jctc.3c00054
- Esther Heid, Charles J. McGill, Florence H. Vermeire, William H. Green. Characterizing Uncertainty in Machine Learning for Chemistry. Journal of Chemical Information and Modeling 2023, 63 (13) , 4012-4029. https://doi.org/10.1021/acs.jcim.3c00373
- Nutaya Pravalphruekul, Maytus Piriyajitakonkij, Phond Phunchongharn, Supanida Piyayotai. De Novo Design of Molecules with Multiaction Potential from Differential Gene Expression using Variational Autoencoder. Journal of Chemical Information and Modeling 2023, 63 (13) , 3999-4011. https://doi.org/10.1021/acs.jcim.3c00355
- Johanna Kleinekorte, Jonas Kleppich, Lorenz Fleitmann, Verena Beckert, Luise Blodau, André Bardow. APPROPRIATE Life Cycle Assessment: A PROcess-Specific, PRedictive Impact AssessmenT Method for Emerging Chemical Processes. ACS Sustainable Chemistry & Engineering 2023, 11 (25) , 9303-9319. https://doi.org/10.1021/acssuschemeng.2c07682
- Dmitrij Rappoport. Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models. The Journal of Physical Chemistry A 2023, 127 (24) , 5252-5263. https://doi.org/10.1021/acs.jpca.3c01430
- Po-Yu Kao, Ya-Chu Yang, Wei-Yin Chiang, Jen-Yueh Hsiao, Yudong Cao, Alex Aliper, Feng Ren, Alán Aspuru-Guzik, Alex Zhavoronkov, Min-Hsiu Hsieh, Yen-Chu Lin. Exploring the Advantages of Quantum Generative Adversarial Networks in Generative Chemistry. Journal of Chemical Information and Modeling 2023, 63 (11) , 3307-3318. https://doi.org/10.1021/acs.jcim.3c00562
- Po-Yen Chen, Kiyou Shibata, Katsumi Hagita, Tomohiro Miyata, Teruyasu Mizoguchi. Prediction of the Ground-State Electronic Structure from Core-Loss Spectra of Organic Molecules by Machine Learning. The Journal of Physical Chemistry Letters 2023, 14 (20) , 4858-4865. https://doi.org/10.1021/acs.jpclett.3c00142
- Eric M. Collins, Krishnan Raghavachari. Interpretable Graph-Network-Based Machine Learning Models via Molecular Fragmentation. Journal of Chemical Theory and Computation 2023, 19 (10) , 2804-2810. https://doi.org/10.1021/acs.jctc.2c01308
- Lieven Bekaert, Suzuno Akatsuka, Naoto Tanibata, Frank De Proft, Annick Hubin, Mesfin Haile Mamme, Masanobu Nakayama. Assessing the Reactivity of the Na3PS4 Solid-State Electrolyte with the Sodium Metal Negative Electrode Using Total Trajectory Analysis with Neural-Network Potential Molecular Dynamics. The Journal of Physical Chemistry C 2023, 127 (18) , 8503-8514. https://doi.org/10.1021/acs.jpcc.3c02379
- Matthew P. Stewart, Scot T. Martin. Machine Learning for Ionization Potentials and Photoionization Cross Sections of Volatile Organic Compounds. ACS Earth and Space Chemistry 2023, 7 (4) , 863-875. https://doi.org/10.1021/acsearthspacechem.3c00009
- Carlos Manuel de Armas-Morejón, Luis A. Montero-Cabrera, Angel Rubio, Joaquim Jornet-Somoza. Electronic Descriptors for Supervised Spectroscopic Predictions. Journal of Chemical Theory and Computation 2023, 19 (6) , 1818-1826. https://doi.org/10.1021/acs.jctc.2c01039
- Rishi Gurnani, Christopher Kuenneth, Aubrey Toland, Rampi Ramprasad. Polymer Informatics at Scale with Multitask Graph Neural Networks. Chemistry of Materials 2023, 35 (4) , 1560-1567. https://doi.org/10.1021/acs.chemmater.2c02991
- Jinzhe Zeng, Yujun Tao, Timothy J. Giese, Darrin M. York. QDπ: A Quantum Deep Potential Interaction Model for Drug Discovery. Journal of Chemical Theory and Computation 2023, 19 (4) , 1261-1275. https://doi.org/10.1021/acs.jctc.2c01172
- Song Xia, Dongdong Zhang, Yingkai Zhang. Multitask Deep Ensemble Prediction of Molecular Energetics in Solution: From Quantum Mechanics to Experimental Properties. Journal of Chemical Theory and Computation 2023, 19 (2) , 659-668. https://doi.org/10.1021/acs.jctc.2c01024
- Kristian Kříž, Lisa Schmidt, Alfred T. Andersson, Marie-Madeleine Walz, David van der Spoel. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. Journal of Chemical Information and Modeling 2023, 63 (2) , 412-431. https://doi.org/10.1021/acs.jcim.2c01127
- Ye Buehler, Jean-Louis Reymond. Molecular Framework Analysis of the Generated Database GDB-13s. Journal of Chemical Information and Modeling 2023, 63 (2) , 484-492. https://doi.org/10.1021/acs.jcim.2c01107
- Dongliang Kang, Jun Ma, Ya-Pu Zhao. Perspectives of Machine Learning Development on Kerogen Molecular Model Reconstruction and Shale Oil/Gas Exploitation. Energy & Fuels 2023, 37 (1) , 98-117. https://doi.org/10.1021/acs.energyfuels.2c03307
- Megan A. Lim, Song Yang, Huanghao Mai, Alan C. Cheng. Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions. Journal of Chemical Information and Modeling 2022, 62 (24) , 6336-6341. https://doi.org/10.1021/acs.jcim.2c00245
- Maximilian Beckers, Nikolas Fechner, Nikolaus Stiefl. 25 Years of Small-Molecule Optimization at Novartis: A Retrospective Analysis of Chemical Series Evolution. Journal of Chemical Information and Modeling 2022, 62 (23) , 6002-6021. https://doi.org/10.1021/acs.jcim.2c00785
- Joshua L. Lansford, Brian C. Barnes, Betsy M. Rice, Klavs F. Jensen. Building Chemical Property Models for Energetic Materials from Small Datasets Using a Transfer Learning Approach. Journal of Chemical Information and Modeling 2022, 62 (22) , 5397-5410. https://doi.org/10.1021/acs.jcim.2c00841
- Hisham Abdel-Aty, Ian R. Gould. Large-Scale Distributed Training of Transformers for Chemical Fingerprinting. Journal of Chemical Information and Modeling 2022, 62 (20) , 4852-4862. https://doi.org/10.1021/acs.jcim.2c00715
- Alex S. Moraes, Gabriel A. Pinheiro, Tuanan C. Lourenço, Mauro C. Lopes, Marcos G. Quiles, Luis G. Dias, Juarez L. F. Da Silva. Screening of the Role of the Chemical Structure in the Electrochemical Stability Window of Ionic Liquids: DFT Calculations Combined with Data Mining. Journal of Chemical Information and Modeling 2022, 62 (19) , 4702-4712. https://doi.org/10.1021/acs.jcim.2c00748
- Masato Sumita, Kei Terayama, Ryo Tamura, Koji Tsuda. QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization. Journal of Chemical Information and Modeling 2022, 62 (18) , 4427-4434. https://doi.org/10.1021/acs.jcim.2c00812
- GuanYa Yang, Wai Yuet Chiu, Jiang Wu, Yi Zhou, ShuGuang Chen, WeiJun Zhou, Jiaqi Fan, GuanHua Chen. Predicting Experimental Heats of Formation via Deep Learning with Limited Experimental Data. The Journal of Physical Chemistry A 2022, 126 (36) , 6295-6300. https://doi.org/10.1021/acs.jpca.2c02957
- Min Xie, Xiaonan Sun, Weixing Li, Jiwen Guan, Zhenhao Liang, Yongjun Hu. A Facile Route for the Formation of Complex Nitrogen-Containing Prebiotic Molecules in the Interstellar Medium. The Journal of Physical Chemistry Letters 2022, 13 (34) , 8207-8213. https://doi.org/10.1021/acs.jpclett.2c01857
- Florian Spenke, Bernd Hartke. Graph-based Automated Macro-Molecule Assembly. Journal of Chemical Information and Modeling 2022, 62 (16) , 3714-3723. https://doi.org/10.1021/acs.jcim.2c00609
- Filipe Menezes, Grzegorz M. Popowicz. ULYSSES: An Efficient and Easy to Use Semiempirical Library for C++. Journal of Chemical Information and Modeling 2022, 62 (16) , 3685-3694. https://doi.org/10.1021/acs.jcim.2c00757
- Zong-Rong Ye, Sheng-Hsuan Hung, Berlin Chen, Ming-Kang Tsai. Assessment of Predicting Frontier Orbital Energies for Small Organic Molecules Using Knowledge-Based and Structural Information. ACS Engineering Au 2022, 2 (4) , 360-368. https://doi.org/10.1021/acsengineeringau.2c00011
- Emad Al Ibrahim, Aamir Farooq. Transfer Learning Approach to Multitarget Temperature-Dependent Reaction Rate Prediction. The Journal of Physical Chemistry A 2022, 126 (28) , 4617-4629. https://doi.org/10.1021/acs.jpca.2c00713
- Jingbai Li, Steven A. Lopez. A Look Inside the Black Box of Machine Learning Photodynamics Simulations. Accounts of Chemical Research 2022, 55 (14) , 1972-1984. https://doi.org/10.1021/acs.accounts.2c00288
- Jonathan Shearer, Jose L. Castro, Alastair D. G. Lawson, Malcolm MacCoss, Richard D. Taylor. Rings in Clinical Trials and Drugs: Present and Future. Journal of Medicinal Chemistry 2022, 65 (13) , 8699-8712. https://doi.org/10.1021/acs.jmedchem.2c00473
- Kanishka Singh, Jannes Münchmeyer, Leon Weber, Ulf Leser, Annika Bande. Graph Neural Networks for Learning Molecular Excitation Spectra. Journal of Chemical Theory and Computation 2022, 18 (7) , 4408-4417. https://doi.org/10.1021/acs.jctc.2c00255
- Huaipan Jiang, Jian Wang, Weilin Cong, Yihe Huang, Morteza Ramezani, Anup Sarma, Nikolay V. Dokholyan, Mehrdad Mahdavi, Mahmut T. Kandemir. Predicting Protein–Ligand Docking Structure with Graph Neural Network. Journal of Chemical Information and Modeling 2022, 62 (12) , 2923-2932. https://doi.org/10.1021/acs.jcim.2c00127
- Jiahui Yu, Jike Wang, Hong Zhao, Junbo Gao, Yu Kang, Dongsheng Cao, Zhe Wang, Tingjun Hou. Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism. Journal of Chemical Information and Modeling 2022, 62 (12) , 2973-2986. https://doi.org/10.1021/acs.jcim.2c00038
- Kiran Sasikumar, Raghavan Ranganathan, Srujan Rokkam, Tapan Desai, Richard Burnes, Peter Cross. Development of Chemical Kinetics Models from Atomistic Reactive Molecular Dynamics Simulations: Application to Iso-octane Combustion and Rubber Ablative Degradation. The Journal of Physical Chemistry A 2022, 126 (21) , 3358-3372. https://doi.org/10.1021/acs.jpca.2c00901
- Mohammadamin Tavakoli, Aaron Mood, David Van Vranken, Pierre Baldi. Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity. Journal of Chemical Information and Modeling 2022, 62 (9) , 2121-2132. https://doi.org/10.1021/acs.jcim.1c01400
- Wendy A. Warr, Marc C. Nicklaus, Christos A. Nicolaou, Matthias Rarey. Exploration of Ultralarge Compound Collections for Drug Discovery. Journal of Chemical Information and Modeling 2022, 62 (9) , 2021-2034. https://doi.org/10.1021/acs.jcim.2c00224
- Eiichi Kojima, Atsuhiro Iimuro, Mado Nakajima, Hirotaka Kinuta, Naoya Asada, Yusuke Sako, Zenzaburo Nakata, Kentaro Uemura, Shuhei Arita, Shinobu Miki, Chiaki Wakasa-Morimoto, Yuki Tachibana. Pocket-to-Lead: Structure-Based De Novo Design of Novel Non-peptidic HIV-1 Protease Inhibitors Using the Ligand Binding Pocket as a Template. Journal of Medicinal Chemistry 2022, 65 (8) , 6157-6170. https://doi.org/10.1021/acs.jmedchem.1c02217
- Ankur Kumar Gupta, Krishnan Raghavachari. Three-Dimensional Convolutional Neural Networks Utilizing Molecular Topological Features for Accurate Atomization Energy Predictions. Journal of Chemical Theory and Computation 2022, 18 (4) , 2132-2143. https://doi.org/10.1021/acs.jctc.1c00504
- Yi Hua, Xiaobao Fang, Guomeng Xing, Yuan Xu, Li Liang, Chenglong Deng, Xiaowen Dai, Haichun Liu, Tao Lu, Yanmin Zhang, Yadong Chen. Effective Reaction-Based De Novo Strategy for Kinase Targets: A Case Study on MERTK Inhibitors. Journal of Chemical Information and Modeling 2022, 62 (7) , 1654-1668. https://doi.org/10.1021/acs.jcim.2c00068
- Yury Kostyukevich, Sergey Sosnin, Sergey Osipenko, Oxana Kovaleva, Lidiia Rumiantseva, Albert Kireev, Alexander Zherebker, Maxim Fedorov, Evgeny N. Nikolaev. PyFragMS─A Web Tool for the Investigation of the Collision-Induced Fragmentation Pathways. ACS Omega 2022, 7 (11) , 9710-9719. https://doi.org/10.1021/acsomega.1c07272
- Penglei Wang, Shuangjia Zheng, Yize Jiang, Chengtao Li, Junhong Liu, Chang Wen, Atanas Patronov, Dahong Qian, Hongming Chen, Yuedong Yang. Structure-Aware Multimodal Deep Learning for Drug–Protein Interaction Prediction. Journal of Chemical Information and Modeling 2022, 62 (5) , 1308-1317. https://doi.org/10.1021/acs.jcim.2c00060
- Raimon Fabregat, Alberto Fabrizio, Edgar A. Engel, Benjamin Meyer, Veronika Juraskova, Michele Ceriotti, Clemence Corminboeuf. Local Kernel Regression and Neural Network Approaches to the Conformational Landscapes of Oligopeptides. Journal of Chemical Theory and Computation 2022, 18 (3) , 1467-1479. https://doi.org/10.1021/acs.jctc.1c00813
- André F. Oliveira, Juarez L. F. Da Silva, Marcos G. Quiles. Molecular Property Prediction and Molecular Design Using a Supervised Grammar Variational Autoencoder. Journal of Chemical Information and Modeling 2022, 62 (4) , 817-828. https://doi.org/10.1021/acs.jcim.1c01573
- Ruocheng Han, Rangsiman Ketkaew, Sandra Luber. A Concise Review on Recent Developments of Machine Learning for the Prediction of Vibrational Spectra. The Journal of Physical Chemistry A 2022, 126 (6) , 801-812. https://doi.org/10.1021/acs.jpca.1c10417
- John M. Simmie. C2H5NO Isomers: From Acetamide to 1,2-Oxazetidine and Beyond. The Journal of Physical Chemistry A 2022, 126 (6) , 924-939. https://doi.org/10.1021/acs.jpca.1c09984
- Hideo Doi, Kazuaki Z. Takahashi, Takeshi Aoyagi. Screening toward the Development of Fingerprints of Atomic Environments Using Bond-Orientational Order Parameters. ACS Omega 2022, 7 (5) , 4606-4613. https://doi.org/10.1021/acsomega.1c06587
- Viktor Zaverkin, Julia Netz, Fabian Zills, Andreas Köhn, Johannes Kästner. Thermally Averaged Magnetic Anisotropy Tensors via Machine Learning Based on Gaussian Moments. Journal of Chemical Theory and Computation 2022, 18 (1) , 1-12. https://doi.org/10.1021/acs.jctc.1c00853
- Lei Tao, Vikas Varshney, Ying Li. Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature. Journal of Chemical Information and Modeling 2021, 61 (11) , 5395-5413. https://doi.org/10.1021/acs.jcim.1c01031
- Tiago Sousa, João Correia, Vítor Pereira, Miguel Rocha. Generative Deep Learning for Targeted Compound Design. Journal of Chemical Information and Modeling 2021, 61 (11) , 5343-5361. https://doi.org/10.1021/acs.jcim.0c01496
- Ying Yang, Kun Yao, Matthew P. Repasky, Karl Leswing, Robert Abel, Brian K. Shoichet, Steven V. Jerome. Efficient Exploration of Chemical Space with Docking and Deep Learning. Journal of Chemical Theory and Computation 2021, 17 (11) , 7106-7119. https://doi.org/10.1021/acs.jctc.1c00810
- Xiaochu Tong, Xiaohong Liu, Xiaoqin Tan, Xutong Li, Jiaxin Jiang, Zhaoping Xiong, Tingyang Xu, Hualiang Jiang, Nan Qiao, Mingyue Zheng. Generative Models for De Novo Drug Design. Journal of Medicinal Chemistry 2021, 64 (19) , 14011-14027. https://doi.org/10.1021/acs.jmedchem.1c00927
- Viktor Zaverkin, David Holzmüller, Ingo Steinwart, Johannes Kästner. Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments. Journal of Chemical Theory and Computation 2021, 17 (10) , 6658-6670. https://doi.org/10.1021/acs.jctc.1c00527
- Philippe Gantzer, Benoit Creton, Carlos Nieto-Draghi. Comparisons of Molecular Structure Generation Methods Based on Fragment Assemblies and Genetic Graphs. Journal of Chemical Information and Modeling 2021, 61 (9) , 4245-4258. https://doi.org/10.1021/acs.jcim.1c00803
- Luis Cesar de Azevedo, Gabriel A. Pinheiro, Marcos G. Quiles, Juarez L. F. Da Silva, Ronaldo C. Prati. Systematic Investigation of Error Distribution in Machine Learning Algorithms Applied to the Quantum-Chemistry QM9 Data Set Using the Bias and Variance Decomposition. Journal of Chemical Information and Modeling 2021, 61 (9) , 4210-4223. https://doi.org/10.1021/acs.jcim.1c00503
- Ava P. Soleimany, Alexander Amini, Samuel Goldman, Daniela Rus, Sangeeta N. Bhatia, Connor W. Coley. Evidential Deep Learning for Guided Molecular Property Prediction and Discovery. ACS Central Science 2021, 7 (8) , 1356-1367. https://doi.org/10.1021/acscentsci.1c00546
- Julia Westermayr, Philipp Marquetand. Machine Learning for Electronically Excited States of Molecules. Chemical Reviews 2021, 121 (16) , 9873-9926. https://doi.org/10.1021/acs.chemrev.0c00749
- Oliver T. Unke, Stefan Chmiela, Huziel E. Sauceda, Michael Gastegger, Igor Poltavsky, Kristof T. Schütt, Alexandre Tkatchenko, Klaus-Robert Müller. Machine Learning Force Fields. Chemical Reviews 2021, 121 (16) , 10142-10186. https://doi.org/10.1021/acs.chemrev.0c01111
- Aditya Nandy, Chenru Duan, Michael G. Taylor, Fang Liu, Adam H. Steeves, Heather J. Kulik. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chemical Reviews 2021, 121 (16) , 9927-10000. https://doi.org/10.1021/acs.chemrev.1c00347
- Bing Huang, O. Anatole von Lilienfeld. Ab Initio Machine Learning in Chemical Compound Space. Chemical Reviews 2021, 121 (16) , 10001-10036. https://doi.org/10.1021/acs.chemrev.0c01303
- Michael Tynes, Wenhao Gao, Daniel J. Burrill, Enrique R. Batista, Danny Perez, Ping Yang, Nicholas Lubbers. Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search. Journal of Chemical Information and Modeling 2021, 61 (8) , 3846-3857. https://doi.org/10.1021/acs.jcim.1c00670
- Luis Itza Vazquez-Salazar, Eric D. Boittier, Oliver T. Unke, Markus Meuwly. Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies. Journal of Chemical Theory and Computation 2021, 17 (8) , 4769-4785. https://doi.org/10.1021/acs.jctc.1c00363
- Shachar Fite, Omri Nitecki, Zeev Gross. Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor. Journal of Chemical Information and Modeling 2021, 61 (7) , 3285-3291. https://doi.org/10.1021/acs.jcim.1c00563
- Logan Ward, Naveen Dandu, Ben Blaiszik, Badri Narayanan, Rajeev S. Assary, Paul C. Redfern, Ian Foster, Larry A. Curtiss. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. The Journal of Physical Chemistry A 2021, 125 (27) , 5990-5998. https://doi.org/10.1021/acs.jpca.1c01960
- SahaIshikaGraduate Student Researcherishikasaha@g.
ucla. eduHarranPatrick G.D.J. & J.M. Cram Chair in Organic Chemistryharran@chem. ucla. eduDr. Jonathan Bohmann, Department of Pharmaceuticals and Bioengineering, Southwest Research Institute, Ryan Gumpper, Postdoctoral Researcher, University of North Carolina at Chapel Hill. Virtual Screening for Chemists. 2021https://doi.org/10.1021/acsinfocus.7e5001 - Alan E. Bilsland, Kirsten McAulay, Ryan West, Angelo Pugliese, Justin Bower. Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction. Journal of Chemical Information and Modeling 2021, 61 (6) , 2547-2559. https://doi.org/10.1021/acs.jcim.0c01226
- Maarten R. Dobbelaere, Pieter P. Plehiers, Ruben Van de Vijver, Christian V. Stevens, Kevin M. Van Geem. Learning Molecular Representations for Thermochemistry Prediction of Cyclic Hydrocarbons and Oxygenates. The Journal of Physical Chemistry A 2021, 125 (23) , 5166-5179. https://doi.org/10.1021/acs.jpca.1c01956
- Guo-Li Xiong, Yue Zhao, Lu Liu, Zhong-Ye Ma, Ai-Ping Lu, Yan Cheng, Ting-Jun Hou, Dong-Sheng Cao. Computational Bioactivity Fingerprint Similarities To Navigate the Discovery of Novel Scaffolds. Journal of Medicinal Chemistry 2021, 64 (11) , 7544-7554. https://doi.org/10.1021/acs.jmedchem.1c00234
- R. Han, S. Luber. Fast Estimation of Møller–Plesset Correlation Energies Based on Atomic Contributions. The Journal of Physical Chemistry Letters 2021, 12 (22) , 5324-5331. https://doi.org/10.1021/acs.jpclett.1c00900
- Ricardo M. Borges, Sean M. Colby, Susanta Das, Arthur S. Edison, Oliver Fiehn, Tobias Kind, Jesi Lee, Amy T. Merrill, Kenneth M. Merz, Jr., Thomas O. Metz, Jamie R. Nunez, Dean J. Tantillo, Lee-Ping Wang, Shunyang Wang, Ryan S. Renslow. Quantum Chemistry Calculations for Metabolomics. Chemical Reviews 2021, 121 (10) , 5633-5670. https://doi.org/10.1021/acs.chemrev.0c00901
- Felix Mayr, Alessio Gagliardi. Global Property Prediction: A Benchmark Study on Open-Source, Perovskite-like Datasets. ACS Omega 2021, 6 (19) , 12722-12732. https://doi.org/10.1021/acsomega.1c00991
- Pingshi Yu, Alistair J. Sterling, Jotun Hein. A Novel Automated Screening Method for Combinatorially Generated Small Molecules. Journal of Chemical Information and Modeling 2021, 61 (4) , 1637-1646. https://doi.org/10.1021/acs.jcim.0c01462
- Andrzej M. Żurański, Jesus I. Martinez Alvarado, Benjamin J. Shields, Abigail G. Doyle. Predicting Reaction Yields via Supervised Learning. Accounts of Chemical Research 2021, 54 (8) , 1856-1865. https://doi.org/10.1021/acs.accounts.0c00770
- Juliette Zito, Ivan Infante. The Future of Ligand Engineering in Colloidal Semiconductor Nanocrystals. Accounts of Chemical Research 2021, 54 (7) , 1555-1564. https://doi.org/10.1021/acs.accounts.0c00765
- Jianing Lu, Song Xia, Jieyu Lu, Yingkai Zhang. Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. Journal of Chemical Information and Modeling 2021, 61 (3) , 1095-1104. https://doi.org/10.1021/acs.jcim.1c00007
- Dakota L. Folmsbee, David R. Koes, Geoffrey R. Hutchison. Evaluation of Thermochemical Machine Learning for Potential Energy Curves and Geometry Optimization. The Journal of Physical Chemistry A 2021, 125 (9) , 1987-1993. https://doi.org/10.1021/acs.jpca.0c10147
- Felicity F. Nielson, Sean M. Colby, Dennis G. Thomas, Ryan S. Renslow, Thomas O. Metz. Exploring the Impacts of Conformer Selection Methods on Ion Mobility Collision Cross Section Predictions. Analytical Chemistry 2021, 93 (8) , 3830-3838. https://doi.org/10.1021/acs.analchem.0c04341
- Pablo A. Unzueta, Chandler S. Greenwell, Gregory J. O. Beran. Predicting Density Functional Theory-Quality Nuclear Magnetic Resonance Chemical Shifts via Δ-Machine Learning. Journal of Chemical Theory and Computation 2021, 17 (2) , 826-840. https://doi.org/10.1021/acs.jctc.0c00979
- Jon Paul Janet, Chenru Duan, Aditya Nandy, Fang Liu, Heather J. Kulik. Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design. Accounts of Chemical Research 2021, 54 (3) , 532-545. https://doi.org/10.1021/acs.accounts.0c00686
- Yao Shi, Paloma L. Prieto, Tara Zepel, Shad Grunert, Jason E. Hein. Automated Experimentation Powers Data Science in Chemistry. Accounts of Chemical Research 2021, 54 (3) , 546-555. https://doi.org/10.1021/acs.accounts.0c00736
- Joydeep Munshi, Wei Chen, TeYu Chien, Ganesh Balasubramanian. Transfer Learned Designer Polymers For Organic Solar Cells. Journal of Chemical Information and Modeling 2021, 61 (1) , 134-142. https://doi.org/10.1021/acs.jcim.0c01157
- Wenhao Gao, Connor W. Coley. The Synthesizability of Molecules Proposed by Generative Models. Journal of Chemical Information and Modeling 2020, 60 (12) , 5714-5723. https://doi.org/10.1021/acs.jcim.0c00174
- Obaidur Rahaman, Alessio Gagliardi. Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints. Journal of Chemical Information and Modeling 2020, 60 (12) , 5971-5983. https://doi.org/10.1021/acs.jcim.0c00687
- Beomchang Kang, Chaok Seok, Juyong Lee. Prediction of Molecular Electronic Transitions Using Random Forests. Journal of Chemical Information and Modeling 2020, 60 (12) , 5984-5994. https://doi.org/10.1021/acs.jcim.0c00698
- Maho Nakata, Tomomi Shimazaki, Masatomo Hashimoto, Toshiyuki Maeda. PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties. Journal of Chemical Information and Modeling 2020, 60 (12) , 5891-5899. https://doi.org/10.1021/acs.jcim.0c00740
- David Balcells, Bastian Bjerkem Skjelstad. tmQM Dataset—Quantum Geometries and Properties of 86k Transition Metal Complexes. Journal of Chemical Information and Modeling 2020, 60 (12) , 6135-6146. https://doi.org/10.1021/acs.jcim.0c01041
- Sebastian Mosbach, Angiras Menon, Feroz Farazi, Nenad Krdzavac, Xiaochi Zhou, Jethro Akroyd, Markus Kraft. Multiscale Cross-Domain Thermochemical Knowledge-Graph. Journal of Chemical Information and Modeling 2020, 60 (12) , 6155-6166. https://doi.org/10.1021/acs.jcim.0c01145
- Marina P. Oliveira, Maurice Andrey, Salomé R. Rieder, Leyla Kern, David F. Hahn, Sereina Riniker, Bruno A. C. Horta, Philippe H. Hünenberger. Systematic Optimization of a Fragment-Based Force Field against Experimental Pure-Liquid Properties Considering Large Compound Families: Application to Saturated Haloalkanes. Journal of Chemical Theory and Computation 2020, 16 (12) , 7525-7555. https://doi.org/10.1021/acs.jctc.0c00683
- Narasimharao Mukku, Prabhakara Madivalappa Davanagere, Kaushik Chanda, Barnali Maiti. A Facile Microwave-Assisted Synthesis of Oxazoles and Diastereoselective Oxazolines Using Aryl-Aldehydes, p-Toluenesulfonylmethyl Isocyanide under Controlled Basic Conditions. ACS Omega 2020, 5 (43) , 28239-28248. https://doi.org/10.1021/acsomega.0c04130
- B. Christopher Rinderspacher. Heuristic Global Optimization in Chemical Compound Space. The Journal of Physical Chemistry A 2020, 124 (43) , 9044-9060. https://doi.org/10.1021/acs.jpca.0c05941
Abstract
Figure 1
Figure 1. Enumeration of GDB-17 starting from mathematical graphs.
Figure 2
Figure 2. Size and MW profiles of the enumerated chemical space in GDB and the reference databases PubChem, ChEMBL, and DrugBank. The size of the leadlike subsets of GDB (GDBLL, GDBLLnoSR) is extrapolated from analyzing a 1% random subset of GDB-17.
Figure 3
Figure 3. Drugs and examples of isomers found in GDB-17. All isomers shown have a shape similarity score ROCS > 1.4. None of the isomers shown are known (Scifinder search). Only acyclovir does not occur in GDB-17 because it contains a hemiaminal (N–Csp3–O), a functional group which is excluded from the enumeration.
Figure 4
Figure 4. Molecule topologies and categories in GDB-17 and reference databases. A. Percentage of reference database compatible with GDB-17 enumeration rules or excluded due to nonenumerated halogen (acyl halide, aliphatic halocarbons) or sulfur (thiols, thioethers), functional groups (acyclic acetals, hemiacetals, aminals, azides, aliphatic nitro groups), element (P, Si, B, Bi, Hg, etc.), skeleton (nonaromatic C═C), or graph (e.g., small rings at 17 atoms). B. Fraction of compounds with small rings. C. Topologies D. Database contents as function of molecular categories. Molecules are assigned to one category only with priority order heteroaromatic > aromatic > heterocyclic > carbocyclic > acyclic. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.
Figure 5
Figure 5. Polarity features. A. c logP histogram in intervals −5.5 to −4.5, −4.5 to −3.5, etc; B. Average clogP as function of hac; C. H-bond donor atom (HBD) histogram; D. Average HBD as function of hac. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.
Figure 6
Figure 6. Molecular shape analyzed by the principal moments of inertia. (41) Occupancy maps are shown in the (P1,P2)-plane, in which P1 and P2 are the normalized ratios of the principal moments of inertia (for details see section Methods), and are colored from blue (1 cpd/pixel) to purple (maximum cpd/pixel for each map: GDB-17: 4,691, GDBLL-17: 889, GDBLLnoSR-17: 684, Pubchem-17: 6202, Chembl-17: 487, Drugbank-17: 4). The inserts show an enlarged view of the lower left edge of each triangle where occupancy is highest for PubChem-17, ChEMBL-17, and DrugBank-17. The GDB-17, GDBLL-17, and GDBLLnoSR-17 were analyzed with a random subset of 16.7 million molecules from GDB-17. For all compounds a single stereoisomer was analyzed as generated by CORINA.
Figure 7
Figure 7. Histograms of quaternary centers (qv) and bonds in fused rings (bfr) in the different databases. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.
Figure 8
Figure 8. Stereochemistry. A Numbers of stereoisomers per compounds. B. Average number of stereoisomer per compound as a function of hac. Stereoisomers were generated from SMILES using CORINA. The data for GDB-17, GDBLL-17, and GDBLLnoSR-17 stem from the analysis of a random 16.7 million subset of GDB-17.
Figure 9
Figure 9. Examples of yet unknown C17-ring systems from GDB-17. These hydrocarbons do not give any hits in Scifinder using ″any atom″ types for carbons and ″any bond″ for bonds, including substructure searches but locking further ring fusions. Stereochemistry is not considered in these searches. The ring systems are shown as one possible stereoisomer.
References
ARTICLE SECTIONSThis article references 47 other publications.
- 1Lipkus, A. H.; Yuan, Q.; Lucas, K. A.; Funk, S. A.; Bartelt, W. F.; Schenck, R. J.; Trippe, A. J. Structural diversity of organic chemistry. A scaffold analysis of the CAS Registry J. Org. Chem. 2008, 73, 4443– 4451[ACS Full Text
], [CAS], Google Scholar
1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXmtlylu7c%253D&md5=c03682d3a3fab0b0eec7a36ef1ecc778Structural Diversity of Organic Chemistry. A Scaffold Analysis of the CAS RegistryLipkus, Alan H.; Yuan, Qiong; Lucas, Karen A.; Funk, Susan A.; Bartelt, William F., III; Schenck, Roger J.; Trippe, Anthony J.Journal of Organic Chemistry (2008), 73 (12), 4443-4451CODEN: JOCEAH; ISSN:0022-3263. (American Chemical Society)The anal. of chem. diversity has become a topic of considerable interest in recent years. This interest has been stimulated by the challenge of discovering novel small-mol. pharmaceuticals. The development of technologies such as combinatorial synthesis and high-throughput screening has made it possible to explore drug-like regions of chem. space in relatively short times. Chem. space is vast and the problem of selecting which region of that space to explore remains a key issue in drug discovery. By analyzing the scaffold content of the CAS Registry, the authors attempt to characterize in a comprehensive way the structural diversity of org. chem. The scaffold of a mol. is taken to be its framework, defined as all its ring systems and all the linkers that connect them. Framework data from more than 24 million org. compds. are analyzed. The distribution of frameworks among compds. is found to be top-heavy, i.e., a small percentage of frameworks occurs in a large percentage of compds. When frameworks are analyzed at the graph level, an even more top-heavy distribution is found: half of the compds. can be described by only 143 framework shapes. The most significant finding is that the framework distribution conforms almost exactly to a power law. This suggests that the more often a framework has been used as the basis for a compd., the more likely it is to be used in another compd. This may be explained by the cost of synthesis: making a new deriv. of a framework is probably less costly if many other derivs. are known. The authors believe that this power law is evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of org. chem. - 2ACS NEWS Chem. Eng. News 2011, 89, 38Google ScholarThere is no corresponding record for this reference.
- 3Bleicher, K. H.; Bohm, H. J.; Muller, K.; Alanine, A. I. Hit and lead generation: Beyond high-throughput screening Nat. Rev. Drug Discovery 2003, 2, 369– 378[Crossref], [PubMed], [CAS], Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXjslamtb8%253D&md5=b54e55c1f0d34fbd172c8aa9b996591dA guide to drug discovery: Hit and lead generation: beyond high-throughput screeningBleicher, Konrad H.; Boehm, Hans-Joachim; Mueller, Klaus; Alanine, Alexander I.Nature Reviews Drug Discovery (2003), 2 (5), 369-378CODEN: NRDDAG; ISSN:1474-1776. (Nature Publishing Group)A review. The identification of small-mol. modulators of protein function, and the process of transforming these into high-content lead series, are key activities in modern drug discovery. The decisions taken during this process have far-reaching consequences for success later in lead optimization and even more crucially in clin. development. Recently, there has been an increased focus on these activities due to escalating downstream costs resulting from high clin. failure rates. In addn., the vast emerging opportunities from efforts in functional genomics and proteomics demands a departure from the linear process of identification, evaluation and refinement activities towards a more integrated parallel process. This calls for flexible, fast and cost-effective strategies to meet the demands of producing high-content lead series with improved prospects for clin. success.
- 4Schreiber, S. L. Small molecules: the missing link in the central dogma Nat. Chem. Biol. 2005, 1, 64– 66[Crossref], [PubMed], [CAS], Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXls1Oms7Y%253D&md5=d4a54994c0d933602d9d02747e5e1ddcSmall molecules: the missing link in the central dogmaSchreiber, Stuart L.Nature Chemical Biology (2005), 1 (2), 64-66CODEN: NCBABT; ISSN:1552-4450. (Nature Publishing Group)There is no expanded citation for this reference.
- 5Mayr, L. M.; Bojanic, D. Novel trends in high-throughput screening Curr. Opin. Pharmacol. 2009, 9, 580– 588[Crossref], [PubMed], [CAS], Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXht1WntLnL&md5=50284fdd9f4d610e81fb442b78b7b69eNovel trends in high-throughput screeningMayr, Lorenz M.; Bojanic, DejanCurrent Opinion in Pharmacology (2009), 9 (5), 580-588CODEN: COPUBK; ISSN:1471-4892. (Elsevier B.V.)A review. High-throughput screening (HTS) is a well-established process for lead discovery in Pharma and Biotech companies and is now also being used for basic and applied research in academia. It comprises the screening of large chem. libraries for activity against biol. targets via the use of automation, miniaturized assays and large-scale data anal. Since its first advent in the early to mid 1990s, the field of HTS has seen not only a continuous change in technol. and processes, but also an adaptation to various needs in lead discovery. HTS has now evolved into a mature discipline that is a crucial source of chem. starting points for drug discovery. Whereas in previous years much emphasis has been put on a steady increase in screening capacity (quant. increase') via automation and miniaturization, the past years have seen a much greater emphasis on content and quality (qual. increase'). Today, many experts in the field see HTS at a crossroad with the need to decide on either higher throughput/more experimentation or a greater focus on assays of greater physiol. relevance, both of which may lead to higher productivity in pharmaceutical R&D. In this paper, we describe the development of HTS over the past decade and point out our own ideas for future directions of HTS in biomedical research. We predict that the trend toward further miniaturization will slow down with the balanced implementation of 384 well, 1536 well, and 384 low vol. well plates. Furthermore, we envisage that there will be much more emphasis on rigorous assay and chem. characterization, particularly considering that novel and more difficult target classes will be pursued. In recent years we have witnessed a clear trend in the drug discovery community toward rigorous hit validation by the use of orthogonal readout technologies, label free and biophys. methodologies. We also see a trend toward a more flexible use of the various screening approaches in lead discovery, i.e., the use of both full deck compd. screening as well as the use of focused screening and iterative screening approaches. Moreover, we expect greater usage of target identification strategies downstream of phenotypic screening and the more effective implementation of affinity selection technologies as a result of advances in chem. diversity methodologies. We predict that, ultimately, each hit finding strategy will be much more project-related, tailor-made, and better integrated into the broader drug discovery efforts.
- 6Renner, S.; Popov, M.; Schuffenhauer, A.; Roth, H. J.; Breitenstein, W.; Marzinzik, A.; Lewis, I.; Krastel, P.; Nigsch, F.; Jenkins, J.; Jacoby, E. Recent trends and observations in the design of high-quality screening collections Future Med. Chem 2011, 3, 751– 766[Crossref], [PubMed], [CAS], Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlvFGisLY%253D&md5=f876f165f937c9792c989053492e59f5Recent trends and observations in the design of high-quality screening collectionsRenner, Steffen; Popov, Maxim; Schuffenhauer, Ansgar; Roth, Hans-Joerg; Breitenstein, Werner; Marzinzik, Andreas; Lewis, Ian; Krastel, Philipp; Nigsch, Florian; Jenkins, Jeremy; Jacoby, EdgarFuture Medicinal Chemistry (2011), 3 (6), 751-766CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)A review. The design of a high-quality screening collection is of utmost importance for the early drug-discovery process and provides, in combination with high-quality assay systems, the foundation of future discoveries. Herein, we review recent trends and observations to successfully expand the access to bioactive chem. space, including the feedback from hit assessment interviews of high-throughput screening campaigns; recent successes with chemogenomics target family approaches, the identification of new relevant target/domain families, diversity-oriented synthesis and new emerging compd. classes, and non-classical approaches, such as fragment-based screening and DNA-encoded chem. libraries. The role of in silico library design approaches are emphasized.
- 7Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discovery 2004, 3, 711– 715[Crossref], [PubMed], [CAS], Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmtVOhtLs%253D&md5=f9025c13a1506f607aaf68415570ed01Opinion: Can the pharmaceutical industry reduce attrition rates?Kola, Ismail; Landis, JohnNature Reviews Drug Discovery (2004), 3 (8), 711-716CODEN: NRDDAG; ISSN:1474-1776. (Nature Publishing Group)The pharmaceutical industry faces considerable challenges, both politically and fiscally. Politically, governments around the world are trying to contain costs and, as health care budgets constitute a very significant part of governmental spending, these costs are the subject of intense scrutiny. In the United States, drug costs are also the subject of intense political discourse. This article deals with the fiscal pressures that face the industry from the perspective of R&D. What impinges on productivity How can we improve current reduced R&D productivity.
- 8Hann, M. M. Molecular obesity, potency and other addictions in drug discovery MedChemComm 2011, 2, 349– 355[Crossref], [CAS], Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXls1Kht7g%253D&md5=954870db4b1d9e846612c3bb256582f7Molecular obesity, potency and other addictions in drug discoveryHann, Michael M.MedChemComm (2011), 2 (5), 349-355CODEN: MCCEAY; ISSN:2040-2503. (Royal Society of Chemistry)A review. Despite the increase in global biol. and chem. knowledge the discovery of effective and safe new drugs seems to become harder rather than easier. Some of this challenge is due to increasing demands for safety and novelty, but some of the risk involved in this should be controllable if we had more effectively learned from our failures. This perspective reflects on some of the learnings of recent years in relation to the causes of attrition. The term Mol. Obesity is introduced to describe our tendency to build potency into mols. by the inappropriate use of lipophilicity which leads to the premature demise of drug candidates.
- 9Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules Nat. Rev. Drug Discovery 2005, 4, 649– 663[Crossref], [PubMed], [CAS], Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmvVOqtro%253D&md5=a30dbc58ed81e0b7fe3f7d41a668e9acComputer-based de novo design of drug-like moleculesSchneider, Gisbert; Fechner, UliNature Reviews Drug Discovery (2005), 4 (8), 649-663CODEN: NRDDAG; ISSN:1474-1776. (Nature Publishing Group)A review with refs. Ever since the first automated de novo design techniques were conceived only 15 years ago, the computer-based design of hit and lead structure candidates has emerged as a complementary approach to high-throughput screening. Although many challenges remain, de novo design supports drug discovery projects by generating novel pharmaceutically active agents with desired properties in a cost- and time-efficient manner. In this review, we outline the various design concepts and highlight current developments in computer-based de novo design.
- 10Jorgensen, W. L. Efficient drug lead discovery and optimization Acc. Chem. Res. 2009, 42, 724– 733[ACS Full Text
], [CAS], Google Scholar
10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXjsFentr4%253D&md5=9641587769701c6541b38e18fa05538aEfficient Drug Lead Discovery and OptimizationJorgensen, William L.Accounts of Chemical Research (2009), 42 (6), 724-733CODEN: ACHRE4; ISSN:0001-4842. (American Chemical Society)A review. During the 1980s, advances in the abilities to perform computer simulations of chem. and biomol. systems and to calc. free energy changes led to the expectation that such methodol. would soon show great utility for guiding mol. design. Important potential applications included design of selective receptors, catalysts, and regulators of biol. function including enzyme inhibitors. This time also saw the rise of high-throughput screening and combinatorial chem. along with complementary computational methods for de novo design and virtual screening including docking. These technologies appeared poised to deliver diverse lead compds. for any biol. target. As with many technol. advances, realization of the expectations required significant addnl. effort and time. However, as summarized here, striking success has now been achieved for computer-aided drug lead generation and optimization. De novo design using both mol. growing and docking are illustrated for lead generation, and lead optimization features free energy perturbation calcns. in conjunction with Monte Carlo statistical mechanics simulations for protein-inhibitor complexes in aq. soln. The specific applications are to the discovery of non-nucleoside inhibitors of HIV reverse transcriptase (HIV-RT) and inhibitors of the binding of the proinflammatory cytokine MIF to its receptor CD74. A std. protocol is presented that includes scans for possible addns. of small substituents to a mol. core, interchange of heterocycles, and focused optimization of substituents at one site. Initial leads with activities at low-micromolar concns. have been advanced rapidly to low-nanomolar inhibitors. - 11Reymond, J. L.; Van Deursen, R.; Blum, L. C.; Ruddigkeit, L. Chemical space as a source for new drugs MedChemComm 2010, 1, 30– 38[Crossref], [CAS], Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtVeju7rF&md5=579428edc79f706101ab58c505052c72Chemical space as a source for new drugsReymond, Jean-Louis; van Deursen, Ruud; Blum, Lorenz C.; Ruddigkeit, LarsMedChemComm (2010), 1 (1), 30-38CODEN: MCCEAY; ISSN:2040-2503. (Royal Society of Chemistry)A review. The chem. space is the ensemble of all possible mols., which is believed to contain at least 1060 org. mols. below 500 Da of possible interest for drug discovery. This review summarizes the development of the chem. space concept from enumerating acyclic hydrocarbons in the 1800's to the recent assembly of the chem. universe database GDB. Chem. space travel algorithms can be used to explore defined regions of chem. space by generating focused virtual libraries. Maps of the chem. space are produced from property spaces visualized by principal component anal. or by self-organizing maps, and from structural analyses such as the scaffold-tree or the MQN-system. Virtual screening of virtual chem. space followed by synthesis and testing of the best hits leads to the discovery of new drug mols.
- 12Hartenfeller, M.; Schneider, G. De novo drug design Methods Mol. Biol. 2011, 672, 299– 323[Crossref], [PubMed], [CAS], Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtlejsr7L&md5=3a30147da9187fa4895ba807cefed82aDe novo drug designHartenfeller, Markus; Schneider, GisbertMethods in Molecular Biology (New York, NY, United States) (2011), 672 (Chemoinformatics and Computational Chemical Biology), 299-323CODEN: MMBIED; ISSN:1064-3745. (Springer)A review. Computer-assisted mol. design supports drug discovery by suggesting novel chemotypes and compd. modifications for lead structure optimization. While the aspect of synthetic feasibility of the automatically designed compds. has been neglected for a long time, we are currently witnessing an increased interest in this topic. Here, we review state-of-the-art software for de novo drug design with a special emphasis on fragment-based techniques that generate druglike, synthetically accessible compds. The importance of scoring functions that can be used to predict compd. reactivity and potency is highlighted, and several promising solns. are discussed. Recent practical validation studies are presented that have already demonstrated that rule-based fragment assembly can result in novel synthesizable compds. with druglike properties and a desired biol. activity.
- 13Klebe, G. Virtual ligand screening: strategies, perspectives and limitations Drug Discovery Today 2006, 11, 580– 594[Crossref], [PubMed], [CAS], Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XlvFGqtLo%253D&md5=5adfd125d48082238a6c5ad0e8343c59Virtual ligand screening: strategies, perspectives and limitationsKlebe, GerhardDrug Discovery Today (2006), 11 (13 & 14), 580-594CODEN: DDTOFS; ISSN:1359-6446. (Elsevier B.V.)A review. In contrast to high-throughput screening, in virtual ligand screening (VS), compds. are selected using computer programs to predict their binding to a target receptor. A key prerequisite is knowledge about the spatial and energetic criteria responsible for protein-ligand binding. The concepts and prerequisites to perform VS are summarized here, and explanations are sought for the enduring limitations of the technol. Target selection, anal. and prepn. are discussed, as well as considerations about the compilation of candidate ligand libraries. The tools and strategies of a VS campaign, and the accuracy of scoring and ranking of the results, are also considered.
- 14Kolb, P.; Ferreira, R. S.; Irwin, J. J.; Shoichet, B. K. Docking and chemoinformatic screens for new ligands and targets Curr. Opin. Biotechnol. 2009, 20, 429– 36[Crossref], [PubMed], [CAS], Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXht1OrtLbL&md5=4b6b2457a735e18be5780d9299ebffe6Docking and chemoinformatic screens for new ligands and targetsKolb, Peter; Ferreira, Rafaela S.; Irwin, John J.; Shoichet, Brian K.Current Opinion in Biotechnology (2009), 20 (4), 429-436CODEN: CUOBE3; ISSN:0958-1669. (Elsevier B.V.)A review. Computer-based docking screens are now widely used to discover new ligands for targets of known structure; in the last two years alone, the discovery of ligands for more than 20 proteins has been reported. Recently, investigators have also turned to predicting new substrates for enzymes of unknown function, taking docking in a wholly new direction. Increasingly, the hit rates, the true-positives, and the false-positives from the docking screens are being compared to those from empirical, high-throughput screens, revealing the strengths, weaknesses, and complementarities of both techniques. The recent efflorescence of GPCR structures has made these quintessential drug targets available to structure-based approaches. Consistent with their druggability', the docking screens have returned high hit rates and potent mols. Finally, in the last several years, an approach almost exactly opposite to docking has also appeared; this pharmacol. network approach begins not with the structure of the target but rather those of drug mols. and asks, given a pattern of chem. in the ligands, what targets may a particular drug bind to. This method, which returns to an older, pharmacol. logic, has been surprisingly successful in predicting new off-targets' for established drugs.
- 15Geppert, H.; Vogt, M.; Bajorath, J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation J. Chem. Inf. Model. 2010, 50, 205– 216[ACS Full Text
], [CAS], Google Scholar
15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXnvFWmsA%253D%253D&md5=aeb674ef7c6c93711eb21452732e21d8Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance EvaluationGeppert, Hanna; Vogt, Martin; Bajorath, JurgenJournal of Chemical Information and Modeling (2010), 50 (2), 205-216CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)There is no expanded citation for this reference. - 16Cayley, E. Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen Chem. Ber. 1875, 8, 1056– 1059
- 17Lederberg, J.; Sutherland, G. L.; Buchanan, B. G.; Feigenbaum, E. A.; Robertson, A. V.; Duffield, A. M.; Djerassi, C. Applications of artificial intelligence for chemical inference. I. Number of possible organic compounds. Acyclic structures containing carbon, hydrogen, oxygen, and nitrogen J. Am. Chem. Soc. 1969, 91, 2973– 2976
- 18Steinbeck, C. Recent developments in automated structure elucidation of natural products Nat. Prod. Rep. 2004, 21, 512– 518[Crossref], [PubMed], [CAS], Google Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXnsVKltr0%253D&md5=7d7486d14d46798ede0aa853faf013f9Recent developments in automated structure elucidation of natural productsSteinbeck, ChristophNatural Product Reports (2004), 21 (4), 512-518CODEN: NPRRDF; ISSN:0265-0568. (Royal Society of Chemistry)A review. Advancements in the field of computer-assisted structure elucidation (CASE) of natural products achieved in the past five years are discussed. This process starts with a dereplication procedure, supported by structure-spectrum databases. Both com. and free products are available to support the procedure. A no. of new programs, as well as advancements in existing ones, are presented. Finally, the option to validate the result by an independent procedure, a high quality ab initio quantum mech. calcn., is discussed.
- 19Reymond, J. L.; Ruddigkeit, L.; Blum, L. C.; Van Deursen, R. The enumeration of chemical space Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 717– 733[Crossref], [CAS], Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFCmsr%252FM&md5=cda6dfd2d048e668455f52ee884b718aThe enumeration of chemical spaceReymond, Jean-Louis; Ruddigkeit, Lars; Blum, Lorenz; van Deursen, RuudWiley Interdisciplinary Reviews: Computational Molecular Science (2012), 2 (5), 717-733CODEN: WIRCAH; ISSN:1759-0884. (Wiley-Blackwell)A review. In the field of medicinal chem., the chem. space describes the ensemble of all org. mols. to be considered when searching for new drugs (estd. >1060 mols.), as well as the property spaces in which these mols. are placed for the sake of describing them. Mols. can be enumerated computationally by the millions, which was first undertaken in the field of computer-aided structure elucidation. Scoring the enumerated virtual libraries by virtual screening has recently become an attractive strategy to prioritize compds. for synthesis and testing. Enumeration methods include combinatorial linking of fragments, genetic algorithms based on cycles of enumeration and selection by ligand-based or target-based scoring functions, and exhaustive enumeration from first principles. The chem. space of mols. following simple rules of chem. stability and synthetic feasibility has been enumerated up to 13 atoms of C, N, O, Cl, S, forming the GDB-13 database with 977 million structures. The database has been organized in a 42-dimensional chem. space using mol. quantum nos. (MQN) as descriptors, which can be visualized by projection in two dimensions by principal component anal.
- 20Fink, T.; Bruggesser, H.; Reymond, J. L. Virtual exploration of the small-molecule chemical universe below 160 Da Angew. Chem., Int. Ed. Engl. 2005, 44, 1504– 1508
- 21Fink, T.; Reymond, J. L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery J. Chem. Inf. Model. 2007, 47, 342– 353[ACS Full Text
], [CAS], Google Scholar
21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtVens7k%253D&md5=fe97c30ee8269de1e889648f4818f42fVirtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug DiscoveryFink, Tobias; Reymond, Jean-LouisJournal of Chemical Information and Modeling (2007), 47 (2), 342-353CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)All mols. of up to 11 atoms of C, N, O, and F possible under consideration of simple valency, chem. stability, and synthetic feasibility rules were generated and collected in a database (GDB). GDB contains 26.4 million mols. (110.9 million stereoisomers), including three- and four-membered rings and triple bonds. By comparison, only 63 857 compds. of up to 11 atoms were found in public databases (a combination of PubChem, ChemACX, ChemSCX, NCI open database, and the Merck Index). A total of 538 of the 1208 ring systems in GDB are currently unknown in the CAS Registry and Beilstein databases in any carbon/heteroatom/multiple-bond combination or as a substructure. Over 70% of GDB mols. are chiral. Because of their small size, all compds. obey Lipinski's bioavailability rule. A total of 13.2 million compds. also follow Congreve's "Rule of 3" for lead-likeness. A Kohonen map trained with autocorrelation descriptors organizes GDB according to compd. classes and shows that leadlike compds. are most abundant in chiral regions of fused carbocycles and fused heterocycles. The projection of known compds. into this map indicates large uncharted areas of chem. space. The potential of GDB for drug discovery is illustrated by virtual screening for kinase inhibitors, G-protein coupled receptor ligands, and ion-channel modulators. The database is available from the author's Web page. - 22Blum, L. C.; Reymond, J. L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 J. Am. Chem. Soc. 2009, 131, 8732– 8733[ACS Full Text
], [CAS], Google Scholar
22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXmvFWru7k%253D&md5=22c200e887a6480b19a73852ca0a3435970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13Blum, Lorenz C.; Reymond, Jean-LouisJournal of the American Chemical Society (2009), 131 (25), 8732-8733CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)GDB-13 enumerates small org. mols. contg. up to 13 atoms of C, N, O, S, and Cl following simple chem. stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small org. mol. database to date. - 23Blum, L. C.; van Deursen, R.; Reymond, J. L. Visualisation and subsets of the chemical universe database GDB-13 for virtual screening J. Comput.-Aided Mol. Des. 2011, 25, 637– 647[Crossref], [PubMed], [CAS], Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtVSqtbbN&md5=45f426cafa64b3bb06fc11e68a1b4022Visualisation and subsets of the chemical universe database GDB-13 for virtual screeningBlum, Lorenz C.; Deursen, Ruud; Reymond, Jean-LouisJournal of Computer-Aided Molecular Design (2011), 25 (7), 637-647CODEN: JCADEQ; ISSN:0920-654X. (Springer)The chem. universe database GDB-13, which enumerates 977 million org. mols. up to 13 atoms of C, N, O, S and Cl following simple chem. stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the anal. of PubChem fragments. Two hundred and fifty-five subsets of GDB-13 were generated by the combinatorial use of eight restrictive criteria, including fragment-like ("rule of three") and scaffold-like (no acyclic carbon atoms) filters. Virtual screening for analogs of 15 com. drugs of 13 non-hydrogen atoms or less shows that retrieving MQN-neighbors of a query mol. from GDB-13 or its subsets provides on av. a 38-fold enrichment in structural analogs (Daylight-type substructure fingerprint Tanimoto TSF > 0.7), and a 75-fold enrichment in shape-similar analogs (ROCS TanimotoCombo score > 1.4). An MQN-searchable version of GDB-13 is provided at www.ghb.unibe.ch.
- 24Nguyen, K. T.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Discovery of NMDA glycine site inhibitors from the chemical universe database GDB ChemMedChem 2008, 3, 1520– 1524[Crossref], [PubMed], [CAS], Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhtlOgt73P&md5=d240511779046f0794b75e35f52c42ccDiscovery of NMDA glycine site inhibitors from the chemical universe database GDBNguyen, Kong Thong; Syed, Salahuddin; Urwyler, Stephan; Bertrand, Sonia; Bertrand, Daniel; Reymond, Jean-LouisChemMedChem (2008), 3 (10), 1520-1524CODEN: CHEMGX; ISSN:1860-7179. (Wiley-VCH Verlag GmbH & Co. KGaA)Using virtual screening tools, promising small mol. ligands of the NMDA receptor glycine site are identified for subsequent synthesis.
- 25Nguyen, K. T.; Luethi, E.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. 3-(aminomethyl)piperazine-2,5-dione as a novel NMDA glycine site inhibitor from the chemical universe database GDB Bioorg. Med. Chem. Lett. 2009, 19, 3832– 3835
- 26Garcia-Delgado, N.; Bertrand, S.; Nguyen, K. T.; van Deursen, R.; Bertrand, D.; Reymond, J.-L. Exploring a7-nicotinic receptor ligand diversity by scaffold enumeration from the Chemical Universe Database GDB ACS Med. Chem. Lett. 2010, 1, 422– 426
- 27Luethi, E.; Nguyen, K. T.; Burzle, M.; Blum, L. C.; Suzuki, Y.; Hediger, M.; Reymond, J. L. Identification of selective norbornane-type aspartate analogue inhibitors of the glutamate transporter 1 (GLT-1) from the chemical universe generated database (GDB) J. Med. Chem. 2010, 53, 7236– 7250[ACS Full Text
], [CAS], Google Scholar
27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtFWlsLvF&md5=3834d28f487e11ead1a8a17698587935Identification of Selective Norbornane-Type Aspartate Analogue Inhibitors of the Glutamate Transporter 1 (GLT-1) from the Chemical Universe Generated Database (GDB)Luethi, Erika; Nguyen, Kong T.; Burzle, Marc; Blum, Lorenz C.; Suzuki, Yoshiro; Hediger, Matthias; Reymond, Jean-LouisJournal of Medicinal Chemistry (2010), 53 (19), 7236-7250CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A variety of conformationally constrained aspartate and glutamate analogs inhibit the glutamate transporter 1 (GLT-1, also known as EAAT2). To expand the search for such analogs, a virtual library of aliph. aspartate and glutamate analogs was generated starting from the chem. universe database GDB-11, which contains 26.4 million possible mols. up to 11 atoms of C, N, O, F, resulting in 101026 aspartate analogs and 151285 glutamate analogs. Virtual screening was realized by high-throughput docking to the glutamate binding site of the glutamate transporter homolog from Pyrococcus horikoshii (PDB code: 1XFH) using Autodock. Norbornane-type aspartate analogs were selected from the top-scoring virtual hits and synthesized. Testing and optimization led to the identification of (1R*,2R*,3S*,4R*,6R*)-2-amino-6-phenethyl-bicyclo[2.2.1]heptane-2,3-dicarboxylic acid (I) as a new inhibitor of GLT-1 with IC50 = 1.4 μM against GLT-1 and no inhibition of the related transporter EAAC1. The systematic diversification of known ligands by enumeration with help of GDB followed by virtual screening, synthesis, and testing as exemplified here provides a general strategy for drug discovery. - 28Blum, L. C.; van Deursen, R.; Bertrand, S.; Mayer, M.; Burgi, J. J.; Bertrand, D.; Reymond, J. L. Discovery of alpha7-nicotinic receptor ligands by virtual screening of the Chemical Universe Database GDB-13 J. Chem. Inf. Model. 2011, 51, 3105– 3112[ACS Full Text
], [CAS], Google Scholar
28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsVGru7vK&md5=c176859de6f18cc54b2f25eee42ab79aDiscovery of α7-Nicotinic Receptor Ligands by Virtual Screening of the Chemical Universe Database GDB-13Blum, Lorenz C.; van Deursen, Ruud; Bertrand, Sonia; Mayer, Milena; Burgi, Justus J.; Bertrand, Daniel; Reymond, Jean-LouisJournal of Chemical Information and Modeling (2011), 51 (12), 3105-3112CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The chem. universe database GDB-13 enumerates 977 million org. mols. up to 13 atoms of C, N, O, Cl, and S that are virtually possible following simple rules for chem. stability and synthetic feasibility. Analogs of nicotine were identified in GDB-13 using the city-block distance in MQN-space (CBDMQN) as a similarity measure, combined with a restriction eliminating problematic structural elements. The search was carried out with a Web browser available at www.gdb.unibe.ch. This virtual screening procedure selected 31 504 analogs of nicotine from GDB-13, from which 48 were known nicotinic ligands reported in Chembl. An addnl. 60 virtual screening hits were purchased and tested for modulation of the acetylcholine signal at the human α7 nAChR expressed in Xenopus oocytes, which led to the identification of three previously unknown inhibitors. These expts. demonstrate for the first time the use of GDB-13 for ligand discovery. - 29Brethous, L.; Garcia-Delgado, N.; Schwartz, J.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Synthesis and nicotinic receptor activity of chemical space analogues of N-(3R)-1-azabicyclo[2.2.2]oct-3-yl-4-chlorobenzamide (PNU-282,987) and 1,4-diazabicyclo[3.2.2]nonane-4-carboxylic acid 4-bromophenyl ester (SSR180711) J. Med. Chem. 2012, 55, 4605– 4618
- 30Reymond, J. L.; Awale, M. Exploring chemical space for drug discovery using the Chemical Universe Database ACS Chem. Neurosci. 2012, 3, 649– 657[ACS Full Text
], [CAS], Google Scholar
30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmtValu70%253D&md5=bbbe3c0931328f3796ce999189374864Exploring Chemical Space for Drug Discovery Using the Chemical Universe DatabaseReymond, Jean-Louis; Awale, MahendraACS Chemical Neuroscience (2012), 3 (9), 649-657CODEN: ACNCDM; ISSN:1948-7193. (American Chemical Society)Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chem. space of small mols. Enumeration from first principles shows that almost all small mols. (>99.9%) have never been synthesized and are still available to be prepd. and tested. We discuss open access sources of mols., the classification and representation of chem. space using mol. quantum nos. (MQN), its exhaustive enumeration in form of the chem. universe generated databases (GDB), and examples of using these databases for prospective drug discovery. MQN-searchable GDB, PubChem, and DrugBank are freely accessible at www.gdb.unibe.ch. - 31Foloppe, N. The benefits of constructing leads from fragment hits Future Med. Chem. 2011, 3, 1111– 1115[Crossref], [PubMed], [CAS], Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpsFeisLY%253D&md5=915652874047d19b21987cccf52c0df7The benefits of constructing leads from fragment hitsFoloppe, N.Future Medicinal Chemistry (2011), 3 (9), 1111-1115CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)A review. Fragments'' refer to particularly small mol. starting points in medicinal chem. The small size of fragments requires adapted techniques for their screening and subsequent elaboration. The detection of the weak binding affinity of fragments for their target, and assocd. screening issues, have been debated at length. Since it is now clear that fragments can be developed into clin. candidates, the discussion is shifting to the design of good-quality lead compds. from fragment hits. The increasing ability to control and tailor this construction process highlights the potential benefits of fragment-based drug discovery.
- 32Teague, S. J.; Davis, A. M.; Leeson, P. D.; Oprea, T. The design of leadlike combinatorial libraries Angew. Chem., Int. Ed. Engl. 1999, 38, 3743– 3748
- 33Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Res. 2009, 37, W623– W633[Crossref], [PubMed], [CAS], Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXosFSktL8%253D&md5=11d50a1ff4d9b353728e1e03ee2e33caPubChem: a public information system for analyzing bioactivities of small moleculesWang, Yanli; Xiao, Jewen; Suzek, Tugba O.; Zhang, Jian; Wang, Jiyao; Bryant, Stephen H.Nucleic Acids Research (2009), 37 (Web Server), W623-W633CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biol. properties of small mols. hosted by the US National Institutes of Health (NIH). PubChem BioAssay database currently contains biol. test results for more than 700 000 compds. The goal of PubChem is to make this information easily accessible to biomedical researchers. In this work, we present a set of web servers to facilitate and optimize the utility of biol. activity information within PubChem. These web-based services provide tools for rapid data retrieval, integration and comparison of biol. screening results, exploratory structure-activity anal., and target selectivity examn. This article reviews these bioactivity anal. tools and discusses their uses. Most of the tools described in this work can be directly accessed at http://pubchem.ncbi.nlm.nih.gov/assay/. URLs for accessing other tools described in this work are specified individually.
- 34Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery Nucleic Acids Res. 2012, 40, D1100– D1107[Crossref], [PubMed], [CAS], Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs12htbjN&md5=aedf7793e1ca54b6a4fa272ea3ef7d0eChEMBL: a large-scale bioactivity database for drug discoveryGaulton, Anna; Bellis, Louisa J.; Bento, A. Patricia; Chambers, Jon; Davies, Mark; Hersey, Anne; Light, Yvonne; McGlinchey, Shaun; Michalovich, David; Al-Lazikani, Bissan; Overington, John P.Nucleic Acids Research (2012), 40 (D1), D1100-D1107CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)ChEMBL is an Open Data database contg. binding, functional and ADMET information for a large no. of drug-like bioactive compds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chem. biol. and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compds. and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.
- 35Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S. DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs Nucleic Acids Res. 2011, 39, D1035– D1041
- 36McKay, B. D. Practical graph isomorphism Congressus Numerantium 1981, 30, 45– 87Google ScholarThere is no corresponding record for this reference.
- 37Rishton, G. M. Reactive compounds and in vitro false positives in HTS Drug Discovery Today 1997, 2, 382– 384[Crossref], [CAS], Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXlvFeksL4%253D&md5=b43a00783098a324ab3dc4d5f3704013Reactive compounds and in vitro false positives in HTSRishton, Gilbert M.Drug Discovery Today (1997), 2 (9), 382-384CODEN: DDTOFS; ISSN:1359-6446. (Elsevier)A review without refs. An important component of the successful high-throughput screening (HTS) strategy in drug discovery is the ability to assess HTS structure-activity data, and to distinguish between promising drug leads and the many useless false positives that can plague screening efforts. The author discusses simple chem. guidelines for the evaluation of "positives" in biochem. screens, with the aim of selecting stable, non-covalent binders (ligands) and eliminating protein-reactive compds. (reagents) from consideration as drug leads at an early stage.
- 38Rishton, G. M. Nonleadlikeness and leadlikeness in biochemical screening Drug Discovery Today 2003, 8, 86– 96[Crossref], [PubMed], [CAS], Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD3s%252Fls1SjtQ%253D%253D&md5=5e34cece6f751fa727f949c3526aa95cNonleadlikeness and leadlikeness in biochemical screeningRishton Gilbert MDrug discovery today (2003), 8 (2), 86-96 ISSN:1359-6446.Biochemical assays have largely supplanted functional biological assays as drug screening tools in the early stages of drug discovery. The de-selection of compounds that are 'nonleadlike' binders (and bonders) and the proactive selection of those compounds that are 'leadlike' in their binding to the target are vital components of the screening effort. The physiochemical properties of leadlikeness and the surprising differences between those properties and the now classical definitions of druglikeness are becoming apparent.
- 39Rush, T. S., III; Grant, J. A.; Mosyak, L.; Nicholls, A. A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction J. Med. Chem. 2005, 48, 1489– 1495[ACS Full Text
], [CAS], Google Scholar
39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXht1Ols78%253D&md5=05b6e54a657a3b8c768e63852d871ef6A Shape-Based 3-D Scaffold Hopping Method and Its Application to a Bacterial Protein-Protein InteractionRush, Thomas S., III; Grant, J. Andrew; Mosyak, Lidia; Nicholls, AnthonyJournal of Medicinal Chemistry (2005), 48 (5), 1489-1495CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)In this paper, the authors describe the first prospective application of the shape-comparison program ROCS (Rapid Overlay of Chem. Structures) to find new scaffolds for small mol. inhibitors of the ZipA-FtsZ protein-protein interaction, a proposed antibacterial target. The shape comparisons are made relative to the crystallog. detd., bioactive conformation of a high-throughput screening (HTS) hit. The use of ROCS led to the identification of a set of novel, weakly binding inhibitors with scaffolds presenting synthetic opportunities to further optimize biol. affinity and lacking development issues assocd. with the HTS lead. These ROCS-identified scaffolds would have been missed using other structural similarity approaches such as ISIS 2D fingerprints. X-ray crystallog. anal. of one of the new inhibitors bound to ZipA reveals that the shape comparison approach very accurately predicted the binding mode. These exptl. results validate this use of ROCS for chemotype switching or "lead hopping" and suggest that it is of general interest for lead identification in drug discovery endeavors. - 40Nicholls, A.; McGaughey, G. B.; Sheridan, R. P.; Good, A. C.; Warren, G.; Mathieu, M.; Muchmore, S. W.; Brown, S. P.; Grant, J. A.; Haigh, J. A.; Nevins, N.; Jain, A. N.; Kelley, B. Molecular shape and medicinal chemistry: a perspective J. Med. Chem. 2010, 53, 3862– 3886[ACS Full Text
], [CAS], Google Scholar
40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhvF2kt7k%253D&md5=85664344e13872527a3dfb2296d34864Molecular Shape and Medicinal Chemistry: A PerspectiveNicholls, Anthony; McGaughey, Georgia B.; Sheridan, Robert P.; Good, Andrew C.; Warren, Gregory; Mathieu, Magali; Muchmore, Steven W.; Brown, Scott P.; Grant, J. Andrew; Haigh, James A.; Nevins, Neysa; Jain, Ajay N.; Kelley, BrianJournal of Medicinal Chemistry (2010), 53 (10), 3862-3886CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A review article with 111 refs. summarized perspectives of mol. shape and medicinal chem. in drug screening. - 41Sauer, W. H.; Schwarz, M. K. Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity J. Chem. Inf. Comput. Sci. 2003, 43, 987– 1003[ACS Full Text
], [CAS], Google Scholar
41https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXhvF2muro%253D&md5=c07401742457e76b066bed4b43feca1dMolecular Shape Diversity of Combinatorial Libraries: A Prerequisite for Broad BioactivitySauer, Wolfgang H. B.; Schwarz, Matthias K.Journal of Chemical Information and Computer Sciences (2003), 43 (3), 987-1003CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)A computational method to rapidly assess and visualize the diversity in mol. shape assocd. with a given compd. set has been developed. Normalized ratios of principal moments of inertia are plotted into two-dimensional triangular graphs and then used to compare the shape space covered by different compd. sets, such as combinatorial libraries of varying size and compn. We have further developed a computational method to analyze interset similarity in terms of shape space coverage, which allows the shape redundancy between the different subsets of a given compd. collection to be analyzed in a quant. way. The shape space coverage has been found to originate mainly from the nature and the 3D-geometry (but not the size) of the central scaffold, while the no. and nature of the peripheral substituents and conformational aspects were shown to be of minor importance. Substantial shape space coverage has been correlated with broad biol. activity by applying the same shape anal. to collections of known bioactive compds., such as MDDR and the GOLD-set. The aggregate of our results corroborates the intuitive notion that mol. shape is intimately linked to biol. activity and that a high degree of shape (hence scaffold) diversity in screening collections will increase the odds of addressing a broad range of biol. targets. - 42Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success J. Med. Chem. 2009, 52, 6752– 6756[ACS Full Text
], [CAS], Google Scholar
42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXht1KjtLvN&md5=4ca92c30c17c53d77ad376719bad951eEscape from Flatland: Increasing Saturation as an Approach to Improving Clinical SuccessLovering, Frank; Bikker, Jack; Humblet, ChristineJournal of Medicinal Chemistry (2009), 52 (21), 6752-6756CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)The medicinal chem. community has become increasingly aware of the value of tracking calcd. phys. properties such as mol. wt., topol. polar surface area, rotatable bonds, and hydrogen bond donors and acceptors. The authors hypothesized that the shift to high-throughput synthetic practices over the past decade may be another factor that may predispose mols. to fail by steering discovery efforts toward achiral, arom. compds. The authors have proposed two simple and interpretable measures of the complexity of mols. prepd. as potential drug candidates. The first is carbon bond satn. as defined by fraction Sp3 (Fsp3) where Fsp3 = (no. of Sp3 hybridized carbons/total carbon count). The second is simply whether a chiral carbon exists in the mol. The authors demonstrate that both complexity (as measured by Fsp3) and the presence of chiral centers correlate with success as compds. transition from discovery, through clin. testing, to drugs. To explain these observations, the authors further demonstrate that satn. correlates with soly., an exptl. phys. property important to success in the drug discovery setting. - 43Ritchie, T. J.; Macdonald, S. J.; Young, R. J.; Pickett, S. D. The impact of aromatic ring count on compound developability: further insights by examining carbo- and hetero-aromatic and -aliphatic ring types Drug Discovery Today 2011, 16, 164– 171[Crossref], [PubMed], [CAS], Google Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhvVyqsr0%253D&md5=8e2d0e4e499d2f07d66ca2f385defb2dThe impact of aromatic ring count on compound developability: further insights by examining carbo- and hetero-aromatic and -aliphatic ring typesRitchie, Timothy J.; MacDonald, Simon J. F.; Young, Robert J.; Pickett, Stephen D.Drug Discovery Today (2011), 16 (3/4), 164-171CODEN: DDTOFS; ISSN:1359-6446. (Elsevier B.V.)A review. The impact of carboarom., heteroarom., carboaliph. and heteroaliph. ring counts and fused arom. ring count on several developability measures (soly., lipophilicity, protein binding, P 450 inhibition and hERG binding) is the topic for this review article. Recent results indicate that increasing ring counts have detrimental effects on developability in the order carboaroms. » heteroaroms. > carboaliphatics > heteroaliphatics, with heteroaliphatics exerting a beneficial effect in many cases. Increasing arom. ring count exerts effects on several developability parameters that are lipophilicity- and size-independent, and fused arom. systems have a beneficial effect relative to their nonfused counterparts. Increasing arom. ring count has a detrimental effect on human bioavailability parameters, and heteroarom. ring count (but not other ring counts) has increased over time in marketed oral drugs.
- 44Clemons, P. A.; Bodycombe, N. E.; Carrinski, H. A.; Wilson, J. A.; Shamji, A. F.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 18787– 18792[Crossref], [PubMed], [CAS], Google Scholar44https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVCqsb%252FN&md5=198e298ca5391d7dfd91184680e3c2ecSmall molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profilesClemons, Paul A.; Bodycombe, Nicole E.; Carrinski, Hyman A.; Wilson, J. Anthony; Shamji, Alykhan F.; Wagner, Bridget K.; Koehler, Angela N.; Schreiber, Stuart L.Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (44), 18787-18792, S18787/1-S18787/5CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Using a diverse collection of small mols. generated from a variety of sources, we measured protein-binding activities of each individual compd. against each of 100 diverse (sequence-unrelated) proteins using small-mol. microarrays. We also analyzed structural features, including complexity, of the small mols. We found that compds. from different sources (com., academic, natural) have different protein-binding behaviors and that these behaviors correlate with general trends in stereochem. and shape descriptors for these compd. collections. Increasing the content of sp3-hybridized and stereogenic atoms relative to compds. from com. sources, which comprise the majority of current screening collections, improved binding selectivity and frequency. The results suggest structural features that synthetic chemists can target when synthesizing screening collections for biol. discovery. Because binding proteins selectively can be a key feature of high-value probes and drugs, synthesizing com- pounds having features identified in this study may result in improved performance of screening collections.
- 45Clemons, P. A.; Wilson, J. A.; Dancik, V.; Muller, S.; Carrinski, H. A.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 6817– 6822[Crossref], [PubMed], [CAS], Google Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlslyru7c%253D&md5=d83343549289415060ea55d45118d31fQuantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collectionsClemons, Paul A.; Wilson, J. Anthony; Dancik, Vlado; Muller, Sandrine; Carrinski, Hyman A.; Wagner, Bridget K.; Koehler, Angela N.; Schreiber, Stuart L.Proceedings of the National Academy of Sciences of the United States of America (2011), 108 (17), 6817-6822, S6817/1-S6817/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Using a diverse collection of small mols. we recently found that compd. sets from different sources (com.; academic; natural) have different protein-binding behaviors, and these behaviors correlate with trends in stereochem. complexity for these compd. sets. These results lend insight into structural features that synthetic chemists might target when synthesizing screening collections for biol. discovery. We report extensive characterization of structural properties and diversity of biol. performance for these compds. and expand comparative analyses to include physicochem. properties and three-dimensional shapes of predicted conformers. The results highlight addnl. similarities and differences between the sets, but also the dependence of such comparisons on the choice of mol. descriptors. Using a protein-binding dataset, we introduce an information-theoretic measure to assess diversity of performance with a constraint on specificity. Rather than relying on finding individual active compds., this measure allows rational judgment of compd. subsets as groups. We also apply this measure to publicly available data from ChemBank for the same compd. sets across a diverse group of functional assays. We find that performance diversity of compd. sets is relatively stable across a range of property values as judged by this measure, both in protein-binding studies and functional assays. Because building screening collections with improved performance depends on efficient use of synthetic org. chem. resources, these studies illustrate an important quant. framework to help prioritize choices made in building such collections.
- 46Sadowski, J.; Gasteiger, J. From atoms and bonds to 3-dimensional atomic coordinates - automatic model builders Chem. Rev. 1993, 93, 2567– 2581[ACS Full Text
], [CAS], Google Scholar
46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXmt1GgsLs%253D&md5=fcdb50dfbd981c8f122da06e06e62e27From atoms and bonds to three-dimensional atomic coordinates: automatic model buildersSadowski, Jens; Gasteiger, JohannChemical Reviews (Washington, DC, United States) (1993), 93 (7), 2567-81CODEN: CHREAY; ISSN:0009-2665.A review with ∼75 refs. in which WIZARD, COBRA, CONCORD, and CORINA are discussed. - 47Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks J. Med. Chem. 1996, 39, 2887– 2893[ACS Full Text
], [CAS], Google Scholar
47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK28XjvVejtro%253D&md5=5e2c4fdfea9434456a0cca83de4185b3The Properties of Known Drugs. 1. Molecular FrameworksBemis, Guy W.; Murcko, Mark A.Journal of Medicinal Chemistry (1996), 39 (15), 2887-2893CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)To better understand the common features present in drug mols., we use shape description methods to analyze a database of com. available drugs and prep. a list of common drug shapes. A useful way of organizing this structural data is to group the atoms of each drug mol. into ring, linker, framework, and side chain atoms. On the basis of the two-dimensional mol. structures (without regard to atom type, hybridization, and bond order), there are 1179 different frameworks among the 5120 compds. analyzed. However, the shapes of half of the drugs in the database are described by the 32 most frequently occurring frameworks. This suggests that the diversity of shapes in the set of known drugs is extremely low. In our second method of anal., in which atom type, hybridization, and bond order are considered, more diversity is seen; there are 2506 different frameworks among the 5120 compds. in the database, and the most frequently occurring 42 frameworks account for only one-fourth of the drugs. We discuss the possible interpretations of these findings and the way they may be used to guide future drug discovery research.