ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img

Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17

View Author Information
Department of Chemistry and Biochemistry, NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
Biomolecular Screening Facility, NCCR Chemical Biology, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
*Phone: +41 31 631 43 25. Fax: +41 31 631 80 57. E-mail: [email protected]
Cite this: J. Chem. Inf. Model. 2012, 52, 11, 2864–2875
Publication Date (Web):October 22, 2012
https://doi.org/10.1021/ci300415d

Copyright © 2022 American Chemical Society. This publication is licensed under CC-BY.

  • Open Access

Article Views

37170

Altmetric

-

Citations

LEARN ABOUT THESE METRICS
PDF (3 MB)

Abstract

Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.

Introduction

ARTICLE SECTIONS
Jump To

The cumulated efforts of synthetic chemistry over the last century has produced over 60 million compounds as collected by Chemical Abstracts Service. (1, 2) Since the implementation of combinatorial and parallel synthesis by academic and industrial drug discovery, the number of druglike small molecules (organic compounds of intermediate polarity with MW ≤ 500 Da) has increased even further. (3-5) The combined corporate, academic, and commercial collections worldwide probably total over 100 million different small molecules. (6) Despite these impressive numbers, it has become increasingly difficult to develop new small molecule drugs, largely due to lack of efficacy, side effects, and toxicity issues. (7, 8)De novo drug design (9-12) may help to address this problem by investigating even much larger numbers of yet unknown molecules by virtual screening (13-15) in search of innovative structures that might exhibit improved selectivity and ADMET profiles.
The majority of de novo drug design methods generate molecules within genetic algorithms that optimize a desired property such as a docking score by evolving a molecule population through breeding and mutation cycles. In most cases these algorithms generate new molecules by recombining known building blocks with known reactions, which severely limits their innovative potential. To circumvent this limitation, we recently approached the direct enumeration of chemical space by extending an approach to de novo design pioneered by Cayley, the inventor of graph theory, to count acyclic hydrocarbons (16) and later used in computer assisted structure elucidation. (17-19) The idea is to enumerate molecules from first principles starting from mathematical graphs irrespective of pre-existing building blocks to avoid a historical bias in structure selection. Geometrical strain and functional group stability criteria are used to ensure that the molecules produced are chemically meaningful. By this method we obtained the chemical universe database GDB-11 enumerating 26.4 million different molecules up to 11 atoms of C, N, O, and F (110.9 million molecules when including stereoisomers). (20, 21) The number increased to almost 1 billion (not counting stereoisomers) for GDB-13 listing all molecules up to 13 atoms of C, N, O, Cl, and S. (22, 23) Both databases were later shown to be useful sources of molecular diversity to discover new receptor ligands by virtual screening, synthesis, and testing. (24-30)
While GDB-11 and GDB-13 uncovered impressive numbers of possible molecules, the databases only addressed very small organic molecules (MW < 200 Da), which are of interest as relatively small fragments (31) but rarely correspond to actual drugs. Herein we report the enumeration of organic molecules up to 17 atoms of C, N, O, S, and halogens, forming the chemical universe database GDB-17 containing 166.4 billion organic molecules. GDB-17 reaches into molecular sizes compatible with many drugs (367 approved drugs ≤17 atoms) and typical for lead compounds (100 < MW < 350 Da). (32) Millions of isomers of known drugs are readily identified in GDB-17. While molecules up to 17 atoms in the public databases PubChem, (33) ChEMBL, (34) or DrugBrank (35) are mostly achiral, aromatic, and heteroaromatic compounds with rodlike shapes, GDB-17 molecules are mostly nonaromatic heterocycles with many quaternary centers and stereoisomers. GDB-17 densely populates the third dimension in shape space and represents many more scaffold types than found in PubChem.

Results and Discussion

ARTICLE SECTIONS
Jump To

Enumeration

The enumeration followed the approach used for GDB-13 starting from the complete list of graphs as given by the program GENG. (36) Graphs corresponding to unstrained hydrocarbons were selected based on geometrical criteria and expanded to ″skeletons″ (unsaturated hydrocarbons) by substituting bonds (single, double, triple bonds) for graph edges. These skeletons were themselves expanded to molecules by substituting atoms (C, N, O, etc.) for graph nodes, respecting valency rules, and eliminating chemically unstable and problematic functional groups (Figure 1). The code used for GDB-13, which stalled at 14 or more atoms due to inefficient memory usage, was entirely rewritten considering process design and efficiency. New graph and molecule selection criteria were also added to constrain the combinatorial explosion above 13 atoms. The code redesign resulted in a 400-fold increase in computing speed allowing to complete the enumeration up to 17 atoms of C, N, O, S, and halogens within reasonable time, as detailed below.

Figure 1

Figure 1. Enumeration of GDB-17 starting from mathematical graphs.

GENG (36) was set to enumerate all graphs up to 17 nodes with maximum valency of four (quaternary carbon) considering only topologically planar (i.e., eliminating knotted topologies corresponding to the K5 and K3,3 graphs), connected (e.g., no catenanes) graphs, which returned 114,304,569,097 graphs. The graphs were then converted to hydrocarbons substituting carbon atoms for graph nodes and carbon–carbon single bonds for graph edges. Hydrocarbons were selected for limited ring strain and topological complexity by applying the hydrocarbon filters H1 to H5 (Table 1), which left 5,422,153 hydrocarbons (0.005% of the graphs) to be considered for molecule generation. These 5.4 million hydrocarbons were converted to ″skeletons″ by introducing double and triple bonds following valency rules and the unsaturation filters S1 to S5, which primarily restrict ring strain and reactive unsaturations (Table 2). The selected skeletons were then run through an aromatization–dearomatization cycle, and duplicates were removed, which eliminated aromatic tautomers. The introduction of unsaturations generated on average 246 skeletons per graphs for a total of 1,330,958,530 skeletons.
The 1.3 billion skeletons were diversified into molecules by combinatorially substituting nitrogen and oxygen for carbon following valency rules but not generating any heteroatom–heteroatom bond. All generated molecules were then checked for undesirable functional groups (FG) to ensure that the molecules have a good probability to be stable and synthetically accessible (FG filters F1–F12, Table 3). These filters followed in part previously reported criteria for removing problematic functional groups from screening libraries. (37, 38) This diversification of skeletons into ″CNO″ molecules produced 110.4 billion molecules, corresponding to an average of 83 molecules per skeleton and 20,400 molecules per graph. Postprocessing steps P1–P7 were finally implemented for additional diversity by combinatorial atom type substitutions (Table 4). These postprocessing steps added another 56 billion molecules, resulting in a total of 166.4 billion molecules in the complete GDB-17. The overall molecule generation procedure was always completed in one run for each graph, and all molecules generated from each graph were checked for duplicates to guarantee that all GDB-17 molecules are different (Table 5). The overall computation consumed 100,000 CPU hours, which is only 2.5-fold more than the computing time originally invested for GDB-13.
Table 1. Ring Strain and Complexity Filters To Select Hydrocarbon Graphs
filterdescriptioncomment
H1SAV (″smallest atomic volume″): all graphs ≤ C11 are converted to a 3D-structure, and the volume of the tetrahedron around each C atom is checked for a minimum value.95.2% of graphs ≤ C11 are discarded due to failed 3D-conversion (using CORINA or ChemAxon molconverter) or due to distorted (planar, pyramidal) centers. Simple polyhedra such as cubane are preserved. See the SI of ref 22 for details.
H2NA2SR (″no atom shared by two small rings″): removes graphs ≥ C12 containing fused or spiro linkages between 3- or 4-membered rings.The vast majority of fused small ring systems are highly strained and reactive. 96.7% of C12- and 97.3% of C13-graphs are removed. See also filter H3.
H3NBH3R (″no bridgehead in 3 rings″): Graphs ≥ C14 with three or more ″nonzero″ bridgehead atoms shared by three or more rings are removed.These multilooped topologies would correspond to molecules of high synthetic complexity. 99.60% of C14-, 99.77% of C15-, 99.95% of C16-, and 99.95% of C17-graphs are removed by filters H2+H3.
H41SR (″one small ring″): C15 and C16 graphs are allowed at most one 3- or 4-membered ring.71.89% of the C15- and 79.50% the C16-graphs that passed H2+H3 are removed.
H50SR (″no small ring″): C17-graphs are not allowed any small rings.96.95% C17-graphs that passed H2+H3 are removed.
Table 2. Unsaturation Filters To Enumerate Skeletons (Unsaturated Hydrocarbons)
filterdescriptioncomment
S1no allenes (C═C═C)Although known and sometime found in bioactive molecules, allenes are usually reactive and quite difficult to prepare but combinatorially extremely frequent.
S2no unsaturations in 3-membered ringsCyclopropenes are known but quite reactive and difficult to prepare. Cyclopropynes are unstable.
S3at most one sp2-center in 4-membered ringsCyclobutenes and cyclobutynes are not enumerated, but the skeletons leading to β-lactams and β-lactones are generated.
S4triple bonds restrictionsNo triple bond in 3- or 4-membered rings, max. One triple bond in rings ≥9 and max two triple bonds in ≥11 rings. Only terminal triple bonds for C17-hydrocarbons (allowing to generate nitriles).
S5bridgehead double bond restrictionsIf a “non-zero” bridgehead carbon is sp2, the ring sizes of the smallest set of smallest rings will be checked. At least one ring of this carbon must be ≥8. In case of two such bridgeheads are sp2, the ring size must be ≥10.
Table 3. Functional Group Filters To Enumerate CNO Moleculesa
filterdescriptioncomment
F1XCX: only one N or O next to a sp3 carbon or two oxygens if both oxygens are ring atoms.Aminals, hemiacetals gem-diols, and acyclic acetals are not enumerated. Only cyclic acetals are allowed.
F2Maximum one N or O in small rings.Allows epoxides, oxiranes, aziridines, azetidines but no cyclic acetals inside 4-membered rings.
F3Anhydrides (O═C)–O–(C═O) are removed.Most anhydrides are unstable toward hydrolysis.
F4Acetal chains O–Csp3–O–Csp3–O are removed.Although sometimes found, acetal chains are diffult to plan synthetically.
F5Molecules with a primary amine and a ketone or aldehyde are removed.This combination often polymerizes.
F6C═N are removed unless the sp2 carbon is connected to a further N or O atom.Removes imines which are unstable but retains amidines and guanidines.
F7(N/O)–C═N–C═(N/O) are removed.The corresponding (N/O)═C–N–C═(N/O) tautomer is allowed.
F8Enol/enamine: removes O or N atoms adjacent to a nonaromatic C═C.Enols, enamines, enol ethers, etc. are almost always unstable toward hydrolysis to the parent carbonyl compound.
F9Acyclic carbonates C–O–(C═O)–O–C are removed.Acyclic carbonates are rather unstable toward hydrolysis.
F10Carbonic acids (O–CO2H), carbamic acids (N–CO2H), and β-carboxylic acid ((C═O)–C–CO2H) are removed.These FG decarboxylate spontaneously.
F11Bridgehead amides: If a “non-zero” bridgehead nitrogen is bound to a nonaromatic sp2 atom, the ring sizes of the smallest set of smallest rings will be checked. At least one ring of the nitrogen must be ≥9.Such amides are ″twisted″ and nonconjugated and therefore quite unstable toward hydrolysis.
F12C═C: Molecules of 17 atoms with nonaromatic carbon–carbon unsaturations are removed.Nonaromatic C═C are highly frequent but often reactive toward polymerization, cycloadditions, isomerizations, oxidation, or nucleophilic addition.
a

No heteroatom–heteroatom bonds are generated at all.

Table 4. Postprocessing Steps for Aromatic Heterocycles, Oximes, Nitro, CF3, Halogens, and Sulfur
stepdescriptioncomment
P1Aromatic C to N: aromatic C atoms adjacent to an aromatic N or O atom are converted to N if valency allows.Aromatic heterocycles with heteroatom–heteroatom bonds are created e.g. 1,2-oxazoles from furans, 1,2,3-triazoles from imidazoles.
P2aKetone oximes C═N–OH: ketones are converted to oximes.Note that alkylated oximes, hydroxamates, hydrazides, and hydrazone are not considered.
P3Aromatic halogens: aromatic OH groups are changed to halogens.Halogen = F, Cl, Br, I, max. Two Br or I per aromatic ring.
P4Trifluoromethyls: tert-butyl groups are changed to CF3. 
P5Aromatic nitro groups: aromatic CO2H are converted to NO2.Aliphatic nitro groups are not considered.
P6Thiophenes: sulfur was substituted for all heteroaromatic oxygen atoms.Aliphatic thiols and thioethers are not considered.
P7aSulfones: carbonyl groups (C═O) in ketones, acids, carboxamides, and carbamates are changed to SO2.Note that C═S and sulfoxides (S═O) are not generated.
a

Steps P2 and P7 increase the heavy atom count (hac), generating for example some 17 atoms molecules with small rings. All molecules with hac >17 were removed to avoid a combinatorial explosion.

Table 5. Database Generation Statistics
HACfiltersagraphsbhydrocarbonscskeletonsdmoleculeseCPU, hf
1SAV, FG11130
2 11360
3 224140
4 6412470
5 2010322190
6 74311191,0910
7 321984486,0290
8 1,6633702,00437,4350
9 9,6161,4489,472243,2330
10 61,8406,32548,7211,670,1630
11 427,13529,496264,32112,219,4603
12NA2SR3,120,002104,1651,188,12772,051,66518
13 23,722,244651,8507,370,864836,687,200206
14NBH3R186,092,397752,27727,419,8372,921,398,415856
151SR1,496,007,875960,415118,977,96315,084,103,3475,378
16 12,176,341,8971,331,875213,259,33138,033,661,35514,415
170SR, C═C100,418,784,0031,583,786962,417,271109,481,780,58079,259
SUM 114,304,569,0975,422,1541,330,958,530166,443,860,262100,134
a

See Tables 14 for details.

b

Graphs produced by GENG for planar, connected graphs up to 17 nodes with maximum node valence of four.

c

Hydrocarbons generated from graphs and passing the filters in Table 1 for limited ring strain and complexity.

d

Unsaturated hydrocarbons generated from hydrocarbons using filters in Table 2.

e

Molecules generated from hydrocarbons by adding heteroatoms (Table 3 and 4), as 2D-structures and stored as SMILES.

f

Computation was parallelized on 360 CPU.

Comparing GDB-17 with PubChem, ChEMBL, and DrugBank

One of the key questions arising from the systematically enumerated chemical space available in GDB-17 is whether this collection significantly differs from the already known chemical space. To perform this comparison, we collected molecules up to 17 atoms in the public archives PubChem, (33) ChEMBL, (34) and DrugBank, (35) to form the reference collections up to 17 atoms PubChem-17 (2,526,453 cpds), ChEMBL-17 (89,156 cpds), and DrugBank-17 (367 cpds, approved drugs only). ChEMBL-17 and DrugBank-17 represent subsets of the larger PubChem-17 which are focused on molecules with biological activities that are reported (ChEMBL-17) respectively clinically approved (DrugBank-17). Unfortunately commercial collections such as the CAS or Beilstein archives could not be obtained. However considering that the 2.5 million molecules in PubChem-17 represent approximately 10% of the 25 million unique molecules listed in PubChem, CAS-17 probably contains around 10% of the entire CAS archive, (1, 2) i.e. slightly more than 6 million molecules (2.4-fold larger than PubChem-17).
The comparison of the enumerated chemical space with known molecules starts with considering database size. Thus, GDB-17 (166.4 billion molecules) is much larger than the sum of all known molecules of similar size as found in PubChem-17 (0.001% of GDB-17). The size of GDB-17 originates in the increase of possible molecules as a function of heavy atom count, which is exponential and much steeper than in the reference databases (Figure 2A). As a consequence the MW range of GDB-17 shows a sharp peak at 240 < MW < 250 Da. The same distribution is observed in the leadlike subset GDBLL-17 (29 billion structures, see below) and leadlike/no small ring subset GDBLLnoSR-17 (22 billion structures, see below), while the MW distribution in the reference databases is more even (Figure 2B).
GDB-17 contains an impressive number of molecules in the area of known drugs. For example, millions of isomers can be identified in GDB-17 for fifteen typical marketed drugs of 14 to 17 atoms selected from DrugBank-17 (Table 6, Figure 3). The examples shown in Figure 3 were selected among isomers with a high shape similarity to the parent drug as measured by the OpenEye scoring function ROCS (Rapid Overlay of Chemical Structures), a well validated virtual screening tool to identify bioactive analogs. (39, 40) These isomers include obvious variations of the parent structure such as ″methyl walk″ analogs, for example structures 5 and 11 as isomers of drugs 4 and 10, as well as nontrivial changes such as different aromatic heterocycles (2, 3, 8, 9, 17, 18, 35, 38, 39, 46), different ring size and connectivity (15, 23, 24, 26, 27, 32, 33, 39, 41, 44–48), and different functional groups (14, 29, 30, 32, 33, 36, 39, 42, 48).
On the other hand, GDB-17 represents a selective enumeration and therefore does not contain all molecules in the reference databases. Overall 57% of PubChem-17, 60% of ChEMBL-17, and 68% of DrugBank-17 are compatible with the GDB-17 enumeration rules. The molecules found in the reference databases but not considered for GDB-17 contain nonenumerated features such as certain types of halogens (e.g., aliphatic halogens) or sulfurs (thiols, thioethers, thioureas), functional groups (e.g., acyclic acetals, hemiacetals, aminals, azides, aliphatic nitro groups), elements (P, Si, B, etc.), skeletons (nonaromatic C═C), or graphs (e.g., spiro-fused cyclopropanes) (Figure 4A).

Figure 2

Figure 2. Size and MW profiles of the enumerated chemical space in GDB and the reference databases PubChem, ChEMBL, and DrugBank. The size of the leadlike subsets of GDB (GDBLL, GDBLLnoSR) is extrapolated from analyzing a 1% random subset of GDB-17.

Table 6. Drug Isomers Found in GDB-17
drug nameaelemental formulano. of isomer
AcyclovirC8H11N5O38,132,952
AminoglutethimideC13H16N2O2183,901,628
AminophenazoneC13H17N3O97,853,936
DexmedetomidineC13H16N29,721,191
DiethylcarbamazineC10H21N3O22,409
EthoxzolamideC9H10N2S2O34,563,491
FelbamateC11H14N2O4369,751,288
FencamfamineC15H21N53,917,207
GuanadrelC10H19N3O260,319,220
ProcaineC13H20N2O2476,975,898
SulfadiazineC10H10N4SO217,003,297
TinidazoleC8H13N3SO424,575,941
TizanidineC9H8N5SCl109,635
TrioxsalenC14H12O31,800,849
VareniclineC13H13N319,676,640
a

See Figure 3 for structural formula of the drugs and examples of isomers.

Figure 3

Figure 3. Drugs and examples of isomers found in GDB-17. All isomers shown have a shape similarity score ROCS > 1.4. None of the isomers shown are known (Scifinder search). Only acyclovir does not occur in GDB-17 because it contains a hemiaminal (N–Csp3–O), a functional group which is excluded from the enumeration.

Figure 4

Figure 4. Molecule topologies and categories in GDB-17 and reference databases. A. Percentage of reference database compatible with GDB-17 enumeration rules or excluded due to nonenumerated halogen (acyl halide, aliphatic halocarbons) or sulfur (thiols, thioethers), functional groups (acyclic acetals, hemiacetals, aminals, azides, aliphatic nitro groups), element (P, Si, B, Bi, Hg, etc.), skeleton (nonaromatic C═C), or graph (e.g., small rings at 17 atoms). B. Fraction of compounds with small rings. C. Topologies D. Database contents as function of molecular categories. Molecules are assigned to one category only with priority order heteroaromatic > aromatic > heterocyclic > carbocyclic > acyclic. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.

Small Rings, Topology, and Compound Categories

The most striking difference between the enumerated chemical space in GDB-17 and known molecules resides in the occurrence of small rings (3- or 4-membered ring). Small rings are very frequent in the systematic enumeration of graphs and the resulting molecules. However they are also relatively difficult to synthesize and often unstable, and indeed they are not found very often in known molecules. While only 4–6% of the compounds in the reference databases contain small rings, the enumerated chemical space up to 16 atoms is to 83% a small ring compound database (Figure 4B). The fraction of small ring compounds in GDB falls to 8.2% at 17 atoms because no small ring were allowed in 17 node graphs (small ring molecules at 17 atoms stem from atom-adding postprocessing steps such as the transformation of carbonyls to sulfonyls and of ketones to oximes, Table 5). Nevertheless the majority (66%) of GDB-17 are molecules with 17 atoms, and the low percentage of small ring compounds at 17 atoms results in an overall 28% of small ring compounds in GDB-17 (25% in GDBLL-17).
In terms of topology, all databases contain approximately two-thirds of molecules with two or three cycles (Figure 4C). Key differences occur in acyclic compounds, which are relatively rare in GDB-17 (1.8%, 3.0 billion molecules) but make up 25% of DrugBank-17. Tri- and polycyclic molecules furthermore combine to 32% of GDB-17 but are much less frequent in the reference databases (PubChem-17: 7%; ChEMBL-17: 16%, DrugBank-17: 6%). The leadlike subset is also rich in tri- and polycyclic compounds (GDBLL-17: 33%), but their proportion is reduced when small rings are removed (GDBLLnoSR-17: 22%).
In terms of compound categories, heteroaromatic compounds make up a large third of GDB-17 and the reference databases (Figure 4D). By contrast aromatics, which also make up a third of reference databases, are quite rare in GDB-17 (0.8%, 1.3 billion molecules). GDB-17 is instead much richer in nonaromatic heterocycles (GDB-17: 57%, GDBLL-17: 41%, GDBLLnoSR-17: 35%) than the reference databases of known compounds (PubChem-17: 12%, ChEMBL-17: 10%, DrugBank-17: 12%).

Polarity and Leadlikeness

The histograms of the calculated octanol:water partition coefficient clogP shows that GDB-17 and DrugBank-17 contain more polar molecules than PubChem-17 and ChEMBL-17 (Figure 5A/B). A similar effect is visible in other polarity descriptors such as the number of H-bond donor atoms (Figure 5C/D). The fact that polar molecules often require longer syntheses and are more difficult to purify than apolar ones might explain their lower proportion in PubChem and ChEMBL, which contain mostly synthesized molecules, compared to GDB-17 representing the spectrum of possibilities. The frequency of polar molecules in the systematic enumeration of GDB-17 results in only 18% of GDB-17 being leadlike compounds as defined by the value ranges 1 < clogP < 3 and 100 < MW < 350 Da. (32) These 18% correspond to 29 billion molecules defining the GDBLL-17 subset, 22 billion of which do not have small rings and form the GDBLLnoSR-17 subset. By comparison approximately half of the compounds from the reference databases are leadlike (PubChem-17: 47%; ChEMBL-17: 49%; DrugBank-17: 36%).

Figure 5

Figure 5. Polarity features. A. c logP histogram in intervals −5.5 to −4.5, −4.5 to −3.5, etc; B. Average clogP as function of hac; C. H-bond donor atom (HBD) histogram; D. Average HBD as function of hac. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.

Molecular Shape

Organic molecules can be classified in terms of shape by analyzing the principal moments of inertia of their 3D structure, which allows to classify molecules either as rods (linear shape, e.g. stretched alkanes), discs (cyclic planar shape, e.g. benzene), or spheres (globular shape, e.g. cubane or adamantane). This analysis shows that the vast majority of currently used druglike molecules are either rodlike or disklike. Only a minority of the molecules used in medicinal chemistry possess any significant third dimension, leading to shape considerations as a design criteria for screening libraries. (41) Closer analyses of successes and failures show that molecules with a significant third dimension in shape are indeed often more successful in drug development programs, suggesting an “escape out of flatland” as a valuable strategy to search for better drug molecules. (42, 43) Nonplanarity is also more pronounced in natural products (NP) and products from diversity-oriented synthesis (DC) compared to commercial screening compounds (CC) and probably contributes to the higher protein binding selectivity of NP and DC compared to CC as observed in small molecule microarray experiments. (44, 45)
A 16.7 million random subset of GDB-17 was subjected to the above shape analysis, and the results were compared with the data for the reference databases. The analysis showed that GDB-17 molecules significantly populate the third dimension, which implies that the ″escape out of flatland″ is statistically unavoidable when considering the enumerated chemical space (Figure 6). The shape distribution into the third dimension is similar for the 29 billion leadlike subset GDBLL-17 and the 22 billion leadlike subset without small rings GDBLLnoSR-17. By comparison the known molecules in PubChem-17, ChEMBL-17, and DrugBank-17 are essentially rods and discs with relatively few spherical molecules. This “flatness” is a direct consequence of the abundance of aromatic systems in these databases of known compounds. Conversely, the occurrence of 3D-shaped molecules in GDB-17 results from the low proportion of acyclic and aromatic compounds and the high frequency of saturated heterocycles in the enumerated chemical space. GDB-17 molecules also contain more quaternary carbon centers (qv, Figure 7A/B) and bonds in fused rings (bfr, Figure 7C/D) compared to known compounds, which are features strongly associated with nonplanarity. The decrease in bfr at 14, 15, and 17 atoms in GDB reflects the introduction of the “no bridgehead in 3-rings”, “one small ring”, and “no small ring” filters which strongly reduce the number of topologies with high bfr. Somewhat unexpectedly, the small-ring filters also reduce the number of quaternary centers per molecule, as seen by the fact that the GDBLLnoSR subset contains fewer quaternary centers than GDB and its leadlike subset.

Figure 6

Figure 6. Molecular shape analyzed by the principal moments of inertia. (41) Occupancy maps are shown in the (P1,P2)-plane, in which P1 and P2 are the normalized ratios of the principal moments of inertia (for details see section Methods), and are colored from blue (1 cpd/pixel) to purple (maximum cpd/pixel for each map: GDB-17: 4,691, GDBLL-17: 889, GDBLLnoSR-17: 684, Pubchem-17: 6202, Chembl-17: 487, Drugbank-17: 4). The inserts show an enlarged view of the lower left edge of each triangle where occupancy is highest for PubChem-17, ChEMBL-17, and DrugBank-17. The GDB-17, GDBLL-17, and GDBLLnoSR-17 were analyzed with a random subset of 16.7 million molecules from GDB-17. For all compounds a single stereoisomer was analyzed as generated by CORINA.

Figure 7

Figure 7. Histograms of quaternary centers (qv) and bonds in fused rings (bfr) in the different databases. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.

Stereochemistry

The much higher frequency of nonplanar molecules in GDB-17 compared to the reference databases should also be reflected in a larger number of stereocenters and hence possible stereoisomers per compound. The number of possible stereoisomers per molecule was determined using the 3D-generator CORINA, (46) which exhaustively generates stereoisomers from 2D structures. CORINA correctly excludes impossible combinations of stereoisomer flipping (e.g., only one stereoisomer for norbornane). CORINA produces enantiomers as pairs with the exception of atropisomers, which are rather rare, implying that molecules for which only a single diastereoisomer is produced are almost always achiral. Their number provides a lower estimate of the number of achiral molecules because meso compounds (e.g., 1,4-dimethylcyclohexane or (R,S)-2,3-butanediol) and achiral Z/E isomer pairs are not singled out.
The stereoisomer counting with CORINA was performed on the 16.7 million subset of GDB-17 and on the reference databases (Figure 8). GDB-17 molecules produced an average of 6.4 stereoisomers per molecule (GDBLL-17: 5.7 stereoisomers/cpd, GDBLLnoSR: 5.1 stereoisomers/cpd), which is three times more than in the reference databases (PubChem-17: 2.0 stereoisomers/cpd, ChEMBL-17: 2.0 stereoisomers/cpd, DrugBank-17: 2.1 stereoisomers/cpd). More than half of the molecules in the reference databases have only one stereoisomer (PubChem-17: 56%, ChEMBL-17: 58%, DrugBank-17: 55%), while only 5% are molecules with eight or more possible stereoisomers (PubChem-17: 4.1%, ChEMBL-17: 4.6%, DrugBank-17: 5.2%). By contrast GDB-17 (respectively GDBLL-17, GDBLLnoSR-17) contains only 22% (respectively 23%, 27%) of molecules with a single stereoisomer but 44% (respectively 38%, 32%) of molecules with eight of more stereoisomers. The smaller average number of stereoisomers per compound as a function of hac in GDBLLnoSR-17 compared to GDBLL-17 shows that the presence of small rings is partly responsible for the larger number of stereoisomers in GDB-17 compared to known compounds.

Figure 8

Figure 8. Stereochemistry. A Numbers of stereoisomers per compounds. B. Average number of stereoisomer per compound as a function of hac. Stereoisomers were generated from SMILES using CORINA. The data for GDB-17, GDBLL-17, and GDBLLnoSR-17 stem from the analysis of a random 16.7 million subset of GDB-17.

Novelty and Scaffolds

The above analyses show that the novelty of GDB-17 compared to PubChem-17 can be assigned in part to global structural features including the relative rarity of aromatic and acyclic compounds, the frequent occurrence of molecules with small rings and nonaromatic heterocycles, and the higher proportion of polar molecules (clogP < 0). GDB-17 molecules also differ from PubChem-17 molecules in that they contain more structural features leading to 3D-shapes, such as quaternary centers and bonds in fused rings, as well as generally more stereoisomers per molecule. Nevertheless GDB-17 contains impressive numbers of compounds within any constraints, as exemplified with the millions of isomers of known drugs and the size of the leadlike/no small rings subset GDBLLnoSR-17 containing 22 billion molecules. By their number these molecules are necessarily new although they represent variations of known compound types.
One can also analyze the databases for novelty independent of global parameters by focusing on the occurrence of “scaffolds”. As “scaffolds” we considered either the “Murcko scaffolds”, (47) which are defined as the saturated hydrocarbon graph of a molecule pruned of any terminal atom, or “ring systems” defined as hydrocarbon graphs without acyclic bonds. (21) The analysis was performed for GDB-17 by considering the 5.4 million unique graphs used for molecule generation (Table 1) and extracting graphs corresponding to Murcko scaffolds and ring systems. For PubChem-17 each molecule was converted to its parent saturated hydrocarbon. All terminal atoms were then removed iteratively to produce “Murcko scaffolds”, and all acyclic bonds were removed to produce “ring systems”. Each resulting list was reduced to unique structures by removing duplicates. Each series was split into three categories by analyzing the smallest set of smallest rings as follows: a) scaffolds containing at least one small ring (″SR″); b) scaffolds containing only 5–7 membered rings (″5–7″); and c) scaffolds without small rings containing at least one 8-membered or larger ring (″8+″). The scaffolds were further subdivided according to the number of quaternary centers (Table 7).
The scaffold and ring system analysis shows that GDB-17 contains 35-fold more Murcko scaffolds and 61-fold more ring systems than PubChem-17. The majority of the imbalance stems from scaffolds containing small rings (Murcko scaffolds: 52-fold excess in GDB-17, ring systems: 109-fold excess in GDB-17), in particular small ring scaffolds with quaternary centers (Murcko scaffolds: 105-fold excess in GDB-17, ring systems: 170-fold excess in GDB-17). If considering only Murcko scaffolds or ring systems with 5- to 7-membered rings and without any quaternary center, which are the easiest to synthesize and most common ring systems, the number of scaffolds is only, but still, 2-fold larger in GDB-17 compared to PubChem-17. Ring systems that are yet unknown even as substructure are readily identified in GDB-17, such as the yet unknown C17-hydrocarbon graphs 4955 shown in Figure 9.
Table 7. Scaffold Analysis of GDB-17 and PubChem-17a
Murcko scaffolds     
no. of quat. C012>2SUM
GDB-17, SR19,80456,97565,53642,895185,210
GDB-17, 5–71,7362,5611,1952175,709
GDB-17, 8+1,1134664401,623
SUM22,65360,00266,77543,112192,542
PubChem-17, SR1,9971,114405563,572
PubChem-17, 5–796056212131,646
PubChem-17, 8+3074180356
SUM3,2641,717534595,574
Ring Systems     
no. of quat. C012>2SUM
GDB-17, SR12,60745,41960,72042,293161,039
GDB-17, 5–71,1352,1431,1262174,621
GDB-17, 8+9784264401,448
SUM14,72047,98861,89042,510167,108
PubChem-17, SR600521314451,480
PubChem-17, 5–74803751113969
PubChem-17, 8+2543680298
SUM1,334932433482,747
a

Murcko scaffolds are hydrocarbon graphs without any terminal atoms and ring systems are hydrocarbon graphs without any acyclic bonds. Scaffolds and ring systems are divided into three categories: SR: at least one small (3- or 4-membered) ring; 5–7: containing only 5- to 7-membered rings; 8+: no small ring and at least one 8-membered or larger ring. Rings are analyzed in the smallest set of smallest rings i.e.. bicyclo[2.2.1]heptane (norbornane) contains two 5-membered rings, while its 6-membered ring is not considered.

Figure 9

Figure 9. Examples of yet unknown C17-ring systems from GDB-17. These hydrocarbons do not give any hits in Scifinder using ″any atom″ types for carbons and ″any bond″ for bonds, including substructure searches but locking further ring fusions. Stereochemistry is not considered in these searches. The ring systems are shown as one possible stereoisomer.

Conclusion

ARTICLE SECTIONS
Jump To

In summary the enumeration of organic molecules starting from mathematical graphs was realized up to 17 atoms of C, N, O, S, and halogens, yielding 166.4 billion molecules corresponding to a defined set of functional group and atom types. Compared to the 2.5 million known molecules up to 17 atoms found in PubChem, GDB-17 molecules contain generally more rings, in particular small rings, as well as many nonaromatic heterocycles. On the other hand, cyclic and aromatic compounds form a much smaller fraction of the database compared to PubChem. GDB-17 molecules furthermore contain more quaternary centers and bonds in fused rings than PubChem molecules, resulting in significant 3D-shapes and a larger number of stereoisomers per molecule. GDB-17 molecules are on average also more polar than known molecules (clogP < 0), although a leadlike subset occupying the range 0 < clogP < 3 still contains 22 billion molecules even when excluding small ring compounds. The structural diversity of GDB-17 is evidenced by the presence of a much larger number of scaffolds compared to known molecules. The abundance of nonplanar molecules suggests that the enumerated chemical space might serve as a rich source of inspiration to design new molecular series for drug discovery.
As to the size of GDB-17, working with 166.4 billion structures is challenging and currently not applicable to advanced virtual screening methods such as shape-based analyses or docking, which are computationally relatively intensive. For such applications a randomly selected subset of GDB-17 of a few hundred thousand to a few million structures is statistically significant and can be used as representative of the whole database. On the other hand, the identification of single molecules such as the selected analogs of known drugs shown in Figure 3, or the examples of polycyclic hydrocarbons shown in Figure 9, can deliver many more interesting results with the complete database because every single molecule is different and identifiable in its own right. The assembly of a searchable version of the entire GDB-17 database and its use for identifying drug analogs by virtual screening represented a challenge of its own and will be described in a separate publication.

Methods

ARTICLE SECTIONS
Jump To

General

All code packages were written in Java 1.6 with Jchem Libaries from ChemAxon. Every filter was applied to the imported molecule to define bond and atom positions of functional groups. All computations were parallelized on a 360-CPU cluster and manually controlled (100,000 CPU hours corresponds to 11 CPU years). Every step was completed before starting the next. All together around 40,000 single calculations have been done. To preserve disk space every output was compressed directly into gzip either by piping with the bash command gzip or by the implementation of gzip into the GZIPStream in Java BufferedReader/Writer.The complete GDB is more than 400 GB as gzip.

Enumeration

Graphs

The program Nauty from McKay was used to generate the connectivity tables for graphs, as found under http://cs.anu.edu.au/people/bdm/. The Nauty subprogram GENG was run up to 17 nodes for the generation of all possible graphs/geng -cd1D4 (Number of Nodes) The check was done for planarity of the graphs by PLANARG to avoid molecules with crossed bonds, e.g. Claus’ benzene./planarg

Hydrocarbons

The resulting output of GENG is a G6 string for a connection table, which was imported and converted to the corresponding hydrocarbons by exchanging the nodes with carbon atoms and the edges with single bonds. Hydrocarbons were filtered for desirable features (Table 1) because the majority of graphs includes 3- and 4-membered rings or multiple connected globular ring systems.

Skeletons

Each single bond was checked for the combinatorially introduction of double bonds (only four bonds per carbon are possible). Additionally every resulting double bond was checked for the combinatorially introduction of triple bonds. The resulting hydrocarbons were aromatized and dearomatized to avoid multiple copies of the same aromatic ring system. Unsaturated hydrocarbons were filtered for desirable features, e.g. the majority of unsaturated hydrocarbons includes allenes (Table 2).

CNO Molecules

Every monovalent, divalent, and trivalent carbon position was checked for the substitution with nitrogen following valency rules. Each monovalent and divalent carbon position was then checked for the substitution with oxygen. Before each exchange it was checked if the position is adjacent to a nitrogen or oxygen atom to avoid the generation of heteroatom heteroatom bonds and speeding up computation. Additionally it was checked before if the position is next to a sp atom, in which case no N or O atom was introduced. Symmetric positions were calculated only once. The resulting CNO-molecules were checked for desirable features to avoid unstable functional groups and to reduce the combinatorial explosion (Table 3). Each molecule was converted to unique SMILES strings to check for duplicates before storing the molecule.

Postprocessing for Oximes, Nitro, CF3, Halogens, and Sulfur

Postprocessing for introducing additional diversity was performed as described in Table 4.

Shape Analysis

The shape analysis was adapted from Sauer and Schwarz (41) and was written in Java 1.6. SMILES were converted into 3D structure using CORINA. (46) The position of the molecule was expressed in a (x,y,z)-coordinate system defined by its principal axes. For each principal axis the moment of inertia was calculated using the general equation
Specified for each axis it yieldsin which the squares of the radii around the axes are defined as rx= y+ z2, ry2 = x2 + z2, and rz2 = x2 + y2. The moments of inertia Ix, Iy, and Iz were then sorted in ascending order to yield I1, I2, and I3. I1 and I2 were finally divided by the highest moment of inertia I3 to yield the values P1 = I1/I3 and P2 = I2/I3.
The (P1,P2)-plane defines a two-dimensional triangular space with distinct boundaries, i.e. structures cannot be found outside the triangle. The triangle also has three distinct edges defining the different dimensionality of molecular shapes: The upper left edge of the triangle (0,1), the lower center edge (0.5,0.5), and the upper right edge (1,1) define 1D rodlike, 2D disklike, and 3D spherical structures, respectively.

Stereoisomer Counting

The CORINA command ./corina.lnx -i t=smiles -o t=sdf -d ori,stergen,rs | grep ˈ$$$$ˈ | wc −l was used to count stereoisomers.

Distribution

A 50 million random subset of GDB-17 and the leadlike and leadlike/no small ring fraction of this subset are freely available for download as a SMILES list from www.gdb.unibe.ch.

Author Information

ARTICLE SECTIONS
Jump To

  • Corresponding Author
    • Jean-Louis Reymond - Department of Chemistry and Biochemistry, NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland Email: [email protected]
  • Authors
    • Lars Ruddigkeit - Department of Chemistry and Biochemistry, NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
    • Ruud van Deursen - Biomolecular Screening Facility, NCCR Chemical Biology, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland
    • Lorenz C. Blum - Department of Chemistry and Biochemistry, NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
  • Notes
    The authors declare no competing financial interest.

Acknowledgment

ARTICLE SECTIONS
Jump To

This work was supported financially by the University of Berne, the Swiss National Science Foundation, the NCCR TransCure, and the NCCR Chemical Biology.

References

ARTICLE SECTIONS
Jump To

This article references 47 other publications.

  1. 1
    Lipkus, A. H.; Yuan, Q.; Lucas, K. A.; Funk, S. A.; Bartelt, W. F.; Schenck, R. J.; Trippe, A. J. Structural diversity of organic chemistry. A scaffold analysis of the CAS Registry J. Org. Chem. 2008, 73, 4443 4451
  2. 2
    ACS NEWS Chem. Eng. News 2011, 89, 38
  3. 3
    Bleicher, K. H.; Bohm, H. J.; Muller, K.; Alanine, A. I. Hit and lead generation: Beyond high-throughput screening Nat. Rev. Drug Discovery 2003, 2, 369 378
  4. 4
    Schreiber, S. L. Small molecules: the missing link in the central dogma Nat. Chem. Biol. 2005, 1, 64 66
  5. 5
    Mayr, L. M.; Bojanic, D. Novel trends in high-throughput screening Curr. Opin. Pharmacol. 2009, 9, 580 588
  6. 6
    Renner, S.; Popov, M.; Schuffenhauer, A.; Roth, H. J.; Breitenstein, W.; Marzinzik, A.; Lewis, I.; Krastel, P.; Nigsch, F.; Jenkins, J.; Jacoby, E. Recent trends and observations in the design of high-quality screening collections Future Med. Chem 2011, 3, 751 766
  7. 7
    Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discovery 2004, 3, 711 715
  8. 8
    Hann, M. M. Molecular obesity, potency and other addictions in drug discovery MedChemComm 2011, 2, 349 355
  9. 9
    Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules Nat. Rev. Drug Discovery 2005, 4, 649 663
  10. 10
    Jorgensen, W. L. Efficient drug lead discovery and optimization Acc. Chem. Res. 2009, 42, 724 733
  11. 11
    Reymond, J. L.; Van Deursen, R.; Blum, L. C.; Ruddigkeit, L. Chemical space as a source for new drugs MedChemComm 2010, 1, 30 38
  12. 12
    Hartenfeller, M.; Schneider, G. De novo drug design Methods Mol. Biol. 2011, 672, 299 323
  13. 13
    Klebe, G. Virtual ligand screening: strategies, perspectives and limitations Drug Discovery Today 2006, 11, 580 594
  14. 14
    Kolb, P.; Ferreira, R. S.; Irwin, J. J.; Shoichet, B. K. Docking and chemoinformatic screens for new ligands and targets Curr. Opin. Biotechnol. 2009, 20, 429 36
  15. 15
    Geppert, H.; Vogt, M.; Bajorath, J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation J. Chem. Inf. Model. 2010, 50, 205 216
  16. 16
    Cayley, E. Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen Chem. Ber. 1875, 8, 1056 1059
  17. 17
    Lederberg, J.; Sutherland, G. L.; Buchanan, B. G.; Feigenbaum, E. A.; Robertson, A. V.; Duffield, A. M.; Djerassi, C. Applications of artificial intelligence for chemical inference. I. Number of possible organic compounds. Acyclic structures containing carbon, hydrogen, oxygen, and nitrogen J. Am. Chem. Soc. 1969, 91, 2973 2976
  18. 18
    Steinbeck, C. Recent developments in automated structure elucidation of natural products Nat. Prod. Rep. 2004, 21, 512 518
  19. 19
    Reymond, J. L.; Ruddigkeit, L.; Blum, L. C.; Van Deursen, R. The enumeration of chemical space Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 717 733
  20. 20
    Fink, T.; Bruggesser, H.; Reymond, J. L. Virtual exploration of the small-molecule chemical universe below 160 Da Angew. Chem., Int. Ed. Engl. 2005, 44, 1504 1508
  21. 21
    Fink, T.; Reymond, J. L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery J. Chem. Inf. Model. 2007, 47, 342 353
  22. 22
    Blum, L. C.; Reymond, J. L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 J. Am. Chem. Soc. 2009, 131, 8732 8733
  23. 23
    Blum, L. C.; van Deursen, R.; Reymond, J. L. Visualisation and subsets of the chemical universe database GDB-13 for virtual screening J. Comput.-Aided Mol. Des. 2011, 25, 637 647
  24. 24
    Nguyen, K. T.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Discovery of NMDA glycine site inhibitors from the chemical universe database GDB ChemMedChem 2008, 3, 1520 1524
  25. 25
    Nguyen, K. T.; Luethi, E.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. 3-(aminomethyl)piperazine-2,5-dione as a novel NMDA glycine site inhibitor from the chemical universe database GDB Bioorg. Med. Chem. Lett. 2009, 19, 3832 3835
  26. 26
    Garcia-Delgado, N.; Bertrand, S.; Nguyen, K. T.; van Deursen, R.; Bertrand, D.; Reymond, J.-L. Exploring a7-nicotinic receptor ligand diversity by scaffold enumeration from the Chemical Universe Database GDB ACS Med. Chem. Lett. 2010, 1, 422 426
  27. 27
    Luethi, E.; Nguyen, K. T.; Burzle, M.; Blum, L. C.; Suzuki, Y.; Hediger, M.; Reymond, J. L. Identification of selective norbornane-type aspartate analogue inhibitors of the glutamate transporter 1 (GLT-1) from the chemical universe generated database (GDB) J. Med. Chem. 2010, 53, 7236 7250
  28. 28
    Blum, L. C.; van Deursen, R.; Bertrand, S.; Mayer, M.; Burgi, J. J.; Bertrand, D.; Reymond, J. L. Discovery of alpha7-nicotinic receptor ligands by virtual screening of the Chemical Universe Database GDB-13 J. Chem. Inf. Model. 2011, 51, 3105 3112
  29. 29
    Brethous, L.; Garcia-Delgado, N.; Schwartz, J.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Synthesis and nicotinic receptor activity of chemical space analogues of N-(3R)-1-azabicyclo[2.2.2]oct-3-yl-4-chlorobenzamide (PNU-282,987) and 1,4-diazabicyclo[3.2.2]nonane-4-carboxylic acid 4-bromophenyl ester (SSR180711) J. Med. Chem. 2012, 55, 4605 4618
  30. 30
    Reymond, J. L.; Awale, M. Exploring chemical space for drug discovery using the Chemical Universe Database ACS Chem. Neurosci. 2012, 3, 649 657
  31. 31
    Foloppe, N. The benefits of constructing leads from fragment hits Future Med. Chem. 2011, 3, 1111 1115
  32. 32
    Teague, S. J.; Davis, A. M.; Leeson, P. D.; Oprea, T. The design of leadlike combinatorial libraries Angew. Chem., Int. Ed. Engl. 1999, 38, 3743 3748
  33. 33
    Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Res. 2009, 37, W623 W633
  34. 34
    Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery Nucleic Acids Res. 2012, 40, D1100 D1107
  35. 35
    Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S. DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs Nucleic Acids Res. 2011, 39, D1035 D1041
  36. 36
    McKay, B. D. Practical graph isomorphism Congressus Numerantium 1981, 30, 45 87
  37. 37
    Rishton, G. M. Reactive compounds and in vitro false positives in HTS Drug Discovery Today 1997, 2, 382 384
  38. 38
    Rishton, G. M. Nonleadlikeness and leadlikeness in biochemical screening Drug Discovery Today 2003, 8, 86 96
  39. 39
    Rush, T. S., III; Grant, J. A.; Mosyak, L.; Nicholls, A. A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction J. Med. Chem. 2005, 48, 1489 1495
  40. 40
    Nicholls, A.; McGaughey, G. B.; Sheridan, R. P.; Good, A. C.; Warren, G.; Mathieu, M.; Muchmore, S. W.; Brown, S. P.; Grant, J. A.; Haigh, J. A.; Nevins, N.; Jain, A. N.; Kelley, B. Molecular shape and medicinal chemistry: a perspective J. Med. Chem. 2010, 53, 3862 3886
  41. 41
    Sauer, W. H.; Schwarz, M. K. Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity J. Chem. Inf. Comput. Sci. 2003, 43, 987 1003
  42. 42
    Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success J. Med. Chem. 2009, 52, 6752 6756
  43. 43
    Ritchie, T. J.; Macdonald, S. J.; Young, R. J.; Pickett, S. D. The impact of aromatic ring count on compound developability: further insights by examining carbo- and hetero-aromatic and -aliphatic ring types Drug Discovery Today 2011, 16, 164 171
  44. 44
    Clemons, P. A.; Bodycombe, N. E.; Carrinski, H. A.; Wilson, J. A.; Shamji, A. F.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 18787 18792
  45. 45
    Clemons, P. A.; Wilson, J. A.; Dancik, V.; Muller, S.; Carrinski, H. A.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 6817 6822
  46. 46
    Sadowski, J.; Gasteiger, J. From atoms and bonds to 3-dimensional atomic coordinates - automatic model builders Chem. Rev. 1993, 93, 2567 2581
  47. 47
    Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks J. Med. Chem. 1996, 39, 2887 2893

Cited By

This article is cited by 743 publications.

  1. Adrian Krzyzanowski, Axel Pahl, Michael Grigalunas, Herbert Waldmann. Spacial Score─A Comprehensive Topological Indicator for Small-Molecule Complexity. Journal of Medicinal Chemistry 2023, 66 (18) , 12739-12750. https://doi.org/10.1021/acs.jmedchem.3c00689
  2. Maho Nakata, Toshiyuki Maeda. PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* Calculations. Journal of Chemical Information and Modeling 2023, 63 (18) , 5734-5754. https://doi.org/10.1021/acs.jcim.3c00899
  3. Gergely Takács, Dávid Havasi, Márk Sándor, Zsolt Dohánics, György T. Balogh, Róbert Kiss. DIY Virtual Chemical Libraries - Novel Starting Points for Drug Discovery. ACS Medicinal Chemistry Letters 2023, 14 (9) , 1188-1197. https://doi.org/10.1021/acsmedchemlett.3c00146
  4. Hyeongwoo Kim, Kyunghoon Lee, Chansu Kim, Jaechang Lim, Woo Youn Kim. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. Journal of Chemical Information and Modeling 2023, Article ASAP.
  5. Sibei Guo, Jun Jiang, Hao Ren, Song Wang. Fusion of Multiple Spectra for Investigating Chemical Bonding Properties via Machine Learning. The Journal of Physical Chemistry Letters 2023, 14 (33) , 7461-7468. https://doi.org/10.1021/acs.jpclett.3c01709
  6. Rostislav Fedorov, Ganna Gryn’ova. Unlocking the Potential: Predicting Redox Behavior of Organic Molecules, from Linear Fits to Neural Networks. Journal of Chemical Theory and Computation 2023, 19 (15) , 4796-4814. https://doi.org/10.1021/acs.jctc.3c00355
  7. Yuyang Wang, Changwen Xu, Zijie Li, Amir Barati Farimani. Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials. Journal of Chemical Theory and Computation 2023, 19 (15) , 5077-5087. https://doi.org/10.1021/acs.jctc.3c00289
  8. Jeffrey A. Dewey, Clémence Delalande, Saara-Anne Azizi, Vivian Lu, Dionysios Antonopoulos, Gyorgy Babnigg. Molecular Glue Discovery: Current and Future Approaches. Journal of Medicinal Chemistry 2023, 66 (14) , 9278-9296. https://doi.org/10.1021/acs.jmedchem.3c00449
  9. Guoxiang Zhao, Weiyin Yan, Zirui Wang, Yao Kang, Zuju Ma, Zhi-Gang Gu, Qiao-Hong Li, Jian Zhang. Predict the Polarizability and Order of Magnitude of Second Hyperpolarizability of Molecules by Machine Learning. The Journal of Physical Chemistry A 2023, 127 (29) , 6109-6115. https://doi.org/10.1021/acs.jpca.2c08563
  10. Thorren Kirschbaum, Börries von Seggern, Joachim Dzubiella, Annika Bande, Frank Noé. Machine Learning Frontier Orbital Energies of Nanodiamonds. Journal of Chemical Theory and Computation 2023, 19 (14) , 4461-4473. https://doi.org/10.1021/acs.jctc.2c01275
  11. Sheng-Hsuan Hung, Zong-Rong Ye, Chi-Feng Cheng, Berlin Chen, Ming-Kang Tsai. Enhanced Predictions for the Experimental Photophysical Data Using the Featurized Schnet-Bondstep Approach. Journal of Chemical Theory and Computation 2023, 19 (14) , 4559-4567. https://doi.org/10.1021/acs.jctc.3c00054
  12. Esther Heid, Charles J. McGill, Florence H. Vermeire, William H. Green. Characterizing Uncertainty in Machine Learning for Chemistry. Journal of Chemical Information and Modeling 2023, 63 (13) , 4012-4029. https://doi.org/10.1021/acs.jcim.3c00373
  13. Nutaya Pravalphruekul, Maytus Piriyajitakonkij, Phond Phunchongharn, Supanida Piyayotai. De Novo Design of Molecules with Multiaction Potential from Differential Gene Expression using Variational Autoencoder. Journal of Chemical Information and Modeling 2023, 63 (13) , 3999-4011. https://doi.org/10.1021/acs.jcim.3c00355
  14. Johanna Kleinekorte, Jonas Kleppich, Lorenz Fleitmann, Verena Beckert, Luise Blodau, André Bardow. APPROPRIATE Life Cycle Assessment: A PROcess-Specific, PRedictive Impact AssessmenT Method for Emerging Chemical Processes. ACS Sustainable Chemistry & Engineering 2023, 11 (25) , 9303-9319. https://doi.org/10.1021/acssuschemeng.2c07682
  15. Dmitrij Rappoport. Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models. The Journal of Physical Chemistry A 2023, 127 (24) , 5252-5263. https://doi.org/10.1021/acs.jpca.3c01430
  16. Po-Yu Kao, Ya-Chu Yang, Wei-Yin Chiang, Jen-Yueh Hsiao, Yudong Cao, Alex Aliper, Feng Ren, Alán Aspuru-Guzik, Alex Zhavoronkov, Min-Hsiu Hsieh, Yen-Chu Lin. Exploring the Advantages of Quantum Generative Adversarial Networks in Generative Chemistry. Journal of Chemical Information and Modeling 2023, 63 (11) , 3307-3318. https://doi.org/10.1021/acs.jcim.3c00562
  17. Po-Yen Chen, Kiyou Shibata, Katsumi Hagita, Tomohiro Miyata, Teruyasu Mizoguchi. Prediction of the Ground-State Electronic Structure from Core-Loss Spectra of Organic Molecules by Machine Learning. The Journal of Physical Chemistry Letters 2023, 14 (20) , 4858-4865. https://doi.org/10.1021/acs.jpclett.3c00142
  18. Eric M. Collins, Krishnan Raghavachari. Interpretable Graph-Network-Based Machine Learning Models via Molecular Fragmentation. Journal of Chemical Theory and Computation 2023, 19 (10) , 2804-2810. https://doi.org/10.1021/acs.jctc.2c01308
  19. Lieven Bekaert, Suzuno Akatsuka, Naoto Tanibata, Frank De Proft, Annick Hubin, Mesfin Haile Mamme, Masanobu Nakayama. Assessing the Reactivity of the Na3PS4 Solid-State Electrolyte with the Sodium Metal Negative Electrode Using Total Trajectory Analysis with Neural-Network Potential Molecular Dynamics. The Journal of Physical Chemistry C 2023, 127 (18) , 8503-8514. https://doi.org/10.1021/acs.jpcc.3c02379
  20. Matthew P. Stewart, Scot T. Martin. Machine Learning for Ionization Potentials and Photoionization Cross Sections of Volatile Organic Compounds. ACS Earth and Space Chemistry 2023, 7 (4) , 863-875. https://doi.org/10.1021/acsearthspacechem.3c00009
  21. Carlos Manuel de Armas-Morejón, Luis A. Montero-Cabrera, Angel Rubio, Joaquim Jornet-Somoza. Electronic Descriptors for Supervised Spectroscopic Predictions. Journal of Chemical Theory and Computation 2023, 19 (6) , 1818-1826. https://doi.org/10.1021/acs.jctc.2c01039
  22. Rishi Gurnani, Christopher Kuenneth, Aubrey Toland, Rampi Ramprasad. Polymer Informatics at Scale with Multitask Graph Neural Networks. Chemistry of Materials 2023, 35 (4) , 1560-1567. https://doi.org/10.1021/acs.chemmater.2c02991
  23. Jinzhe Zeng, Yujun Tao, Timothy J. Giese, Darrin M. York. QDπ: A Quantum Deep Potential Interaction Model for Drug Discovery. Journal of Chemical Theory and Computation 2023, 19 (4) , 1261-1275. https://doi.org/10.1021/acs.jctc.2c01172
  24. Song Xia, Dongdong Zhang, Yingkai Zhang. Multitask Deep Ensemble Prediction of Molecular Energetics in Solution: From Quantum Mechanics to Experimental Properties. Journal of Chemical Theory and Computation 2023, 19 (2) , 659-668. https://doi.org/10.1021/acs.jctc.2c01024
  25. Kristian Kříž, Lisa Schmidt, Alfred T. Andersson, Marie-Madeleine Walz, David van der Spoel. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. Journal of Chemical Information and Modeling 2023, 63 (2) , 412-431. https://doi.org/10.1021/acs.jcim.2c01127
  26. Ye Buehler, Jean-Louis Reymond. Molecular Framework Analysis of the Generated Database GDB-13s. Journal of Chemical Information and Modeling 2023, 63 (2) , 484-492. https://doi.org/10.1021/acs.jcim.2c01107
  27. Dongliang Kang, Jun Ma, Ya-Pu Zhao. Perspectives of Machine Learning Development on Kerogen Molecular Model Reconstruction and Shale Oil/Gas Exploitation. Energy & Fuels 2023, 37 (1) , 98-117. https://doi.org/10.1021/acs.energyfuels.2c03307
  28. Megan A. Lim, Song Yang, Huanghao Mai, Alan C. Cheng. Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions. Journal of Chemical Information and Modeling 2022, 62 (24) , 6336-6341. https://doi.org/10.1021/acs.jcim.2c00245
  29. Maximilian Beckers, Nikolas Fechner, Nikolaus Stiefl. 25 Years of Small-Molecule Optimization at Novartis: A Retrospective Analysis of Chemical Series Evolution. Journal of Chemical Information and Modeling 2022, 62 (23) , 6002-6021. https://doi.org/10.1021/acs.jcim.2c00785
  30. Joshua L. Lansford, Brian C. Barnes, Betsy M. Rice, Klavs F. Jensen. Building Chemical Property Models for Energetic Materials from Small Datasets Using a Transfer Learning Approach. Journal of Chemical Information and Modeling 2022, 62 (22) , 5397-5410. https://doi.org/10.1021/acs.jcim.2c00841
  31. Hisham Abdel-Aty, Ian R. Gould. Large-Scale Distributed Training of Transformers for Chemical Fingerprinting. Journal of Chemical Information and Modeling 2022, 62 (20) , 4852-4862. https://doi.org/10.1021/acs.jcim.2c00715
  32. Alex S. Moraes, Gabriel A. Pinheiro, Tuanan C. Lourenço, Mauro C. Lopes, Marcos G. Quiles, Luis G. Dias, Juarez L. F. Da Silva. Screening of the Role of the Chemical Structure in the Electrochemical Stability Window of Ionic Liquids: DFT Calculations Combined with Data Mining. Journal of Chemical Information and Modeling 2022, 62 (19) , 4702-4712. https://doi.org/10.1021/acs.jcim.2c00748
  33. Masato Sumita, Kei Terayama, Ryo Tamura, Koji Tsuda. QCforever: A Quantum Chemistry Wrapper for Everyone to Use in Black-Box Optimization. Journal of Chemical Information and Modeling 2022, 62 (18) , 4427-4434. https://doi.org/10.1021/acs.jcim.2c00812
  34. GuanYa Yang, Wai Yuet Chiu, Jiang Wu, Yi Zhou, ShuGuang Chen, WeiJun Zhou, Jiaqi Fan, GuanHua Chen. Predicting Experimental Heats of Formation via Deep Learning with Limited Experimental Data. The Journal of Physical Chemistry A 2022, 126 (36) , 6295-6300. https://doi.org/10.1021/acs.jpca.2c02957
  35. Min Xie, Xiaonan Sun, Weixing Li, Jiwen Guan, Zhenhao Liang, Yongjun Hu. A Facile Route for the Formation of Complex Nitrogen-Containing Prebiotic Molecules in the Interstellar Medium. The Journal of Physical Chemistry Letters 2022, 13 (34) , 8207-8213. https://doi.org/10.1021/acs.jpclett.2c01857
  36. Florian Spenke, Bernd Hartke. Graph-based Automated Macro-Molecule Assembly. Journal of Chemical Information and Modeling 2022, 62 (16) , 3714-3723. https://doi.org/10.1021/acs.jcim.2c00609
  37. Filipe Menezes, Grzegorz M. Popowicz. ULYSSES: An Efficient and Easy to Use Semiempirical Library for C++. Journal of Chemical Information and Modeling 2022, 62 (16) , 3685-3694. https://doi.org/10.1021/acs.jcim.2c00757
  38. Zong-Rong Ye, Sheng-Hsuan Hung, Berlin Chen, Ming-Kang Tsai. Assessment of Predicting Frontier Orbital Energies for Small Organic Molecules Using Knowledge-Based and Structural Information. ACS Engineering Au 2022, 2 (4) , 360-368. https://doi.org/10.1021/acsengineeringau.2c00011
  39. Emad Al Ibrahim, Aamir Farooq. Transfer Learning Approach to Multitarget Temperature-Dependent Reaction Rate Prediction. The Journal of Physical Chemistry A 2022, 126 (28) , 4617-4629. https://doi.org/10.1021/acs.jpca.2c00713
  40. Jingbai Li, Steven A. Lopez. A Look Inside the Black Box of Machine Learning Photodynamics Simulations. Accounts of Chemical Research 2022, 55 (14) , 1972-1984. https://doi.org/10.1021/acs.accounts.2c00288
  41. Jonathan Shearer, Jose L. Castro, Alastair D. G. Lawson, Malcolm MacCoss, Richard D. Taylor. Rings in Clinical Trials and Drugs: Present and Future. Journal of Medicinal Chemistry 2022, 65 (13) , 8699-8712. https://doi.org/10.1021/acs.jmedchem.2c00473
  42. Kanishka Singh, Jannes Münchmeyer, Leon Weber, Ulf Leser, Annika Bande. Graph Neural Networks for Learning Molecular Excitation Spectra. Journal of Chemical Theory and Computation 2022, 18 (7) , 4408-4417. https://doi.org/10.1021/acs.jctc.2c00255
  43. Huaipan Jiang, Jian Wang, Weilin Cong, Yihe Huang, Morteza Ramezani, Anup Sarma, Nikolay V. Dokholyan, Mehrdad Mahdavi, Mahmut T. Kandemir. Predicting Protein–Ligand Docking Structure with Graph Neural Network. Journal of Chemical Information and Modeling 2022, 62 (12) , 2923-2932. https://doi.org/10.1021/acs.jcim.2c00127
  44. Jiahui Yu, Jike Wang, Hong Zhao, Junbo Gao, Yu Kang, Dongsheng Cao, Zhe Wang, Tingjun Hou. Organic Compound Synthetic Accessibility Prediction Based on the Graph Attention Mechanism. Journal of Chemical Information and Modeling 2022, 62 (12) , 2973-2986. https://doi.org/10.1021/acs.jcim.2c00038
  45. Kiran Sasikumar, Raghavan Ranganathan, Srujan Rokkam, Tapan Desai, Richard Burnes, Peter Cross. Development of Chemical Kinetics Models from Atomistic Reactive Molecular Dynamics Simulations: Application to Iso-octane Combustion and Rubber Ablative Degradation. The Journal of Physical Chemistry A 2022, 126 (21) , 3358-3372. https://doi.org/10.1021/acs.jpca.2c00901
  46. Mohammadamin Tavakoli, Aaron Mood, David Van Vranken, Pierre Baldi. Quantum Mechanics and Machine Learning Synergies: Graph Attention Neural Networks to Predict Chemical Reactivity. Journal of Chemical Information and Modeling 2022, 62 (9) , 2121-2132. https://doi.org/10.1021/acs.jcim.1c01400
  47. Wendy A. Warr, Marc C. Nicklaus, Christos A. Nicolaou, Matthias Rarey. Exploration of Ultralarge Compound Collections for Drug Discovery. Journal of Chemical Information and Modeling 2022, 62 (9) , 2021-2034. https://doi.org/10.1021/acs.jcim.2c00224
  48. Eiichi Kojima, Atsuhiro Iimuro, Mado Nakajima, Hirotaka Kinuta, Naoya Asada, Yusuke Sako, Zenzaburo Nakata, Kentaro Uemura, Shuhei Arita, Shinobu Miki, Chiaki Wakasa-Morimoto, Yuki Tachibana. Pocket-to-Lead: Structure-Based De Novo Design of Novel Non-peptidic HIV-1 Protease Inhibitors Using the Ligand Binding Pocket as a Template. Journal of Medicinal Chemistry 2022, 65 (8) , 6157-6170. https://doi.org/10.1021/acs.jmedchem.1c02217
  49. Ankur Kumar Gupta, Krishnan Raghavachari. Three-Dimensional Convolutional Neural Networks Utilizing Molecular Topological Features for Accurate Atomization Energy Predictions. Journal of Chemical Theory and Computation 2022, 18 (4) , 2132-2143. https://doi.org/10.1021/acs.jctc.1c00504
  50. Yi Hua, Xiaobao Fang, Guomeng Xing, Yuan Xu, Li Liang, Chenglong Deng, Xiaowen Dai, Haichun Liu, Tao Lu, Yanmin Zhang, Yadong Chen. Effective Reaction-Based De Novo Strategy for Kinase Targets: A Case Study on MERTK Inhibitors. Journal of Chemical Information and Modeling 2022, 62 (7) , 1654-1668. https://doi.org/10.1021/acs.jcim.2c00068
  51. Yury Kostyukevich, Sergey Sosnin, Sergey Osipenko, Oxana Kovaleva, Lidiia Rumiantseva, Albert Kireev, Alexander Zherebker, Maxim Fedorov, Evgeny N. Nikolaev. PyFragMS─A Web Tool for the Investigation of the Collision-Induced Fragmentation Pathways. ACS Omega 2022, 7 (11) , 9710-9719. https://doi.org/10.1021/acsomega.1c07272
  52. Penglei Wang, Shuangjia Zheng, Yize Jiang, Chengtao Li, Junhong Liu, Chang Wen, Atanas Patronov, Dahong Qian, Hongming Chen, Yuedong Yang. Structure-Aware Multimodal Deep Learning for Drug–Protein Interaction Prediction. Journal of Chemical Information and Modeling 2022, 62 (5) , 1308-1317. https://doi.org/10.1021/acs.jcim.2c00060
  53. Raimon Fabregat, Alberto Fabrizio, Edgar A. Engel, Benjamin Meyer, Veronika Juraskova, Michele Ceriotti, Clemence Corminboeuf. Local Kernel Regression and Neural Network Approaches to the Conformational Landscapes of Oligopeptides. Journal of Chemical Theory and Computation 2022, 18 (3) , 1467-1479. https://doi.org/10.1021/acs.jctc.1c00813
  54. André F. Oliveira, Juarez L. F. Da Silva, Marcos G. Quiles. Molecular Property Prediction and Molecular Design Using a Supervised Grammar Variational Autoencoder. Journal of Chemical Information and Modeling 2022, 62 (4) , 817-828. https://doi.org/10.1021/acs.jcim.1c01573
  55. Ruocheng Han, Rangsiman Ketkaew, Sandra Luber. A Concise Review on Recent Developments of Machine Learning for the Prediction of Vibrational Spectra. The Journal of Physical Chemistry A 2022, 126 (6) , 801-812. https://doi.org/10.1021/acs.jpca.1c10417
  56. John M. Simmie. C2H5NO Isomers: From Acetamide to 1,2-Oxazetidine and Beyond. The Journal of Physical Chemistry A 2022, 126 (6) , 924-939. https://doi.org/10.1021/acs.jpca.1c09984
  57. Hideo Doi, Kazuaki Z. Takahashi, Takeshi Aoyagi. Screening toward the Development of Fingerprints of Atomic Environments Using Bond-Orientational Order Parameters. ACS Omega 2022, 7 (5) , 4606-4613. https://doi.org/10.1021/acsomega.1c06587
  58. Viktor Zaverkin, Julia Netz, Fabian Zills, Andreas Köhn, Johannes Kästner. Thermally Averaged Magnetic Anisotropy Tensors via Machine Learning Based on Gaussian Moments. Journal of Chemical Theory and Computation 2022, 18 (1) , 1-12. https://doi.org/10.1021/acs.jctc.1c00853
  59. Lei Tao, Vikas Varshney, Ying Li. Benchmarking Machine Learning Models for Polymer Informatics: An Example of Glass Transition Temperature. Journal of Chemical Information and Modeling 2021, 61 (11) , 5395-5413. https://doi.org/10.1021/acs.jcim.1c01031
  60. Tiago Sousa, João Correia, Vítor Pereira, Miguel Rocha. Generative Deep Learning for Targeted Compound Design. Journal of Chemical Information and Modeling 2021, 61 (11) , 5343-5361. https://doi.org/10.1021/acs.jcim.0c01496
  61. Ying Yang, Kun Yao, Matthew P. Repasky, Karl Leswing, Robert Abel, Brian K. Shoichet, Steven V. Jerome. Efficient Exploration of Chemical Space with Docking and Deep Learning. Journal of Chemical Theory and Computation 2021, 17 (11) , 7106-7119. https://doi.org/10.1021/acs.jctc.1c00810
  62. Xiaochu Tong, Xiaohong Liu, Xiaoqin Tan, Xutong Li, Jiaxin Jiang, Zhaoping Xiong, Tingyang Xu, Hualiang Jiang, Nan Qiao, Mingyue Zheng. Generative Models for De Novo Drug Design. Journal of Medicinal Chemistry 2021, 64 (19) , 14011-14027. https://doi.org/10.1021/acs.jmedchem.1c00927
  63. Viktor Zaverkin, David Holzmüller, Ingo Steinwart, Johannes Kästner. Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments. Journal of Chemical Theory and Computation 2021, 17 (10) , 6658-6670. https://doi.org/10.1021/acs.jctc.1c00527
  64. Philippe Gantzer, Benoit Creton, Carlos Nieto-Draghi. Comparisons of Molecular Structure Generation Methods Based on Fragment Assemblies and Genetic Graphs. Journal of Chemical Information and Modeling 2021, 61 (9) , 4245-4258. https://doi.org/10.1021/acs.jcim.1c00803
  65. Luis Cesar de Azevedo, Gabriel A. Pinheiro, Marcos G. Quiles, Juarez L. F. Da Silva, Ronaldo C. Prati. Systematic Investigation of Error Distribution in Machine Learning Algorithms Applied to the Quantum-Chemistry QM9 Data Set Using the Bias and Variance Decomposition. Journal of Chemical Information and Modeling 2021, 61 (9) , 4210-4223. https://doi.org/10.1021/acs.jcim.1c00503
  66. Ava P. Soleimany, Alexander Amini, Samuel Goldman, Daniela Rus, Sangeeta N. Bhatia, Connor W. Coley. Evidential Deep Learning for Guided Molecular Property Prediction and Discovery. ACS Central Science 2021, 7 (8) , 1356-1367. https://doi.org/10.1021/acscentsci.1c00546
  67. Julia Westermayr, Philipp Marquetand. Machine Learning for Electronically Excited States of Molecules. Chemical Reviews 2021, 121 (16) , 9873-9926. https://doi.org/10.1021/acs.chemrev.0c00749
  68. Oliver T. Unke, Stefan Chmiela, Huziel E. Sauceda, Michael Gastegger, Igor Poltavsky, Kristof T. Schütt, Alexandre Tkatchenko, Klaus-Robert Müller. Machine Learning Force Fields. Chemical Reviews 2021, 121 (16) , 10142-10186. https://doi.org/10.1021/acs.chemrev.0c01111
  69. Aditya Nandy, Chenru Duan, Michael G. Taylor, Fang Liu, Adam H. Steeves, Heather J. Kulik. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chemical Reviews 2021, 121 (16) , 9927-10000. https://doi.org/10.1021/acs.chemrev.1c00347
  70. Bing Huang, O. Anatole von Lilienfeld. Ab Initio Machine Learning in Chemical Compound Space. Chemical Reviews 2021, 121 (16) , 10001-10036. https://doi.org/10.1021/acs.chemrev.0c01303
  71. Michael Tynes, Wenhao Gao, Daniel J. Burrill, Enrique R. Batista, Danny Perez, Ping Yang, Nicholas Lubbers. Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search. Journal of Chemical Information and Modeling 2021, 61 (8) , 3846-3857. https://doi.org/10.1021/acs.jcim.1c00670
  72. Luis Itza Vazquez-Salazar, Eric D. Boittier, Oliver T. Unke, Markus Meuwly. Impact of the Characteristics of Quantum Chemical Databases on Machine Learning Prediction of Tautomerization Energies. Journal of Chemical Theory and Computation 2021, 17 (8) , 4769-4785. https://doi.org/10.1021/acs.jctc.1c00363
  73. Shachar Fite, Omri Nitecki, Zeev Gross. Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor. Journal of Chemical Information and Modeling 2021, 61 (7) , 3285-3291. https://doi.org/10.1021/acs.jcim.1c00563
  74. Logan Ward, Naveen Dandu, Ben Blaiszik, Badri Narayanan, Rajeev S. Assary, Paul C. Redfern, Ian Foster, Larry A. Curtiss. Graph-Based Approaches for Predicting Solvation Energy in Multiple Solvents: Open Datasets and Machine Learning Models. The Journal of Physical Chemistry A 2021, 125 (27) , 5990-5998. https://doi.org/10.1021/acs.jpca.1c01960
  75. SahaIshikaGraduate Student ResearcherHarranPatrick G.D.J. & J.M. Cram Chair in Organic ChemistryDr. Jonathan Bohmann, Department of Pharmaceuticals and Bioengineering, Southwest Research Institute, Ryan Gumpper, Postdoctoral Researcher, University of North Carolina at Chapel Hill. Virtual Screening for Chemists. 2021https://doi.org/10.1021/acsinfocus.7e5001
  76. Alan E. Bilsland, Kirsten McAulay, Ryan West, Angelo Pugliese, Justin Bower. Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction. Journal of Chemical Information and Modeling 2021, 61 (6) , 2547-2559. https://doi.org/10.1021/acs.jcim.0c01226
  77. Maarten R. Dobbelaere, Pieter P. Plehiers, Ruben Van de Vijver, Christian V. Stevens, Kevin M. Van Geem. Learning Molecular Representations for Thermochemistry Prediction of Cyclic Hydrocarbons and Oxygenates. The Journal of Physical Chemistry A 2021, 125 (23) , 5166-5179. https://doi.org/10.1021/acs.jpca.1c01956
  78. Guo-Li Xiong, Yue Zhao, Lu Liu, Zhong-Ye Ma, Ai-Ping Lu, Yan Cheng, Ting-Jun Hou, Dong-Sheng Cao. Computational Bioactivity Fingerprint Similarities To Navigate the Discovery of Novel Scaffolds. Journal of Medicinal Chemistry 2021, 64 (11) , 7544-7554. https://doi.org/10.1021/acs.jmedchem.1c00234
  79. R. Han, S. Luber. Fast Estimation of Møller–Plesset Correlation Energies Based on Atomic Contributions. The Journal of Physical Chemistry Letters 2021, 12 (22) , 5324-5331. https://doi.org/10.1021/acs.jpclett.1c00900
  80. Ricardo M. Borges, Sean M. Colby, Susanta Das, Arthur S. Edison, Oliver Fiehn, Tobias Kind, Jesi Lee, Amy T. Merrill, Kenneth M. Merz, Jr., Thomas O. Metz, Jamie R. Nunez, Dean J. Tantillo, Lee-Ping Wang, Shunyang Wang, Ryan S. Renslow. Quantum Chemistry Calculations for Metabolomics. Chemical Reviews 2021, 121 (10) , 5633-5670. https://doi.org/10.1021/acs.chemrev.0c00901
  81. Felix Mayr, Alessio Gagliardi. Global Property Prediction: A Benchmark Study on Open-Source, Perovskite-like Datasets. ACS Omega 2021, 6 (19) , 12722-12732. https://doi.org/10.1021/acsomega.1c00991
  82. Pingshi Yu, Alistair J. Sterling, Jotun Hein. A Novel Automated Screening Method for Combinatorially Generated Small Molecules. Journal of Chemical Information and Modeling 2021, 61 (4) , 1637-1646. https://doi.org/10.1021/acs.jcim.0c01462
  83. Andrzej M. Żurański, Jesus I. Martinez Alvarado, Benjamin J. Shields, Abigail G. Doyle. Predicting Reaction Yields via Supervised Learning. Accounts of Chemical Research 2021, 54 (8) , 1856-1865. https://doi.org/10.1021/acs.accounts.0c00770
  84. Juliette Zito, Ivan Infante. The Future of Ligand Engineering in Colloidal Semiconductor Nanocrystals. Accounts of Chemical Research 2021, 54 (7) , 1555-1564. https://doi.org/10.1021/acs.accounts.0c00765
  85. Jianing Lu, Song Xia, Jieyu Lu, Yingkai Zhang. Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning. Journal of Chemical Information and Modeling 2021, 61 (3) , 1095-1104. https://doi.org/10.1021/acs.jcim.1c00007
  86. Dakota L. Folmsbee, David R. Koes, Geoffrey R. Hutchison. Evaluation of Thermochemical Machine Learning for Potential Energy Curves and Geometry Optimization. The Journal of Physical Chemistry A 2021, 125 (9) , 1987-1993. https://doi.org/10.1021/acs.jpca.0c10147
  87. Felicity F. Nielson, Sean M. Colby, Dennis G. Thomas, Ryan S. Renslow, Thomas O. Metz. Exploring the Impacts of Conformer Selection Methods on Ion Mobility Collision Cross Section Predictions. Analytical Chemistry 2021, 93 (8) , 3830-3838. https://doi.org/10.1021/acs.analchem.0c04341
  88. Pablo A. Unzueta, Chandler S. Greenwell, Gregory J. O. Beran. Predicting Density Functional Theory-Quality Nuclear Magnetic Resonance Chemical Shifts via Δ-Machine Learning. Journal of Chemical Theory and Computation 2021, 17 (2) , 826-840. https://doi.org/10.1021/acs.jctc.0c00979
  89. Jon Paul Janet, Chenru Duan, Aditya Nandy, Fang Liu, Heather J. Kulik. Navigating Transition-Metal Chemical Space: Artificial Intelligence for First-Principles Design. Accounts of Chemical Research 2021, 54 (3) , 532-545. https://doi.org/10.1021/acs.accounts.0c00686
  90. Yao Shi, Paloma L. Prieto, Tara Zepel, Shad Grunert, Jason E. Hein. Automated Experimentation Powers Data Science in Chemistry. Accounts of Chemical Research 2021, 54 (3) , 546-555. https://doi.org/10.1021/acs.accounts.0c00736
  91. Joydeep Munshi, Wei Chen, TeYu Chien, Ganesh Balasubramanian. Transfer Learned Designer Polymers For Organic Solar Cells. Journal of Chemical Information and Modeling 2021, 61 (1) , 134-142. https://doi.org/10.1021/acs.jcim.0c01157
  92. Wenhao Gao, Connor W. Coley. The Synthesizability of Molecules Proposed by Generative Models. Journal of Chemical Information and Modeling 2020, 60 (12) , 5714-5723. https://doi.org/10.1021/acs.jcim.0c00174
  93. Obaidur Rahaman, Alessio Gagliardi. Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints. Journal of Chemical Information and Modeling 2020, 60 (12) , 5971-5983. https://doi.org/10.1021/acs.jcim.0c00687
  94. Beomchang Kang, Chaok Seok, Juyong Lee. Prediction of Molecular Electronic Transitions Using Random Forests. Journal of Chemical Information and Modeling 2020, 60 (12) , 5984-5994. https://doi.org/10.1021/acs.jcim.0c00698
  95. Maho Nakata, Tomomi Shimazaki, Masatomo Hashimoto, Toshiyuki Maeda. PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties. Journal of Chemical Information and Modeling 2020, 60 (12) , 5891-5899. https://doi.org/10.1021/acs.jcim.0c00740
  96. David Balcells, Bastian Bjerkem Skjelstad. tmQM Dataset—Quantum Geometries and Properties of 86k Transition Metal Complexes. Journal of Chemical Information and Modeling 2020, 60 (12) , 6135-6146. https://doi.org/10.1021/acs.jcim.0c01041
  97. Sebastian Mosbach, Angiras Menon, Feroz Farazi, Nenad Krdzavac, Xiaochi Zhou, Jethro Akroyd, Markus Kraft. Multiscale Cross-Domain Thermochemical Knowledge-Graph. Journal of Chemical Information and Modeling 2020, 60 (12) , 6155-6166. https://doi.org/10.1021/acs.jcim.0c01145
  98. Marina P. Oliveira, Maurice Andrey, Salomé R. Rieder, Leyla Kern, David F. Hahn, Sereina Riniker, Bruno A. C. Horta, Philippe H. Hünenberger. Systematic Optimization of a Fragment-Based Force Field against Experimental Pure-Liquid Properties Considering Large Compound Families: Application to Saturated Haloalkanes. Journal of Chemical Theory and Computation 2020, 16 (12) , 7525-7555. https://doi.org/10.1021/acs.jctc.0c00683
  99. Narasimharao Mukku, Prabhakara Madivalappa Davanagere, Kaushik Chanda, Barnali Maiti. A Facile Microwave-Assisted Synthesis of Oxazoles and Diastereoselective Oxazolines Using Aryl-Aldehydes, p-Toluenesulfonylmethyl Isocyanide under Controlled Basic Conditions. ACS Omega 2020, 5 (43) , 28239-28248. https://doi.org/10.1021/acsomega.0c04130
  100. B. Christopher Rinderspacher. Heuristic Global Optimization in Chemical Compound Space. The Journal of Physical Chemistry A 2020, 124 (43) , 9044-9060. https://doi.org/10.1021/acs.jpca.0c05941
Load more citations
  • Abstract

    Figure 1

    Figure 1. Enumeration of GDB-17 starting from mathematical graphs.

    Figure 2

    Figure 2. Size and MW profiles of the enumerated chemical space in GDB and the reference databases PubChem, ChEMBL, and DrugBank. The size of the leadlike subsets of GDB (GDBLL, GDBLLnoSR) is extrapolated from analyzing a 1% random subset of GDB-17.

    Figure 3

    Figure 3. Drugs and examples of isomers found in GDB-17. All isomers shown have a shape similarity score ROCS > 1.4. None of the isomers shown are known (Scifinder search). Only acyclovir does not occur in GDB-17 because it contains a hemiaminal (N–Csp3–O), a functional group which is excluded from the enumeration.

    Figure 4

    Figure 4. Molecule topologies and categories in GDB-17 and reference databases. A. Percentage of reference database compatible with GDB-17 enumeration rules or excluded due to nonenumerated halogen (acyl halide, aliphatic halocarbons) or sulfur (thiols, thioethers), functional groups (acyclic acetals, hemiacetals, aminals, azides, aliphatic nitro groups), element (P, Si, B, Bi, Hg, etc.), skeleton (nonaromatic C═C), or graph (e.g., small rings at 17 atoms). B. Fraction of compounds with small rings. C. Topologies D. Database contents as function of molecular categories. Molecules are assigned to one category only with priority order heteroaromatic > aromatic > heterocyclic > carbocyclic > acyclic. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.

    Figure 5

    Figure 5. Polarity features. A. c logP histogram in intervals −5.5 to −4.5, −4.5 to −3.5, etc; B. Average clogP as function of hac; C. H-bond donor atom (HBD) histogram; D. Average HBD as function of hac. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.

    Figure 6

    Figure 6. Molecular shape analyzed by the principal moments of inertia. (41) Occupancy maps are shown in the (P1,P2)-plane, in which P1 and P2 are the normalized ratios of the principal moments of inertia (for details see section Methods), and are colored from blue (1 cpd/pixel) to purple (maximum cpd/pixel for each map: GDB-17: 4,691, GDBLL-17: 889, GDBLLnoSR-17: 684, Pubchem-17: 6202, Chembl-17: 487, Drugbank-17: 4). The inserts show an enlarged view of the lower left edge of each triangle where occupancy is highest for PubChem-17, ChEMBL-17, and DrugBank-17. The GDB-17, GDBLL-17, and GDBLLnoSR-17 were analyzed with a random subset of 16.7 million molecules from GDB-17. For all compounds a single stereoisomer was analyzed as generated by CORINA.

    Figure 7

    Figure 7. Histograms of quaternary centers (qv) and bonds in fused rings (bfr) in the different databases. The data for GDB-17 and its subsets were computed from a 1% random subset of the database.

    Figure 8

    Figure 8. Stereochemistry. A Numbers of stereoisomers per compounds. B. Average number of stereoisomer per compound as a function of hac. Stereoisomers were generated from SMILES using CORINA. The data for GDB-17, GDBLL-17, and GDBLLnoSR-17 stem from the analysis of a random 16.7 million subset of GDB-17.

    Figure 9

    Figure 9. Examples of yet unknown C17-ring systems from GDB-17. These hydrocarbons do not give any hits in Scifinder using ″any atom″ types for carbons and ″any bond″ for bonds, including substructure searches but locking further ring fusions. Stereochemistry is not considered in these searches. The ring systems are shown as one possible stereoisomer.

  • References

    ARTICLE SECTIONS
    Jump To

    This article references 47 other publications.

    1. 1
      Lipkus, A. H.; Yuan, Q.; Lucas, K. A.; Funk, S. A.; Bartelt, W. F.; Schenck, R. J.; Trippe, A. J. Structural diversity of organic chemistry. A scaffold analysis of the CAS Registry J. Org. Chem. 2008, 73, 4443 4451
    2. 2
      ACS NEWS Chem. Eng. News 2011, 89, 38
    3. 3
      Bleicher, K. H.; Bohm, H. J.; Muller, K.; Alanine, A. I. Hit and lead generation: Beyond high-throughput screening Nat. Rev. Drug Discovery 2003, 2, 369 378
    4. 4
      Schreiber, S. L. Small molecules: the missing link in the central dogma Nat. Chem. Biol. 2005, 1, 64 66
    5. 5
      Mayr, L. M.; Bojanic, D. Novel trends in high-throughput screening Curr. Opin. Pharmacol. 2009, 9, 580 588
    6. 6
      Renner, S.; Popov, M.; Schuffenhauer, A.; Roth, H. J.; Breitenstein, W.; Marzinzik, A.; Lewis, I.; Krastel, P.; Nigsch, F.; Jenkins, J.; Jacoby, E. Recent trends and observations in the design of high-quality screening collections Future Med. Chem 2011, 3, 751 766
    7. 7
      Kola, I.; Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discovery 2004, 3, 711 715
    8. 8
      Hann, M. M. Molecular obesity, potency and other addictions in drug discovery MedChemComm 2011, 2, 349 355
    9. 9
      Schneider, G.; Fechner, U. Computer-based de novo design of drug-like molecules Nat. Rev. Drug Discovery 2005, 4, 649 663
    10. 10
      Jorgensen, W. L. Efficient drug lead discovery and optimization Acc. Chem. Res. 2009, 42, 724 733
    11. 11
      Reymond, J. L.; Van Deursen, R.; Blum, L. C.; Ruddigkeit, L. Chemical space as a source for new drugs MedChemComm 2010, 1, 30 38
    12. 12
      Hartenfeller, M.; Schneider, G. De novo drug design Methods Mol. Biol. 2011, 672, 299 323
    13. 13
      Klebe, G. Virtual ligand screening: strategies, perspectives and limitations Drug Discovery Today 2006, 11, 580 594
    14. 14
      Kolb, P.; Ferreira, R. S.; Irwin, J. J.; Shoichet, B. K. Docking and chemoinformatic screens for new ligands and targets Curr. Opin. Biotechnol. 2009, 20, 429 36
    15. 15
      Geppert, H.; Vogt, M.; Bajorath, J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation J. Chem. Inf. Model. 2010, 50, 205 216
    16. 16
      Cayley, E. Ueber die analytischen Figuren, welche in der Mathematik Bäume genannt werden und ihre Anwendung auf die Theorie chemischer Verbindungen Chem. Ber. 1875, 8, 1056 1059
    17. 17
      Lederberg, J.; Sutherland, G. L.; Buchanan, B. G.; Feigenbaum, E. A.; Robertson, A. V.; Duffield, A. M.; Djerassi, C. Applications of artificial intelligence for chemical inference. I. Number of possible organic compounds. Acyclic structures containing carbon, hydrogen, oxygen, and nitrogen J. Am. Chem. Soc. 1969, 91, 2973 2976
    18. 18
      Steinbeck, C. Recent developments in automated structure elucidation of natural products Nat. Prod. Rep. 2004, 21, 512 518
    19. 19
      Reymond, J. L.; Ruddigkeit, L.; Blum, L. C.; Van Deursen, R. The enumeration of chemical space Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 717 733
    20. 20
      Fink, T.; Bruggesser, H.; Reymond, J. L. Virtual exploration of the small-molecule chemical universe below 160 Da Angew. Chem., Int. Ed. Engl. 2005, 44, 1504 1508
    21. 21
      Fink, T.; Reymond, J. L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery J. Chem. Inf. Model. 2007, 47, 342 353
    22. 22
      Blum, L. C.; Reymond, J. L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 J. Am. Chem. Soc. 2009, 131, 8732 8733
    23. 23
      Blum, L. C.; van Deursen, R.; Reymond, J. L. Visualisation and subsets of the chemical universe database GDB-13 for virtual screening J. Comput.-Aided Mol. Des. 2011, 25, 637 647
    24. 24
      Nguyen, K. T.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Discovery of NMDA glycine site inhibitors from the chemical universe database GDB ChemMedChem 2008, 3, 1520 1524
    25. 25
      Nguyen, K. T.; Luethi, E.; Syed, S.; Urwyler, S.; Bertrand, S.; Bertrand, D.; Reymond, J. L. 3-(aminomethyl)piperazine-2,5-dione as a novel NMDA glycine site inhibitor from the chemical universe database GDB Bioorg. Med. Chem. Lett. 2009, 19, 3832 3835
    26. 26
      Garcia-Delgado, N.; Bertrand, S.; Nguyen, K. T.; van Deursen, R.; Bertrand, D.; Reymond, J.-L. Exploring a7-nicotinic receptor ligand diversity by scaffold enumeration from the Chemical Universe Database GDB ACS Med. Chem. Lett. 2010, 1, 422 426
    27. 27
      Luethi, E.; Nguyen, K. T.; Burzle, M.; Blum, L. C.; Suzuki, Y.; Hediger, M.; Reymond, J. L. Identification of selective norbornane-type aspartate analogue inhibitors of the glutamate transporter 1 (GLT-1) from the chemical universe generated database (GDB) J. Med. Chem. 2010, 53, 7236 7250
    28. 28
      Blum, L. C.; van Deursen, R.; Bertrand, S.; Mayer, M.; Burgi, J. J.; Bertrand, D.; Reymond, J. L. Discovery of alpha7-nicotinic receptor ligands by virtual screening of the Chemical Universe Database GDB-13 J. Chem. Inf. Model. 2011, 51, 3105 3112
    29. 29
      Brethous, L.; Garcia-Delgado, N.; Schwartz, J.; Bertrand, S.; Bertrand, D.; Reymond, J. L. Synthesis and nicotinic receptor activity of chemical space analogues of N-(3R)-1-azabicyclo[2.2.2]oct-3-yl-4-chlorobenzamide (PNU-282,987) and 1,4-diazabicyclo[3.2.2]nonane-4-carboxylic acid 4-bromophenyl ester (SSR180711) J. Med. Chem. 2012, 55, 4605 4618
    30. 30
      Reymond, J. L.; Awale, M. Exploring chemical space for drug discovery using the Chemical Universe Database ACS Chem. Neurosci. 2012, 3, 649 657
    31. 31
      Foloppe, N. The benefits of constructing leads from fragment hits Future Med. Chem. 2011, 3, 1111 1115
    32. 32
      Teague, S. J.; Davis, A. M.; Leeson, P. D.; Oprea, T. The design of leadlike combinatorial libraries Angew. Chem., Int. Ed. Engl. 1999, 38, 3743 3748
    33. 33
      Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Res. 2009, 37, W623 W633
    34. 34
      Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery Nucleic Acids Res. 2012, 40, D1100 D1107
    35. 35
      Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S. DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs Nucleic Acids Res. 2011, 39, D1035 D1041
    36. 36
      McKay, B. D. Practical graph isomorphism Congressus Numerantium 1981, 30, 45 87
    37. 37
      Rishton, G. M. Reactive compounds and in vitro false positives in HTS Drug Discovery Today 1997, 2, 382 384
    38. 38
      Rishton, G. M. Nonleadlikeness and leadlikeness in biochemical screening Drug Discovery Today 2003, 8, 86 96
    39. 39
      Rush, T. S., III; Grant, J. A.; Mosyak, L.; Nicholls, A. A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction J. Med. Chem. 2005, 48, 1489 1495
    40. 40
      Nicholls, A.; McGaughey, G. B.; Sheridan, R. P.; Good, A. C.; Warren, G.; Mathieu, M.; Muchmore, S. W.; Brown, S. P.; Grant, J. A.; Haigh, J. A.; Nevins, N.; Jain, A. N.; Kelley, B. Molecular shape and medicinal chemistry: a perspective J. Med. Chem. 2010, 53, 3862 3886
    41. 41
      Sauer, W. H.; Schwarz, M. K. Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity J. Chem. Inf. Comput. Sci. 2003, 43, 987 1003
    42. 42
      Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: increasing saturation as an approach to improving clinical success J. Med. Chem. 2009, 52, 6752 6756
    43. 43
      Ritchie, T. J.; Macdonald, S. J.; Young, R. J.; Pickett, S. D. The impact of aromatic ring count on compound developability: further insights by examining carbo- and hetero-aromatic and -aliphatic ring types Drug Discovery Today 2011, 16, 164 171
    44. 44
      Clemons, P. A.; Bodycombe, N. E.; Carrinski, H. A.; Wilson, J. A.; Shamji, A. F.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 18787 18792
    45. 45
      Clemons, P. A.; Wilson, J. A.; Dancik, V.; Muller, S.; Carrinski, H. A.; Wagner, B. K.; Koehler, A. N.; Schreiber, S. L. Quantifying structure and performance diversity for sets of small molecules comprising small-molecule screening collections Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 6817 6822
    46. 46
      Sadowski, J.; Gasteiger, J. From atoms and bonds to 3-dimensional atomic coordinates - automatic model builders Chem. Rev. 1993, 93, 2567 2581
    47. 47
      Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks J. Med. Chem. 1996, 39, 2887 2893

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

You’ve supercharged your research process with ACS and Mendeley!

STEP 1:
Click to create an ACS ID

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

MENDELEY PAIRING EXPIRED
Your Mendeley pairing has expired. Please reconnect