Systematic Comparison of Experimental Crystallographic Geometries and Gas-Phase Computed Conformers for Torsion PreferencesClick to copy article linkArticle link copied!
- Dakota L. FolmsbeeDakota L. FolmsbeeDepartment of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United StatesDepartment of Anesthesiology & Perioperative Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United StatesMore by Dakota L. Folmsbee
- David R. KoesDavid R. KoesDepartment of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United StatesMore by David R. Koes
- Geoffrey R. Hutchison*Geoffrey R. Hutchison*Email: [email protected]Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United StatesDepartment of Chemical & Petroleum Engineering, University of Pittsburgh, 3700 O’Hara Street, Pittsburgh, Pennsylvania 15261, United StatesMore by Geoffrey R. Hutchison
Abstract
We performed exhaustive torsion sampling on more than 3 million compounds using the GFN2-xTB method and performed a comparison of experimental crystallographic and gas-phase conformers. Many conformer sampling methods derive torsional angle distributions from experimental crystallographic data, limiting the torsion preferences to molecules that must be stable, synthetically accessible, and able to be crystallized. In this work, we evaluate the differences in torsional preferences of experimental crystallographic geometries and gas-phase computed conformers from a broad selection of compounds to determine whether torsional angle distributions obtained from semiempirical methods are suitable priors for conformer sampling. We find that differences in torsion preferences can be mostly attributed to a lack of available experimental crystallographic data with small deviations derived from gas-phase geometry differences. GFN2 demonstrates the ability to provide accurate and reliable torsional preferences that can provide a basis for new methods free from the limitations of experimental data collection. We provide Gaussian-based fits and sampling distributions suitable for torsion sampling and propose an alternative to the widely used “experimental torsion and knowledge distance geometry” (ETKDG) method using quantum torsion-derived distance geometry (QTDG) methods.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Introduction
Methods
Results and Discussion
Comparing Overall Geometries
Figure 1
Figure 1. Comparison of the smallest non-hydrogen RMSD between experimental crystallographic geometry and CREST or ETKDG conformers for the (left) Crystallographic Open Database (COD) and (right) Platinum Diverse data set. (c–f) Captions indicate best-fit linear regression in Å. Note that for both data sets, CREST produces smaller RMSD for molecules with few rotatable bonds but a larger slope indicating generally worse RMSD for larger compounds with more rotatable bonds.
Figure 2
Figure 2. Calculated radius of gyration for the lowest-energy CREST conformer and experimental geometries as a function of the number of rotatable bonds for the (a) Crystallographic Open Database (COD) and (b) Platinum data sets and scatterplot of compounds from (c) COD and (d) Platinum sets, comparing the radius of gyration from the CREST/GFN2 lowest-energy conformation with that of the experimental crystal structure geometry. Dashed line indicates a 1:1 correspondence, with approximate bounds indicated by solid lines.
Individual Torsion Preferences
Figure 3
Figure 3. Correlation between experimental and gas-phase torsions across the COD data set for (a) acyclic patterns and (b) ring patterns.
Figure 4
Figure 4. Histograms for pattern 229, including COD experimental torsional angles, gas-phase lowest-energy conformers from the same COD molecules, and gas-phase lowest-energy conformers across the entire data set, indicating the strong correlations and that the increased quantity of data greatly refines the histograms.
Figure 5
Figure 5. Histograms for pattern 307, including COD experimental torsional angles, gas-phase lowest-energy conformers from the same COD molecules, and gas-phase lowest-energy conformers across the entire data set, again indicating how the increased quantity of data greatly refines the torsional preferences (e.g., peaks near 50, 90, and 132°).
Figure 6
Figure 6. Histograms for COD experimental torsional angles, gas-phase lowest-energy conformers from the same COD molecules, and conformers across the entire data set, again indicating that the increased quantity of data greatly refines the torsional preferences (e.g., a strong peak at 60°).
Comparisons between GFN2- and DFT-Optimized Geometries
Figure 7
Figure 7. Differences in torsional preferences for experimental and gas-phase geometries.
Figure 8
Figure 8. Example of compound matching torsion pattern 270 with steric constraint forcing an angle of ∼90°. Figure from Avogadro2. (67,68)
Fitting and Sampling Torsion Angles
Figure 9
Figure 9. Histograms of correlation between (a,d) ETKDG, (b,e) cosine fits, or (c,f) Gaussian fits and derived torsional histograms for (a–c) acyclic and (d–f) ring torsional patterns.
method | median acyclic r2 | median ring r2 |
---|---|---|
ETKDG | 0.26 | 0.03 |
cosine fits | 0.71 | 0.73 |
Gaussian fits | 0.91 | 0.93 |
Conclusions
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c01278.
Histograms of the rotatable bonds and conformers in the COD set, comparisons of RMSD and radius of gyration for the COD and Platinum sets, and histograms of torsional angle deviations for GFN2-optimized and ωB97X-D3/def2-SVP-optimized geometries (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
We acknowledge the National Science Foundation (CHE-1800435 and CHE-2102474) for support and the University of Pittsburgh Center for Research Computing for the resources provided. Specifically, this work used the H2P cluster, which is supported by NSF award OAC-2117681, as well as resources provided by the Open Science Grid, (72,73) which is supported by NSF award OAC-2030508 and the U.S. Department of Energy’s Office of Science. We also thank Sereina Riniker and Greg Landrum for helpful discussions.
References
This article references 73 other publications.
- 1Hawkins, P. C. D. Conformation Generation: The State of the Art. J. Chem. Inf. Model. 2017, 57, 1747– 1756, DOI: 10.1021/acs.jcim.7b00221Google Scholar1Conformation Generation: The State of the ArtHawkins, Paul C. D.Journal of Chemical Information and Modeling (2017), 57 (8), 1747-1756CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The generation of conformations for small mols. is a problem of continuing interest in cheminformatics and computational drug discovery. This review will present an overview of methods used to sample conformational space, focusing on those methods designed for org. mols. commonly of interest in drug discovery. Different approaches to both the sampling of conformational space and the scoring of conformational stability will be compared and contrasted, with an emphasis on those methods suitable for conformer sampling of large nos. of drug-like mols. Particular attention will be devoted to the appropriate utilization of information from exptl. solid-state structures in validating and evaluating the performance of these tools. The review will conclude with some areas worthy of further investigation.
- 2Friedrich, N.-O.; de Bruyn Kops, C.; Flachsenberg, F.; Sommer, K.; Rarey, M.; Kirchmair, J. Benchmarking Commercial Conformer Ensemble Generators. J. Chem. Inf. Model. 2017, 57, 2719– 2728, DOI: 10.1021/acs.jcim.7b00505Google Scholar2Benchmarking Commercial Conformer Ensemble GeneratorsFriedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, JohannesJournal of Chemical Information and Modeling (2017), 57 (11), 2719-2728CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)We assess and compare the performance of eight com. conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extd. from the PDB. Differences in the performance of com. algorithms are much smaller than those obsd. for free algorithms in our previous study. For com. algorithms, the median min. root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a max. of 250 conformers are between 0.46 and 0.61 Å. Com. conformer ensemble generators are characterized by their high robustness, with at least 99% of all input mols. successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked com. algorithms. Based on a statistical anal., we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.
- 3Friedrich, N.-O.; Meyder, A.; de Bruyn Kops, C.; Sommer, K.; Flachsenberg, F.; Rarey, M.; Kirchmair, J. High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators. J. Chem. Inf. Model. 2017, 57, 529– 539, DOI: 10.1021/acs.jcim.6b00613Google Scholar3High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble GeneratorsFriedrich, Nils-Ole; Meyder, Agnes; de Bruyn Kops, Christina; Sommer, Kai; Flachsenberg, Florian; Rarey, Matthias; Kirchmair, JohannesJournal of Chemical Information and Modeling (2017), 57 (3), 529-539CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The authors developed a cheminformatics pipeline for the fully automated selection and extn. of high-quality protein-bound ligand conformations from x-ray structural data. The pipeline evaluates the validity and accuracy of the 3D structures of small mols. according to multiple criteria, including their fit to the electron d. and their physicochem. and structural properties. Using this approach, the authors compiled two high-quality datasets from the Protein Data Bank (PDB): a comprehensive dataset and a diversified subset of 4626 and 2912 structures, resp. The datasets were applied to benchmarking seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit std. conformer ensemble generator, the Exptl.-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK. Substantial differences in the performance of the individual algorithms were obsd., with RDKit and ETKDG generally achieving a favorable balance of accuracy, ensemble size and runtime. The platinum datasets are available for download from http://www.zbh.uni-hamburg.de/platinum_dataset.
- 4Ebejer, J.-P.; Morris, G. M.; Deane, C. M. Freely Available Conformer Generation Methods: How Good Are They?. J. Chem. Inf. Model. 2012, 52, 1146– 1158, DOI: 10.1021/ci2004658Google Scholar4Freely Available Conformer Generation Methods: How Good Are They?Ebejer, Jean-Paul; Morris, Garrett M.; Deane, Charlotte M.Journal of Chemical Information and Modeling (2012), 52 (5), 1146-1158CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. Conformer generation has important implications in cheminformatics, particularly in computational drug discovery where the quality of conformer generation software may affect the outcome of a virtual screening exercise. We examine the performance of four freely available small mol. conformer generation tools (Balloon, Confab, Frog2, and RDKit) alongside a com. tool (MOE). The aim of this study is 3-fold: (i) to identify which tools most accurately reproduce exptl. detd. structures; (ii) to examine the diversity of the generated conformational set; and (iii) to benchmark the computational time expended. These aspects were tested using a set of 708 drug-like mols. assembled from the OMEGA validation set and the Astex Diverse Set. These mols. have varying physicochem. properties and at least one known X-ray crystal structure. We found that RDKit and Confab are statistically better than other methods at generating low rmsd conformers to the known structure. RDKit is particularly suited for less flexible mols. while Confab, with its systematic approach, is able to generate conformers which are geometrically closer to the exptl. detd. structure for mols. with a large no. of rotatable bonds (≥10). In our tests RDKit also resulted as the second fastest method after Frog2. In order to enhance the performance of RDKit, we developed a postprocessing algorithm to build a diverse and representative set of conformers which also contains a close conformer to the known structure. Our anal. indicates that, with postprocessing, RDKit is a valid free alternative to com., proprietary software.
- 5Penner, P.; Guba, W.; Schmidt, R.; Meyder, A.; Stahl, M.; Rarey, M. The Torsion Library: Semiautomated Improvement of Torsion Rules with SMARTScompare. J. Chem. Inf. Model. 2022, 62, 1644– 1653, DOI: 10.1021/acs.jcim.2c00043Google Scholar5The Torsion Library: Semiautomated Improvement of Torsion Rules with SMARTScomparePenner, Patrick; Guba, Wolfgang; Schmidt, Robert; Meyder, Agnes; Stahl, Martin; Rarey, MatthiasJournal of Chemical Information and Modeling (2022), 62 (7), 1644-1653CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The Torsion Library is a collection of torsion motifs assocd. with angle distributions, derived from crystallog. databases. It is used in strain assessment, conformer generation, and geometry optimization. A hierarchical structure of expert curated SMARTS defines the chem. environments of rotatable bonds and assocs. these with preferred angles. SMARTS can be very complex and full of implications, which make them difficult to maintain manually. Recent developments in automatically comparing SMARTS patterns can be applied to the Torsion Library to ensure its correctness. We specifically discuss the implementation and the limits of such a procedure in the context of torsion motifs and show several examples of how the Torsion Library benefits from this. All automated changes are validated manually and then shown to have an effect on the angle distributions by correcting matching behavior. The cor. Torsion Library itself is available including both PDB as well as CSD histograms in the Supporting Information and can be used to evaluate rotatable bonds at https://torsions.zbh.uni-hamburg.de.
- 6Wang, S.; Witek, J.; Landrum, G. A.; Riniker, S. Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional-Angle Preferences. J. Chem. Inf. Model. 2020, 60, 2044– 2058, DOI: 10.1021/acs.jcim.0c00025Google Scholar6Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional-Angle PreferencesWang, Shuzhe; Witek, Jagna; Landrum, Gregory A.; Riniker, SereinaJournal of Chemical Information and Modeling (2020), 60 (4), 2044-2058CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The conformer generator ETKDG is a stochastic search method that utilizes distance geometry together with knowledge derived from exptl. crystal structures. It has been shown to generate good conformers for acyclic, flexible mols. This work builds on ETKDG to improve conformer generation of mols. contg. small or large aliph. (i.e., non-arom.) rings. For one, we devise addnl. torsional-angle potentials to describe small aliph. rings and adapt the previously developed potentials for acyclic bonds to facilitate the sampling of macrocycles. However, due to the larger no. of degrees of freedom of macrocycles, the conformational space to sample is much broader than for small mols., creating a challenge for conformer generators. We therefore introduce different heuristics to restrict the search space of macrocycles and bias the sampling toward more exptl. relevant structures. Specifically, we show the usage of elliptical geometry and customizable Coulombic interactions as heuristics. The performance of the improved ETKDG is demonstrated on test sets of diverse macrocycles and cyclic peptides. The code developed here will be incorporated into the 2020.03 release of the open-source cheminformatics library RDKit.
- 7Riniker, S.; Landrum, G. A. Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. J. Chem. Inf. Model. 2015, 55, 2562– 2574, DOI: 10.1021/acs.jcim.5b00654Google Scholar7Better Informed Distance Geometry: Using What We Know To Improve Conformation GenerationRiniker, Sereina; Landrum, Gregory A.Journal of Chemical Information and Modeling (2015), 55 (12), 2562-2574CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Small org. mols. are often flexible, i.e., they can adopt a variety of low-energy conformations in soln. that exist in equil. with each other. Two main search strategies are used to generate representative conformational ensembles for mols.: systematic and stochastic. In the first approach, each rotatable bond is sampled systematically in discrete intervals, limiting its use to mols. with a small no. of rotatable bonds. Stochastic methods, however, sample the conformational space of a mol. randomly and can thus be applied to more flexible mols. Different methods employ different degrees of exptl. data for conformer generation. So-called knowledge-based methods use predefined libraries of torsional angles and ring conformations. In the distance geometry approach, however, a smaller amt. of empirical information was used, i.e., ideal bond lengths, ideal bond angles, and a few ideal torsional angles. Distance geometry is a computationally fast method to generate conformers, but it has the downside that purely distance-based constraints tend to lead to distorted arom. rings and sp2 centers. To correct this, the resulting conformations are often minimized with a force field, adding computational complexity and run time. Here the authors present an alternative strategy that combines the distance geometry approach with exptl. torsion-angle preferences obtained from small-mol. crystallog. data. The torsional angles are described by a previously developed set of hierarchically structured SMARTS patterns. The new approach is implemented in the open-source cheminformatics library RDKit, and its performance is assessed by comparing the diversity of the generated ensemble and the ability to reproduce crystal conformations taken from the crystal structures of small mols. and protein-ligand complexes.
- 8Guba, W.; Meyder, A.; Rarey, M.; Hert, J. Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small Molecules. J. Chem. Inf. Model. 2016, 56, 1– 5, DOI: 10.1021/acs.jcim.5b00522Google Scholar8Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small MoleculesGuba, Wolfgang; Meyder, Agnes; Rarey, Matthias; Hert, JeromeJournal of Chemical Information and Modeling (2016), 56 (1), 1-5CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The Torsion Library contains hundreds of rules for small mol. conformations which have been derived from the Cambridge Structural Database (CSD) and are curated by mol. design experts. The torsion rules are encoded as SMARTS patterns and categorize rotatable bonds via a traffic light coloring scheme. We have systematically revised all torsion rules to better identify highly strained conformations and minimize the no. of false alerts for CSD small mol. X-ray structures. For this new release, we added or substantially modified 78 torsion patterns and reviewed all angles and tolerance intervals. The overall no. of red alerts for a filtered CSD data set with 130 000 structures was reduced by a factor of 4 compared to the predecessor. This is of clear advantage in 3D virtual screening where hits should only be removed by a conformational filter if they are in energetically inaccessible conformations.
- 9Gražulis, S.; Chateigner, D.; Downs, R. T.; Yokochi, A. F. T.; Quirós, M.; Lutterotti, L.; Manakova, E.; Butkus, J.; Moeck, P.; Le Bail, A. Crystallography Open Database – an open-access collection of crystal structures. J. Appl. Crystallogr. 2009, 42, 726– 729, DOI: 10.1107/S0021889809016690Google Scholar9Crystallography Open Database - an open-access collection of crystal structuresGrazulis, Saulius; Chateigner, Daniel; Downs, Robert T.; Yokochi, A. F. T.; Quiros, Miguel; Lutterotti, Luca; Manakova, Elena; Butkus, Justas; Moeck, Peter; Le Bail, ArmelJournal of Applied Crystallography (2009), 42 (4), 726-729CODEN: JACGAR; ISSN:0021-8898. (International Union of Crystallography)The Crystallog. Open Database (COD), which is a project that aims to gather all available inorg., metal-org. and small org. mol. structural data in one database, is described. The database adopts an open-access model. The COD currently contains ∼80,000 entries in crystallog. information file format, with nearly full coverage of the International Union of Crystallog. publications, and is growing in size and quality.
- 10Gražulis, S.; Daškevič, A.; Merkys, A.; Chateigner, D.; Lutterotti, L.; Quirós, M.; Serebryanaya, N. R.; Moeck, P.; Downs, R. T.; Le Bail, A. Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res. 2012, 40, D420– D427, DOI: 10.1093/nar/gkr900Google Scholar10Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaborationGrazulis, Saulius; Daskevic, Adriana; Merkys, Andrius; Chateigner, Daniel; Lutterotti, Luca; Quiros, Miguel; Serebryanaya, Nadezhda R.; Moeck, Peter; Downs, Robert T.; Le Bail, ArmelNucleic Acids Research (2012), 40 (D1), D420-D427CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Using an open-access distribution model, the Crystallog. Open Database (COD, http://www.crystallog.net) collects all known small mol. / small to medium sized unit cell' crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated ∼150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of std. open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge.
- 11Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge Structural Database. Acta Crystallogr. B 2016, 72, 171– 179, DOI: 10.1107/S2052520616003954Google Scholar11The Cambridge Structural DatabaseGroom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.Acta Crystallographica, Section B: Structural Science, Crystal Engineering and Materials (2016), 72 (2), 171-179CODEN: ACSBDA; ISSN:2052-5206. (International Union of Crystallography)The Cambridge Structural Database (CSD) contains a complete record of all published org. and metal-org. small-mol. crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chem. data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chem. editors prior to entering the database. A key component of this processing is the reliable assocn. of the chem. identity of the structure studied with the exptl. data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of addnl. exptl. data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of std. identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.
- 12Sadowski, P.; Baldi, P. Small-Molecule 3D Structure Prediction Using Open Crystallography Data. J. Chem. Inf. Model. 2013, 53, 3127– 3130, DOI: 10.1021/ci4005282Google Scholar12Small-Molecule 3D Structure Prediction Using Open Crystallography DataSadowski, Peter; Baldi, PierreJournal of Chemical Information and Modeling (2013), 53 (12), 3127-3130CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Predicting the 3D structures of small mols. is a common problem in chemoinformatics. Even the best methods are inaccurate for complex mols., and there is a large gap in accuracy between proprietary and free algorithms. Previous work presented COSMOS, a novel data-driven algorithm that uses knowledge of known structures from the Cambridge Structural Database and demonstrates performance that was competitive with proprietary algorithms. However, dependence on the Cambridge Structural Database prevented its widespread use. Here, we present an updated version of the COSMOS structure predictor, complete with a free structure library derived from open data sources. We demonstrate that COSMOS performs better than other freely available methods, with a mean RMSD of 1.16 and 1.68 Å for org. and metal-org. structures, resp., and a mean prediction time of 60 ms per mol. This is a 17% and 20% redn., resp., in RMSD compared to the free predictor provided by Open Babel, and it is 10 times faster. The ChemDB Web portal provides a COSMOS prediction Web server, as well as downloadable copies of the COSMOS executable and library of mol. substructures.
- 13Wicker, J. G. P.; Cooper, R. I. Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor. J. Chem. Inf. Model. 2016, 56, 2347– 2352, DOI: 10.1021/acs.jcim.6b00565Google Scholar13Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single DescriptorWicker, Jerome G. P.; Cooper, Richard I.Journal of Chemical Information and Modeling (2016), 56 (12), 2347-2352CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A new mol. descriptor, nConf20, based on chem. connectivity, is presented which captures the accessible conformational space of a mol. Currently the best available two-dimensional descriptors for quantifying the flexibility of a particular mol. are the rotatable bond count (RBC) and the Kier flexibility index. The authors present a descriptor which captures this information by sampling the conformational space of a mol. using the RDKit conformer generator. Flexibility has previously been identified as a key feature in detg. whether a mol. is likely to crystallize or not. For this application, nConf20 significantly outperforms previously reported single-variable classifiers and also assists rule-based anal. of black-box machine learning classification algorithms.
- 14Das, S.; Dinpazhoh, L.; Tanemura, K. A.; Merz, K. M. Rapid and Automated Ab Initio Metabolite Collisional Cross Section Prediction from SMILES Input. J. Chem. Inf. Model. 2023, 63, 4995– 5000, DOI: 10.1021/acs.jcim.3c00890Google Scholar14Rapid and Automated Ab Initio Metabolite Collisional Cross Section Prediction from SMILES InputDas, Susanta; Dinpazhoh, Laleh; Tanemura, Kiyoto Aramis; Merz Jr., Kenneth M.Journal of Chemical Information and Modeling (2023), 63 (16), 4995-5000CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)We implemented an ab initio CCS prediction workflow which incrementally refines generated structures using mol. mechanics, a deep learning potential, conformational clustering, and quantum mechanics (QM). Automating intermediate steps for a high performance computing (HPC) environment allows users to input the SMILES structure of small org. mols. and obtain a Boltzmann averaged collisional cross section (CCS) value as output. The CCS of a mol. species is a metric measured by ion mobility spectrometry (IMS) which can improve annotation of untargeted metabolomics expts. We report only a minor drop in accuracy when we expedite the CCS calcn. by replacing the QM geometry refinement step with a single-point energy calcn. Even though the workflow involves stochastic steps (i.e., conformation generation and clustering), the final CCS value was highly reproducible for multiple iterations on L-carnosine. Finally, we illustrate that the gas phase ensemble modeled for the workflow are intermediate files which can be used for the prediction of other properties such as aq. phase NMR chem. shift prediction.
- 15Das, S.; Tanemura, K. A.; Dinpazhoh, L.; Keng, M.; Schumm, C.; Leahy, L.; Asef, C. K.; Rainey, M.; Edison, A. S.; Fernández, F. M.; Merz, K. M. In Silico Collision Cross Section Calculations to Aid Metabolite Annotation. J. Am. Soc. Mass Spectrom. 2022, 33, 750– 759, DOI: 10.1021/jasms.1c00315Google Scholar15In Silico Collision Cross Section Calculations to Aid Metabolite AnnotationDas, Susanta; Tanemura, Kiyoto Aramis; Dinpazhoh, Laleh; Keng, Mithony; Schumm, Christina; Leahy, Lydia; Asef, Carter K.; Rainey, Markace; Edison, Arthur S.; Fernandez, Facundo M.; Merz Jr., Kenneth M.Journal of the American Society for Mass Spectrometry (2022), 33 (5), 750-759CODEN: JAMSEF; ISSN:1879-1123. (American Chemical Society)The interpretation of ion mobility coupled to mass spectrometry (IM-MS) data to predict unknown structures is challenging and depends on accurate theor. ests. of the mol. ion collision cross section (CCS) against a buffer gas in a low or atm. pressure drift chamber. The sensitivity and reliability of computational prediction of CCS values depend on accurately modeling the mol. state over accessible conformations. In this work, we developed an efficient CCS computational workflow using a machine learning model in conjunction with std. DFT methods and CCS calcns. Furthermore, we have performed Traveling Wave IM-MS (TWIMS) expts. to validate the extant exptl. values and assess uncertainties in exptl. measured CCS values. The developed workflow yielded accurate structural predictions and provides unique insights into the likely preferred conformation analyzed using IM-MS expts. The complete workflow makes the computation of CCS values tractable for a large no. of conformationally flexible metabolites with complex mol. structures.
- 16Insausti, A.; Alonso, E. R.; Tercero, B.; Santos, J. I.; Calabrese, C.; Vogt, N.; Corzana, F.; Demaison, J.; Cernicharo, J.; Cocinero, E. J. Laboratory Observation of, Astrochemical Search for, and Structure of Elusive Erythrulose in the Interstellar Medium. J. Phys. Chem. Lett. 2021, 12, 1352– 1359, DOI: 10.1021/acs.jpclett.0c03050Google Scholar16Laboratory Observation of, Astrochemical Search for, and Structure of Elusive Erythrulose in the Interstellar MediumInsausti, Aran; Alonso, Elena R.; Tercero, Belen; Santos, Jose I.; Calabrese, Camilla; Vogt, Natalja; Corzana, Francisco; Demaison, Jean; Cernicharo, Jose; Cocinero, Emilio J.Journal of Physical Chemistry Letters (2021), 12 (4), 1352-1359CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Rotational spectroscopy provides the most powerful means of identifying mols. of biol. interest in the interstellar medium (ISM), but despite their importance, the detection of carbohydrates has remained rather elusive. Here, we present a comprehensive Fourier transform rotational spectroscopic study of elusive erythrulose, a sugar building block likely to be present in the ISM, employing a novel method of transferring the hygroscopic oily carbohydrate into the gas phase. The high sensitivity of the expt. allowed the rotational spectra of all monosubstituted isotopologue species of 13C-12C3H8O4 to be recorded, which, together with quantum chem. calcns., enabled us to det. their equil. geometries (reSE) with great precision. Searches employing the new exptl. data for erythrulose have been undertaken in different ISM regions, so far including the cold areas Barnard 1, the pre-stellar core TMC-1, Sagittarius B2. Although no lines of erythrulose were found, this data will serve to enable future searches and possible detections in other ISM regions.
- 17Alonso, E. R.; Peña, I.; Cabezas, C.; Alonso, J. L. Structural Expression of Exo-Anomeric Effect. J. Phys. Chem. Lett. 2016, 7, 845– 850, DOI: 10.1021/acs.jpclett.6b00028Google Scholar17Structural Expression of Exo-Anomeric EffectAlonso, Elena R.; Pena, Isabel; Cabezas, Carlos; Alonso, Jose L.Journal of Physical Chemistry Letters (2016), 7 (5), 845-850CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Structural signatures for exo-anomeric effect have been extd. from the archetypal Me β-D-xyloside using broadband Fourier transform microwave spectroscopy combined with laser ablation. Spectrum anal. allows the detn. of a set of rotational consts., which has been unequivocally attributed to conformer cc-β-4C1 g-, corresponding to the global min. of the potential energy surface, where the aglycon residue (CH3) orientation contributes to maximization of the exo-anomeric effect. Further anal. allowed the detn. of the rs structure, based on the detection of 11 isotopologues-derived from the presence of six 13C and five 18O atoms-obsd. in their natural abundances. The obsd. glycosidic C1-O1 bond length decrease (1.38 Å) can be interpreted in terms of the exo-anomeric effect. As such, the exo-anomeric effect presents itself as one of the main driving forces controlling the shape of many biol. important oligosaccharides.
- 18Peña, I.; Cocinero, E. J.; Cabezas, C.; Lesarri, A.; Mata, S.; Écija, P.; Daly, A. M.; Cimas, A.; Bermúdez, C.; Basterretxea, F. J.; Blanco, S.; Fernández, J. A.; López, J. C.; Castaño, F.; Alonso, J. L. Six Pyranoside Forms of Free 2-Deoxy-D-ribose. Angew. Chem., Int. Ed. 2013, 52, 11840– 11845, DOI: 10.1002/anie.201305589Google Scholar18Six Pyranoside Forms of Free 2-Deoxy-D-ribosePena, Isabel; Cocinero, Emilio J.; Cabezas, Carlos; Lesarri, Alberto; Mata, Santiago; Ecija, Patricia; Daly, Adam M.; Cimas, Alvaro; Bermudez, Celina; Basterretxea, Francisco J.; Blanco, Susana; Fernandez, Jose A.; Lopez, Juan C.; Castano, Fernando; Alonso, Jose L.Angewandte Chemie, International Edition (2013), 52 (45), 11840-11845CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)The gas phase rotational spectrum of 2-deoxy-D-ribose (I) was obsd. using a UV ultrafast laser ablation technique as a source for Balle-Flygare and chirped pulsed FTMW spectrometers and was assigned with the aide of MP2(FULL) calcns. In the gas phase, I exists as a mixt. of two α- (10%) and four β-pyranose (90%) conformers. the conformational behavior is controlled by both anomeric effects and hydrogen bonding. the OH groups are preferentially oriented so as to favor cooperative hydrogen bonding. In this context, the previous exptl. ionization potential of 9.1 eV of gas phase I (obtained using tunable vacuum UV synchrotron radiation ) assigned to the α-pyranose should actually correspond to the β-pyranose. the solvent effect on furanose-pyranose equil. and the occurrence of these conformers in RNA and DNA were discussed.
- 19Baldi, P. Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response to the Letter by the Cambridge Crystallographic Data Centre. J. Chem. Inf. Model. 2011, 51, 3029, DOI: 10.1021/ci200460zGoogle Scholar19Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response to the Letter by the Cambridge Crystallographic Data CentreBaldi, PierreJournal of Chemical Information and Modeling (2011), 51 (12), 3029CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A polemic in response to Groom, C.R. (ibid., S1, 2787 - 2787, 2011) is presented to sentiments expressed in "Data-Driven High-Throughput Prediction of the 3-D Structure of Small Mols.: Review and Progress. A Response from The Cambridge Crystallog. Data Center", recently published in the Journal of Chem. Information and Modeling, which may give readers a misleading impression regarding significant impediments to scientific research posed by the CCDC.
- 20Rappe, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A.; Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992, 114, 10024– 10035, DOI: 10.1021/ja00051a040Google Scholar20UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulationsRappe, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A., III; Skiff, W. M.Journal of the American Chemical Society (1992), 114 (25), 10024-35CODEN: JACSAT; ISSN:0002-7863.A new mol. mechanics force field, the Universal force field (UFF), is described wherein the force field parameters are estd. using general rules based only on the element, its hybridization and its connectivity. The force field functional forms, parameters, and generating formulas for the full periodic table are presented.
- 21Folmsbee, D.; Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. Int. J. Quantum Chem. 2021, 121, e26381 DOI: 10.1002/qua.26381Google Scholar21Assessing conformer energies using electronic structure and machine learning methodsFolmsbee, Dakota; Hutchison, GeoffreyInternational Journal of Quantum Chemistry (2021), 121 (1), e26381CODEN: IJQCB2; ISSN:0020-7608. (John Wiley & Sons, Inc.)A review. We have performed a large-scale evaluation of current computational methods, including conventional small-mol. force fields; semiempirical, d. functional, ab initio electronic structure methods; and current machine learning (ML) techniques to evaluate relative single-point energies. Using up to 10 local min. geometries across ∼700 mols., each optimized by B3LYP-D3BJ with single-point DLPNO-CCSD(T) triple-zeta energies, we consider over 6500 single points to compare the correlation between different methods for both relative energies and ordered rankings of min. We find that the current ML methods have potential and recommend methods at each tier of the accuracy-time tradeoff, particularly the recent GFN2 semiempirical method, the B97-3c d. functional approxn., and RI-MP2 for accurate conformer energies. The ANI family of ML methods shows promise, particularly the ANI-1ccx variant trained in part on coupled-cluster energies. Multiple methods suggest continued improvements should be expected in both performance and accuracy.
- 22Kanal, I. Y.; Keith, J. A.; Hutchison, G. R. A sobering assessment of small-molecule force field methods for low energy conformer predictions. Int. J. Quantum Chem. 2018, 118, e25512 DOI: 10.1002/qua.25512Google ScholarThere is no corresponding record for this reference.
- 23Bannwarth, C.; Ehlert, S.; Grimme, S. GFN2-xTB - An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. J. Chem. Theory Comput. 2019, 15, 1652– 1671, DOI: 10.1021/acs.jctc.8b01176Google Scholar23GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion ContributionsBannwarth, Christoph; Ehlert, Sebastian; Grimme, StefanJournal of Chemical Theory and Computation (2019), 15 (3), 1652-1671CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)An extended semiempirical tight-binding model is presented, which is primarily designed for the fast calcn. of structures and noncovalent interactions energies for mol. systems with roughly 1000 atoms. The essential novelty in this so-called GFN2-xTB method is the inclusion of anisotropic second order d. fluctuation effects via short-range damped interactions of cumulative at. multipole moments. Without noticeable increase in the computational demands, this results in a less empirical and overall more phys. sound method, which does not require any classical halogen or hydrogen bonding corrections and which relies solely on global and element-specific parameters (available up to radon, Z = 86). Moreover, the at. partial charge dependent D4 London dispersion model is incorporated self-consistently, which can be naturally obtained in a tight-binding picture from second order d. fluctuations. Fully anal. and numerically precise gradients (nuclear forces) are implemented. The accuracy of the method is benchmarked for a wide variety of systems and compared with other semiempirical methods. Along with excellent performance for the "target" properties, we also find lower errors for "off-target" properties such as barrier heights and mol. dipole moments. High computational efficiency along with the improved physics compared to it precursor GFN-xTB makes this method well-suited to explore the conformational space of mol. systems. Significant improvements are furthermore obsd. for various benchmark sets, which are prototypical for biomol. systems in aq. soln.
- 24Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 8, 3192– 3203, DOI: 10.1039/C6SC05720AGoogle Scholar24ANI-1: an extensible neural network potential with DFT accuracy at force field computational costSmith, J. S.; Isayev, O.; Roitberg, A. E.Chemical Science (2017), 8 (4), 3192-3203CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A review. Deep learning is revolutionizing many areas of science and technol., esp. image, text, and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mech. (QM) DFT calcns. can learn an accurate and transferable potential for org. mols. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Mol. Energies) or ANI for short. ANI is a new method designed with the intent of developing transferable neural network potentials that utilize a highly-modified version of the Behler and Parrinello symmetry functions to build single-atom at. environment vectors (AEV) as a mol. representation. AEVs provide the ability to train neural networks to data that spans both configurational and conformational space, a feat not previously accomplished on this scale. We utilized ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for org. mols. contg. four atom types: H, C, N, and O. To obtain an accelerated but phys. relevant sampling of mol. potential surfaces, we also proposed a Normal Mode Sampling (NMS) method for generating mol. conformations. Through a series of case studies, we show that ANI-1 is chem. accurate compared to ref. DFT calcns. on much larger mol. systems (up to 54 atoms) than those included in the training data set.
- 25Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733, DOI: 10.1063/1.5023802Google Scholar25Less is more: Sampling chemical space with active learningSmith, Justin S.; Nebgen, Ben; Lubbers, Nicholas; Isayev, Olexandr; Roitberg, Adrian E.Journal of Chemical Physics (2018), 148 (24), 241733/1-241733/10CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)The development of accurate and transferable machine learning (ML) potentials for predicting mol. energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chem. space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of org. mols. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single mols. or materials, while remaining applicable to the general class of org. mols. composed of the elements CHNO. (c) 2018 American Institute of Physics.
- 26Smith, J. S.; Nebgen, B. T.; Zubatyuk, R.; Lubbers, N.; Devereux, C.; Barros, K.; Tretiak, S.; Isayev, O.; Roitberg, A. E. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 2019, 10, 2903, DOI: 10.1038/s41467-019-10827-4Google Scholar26Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learningSmith Justin S; Devereux Christian; Roitberg Adrian E; Smith Justin S; Nebgen Benjamin T; Zubatyuk Roman; Lubbers Nicholas; Barros Kipton; Tretiak Sergei; Smith Justin S; Lubbers Nicholas; Nebgen Benjamin T; Tretiak Sergei; Zubatyuk Roman; Isayev OlexandrNature communications (2019), 10 (1), 2903 ISSN:.Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.
- 27Devereux, C.; Smith, J. S.; Huddleston, K. K.; Barros, K.; Zubatyuk, R.; Isayev, O.; Roitberg, A. E. Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens. J. Chem. Theory Comput. 2020, 16, 4192– 4202, DOI: 10.1021/acs.jctc.0c00121Google Scholar27Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and HalogensDevereux, Christian; Smith, Justin S.; Huddleston, Kate K.; Barros, Kipton; Zubatyuk, Roman; Isayev, Olexandr; Roitberg, Adrian E.Journal of Chemical Theory and Computation (2020), 16 (7), 4192-4202CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)Machine learning (ML) methods have become powerful, predictive tools in a wide range of applications, such as facial recognition and autonomous vehicles. In the sciences, computational chemists and physicists have been using ML for the prediction of phys. phenomena, such as atomistic potential energy surfaces and reaction pathways. Transferable ML potentials, such as ANI-1x, have been developed with the goal of accurately simulating org. mols. contg. the chem. elements H, C, N, and O. Here, we provide an extension of the ANI-1x model. The new model, dubbed ANI-2x, is trained to three addnl. chem. elements: S, F, and Cl. Addnl., ANI-2x underwent torsional refinement training to better predict mol. torsion profiles. These new features open a wide range of new applications within org. chem. and drug development. These seven elements (H, C, N, O, F, Cl, and S) make up ∼ 90% of drug-like mols. To show that these addns. do not sacrifice accuracy, we have tested this model across a range of org. mols. and applications, including the COMP6 benchmark, dihedral rotations, conformer scoring, and nonbonded interactions. ANI-2x is shown to accurately predict mol. energies compared to d. functional theory with a ~ 106 factor speedup and a negligible slowdown compared to ANI-1x and shows subchem. accuracy across most of the COMP6 benchmark. The resulting model is a valuable tool for drug development which can potentially replace both quantum calcns. and classical force fields for a myriad of applications.
- 28Qiao, Z.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 2020, 153, 124111, DOI: 10.1063/5.0021955Google Scholar28OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital featuresQiao, Zhuoran; Welborn, Matthew; Anandkumar, Animashree; Manby, Frederick R.; Miller, Thomas F.Journal of Chemical Physics (2020), 153 (12), 124111CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We introduce a machine learning method in which energy solns. from the Schroedinger equation are predicted using symmetry adapted AO features and a graph neural-network architecture. OrbNet is shown to outperform existing methods in terms of learning efficiency and transferability for the prediction of d. functional theory results while employing low-cost features that are obtained from semi-empirical electronic structure calcns. For applications to datasets of drug-like mols., including QM7b-T, QM9, GDB-13-T, DrugBank, and the conformer benchmark dataset of Folmsbee and Hutchison [Int. J. Quantum Chem. (published online) (2020)], OrbNet predicts energies within chem. accuracy of d. functional theory at a computational cost that is 1000-fold or more reduced. (c) 2020 American Institute of Physics.
- 29Christensen, A. S.; Sirumalla, S. K.; Qiao, Z.; O’Connor, M. B.; Smith, D. G. A.; Ding, F.; Bygrave, P. J.; Anandkumar, A.; Welborn, M.; Manby, F. R.; Miller, T. F. OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 2021, 155, 204103, DOI: 10.1063/5.0061990Google Scholar29OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracyChristensen, Anders S.; Sirumalla, Sai Krishna; Qiao, Zhuoran; O'Connor, Michael B.; Smith, Daniel G. A.; Ding, Feizhi; Bygrave, Peter J.; Anandkumar, Animashree; Welborn, Matthew; Manby, Frederick R.; Miller, Thomas F.Journal of Chemical Physics (2021), 155 (20), 204103CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state d. functional theory (DFT) energy calcns. The model is a message-passing graph neural network that uses symmetry-adapted AO features from a low-cost quantum calcn. to predict the energy of a mol. OrbNet Denali is trained on a vast dataset of 2.3 x 106 DFT calcns. on mols. and geometries. This dataset covers the most common elements in biochem. and org. chem. (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged mols. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chem. problems that are not present in the training set, OrbNet Denali produces a mean abs. error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coeff. of R2 = 0.90 compared to the ref. DLPNO-CCSD(T) calcn. and R2 = 0.97 compared to the method used to generate the training data (ωB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chem. accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of ωB97X-D3/def2-TZVP with an av. mean abs. error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset. (c) 2021 American Institute of Physics.
- 30Nakata, M.; Shimazaki, T. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry. J. Chem. Inf. Model. 2017, 57, 1300– 1308, DOI: 10.1021/acs.jcim.7b00083Google Scholar30PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven ChemistryNakata, Maho; Shimazaki, TomomiJournal of Chemical Information and Modeling (2017), 57 (6), 1300-1308CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Large-scale mol. databases play an essential role in the investigation of various subjects such as the development of org. materials, in-silico drug designs, and data-driven studies with machine learning, among others. We developed a large-scale quantum chem. database based on the first-principles method without performing any expt. Our database currently contains three million mol. electronic structures based on the d. functional theory method at the B3LYP/6-31G* level, and we successively calcd. 10 low-lying excited states of over two million mols. by the time-dependent DFT method with the 6-31+G* basis set. To select the mols. calcd. in our project, we mainly referred to the PubChem project, and it was used as a source of the mol. structures in short strings using the InChI and the SMILES representations. Accordingly, we named our quantum chem. database project as "PubChemQC" (http://pubchemqc.riken.jp/) and placed it in the public domain. In this paper, we showed the fundamental features of the PubChemQC database and dis- cussed the techniques used to construct the dataset for large-scale quantum chem. calcns. We also presented a machine-learning approach to predict the electronic structure of mols. as an example to demonstrate the suitability of the large-scale quantum chem. database.
- 31Smith, D. G. A.; Altarawy, D.; Burns, L. A.; Welborn, M.; Naden, L. N.; Ward, L.; Ellis, S.; Pritchard, B. P.; Crawford, T. D. The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2020, 11, e1491 DOI: 10.1002/wcms.1491Google ScholarThere is no corresponding record for this reference.
- 32Lim, V. T.; Hahn, D. F.; Tresadern, G.; Bayly, C. I.; Mobley, D. L. Benchmark assessment of molecular geometries and energies from small molecule force fields [version 1; peer review: 2 approved]. F1000Research 2020, 9, 1390, DOI: 10.12688/f1000research.27141.1Google Scholar32Benchmark assessment of molecular geometries and energies from small molecule force fieldsLim, Victoria T.; Hahn, David F.; Tresadern, Gary; Bayly, Christopher I.; Mobley, David L.F1000Research (2020), 9 (), 1390CODEN: FRESJL; ISSN:2046-1402. (F1000 Research Ltd.)Background: Force fields are used in a wide variety of contexts for classical mol. simulation, including studies on protein-ligand binding, membrane permeation, and thermophys. property prediction. The quality of these studies relies on the quality of the force fields used to represent the systems. Methods: Focusing on small mols. of fewer than 50 heavy atoms, our aim in this work is to compare nine force fields: GAFF, GAFF2, MMFF94, MMFF94S, OPLS3e, SMIRNOFF99Frosst, and the Open Force Field Parsley, versions 1.0, 1.1, and 1.2. On a dataset comprising 22,675 mol. structures of 3,271 mols., we analyzed force field-optimized geometries and conformer energies compared to ref. quantum mech. (QM) data. Results: We show that while OPLS3e performs best, the latest Open Force Field Parsley release is approaching a comparable level of accuracy in reproducing QM geometries and energetics for this set of mols. Meanwhile, the performance of established force fields such as MMFF94S and GAFF2 is generally somewhat worse. We also fiend that the series of recent Open Force Field versions provide significant increases in accuracy. Conclusions: This study provides an extensive test of the performance of different mol. mechanics force fields on a diverse mol. set, and highlights two (OPLS3e and OpenFF 1.2) that perform better than the others tested on the present comparison. Our mol. set and results are available for other researchers to use in testing.
- 33Gražulis, S.; Merkys, A.; Vaitkus, A.; Chateigner, D.; Lutterotti, L.; Moeck, P.; Quiros, M.; Downs, R. T.; Kaminsky, W.; Bail, A. L. Materials Informatics: Methods, Tools and Applications; Isayev, O., Tropsha, A., Curtarolo, S., Eds.; Wiley, 2019; Chapter 1, pp 1– 39.Google ScholarThere is no corresponding record for this reference.
- 34Chai, J.-D.; Head-Gordon, M. Systematic optimization of long-range corrected hybrid density functionals. J. Chem. Phys. 2008, 128, 084106, DOI: 10.1063/1.2834918Google Scholar34Systematic optimization of long-range corrected hybrid density functionalsChai, Jeng-Da; Head-Gordon, MartinJournal of Chemical Physics (2008), 128 (8), 084106/1-084106/15CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)A general scheme for systematically modeling long-range cor. (LC) hybrid d. functionals is proposed. Our resulting two LC hybrid functionals are shown to be accurate in thermochem., kinetics, and noncovalent interactions, when compared with common hybrid d. functionals. The qual. failures of the commonly used hybrid d. functionals in some "difficult problems," such as dissocn. of sym. radical cations and long-range charge-transfer excitations, are significantly reduced by the present LC hybrid d. functionals. (c) 2008 American Institute of Physics.
- 35Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 2005, 7, 3297, DOI: 10.1039/b508541aGoogle Scholar35Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracyWeigend, Florian; Ahlrichs, ReinhartPhysical Chemistry Chemical Physics (2005), 7 (18), 3297-3305CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)Gaussian basis sets of quadruple zeta valence quality for Rb-Rn are presented, as well as bases of split valence and triple zeta valence quality for H-Rn. The latter were obtained by (partly) modifying bases developed previously. A large set of more than 300 mols. representing (nearly) all elements-except lanthanides-in their common oxidn. states was used to assess the quality of the bases all across the periodic table. Quantities investigated were atomization energies, dipole moments and structure parameters for Hartree-Fock, d. functional theory and correlated methods, for which we had chosen Moller-Plesset perturbation theory as an example. Finally recommendations are given which type of basis set is used best for a certain level of theory and a desired quality of results.
- 36Weigend, F. Accurate Coulomb-fitting basis sets for H to Rn. Phys. Chem. Chem. Phys. 2006, 8, 1057, DOI: 10.1039/b515623hGoogle Scholar36Accurate Coulomb-fitting basis sets for H to RnWeigend, FlorianPhysical Chemistry Chemical Physics (2006), 8 (9), 1057-1065CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)A series of auxiliary basis sets to fit Coulomb potentials for the elements H to Rn (except lanthanides) is presented. For each element only one auxiliary basis set is needed to approx. Coulomb energies in conjunction with orbital basis sets of split valence, triple zeta valence and quadruple zeta valence quality with errors of typically below ca. 0.15 kJ mol-1 per atom; this was demonstrated in conjunction with the recently developed orbital basis sets of types def2-SV(P), def2-TZVP and def2-QZVPP for a large set of small mols. representing (nearly) each element in all of its common oxidn. states. These auxiliary bases are slightly more than three times larger than orbital bases of split valence quality. Compared to non-approximated treatments, computation times for the Coulomb part are reduced by a factor of ca. 8 for def2-SV(P) orbital bases, ca. 25 for def2-TZVP and ca. 100 for def2-QZVPP orbital bases.
- 37Sterling, T.; Irwin, J. J. ZINC 15 – Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324– 2337, DOI: 10.1021/acs.jcim.5b00559Google Scholar37ZINC 15 - Ligand Discovery for EveryoneSterling, Teague; Irwin, John J.Journal of Chemical Information and Modeling (2015), 55 (11), 2324-2337CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Many questions about the biol. activity and availability of small mols. remain inaccessible to investigators who could most benefit from their answers. To narrow the gap between chemoinformatics and biol., we have developed a suite of ligand annotation, purchasability, target, and biol. assocn. tools, incorporated into ZINC and meant for investigators who are not computer specialists. The new version contains over 120 million purchasable "drug-like" compds. - effectively all org. mols. that are for sale - a quarter of which are available for immediate delivery. ZINC connects purchasable compds. to high-value ones such as metabolites, drugs, natural products, and annotated compds. from the literature. Compds. may be accessed by the genes for which they are annotated as well as the major and minor target classes to which those genes belong. It offers new anal. tools that are easy for nonspecialists yet with few limitations for experts. ZINC retains its original 3D roots - all mols. are available in biol. relevant, ready-to-dock formats. ZINC is freely available at http://zinc15.docking.org.
- 38Yoshikawa, N.; Hutchison, G. R. Fast, efficient fragment-based coordinate generation for Open Babel. J. Cheminf. 2019, 11, 49, DOI: 10.1186/s13321-019-0372-5Google ScholarThere is no corresponding record for this reference.
- 39O’Boyle, N. M.; Morley, C.; Hutchison, G. R. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008, 2, 5, DOI: 10.1186/1752-153x-2-5Google Scholar39Pybel: a Python wrapper for the OpenBabel cheminformatics toolkitO'Boyle Noel M; Morley Chris; Hutchison Geoffrey RChemistry Central journal (2008), 2 (), 5 ISSN:.BACKGROUND: Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit. RESULTS: Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel. CONCLUSION: Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.
- 40O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An open chemical toolbox. J. Cheminf. 2011, 3, 33, DOI: 10.1186/1758-2946-3-33Google Scholar40Open Babel: an open chemical toolboxO'Boyle, Noel M.; Banck, Michael; James, Craig A.; Morley, Chris; Vandermeersch, Tim; Hutchison, Geoffrey R.Journal of Cheminformatics (2011), 3 (), 33CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Background: A frequent problem in computational modeling is the interconversion of chem. structures between different formats. While std. interchange formats exist (for example, Chem. Markup Language) and de facto stds. have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chem. data, differences in the data stored by different formats (0D vs. 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results: We discuss, for the first time, Open Babel, an open-source chem. toolbox that speaks the many languages of chem. data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chem. and mol. data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a soln. to the proliferation of multiple chem. file formats. In addn., it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chem. data in areas such as org. chem., drug design, materials science, and computational chem. It is freely available under an open-source license.
- 41Wishart, D. S.; Feunang, Y. D.; Guo, A. C.; Lo, E. J.; Marcu, A.; Grant, J. R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074– D1082, DOI: 10.1093/nar/gkx1037Google Scholar41DrugBank 5.0: a major update to the DrugBank database for 2018Wishart, David S.; Feunang, Yannick D.; Guo, An C.; Lo, Elvis J.; Marcu, Ana; Grant, Jason R.; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Sayeeda, Zinat; Assempour, Nazanin; Iynkkaran, Ithayavani; Liu, Yifeng; Maciejewski, Adam; Gale, Nicola; Wilson, Alex; Chin, Lucy; Cummings, Ryan; Le, Diana; Pon, Allison; Knox, Craig; Wilson, MichaelNucleic Acids Research (2018), 46 (D1), D1074-D1082CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)DrugBank is a web-enabled database contg. comprehensivemol. information about drugs, their mechanisms, their interactions and their targets. First described in 2006, Drug- Bank has continued to evolve over the past 12 years in response to marked improvements to web stds. and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total no. of investigational drugs in the database has grown by almost 300%, the no. of drug-drug interactions has grown by nearly 600% and the no. of SNP-assocd. drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoproteomics). New data have also been added on the status of hundreds of newdrug clin. trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacol. research, pharmaceutical science and drug education.
- 42Pracht, P.; Bohle, F.; Grimme, S. Automated Exploration of the low-energy Chemical Space with fast Quantum Chemical Methods. Phys. Chem. Chem. Phys. 2020, 22, 7169– 7192, DOI: 10.1039/C9CP06869DGoogle Scholar42Automated exploration of the low-energy chemical space with fast quantum chemical methodsPracht, Philipp; Bohle, Fabian; Grimme, StefanPhysical Chemistry Chemical Physics (2020), 22 (14), 7169-7192CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)We propose and discuss an efficient scheme for the in silico sampling for parts of the mol. chem. space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm. The focus of this work is set on the generation of proper thermodn. ensembles at a quantum chem. level for conformers, but similar procedures for protonation states, tautomerism and non-covalent complex geometries are also discussed. The conformational ensembles consisting of all significantly populated min. energy structures normally form the basis of further, mostly DFT computational work, such as the calcn. of spectra or macroscopic properties. By using basic quantum chem. methods, electronic effects or possible bond breaking/formation are accounted for and a very reasonable initial energetic ranking of the candidate structures is obtained. Due to the huge computational speedup gained by the fast low-cost quantum chem. methods, overall short computation times even for systems with hundreds of atoms (typically drug-sized mols.) are achieved. Furthermore, specialized applications, such as sampling with implicit solvation models or constrained conformational sampling for transition-states, metal-, surface-, or noncovalently bound complexes are discussed, opening many possible applications in modern computational chem. and drug discovery. The procedures have been implemented in a freely available computer code called CREST, that makes use of the fast and reliable GFNn-xTB methods.
- 43Grimme, S. Exploration of Chemical Compound, Conformer, and Reaction Space with Meta-Dynamics Simulations Based on Tight-Binding Quantum Chemical Calculations. J. Chem. Theory Comput. 2019, 15, 2847– 2862, DOI: 10.1021/acs.jctc.9b00143Google Scholar43Exploration of Chemical Compound, Conformer, and Reaction Space with Meta-Dynamics Simulations Based on Tight-Binding Quantum Chemical CalculationsGrimme, StefanJournal of Chemical Theory and Computation (2019), 15 (5), 2847-2862CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The semiempirical tight-binding based quantum chem. method GFN2-xTB is used in the framework of meta-dynamics (MTD) to globally explore chem. compd., conformer, and reaction space. The biasing potential given as a sum of Gaussian functions is expressed with the root-mean-square-deviation (RMSD) in Cartesian space as a metric for the collective variables. This choice makes the approach robust and generally applicable to three common problems (i.e., conformer search, chem. reaction space exploration in a virtual nanoreactor, and for guessing reaction paths). Because of the inherent locality of the at. RMSD, functional group or fragment selective treatments are possible facilitating the investigation of catalytic processes where, for example, only the substrate is thermally activated. Due to the approx. character of the GFN2-xTB method, the resulting structure ensembles require further refinement with more sophisticated, for example, d. functional or wave function theory methods. However, the approach is extremely efficient running routinely on common laptop computers in minutes to hours of computation time even for realistically sized mols. with a few hundred atoms. Furthermore, the underlying potential energy surface for mols. contg. almost all elements (Z = 1-86) is globally consistent including the covalent dissocn. process and electronically complicated situations in, for example, transition metal systems. As examples, thermal decompn., ethyne oligomerization, the oxidn. of hydrocarbons (by oxygen and a P 450 enzyme model), a Miller-Urey model system, a thermally forbidden dimerization, and a multistep intramol. cyclization reaction are shown. For typical conformational search problems of org. drug mols., the new MTD(RMSD) algorithm yields lower energy structures and more complete conformer ensembles at reduced computational effort compared with its already well performing predecessor.
- 44Chan, L.; Morris, G. M.; Hutchison, G. R. Understanding Conformational Entropy in Small Molecules. J. Chem. Theory Comput. 2021, 17, 2099– 2106, DOI: 10.1021/acs.jctc.0c01213Google Scholar44Understanding Conformational Entropy in Small MoleculesChan, Lucian; Morris, Garrett M.; Hutchison, Geoffrey R.Journal of Chemical Theory and Computation (2021), 17 (4), 2099-2106CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The calcn. of the entropy of flexible mols. can be challenging, since the no. of possible conformers can grow exponentially with mol. size and many low-energy conformers may be thermally accessible. Different methods have been proposed to approx. the contribution of conformational entropy to the mol. std. entropy, including performing thermochem. calcns. with all possible stable conformations and developing empirical corrections from exptl. data. We have performed conformer sampling on over 120,000 small mols. generating some 12 million conformers, to develop models to predict conformational entropy across a wide range of mols. Using insight into the nature of conformational disorder, our cross-validated phys. motivated statistical model gives a mean abs. error of ∼ 4.8 J/mol·K or under 0.4 kcal/mol at 300 K. Beyond predicting mol. entropies and free energies, the model implies a high degree of correlation between torsions in most mols., often assumed to be independent. While individual dihedral rotations may have low energetic barriers, the shape and chem. functionality of most mols. necessarily correlate their torsional degrees of freedom and hence restrict the no. of low-energy conformations immensely. Our simple models capture these correlations and advance our understanding of small mol. conformational entropy.
- 45Landrum, G. RDKit: Open-Source Cheminformatics. Available at http://www.rdkit.org, 2020; http://www.rdkit.org (accesses Oct 1, 2022).Google ScholarThere is no corresponding record for this reference.
- 46https://smarts.plus/.Google ScholarThere is no corresponding record for this reference.
- 47Schomburg, K.; Ehrlich, H.-C.; Stierand, K.; Rarey, M. From Structure Diagrams to Visual Chemical Patterns. J. Chem. Inf. Model. 2010, 50, 1529– 1535, DOI: 10.1021/ci100209aGoogle Scholar47From Structure Diagrams to Visual Chemical PatternsSchomburg, Karen; Ehrlich, Hans-Christian; Stierand, Katrin; Rarey, MatthiasJournal of Chemical Information and Modeling (2010), 50 (9), 1529-1535CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The intuitive way of chemists to communicate mols. is via two-dimensional structure diagrams. The straightforward visual representations are mostly preferred to the often complicated systematic chem. names. For chem. patterns, however, no comparable visualization stds. have evolved so far. Chem. patterns denoting descriptions of chem. features are needed whenever a set of mols. is filtered for certain properties. The currently available representations are constrained to linear mol. pattern languages which are hardly human readable and therefore keep chemists without computational background from systematically formulating patterns. Therefore, we introduce a new visualization concept for chem. patterns. The common std. concept of structure diagrams is extended to account for property descriptions and logic combinations of chem. features in patterns. As a first application of the new concept, we developed the SMARTSviewer, a tool that converts chem. patterns encoded in SMARTS strings to a visual representation. The graphic pattern depiction provides an overview of the specified chem. features, variations, and similarities without needing to decode the often cryptic linear expressions. Taking recent chem. publications from various fields, we demonstrate the wide application range of a graphical chem. pattern language.
- 48Neese, F.; Wennmohs, F.; Becker, U.; Riplinger, C. The ORCA quantum chemistry program package. J. Chem. Phys. 2020, 152, 224108, DOI: 10.1063/5.0004608Google Scholar48The ORCA quantum chemistry program packageNeese, Frank; Wennmohs, Frank; Becker, Ute; Riplinger, ChristophJournal of Chemical Physics (2020), 152 (22), 224108CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)In this contribution to the special software-centered issue, the ORCA program package is described. We start with a short historical perspective of how the project began and go on to discuss its current feature set. ORCA has grown into a rather comprehensive general-purpose package for theor. research in all areas of chem. and many neighboring disciplines such as materials sciences and biochem. ORCA features d. functional theory, a range of wavefunction based correlation methods, semi-empirical methods, and even force-field methods. A range of solvation and embedding models is featured as well as a complete intrinsic to ORCA quantum mechanics/mol. mechanics engine. A specialty of ORCA always has been a focus on transition metals and spectroscopy as well as a focus on applicability of the implemented methods to "real-life" chem. applications involving systems with a few hundred atoms. In addn. to being efficient, user friendly, and, to the largest extent possible, platform independent, ORCA features a no. of methods that are either unique to ORCA or have been first implemented in the course of the ORCA development. Next to a range of spectroscopic and magnetic properties, the linear- or low-order single- and multi-ref. local correlation methods based on pair natural orbitals (domain based local pair natural orbital methods) should be mentioned here. Consequently, ORCA is a widely used program in various areas of chem. and spectroscopy with a current user base of over 22 000 registered users in academic research and in industry. (c) 2020 American Institute of Physics.
- 49Lin, J. B.; Jin, Y.; Lopez, S. A.; Druckerman, N.; Wheeler, S. E.; Houk, K. N. Torsional Barriers to Rotation and Planarization in Heterocyclic Oligomers of Value in Organic Electronics. J. Chem. Theory Comput. 2017, 13, 5624– 5638, DOI: 10.1021/acs.jctc.7b00709Google Scholar49Torsional Barriers to Rotation and Planarization in Heterocyclic Oligomers of Value in Organic ElectronicsLin, Janice B.; Jin, Yu; Lopez, Steven A.; Druckerman, Nathaniel; Wheeler, Steven E.; Houk, K. N.Journal of Chemical Theory and Computation (2017), 13 (11), 5624-5638CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)In order to understand the conformational behavior of org. components in org. electronic devices, we have computed the torsional potentials for a library of thiophene-based heterodimers. The accuracy and efficiencies of computational methods for these org. materials were benchmarked for 11 common d. functionals with three Pople basis sets against a Focal Point Anal. (FPA) on a model oligothiophene 2,5-bis(3-tetradecylthiophen-2-yl)thieno[3,2-b]-thiophene (BTTT) system. This study establishes a set of general trends in regards to conformational preferences, as well as planarization and rotational barriers for a library comprised of common fragments found in org. materials. These gas phase structures are compared to exptl. crystal structures to det. the effect of crystal packing on geometry. Finally, we analyze the structure of hole-transporting material DERDTS-TBDT and design a new oligomer likely to be planar in the solid state.
- 50Perkins, M. A.; Cline, L. M.; Tschumper, G. S. Torsional Profiles of Thiophene and Furan Oligomers: Probing the Effects of Heterogeneity and Chain Length. J. Phys. Chem. A 2021, 125, 6228– 6237, DOI: 10.1021/acs.jpca.1c04714Google Scholar50Torsional Profiles of Thiophene and Furan Oligomers: Probing the Effects of Heterogeneity and Chain LengthPerkins, Morgan A.; Cline, Laura M.; Tschumper, Gregory S.Journal of Physical Chemistry A (2021), 125 (28), 6228-6237CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)A systematic anal. of the torsional profiles of 55 unique oligomers composed of two to four thiophene and/or furan rings (n = 2 to 4) has been conducted using three d. functional theory (DFT) methods along with MP2 and three different coupled-cluster methods. Two planar or quasi-planar min. were identified for each n = 2 oligomer system. In every case, the torsional angle (τ) between the heteroatoms about the carbon-carbon bond connecting the two rings is at or near 180° for the global min. and 0° for the local min., referred to as anti and syn conformations, resp. These oligomers have rotational barrier heights ranging from ca. 2 kcal mol-1 for 2,2'-bithiophene to 4 kcal mol-1 for 2,2'-bifuran, based on electronic energies computed near the CCSD(T) complete basis set (CBS) limit. The corresponding rotational barrier for the heterogeneous 2-(2-thienyl)furan counterpart falls approx. halfway between those values. The energy differences between the min. are approx. 2 and 0.4 kcal mol-1 for the homogeneous 2,2'-bifuran and 2,2'-bithiophene, resp., whereas the energy difference between the planar local and global min. (at τ = 0 and 180°, resp.) is only 0.3 kcal mol-1 for 2-(2-thienyl)furan. Extending these three oligomers by adding one or two addnl. thiophene and/or furan rings resulted in only minor changes to the torsional profiles when rotating around the same carbon-carbon bond as the two-ring profiles. Relative energy differences between the syn and anti conformations were changed by no more than 0.4 kcal mol-1 for the corresponding n = 3 and 4 oligomers, while the rotational barrier height increased by no more than 0.8 kcal mol-1.
- 51Johansson, M. P.; Olsen, J. Torsional Barriers and Equilibrium Angle of Biphenyl: Reconciling Theory with Experiment. J. Chem. Theory Comput. 2008, 4, 1460– 1471, DOI: 10.1021/ct800182eGoogle Scholar51Torsional Barriers and Equilibrium Angle of Biphenyl: Reconciling Theory with ExperimentJohansson, Mikael P.; Olsen, JeppeJournal of Chemical Theory and Computation (2008), 4 (9), 1460-1471CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The barriers of internal rotation of the two Ph groups in biphenyl are investigated using a combination of coupled cluster and d. functional theory. The exptl. barriers are for the first time accurately reproduced; our best ests. of the barriers are 8.0 and 8.3 kJ/mol around the planar and perpendicular conformations, resp. The use of flexible basis sets of at least augmented quadruple-ζ quality is shown to be a crucial prerequisite. Further, to finally reconcile theory with expt., extrapolations of both the basis set toward the basis set limit and electron correlation toward the full configuration-interaction limit are necessary. The min. of the torsional angle is significantly increased by free energy corrections, which are needed to reach an agreement with expt. The d. functional B3LYP approach is found to perform well compared with the highest level ab initio results.
- 52Nam, S.; Cho, E.; Sim, E.; Burke, K. Explaining and Fixing DFT Failures for Torsional Barriers. J. Phys. Chem. Lett. 2021, 12, 2796– 2804, DOI: 10.1021/acs.jpclett.1c00426Google Scholar52Explaining and Fixing DFT Failures for Torsional BarriersNam, Seungsoo; Cho, Eunbyol; Sim, Eunji; Burke, KieronJournal of Physical Chemistry Letters (2021), 12 (11), 2796-2804CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Most torsional barriers are predicted with high accuracies (about 1 kJ/mol) by std. semilocal functionals, but a small subset was found to have much larger errors. We created a database of almost 300 carbon-carbon torsional barriers, including 12 poorly behaved barriers, that stem from the Y=C-X group, where Y is O or S and X is a halide. Functionals with enhanced exchange mixing (about 50%) worked well for all barriers. We found that poor actors have delocalization errors caused by hyperconjugation. These problematic calcns. are d.-sensitive (i.e., DFT predictions change noticeably with the d.), and using HF densities (HF-DFT) fixes these issues. For example, conventional B3LYP performs as accurately as exchange-enhanced functionals if the HF d. is used. For long-chain conjugated mols., HF-DFT can be much better than exchange-enhanced functionals. We suggest that HF-PBE0 has the best overall performance.
- 53Jackson, N. E.; Savoie, B. M.; Kohlstedt, K. L.; Olvera de la Cruz, M.; Schatz, G. C.; Chen, L. X.; Ratner, M. A. Controlling Conformations of Conjugated Polymers and Small Molecules: The Role of Nonbonding Interactions. J. Am. Chem. Soc. 2013, 135, 10475– 10483, DOI: 10.1021/ja403667sGoogle Scholar53Controlling Conformations of Conjugated Polymers and Small Molecules: The Role of Nonbonding InteractionsJackson, Nicholas E.; Savoie, Brett M.; Kohlstedt, Kevin L.; Olvera de la Cruz, Monica; Schatz, George C.; Chen, Lin X.; Ratner, Mark A.Journal of the American Chemical Society (2013), 135 (28), 10475-10483CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)The chem. variety present in the org. electronics literature has motivated us to investigate potential nonbonding interactions often incorporated into conformational "locking" schemes. We examine a variety of potential interactions, including oxygen-sulfur, nitrogen-sulfur, and fluorine-sulfur, using accurate quantum-chem. wave function methods and noncovalent interaction (NCI) anal. on a selection of high-performing conjugated polymers and small mols. found in the literature. In addn., we evaluate a set of nonbonding interactions occurring between various heterocyclic and pendant atoms taken from a group of representative π-conjugated mols. Together with our survey and set of interactions, it is detd. that while many nonbonding interactions possess weak binding capabilities, nontraditional hydrogen-bonding interactions, oxygen-hydrogen (CH···O) and nitrogen-hydrogen (CH···N), are alone in inducing conformational control and enhanced planarity along a polymer or small mol. backbone at room temp.
- 54Greenwell, C.; Beran, G. J. O. Inaccurate Conformational Energies Still Hinder Crystal Structure Prediction in Flexible Organic Molecules. Cryst. Growth Des. 2020, 20, 4875– 4881, DOI: 10.1021/acs.cgd.0c00676Google Scholar54Inaccurate Conformational Energies Still Hinder Crystal Structure Prediction in Flexible Organic MoleculesGreenwell, Chandler; Beran, Gregory J. O.Crystal Growth & Design (2020), 20 (8), 4875-4881CODEN: CGDEFU; ISSN:1528-7483. (American Chemical Society)Crystal structure prediction driven by d. functional theory has become an increasingly useful tool for the pharmaceutical industry and others interested in understanding and controlling org. mol. crystal packing. However, delocalization error in widely used d. functionals leads to problematic conformational energies that can cause incorrect predictions of polymorph stabilities. In 5 examples ranging from small mols. to the polymorphically challenging pharmaceuticals axitinib and galunisertib, inexpensively correcting the intramol. conformational energies with higher-level electronic structure methods leads to polymorph stability predictions that agree far better with expt. This approach also provides a valuable diagnostic for when skepticism about predicted polymorph stabilities is warranted. Commonly used d. functionals have difficulty ranking certain types of conformational polymorph structures correctly. Correcting the intramol. conformational energies with higher-level quantum chem. methods can improve the accuracy of crystal structure prediction stability rankings considerably.
- 55Smith, J. S.; Zubatyuk, R.; Nebgen, B.; Lubbers, N.; Barros, K.; Roitberg, A. E.; Isayev, O.; Tretiak, S. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 2020, 7, 134, DOI: 10.1038/s41597-020-0473-zGoogle Scholar55The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for moleculesSmith, Justin S.; Zubatyuk, Roman; Nebgen, Benjamin; Lubbers, Nicholas; Barros, Kipton; Roitberg, Adrian E.; Isayev, Olexandr; Tretiak, SergeiScientific Data (2020), 7 (1), 134CODEN: SDCABS; ISSN:2052-4463. (Nature Research)Abstr.: Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chem., ML has been used to develop models for predicting mol. properties, for example quantum mechanics (QM) calcd. potential energy surfaces and at. charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for org. mols. were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality redn. scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M d. functional theory calcns., while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approx. 14 million CPU core-hours were expended to generate this data. Multiple QM calcd. properties for the chem. elements C, H, N, and O are provided: energies, at. forces, multipole moments, at. charges, etc. We provide this data to the community to aid research and development of ML models for chem.
- 56Axelrod, S.; Gomez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 2022, 9, 185, DOI: 10.1038/s41597-022-01288-4Google Scholar56GEOM, energy-annotated molecular conformations for property prediction and molecular generationAxelrod, Simon; Gomez-Bombarelli, RafaelScientific Data (2022), 9 (1), 185CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Machine learning (ML) outperforms traditional approaches in many mol. design tasks. ML models usually predict mol. properties from a 2D chem. graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a mol. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and exptl. data. Here we use advanced sampling and semi-empirical d. functional theory (DFT) to generate 37 million mol. conformations for over 450,000 mols. The Geometric Ensemble Of Mols. (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with exptl. data related to biophysics, physiol., and phys. chem. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations.
- 57Isert, C.; Atz, K.; Jiménez-Luna, J.; Schneider, G. QMugs, quantum mechanical properties of drug-like molecules. Sci. Data 2022, 9, 273, DOI: 10.1038/s41597-022-01390-7Google Scholar57QMugs, quantum mechanical properties of drug-like moleculesIsert, Clemens; Atz, Kenneth; Jimenez-Luna, Jose; Schneider, GisbertScientific Data (2022), 9 (1), 273CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Machine learning approaches in drug discovery, as well as in other areas of the chem. sciences, benefit from curated datasets of phys. mol. properties. However, there currently is a lack of data collections featuring large bioactive mols. alongside first-principle quantum chem. information. The open-access QMugs (Quantum-Mech. Properties of Drug-like Mols.) dataset fills this void. The QMugs collection comprises quantum mech. properties of more than 665 k biol. and pharmacol. relevant mols. extd. from the ChEMBL database, totaling ∼2 M conformers. QMugs contains optimized mol. geometries and thermodn. data obtained via the semi-empirical method GFN2-xTB. Atomic and mol. properties are provided on both the GFN2-xTB and on the d.-functional levels of theory (DFT, ωB97X-D/def2-SVP). QMugs features mols. of significantly larger size than previously-reported collections and comprises their resp. quantum mech. wave functions, including DFT d. and orbital matrixes. This dataset is intended to facilitate the development of models that learn from mol. data on different levels of theory while also providing insight into the corresponding relationships between mol. structure and biol. activity.
- 58Eastman, P.; Behara, P. K.; Dotson, D. L.; Galvelis, R.; Herr, J. E.; Horton, J. T.; Mao, Y.; Chodera, J. D.; Pritchard, B. P.; Wang, Y.; Fabritiis, G. D.; Markland, T. E. SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. Sci. Data 2023, 10, 11, DOI: 10.1038/s41597-022-01882-6Google Scholar58SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning PotentialsEastman, Peter; Behara, Pavan Kumar; Dotson, David L.; Galvelis, Raimondas; Herr, John E.; Horton, Josh T.; Mao, Yuezhi; Chodera, John D.; Pritchard, Benjamin P.; Wang, Yuanqing; De Fabritiis, Gianni; Markland, Thomas E.Scientific Data (2023), 10 (1), 11CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Machine learning potentials are an important tool for mol. simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chem. dataset for training potentials relevant to simulating drug-like small mols. interacting with proteins. It contains over 1.1 million conformations for a diverse set of small mols., dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged mols., and a wide range of covalent and non-covalent interactions. It provides both forces and energies calcd. at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chem. accuracy across a broad region of chem. space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in mol. simulations.
- 59McNutt, A. T.; Bisiriyu, F.; Song, S.; Vyas, A.; Hutchison, G. R.; Koes, D. R. Conformer Generation for Structure-Based Drug Design: How Many and How Good?. J. Chem. Inf. Model. 2023, 63, 6598– 6607, DOI: 10.1021/acs.jcim.3c01245Google Scholar59Conformer Generation for Structure-Based Drug Design: How Many and How Good?McNutt, Andrew T.; Bisiriyu, Fatimah; Song, Sophia; Vyas, Ananya; Hutchison, Geoffrey R.; Koes, David RyanJournal of Chemical Information and Modeling (2023), 63 (21), 6598-6607CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Conformer generation, the assignment of realistic 3D coordinates to a small mol., is fundamental to structure-based drug design. Conformational ensembles are required for rigid-body matching algorithms, such as shape-based or pharmacophore approaches, and even methods that treat the ligand flexibly, such as docking, are dependent on the quality of the provided conformations due to not sampling all degrees of freedom (e.g., only sampling torsions). Here, we empirically elucidate some general principles about the size, diversity, and quality of the conformational ensembles needed to get the best performance in common structure-based drug discovery tasks. In many cases, our findings may parallel "common knowledge" well-known to practitioners of the field. Nonetheless, we feel that it is valuable to quantify these conformational effects while reproducing and expanding upon previous studies. Specifically, we investigate the performance of a state-of-the-art generative deep learning approach vs. a more classical geometry-based approach, the effect of energy minimization as a postprocessing step, the effect of ensemble size (max. no. of conformers), and construction (filtering by root-mean-square deviation for diversity) and how these choices influence the ability to recapitulate bioactive conformations and perform pharmacophore screening and mol. docking.
- 60Foloppe, N.; Chen, I.-J. Energy windows for computed compound conformers: covering artefacts or truly large reorganization energies?. Future Med. Chem. 2019, 11, 97– 118, DOI: 10.4155/fmc-2018-0400Google Scholar60Energy windows for computed compound conformers: covering artefacts or truly large reorganization energies?Foloppe, Nicolas; Chen, I-JenFuture Medicinal Chemistry (2019), 11 (2), 97-118CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)The generation of 3D conformers of small mols. underpins most computational drug discovery. Thus, the conformer quality is crit. and depends on their energetics. A key parameter is the empirical conformational energy window (ΔEw), since only conformers within ΔEw are retained. However, ΔEw values in use appear unrealistically large. We analyze the factors pertaining to the conformer energetics and ΔEw. We argue that more attention must be focused on the problem of collapsed low-energy conformers. That is due to artificial intramol. stabilization and occurs even with continuum solvation. Consequently, the conformational energy of extended bioactive structures is artifactually increased, which inflates ΔEw. Thus, this Perspective highlights the issues arising from low-energy conformers and suggests improvements via empirical or physics-based strategies. Graphical abstr. :.
- 61Rai, B. K.; Sresht, V.; Yang, Q.; Unwalla, R.; Tu, M.; Mathiowetz, A. M.; Bakken, G. A. Comprehensive Assessment of Torsional Strain in Crystal Structures of Small Molecules and Protein–Ligand Complexes using ab Initio Calculations. J. Chem. Inf. Model. 2019, 59, 4195– 4208, DOI: 10.1021/acs.jcim.9b00373Google Scholar61Comprehensive Assessment of Torsional Strain in Crystal Structures of Small Molecules and Protein-Ligand Complexes using ab Initio CalculationsRai, Brajesh K.; Sresht, Vishnu; Yang, Qingyi; Unwalla, Ray; Tu, Meihua; Mathiowetz, Alan M.; Bakken, Gregory A.Journal of Chemical Information and Modeling (2019), 59 (10), 4195-4208CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The energetics of rotation around single bonds (torsions) is a key determinant of the 3D shape that drug-like mols. adopt in soln., the solid state, and in different biol. environments, which in turn defines their unique phys. and pharmacol. properties. Therefore, accurate characterization of torsion angle preference and energetics is essential for the success of computational drug discovery and design. Here, the authors analyze torsional strain in crystal structures of drug-like mols. in CSD and bioactive ligand conformations in PDB, expressing the total strain energy as a sum of strain energy from constituent rotatable bonds. The authors utilized Cloud computing to generate torsion scan profiles of a very large collection of chem. diverse neutral fragments at DFT(B3LYP)/6-31G*//6-31G** or DFT(B3LYP)/6-31+G*//6-31+G** (for sulfur-contg. mol.). With the data generated from these ab initio calcns., the authors performed rigorous anal. of strain due to deviation of obsd. torsion angles relative to their ideal gas-phase geometries. Contrary to the previous studies based on mol. mechanics, the authors find that in the cryst.-state mols. generally adopt low-strain conformations, with median per-torsion strain energy in CSD and PDB under 1/10th and 1/3rd of a kcal/mol, resp. However, for a small fraction (<5%) of motifs, external effects such as steric hindrance and hydrogen bonds result in strain penalty exceeding 2.5 kcal/mol. The authors find that due to poor quality of PDB structures in general, bioactive structures tend to have higher torsional strain compared to small mol. crystal conformations. However, in the absence of structural fitting artifacts in PDB structures, protein-induced strain in bioactive conformations is quant. similar to those due to the packing forces in small mol. crystal structures. This anal. allows us to establish strain energy thresholds to help to identify biol. relevant conformers in a given ensemble. The work presented here is the most comprehensive study to date that demonstrates the utility and feasibility of gas-phase QM calcns. to study conformational preference and energetics of drug-size mols. Potential applications of this study in computational lead discovery and structure-based design are discussed.
- 62Taylor, R.; Wood, P. A. A Million Crystal Structures: The Whole Is Greater than the Sum of Its Parts. Chem. Rev. 2019, 119, 9427– 9477, DOI: 10.1021/acs.chemrev.9b00155Google Scholar62A Million Crystal Structures: The Whole Is Greater than the Sum of Its PartsTaylor, Robin; Wood, Peter A.Chemical Reviews (Washington, DC, United States) (2019), 119 (16), 9427-9477CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)A review. The founding in 1965 of what is now called the Cambridge Structural Database (CSD) has reaped dividends in numerous and diverse areas of chem. research. Each of the million or so crystal structures in the database was solved for its own particular reason, but collected together, the structures can be reused to address a multitude of new problems. In this Review, which is focused mainly on the last 10 years, we chronicle the contribution of the CSD to research into mol. geometries, mol. interactions, and mol. assemblies and demonstrate its value in the design of biol. active mols. and the solid forms in which they are delivered. Its potential in other com. relevant areas is described, including gas storage and delivery, thin films, and (opto)electronics. The CSD also aids the soln. of new crystal structures. Because no scientific instrument is without shortcomings, the limitations of CSD research are assessed. We emphasize the importance of maintaining database quality: notwithstanding the arrival of big data and machine learning, it remains perilous to ignore the principle of garbage in, garbage out. Finally, we explain why the CSD must evolve with the world around it to ensure it remains fit for purpose in the years ahead.
- 63Liebeschuetz, J. W. The Good, the Bad, and the Twisted Revisited: An Analysis of Ligand Geometry in Highly Resolved Protein–Ligand X-ray Structures. J. Med. Chem. 2021, 64, 7533– 7543, DOI: 10.1021/acs.jmedchem.1c00228Google Scholar63The Good, the Bad, and the Twisted Revisited: An Analysis of Ligand Geometry in Highly Resolved Protein-Ligand X-ray StructuresLiebeschuetz, John W.Journal of Medicinal Chemistry (2021), 64 (11), 7533-7543CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)An anal. of the rotatable bond geometry of drug-like ligand models is reported for high-resoln. (<1.1 Å) crystallog. protein-ligand complexes. In cases where the ligand fit to the electron d. is very good, unusual torsional geometry is rare and, most often, though not exclusively, assocd. with strong polar, metal, or covalent ligand-protein interactions. It is rarely assocd. with a torsional strain of greater than 2 kcal mol-1 by calcn. An unusual torsional geometry is more prevalent where the fit to electron d. is not perfect. Multiple low-strain conformer bindings were obsd. in 21% of the set and, it is suggested, may also lie behind many of the 35% of single-occupancy cases, where a poor fit to the e-d. was found. It is concluded that multiple conformer ligand binding is an under-recognized phenomenon in structure-based drug design and that there is a need for more robust crystallog. refinement methods to better handle such cases.
- 64Tong, J.; Zhao, S. Large-Scale Analysis of Bioactive Ligand Conformational Strain Energy by Ab Initio Calculation. J. Chem. Inf. Model. 2021, 61, 1180– 1192, DOI: 10.1021/acs.jcim.0c01197Google Scholar64Large-Scale Analysis of Bioactive Ligand Conformational Strain Energy by Ab Initio CalculationTong, Jiahui; Zhao, SuwenJournal of Chemical Information and Modeling (2021), 61 (3), 1180-1192CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Ligand conformational strain energy (LCSE) plays an important role in virtual screening and lead optimization. While various studies have provided insights into LCSE for small-mol. ligands in the Protein Data Bank (PDB), conclusions are inconsistent mainly due to small datasets, poor quality control of crystal structures, and mol. mechanics (MM) or low-level quantum mechanics (QM) calcns. Here, we built a high-quality dataset (LigBoundConf) of 8145 ligand-bound conformations from PDB crystal structures and calcd. LCSE at the M062X-D3/ma-TZVPP (SMD)//M062X-D3/def2-SVP(SMD) level for each case in the dataset. The mean/median LCSE is 4.6/3.7 kcal/mol for 6672 successfully calcd. cases, which is significantly lower than the ests. based on mol. mechanics in many previous analyses. Esp., when removing ligands with nonarom. ring(s) that are prone to have large LCSEs due to electron d. overfitting, the mean/median LCSE was reduced to 3.3/2.5 kcal/mol. We further reveal that LCSE is correlated with several ligand properties, including formal at. charge, mol. wt., no. of rotatable bonds, and no. of hydrogen-bond donors and acceptors. In addn., our results show that although summation of torsion strains is a good approxn. of LCSE for most cases, for a small fraction (about 6%) of our dataset, it underestimates LCSEs if ligands could form nonlocal intramol. interactions in the unbound state. Taken together, our work provides a comprehensive profile of LCSE for ligands in PDB, which could help ligand conformation generation, ligand docking pose evaluation, and lead optimization.
- 65Chan, L.; Hutchison, G. R.; Morris, G. M. Understanding Ring Puckering in Small Molecules and Cyclic Peptides. J. Chem. Inf. Model. 2021, 61, 743– 755, DOI: 10.1021/acs.jcim.0c01144Google Scholar65Understanding Ring Puckering in Small Molecules and Cyclic PeptidesChan, Lucian; Hutchison, Geoffrey R.; Morris, Garrett M.Journal of Chemical Information and Modeling (2021), 61 (2), 743-755CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The geometry of a mol. plays a significant role in detg. its phys. and chem. properties. Despite its importance, there are relatively few studies on ring puckering and conformations, often focused on small cycloalkanes, 5- and 6-membered carbohydrate rings, and specific macrocycle families. We lack a general understanding of the puckering preferences of medium-sized rings and macrocycles. To address this, we provide an extensive conformational anal. of a diverse set of rings. We used Cremer-Pople puckering coordinates to study the trends of the ring conformation across a set of 140 000 diverse small mols., including small rings, macrocycles, and cyclic peptides. By standardizing using key atoms, we show that the ring conformations can be classified into relatively few conformational clusters, based on their canonical forms. The no. of such canonical clusters increases slowly with ring size. Ring puckering motions, esp. pseudo-rotations, are generally restricted and differ between clusters. More importantly, we propose models to map puckering preferences to torsion space, which allows us to understand the inter-related changes in torsion angles during pseudo-rotation and other puckering motions. Beyond ring puckers, our models also explain the change in substituent orientation upon puckering. We also present a novel knowledge-based sampling method using the puckering preferences and coupled substituent motion to generate ring conformations efficiently. In summary, this work provides an improved understanding of general ring puckering preferences, which will in turn accelerate the identification of low-energy ring conformations for applications from polymeric materials to drug binding.
- 66Lemm, D.; von Rudorff, G. F.; von Lilienfeld, O. A. Machine learning based energy-free structure predictions of molecules, transition states, and solids. Nat. Commun. 2021, 12, 4468, DOI: 10.1038/s41467-021-24525-7Google Scholar66Machine learning based energy-free structure predictions of molecules, transition states, and solidsLemm, Dominik; von Rudorff, Guido Falk; von Lilienfeld, O. AnatoleNature Communications (2021), 12 (1), 4468CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)The computational prediction of atomistic structure is a long-standing problem in physics, chem., materials, and biol. Conventionally, force-fields or ab initio methods det. structure through energy minimization, which is either approx. or computationally demanding. This accuracy/cost trade-off prohibits the generation of synthetic big data sets accounting for chem. space with atomistic detail. Exploiting implicit correlations among relaxed structures in training data sets, our machine learning model Graph-To-Structure (G2S) generalizes across compd. space in order to infer interat. distances for out-of-sample compds., effectively enabling the direct reconstruction of coordinates, and thereby bypassing the conventional energy optimization task. The numerical evidence collected includes 3D coordinate predictions for org. mols., transition states, and cryst. solids. G2S improves systematically with training set size, reaching mean abs. interat. distance prediction errors of less than 0.2 Å for less than eight thousand training structures - on par or better than conventional structure generators. Applicability tests of G2S include successful predictions for systems which typically require manual intervention, improved initial guesses for subsequent conventional ab initio based relaxation, and input generation for subsequent use of structure based quantum machine learning models.
- 67Hanwell, M. D.; Curtis, D. E.; Lonie, D. C.; Vandermeersch, T.; Zurek, E.; Hutchison, G. R. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminf. 2012, 4, 17, DOI: 10.1186/1758-2946-4-17Google Scholar67Avogadro: an advanced semantic chemical editor, visualization, and analysis platformHanwell, Marcus D.; Curtis, Donald E.; Lonie, David C.; Vandermeersch, Tim; Zurek, Eva; Hutchison, Geoffrey R.Journal of Cheminformatics (2012), 4 (), 17CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Background: The Avogadro project has developed an advanced mol. editor and visualizer designed for cross-platform use in computational chem., mol. modeling, bioinformatics, materials science, and related areas. It offers flexible, high quality rendering, and a powerful plugin architecture. Typical uses include building mol. structures, formatting input files, and analyzing output of a wide variety of computational chem. packages. By using the CML file format as its native document type, Avogadro seeks to enhance the semantic accessibility of chem. data types. Results: The work presented here details the Avogadro library, which is a framework providing a code library and application programming interface (API) with three-dimensional visualization capabilities; and has direct applications to research and education in the fields of chem., physics, materials science, and biol. The Avogadro application provides a rich graphical interface using dynamically loaded plugins through the library itself. The application and library can each be extended by implementing a plugin module in C++ or Python to explore different visualization techniques, build/manipulate mol. structures, and interact with other programs. We describe some example extensions, one which uses a genetic algorithm to find stable crystal structures, and one which interfaces with the PackMol program to create packed, solvated structures for mol. dynamics simulations. The 1.0 release series of Avogadro is the main focus of the results discussed here. Conclusions: Avogadro offers a semantic chem. builder and platform for visualization and anal. For users, it offers an easy-to-use builder, integrated support for downloading from common databases such as PubChem and the Protein Data Bank, extg. chem. data from a wide variety of formats, including computational chem. output, and native, semantic support for the CML file format. For developers, it can be easily extended via a powerful plugin mechanism to support new features in org. chem., inorg. complexes, drug design, materials, biomols., and simulations.
- 68Avogadro2 Version 1.97. https://two.avogadro.cc/.Google ScholarThere is no corresponding record for this reference.
- 69Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261– 272, DOI: 10.1038/s41592-019-0686-2Google Scholar69SciPy 1.0: fundamental algorithms for scientific computing in PythonVirtanen, Pauli; Gommers, Ralf; Oliphant, Travis E.; Haberland, Matt; Reddy, Tyler; Cournapeau, David; Burovski, Evgeni; Peterson, Pearu; Weckesser, Warren; Bright, Jonathan; van der Walt, Stefan J.; Brett, Matthew; Wilson, Joshua; Millman, K. Jarrod; Mayorov, Nikolay; Nelson, Andrew R. J.; Jones, Eric; Kern, Robert; Larson, Eric; Carey, C. J.; Polat, Ilhan; Feng, Yu; Moore, Eric W.; Vander Plas, Jake; Laxalde, Denis; Perktold, Josef; Cimrman, Robert; Henriksen, Ian; Quintero, E. A.; Harris, Charles R.; Archibald, Anne M.; Ribeiro, Antonio H.; Pedregosa, Fabian; van Mulbregt, PaulNature Methods (2020), 17 (3), 261-272CODEN: NMAEA3; ISSN:1548-7091. (Nature Research)Abstr.: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto std. for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per yr. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent tech. developments.
- 70Chan, L.; Hutchison, G. R.; Morris, G. M. Bayesian optimization for conformer generation. J. Cheminf. 2019, 11, 32, DOI: 10.1186/s13321-019-0354-7Google ScholarThere is no corresponding record for this reference.
- 71Chan, L.; Hutchison, G. R.; Morris, G. M. BOKEI: Bayesian optimization using knowledge of correlated torsions and expected improvement for conformer generation. Phys. Chem. Chem. Phys. 2020, 22, 5211– 5219, DOI: 10.1039/C9CP06688HGoogle Scholar71BOKEI: Bayesian optimization using knowledge of correlated torsions and expected improvement for conformer generationChan, Lucian; Hutchison, Geoffrey R.; Morris, Garrett M.Physical Chemistry Chemical Physics (2020), 22 (9), 5211-5219CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)A key challenge in conformer sampling is finding low-energy conformations with a small no. of energy evaluations. We recently demonstrated the Bayesian Optimization Algorithm (BOA) is an effective method for finding the lowest energy conformation of a small mol. Our approach balances between exploitation and exploration, and is more efficient than exhaustive or random search methods. Here, we extend strategies used on proteins and oligopeptides (e.g. Ramachandran plots of secondary structure) and study correlated torsions in small mols. We use bivariate von Mises distributions to capture correlations, and use them to constrain the search space. We validate the performance of our new method, Bayesian Optimization with Knowledge-based Expected Improvement (BOKEI), on a dataset consisting of 533 diverse small mols., using (i) a force field (MMFF94); and (ii) a semi-empirical method (GFN2), as the objective function. We compare the search performance of BOKEI, BOA with Expected Improvement (BOA-EI), and a genetic algorithm (GA), using a fixed no. of energy evaluations. In more than 60% of the cases examd., BOKEI finds lower energy conformations than global optimization with BOA-EI or GA. More importantly, we find correlated torsions in up to 15% of small mols. in larger data sets, up to 8 times more often than previously reported. The BOKEI patterns not only describe steric clashes, but also reflect favorable intramol. interactions such as hydrogen bonds and π-π stacking. Increasing our understanding of the conformational preferences of mols. will help improve our ability to find low energy conformers efficiently, which will have impact in a wide range of computational modeling applications.
- 72Pordes, R.; Petravick, D.; Kramer, B.; Olson, D.; Livny, M.; Roy, A.; Avery, P.; Blackburn, K.; Wenaus, T.; Würthwein, F. The Open Science Grid. J. Phys. Conf. 2007, 78, 012057, DOI: 10.1088/1742-6596/78/1/012057Google ScholarThere is no corresponding record for this reference.
- 73Sfiligoi, I.; Bradley, D. C.; Holzman, B.; Mhashilkar, P.; Padhi, S.; Wurthwein, F. The Pilot Way to Grid Resources Using glideinWMS. World Congr. Comput. Sci. Inf. Eng. 2009, 2, 428– 432, DOI: 10.1109/CSIE.2009.950Google ScholarThere is no corresponding record for this reference.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 3 publications.
- The-Chuong Trinh, Pierre Falson, Viet-Khoa Tran-Nguyen, Ahcène Boumendjel. Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction. Journal of Chemical Information and Modeling 2025, 65
(8)
, 4027-4042. https://doi.org/10.1021/acs.jcim.5c00374
- Linghan Kong, Richard A. Bryce. Discriminating High from Low Energy Conformers of Druglike Molecules: An Assessment of Machine Learning Potentials and Quantum Chemical Methods. ChemPhysChem 2025, 26
(8)
https://doi.org/10.1002/cphc.202400992
- Philipp Pracht, Stefan Grimme, Christoph Bannwarth, Fabian Bohle, Sebastian Ehlert, Gereon Feldmann, Johannes Gorges, Marcel Müller, Tim Neudecker, Christoph Plett, Sebastian Spicher, Pit Steinbach, Patryk A. Wesołowski, Felix Zeller. CREST—A program for the exploration of low-energy molecular chemical space. The Journal of Chemical Physics 2024, 160
(11)
https://doi.org/10.1063/5.0197592
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Abstract
Figure 1
Figure 1. Comparison of the smallest non-hydrogen RMSD between experimental crystallographic geometry and CREST or ETKDG conformers for the (left) Crystallographic Open Database (COD) and (right) Platinum Diverse data set. (c–f) Captions indicate best-fit linear regression in Å. Note that for both data sets, CREST produces smaller RMSD for molecules with few rotatable bonds but a larger slope indicating generally worse RMSD for larger compounds with more rotatable bonds.
Figure 2
Figure 2. Calculated radius of gyration for the lowest-energy CREST conformer and experimental geometries as a function of the number of rotatable bonds for the (a) Crystallographic Open Database (COD) and (b) Platinum data sets and scatterplot of compounds from (c) COD and (d) Platinum sets, comparing the radius of gyration from the CREST/GFN2 lowest-energy conformation with that of the experimental crystal structure geometry. Dashed line indicates a 1:1 correspondence, with approximate bounds indicated by solid lines.
Figure 3
Figure 3. Correlation between experimental and gas-phase torsions across the COD data set for (a) acyclic patterns and (b) ring patterns.
Figure 4
Figure 4. Histograms for pattern 229, including COD experimental torsional angles, gas-phase lowest-energy conformers from the same COD molecules, and gas-phase lowest-energy conformers across the entire data set, indicating the strong correlations and that the increased quantity of data greatly refines the histograms.
Figure 5
Figure 5. Histograms for pattern 307, including COD experimental torsional angles, gas-phase lowest-energy conformers from the same COD molecules, and gas-phase lowest-energy conformers across the entire data set, again indicating how the increased quantity of data greatly refines the torsional preferences (e.g., peaks near 50, 90, and 132°).
Figure 6
Figure 6. Histograms for COD experimental torsional angles, gas-phase lowest-energy conformers from the same COD molecules, and conformers across the entire data set, again indicating that the increased quantity of data greatly refines the torsional preferences (e.g., a strong peak at 60°).
Figure 7
Figure 7. Differences in torsional preferences for experimental and gas-phase geometries.
Figure 8
Figure 8. Example of compound matching torsion pattern 270 with steric constraint forcing an angle of ∼90°. Figure from Avogadro2. (67,68)
Figure 9
Figure 9. Histograms of correlation between (a,d) ETKDG, (b,e) cosine fits, or (c,f) Gaussian fits and derived torsional histograms for (a–c) acyclic and (d–f) ring torsional patterns.
References
This article references 73 other publications.
- 1Hawkins, P. C. D. Conformation Generation: The State of the Art. J. Chem. Inf. Model. 2017, 57, 1747– 1756, DOI: 10.1021/acs.jcim.7b002211Conformation Generation: The State of the ArtHawkins, Paul C. D.Journal of Chemical Information and Modeling (2017), 57 (8), 1747-1756CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The generation of conformations for small mols. is a problem of continuing interest in cheminformatics and computational drug discovery. This review will present an overview of methods used to sample conformational space, focusing on those methods designed for org. mols. commonly of interest in drug discovery. Different approaches to both the sampling of conformational space and the scoring of conformational stability will be compared and contrasted, with an emphasis on those methods suitable for conformer sampling of large nos. of drug-like mols. Particular attention will be devoted to the appropriate utilization of information from exptl. solid-state structures in validating and evaluating the performance of these tools. The review will conclude with some areas worthy of further investigation.
- 2Friedrich, N.-O.; de Bruyn Kops, C.; Flachsenberg, F.; Sommer, K.; Rarey, M.; Kirchmair, J. Benchmarking Commercial Conformer Ensemble Generators. J. Chem. Inf. Model. 2017, 57, 2719– 2728, DOI: 10.1021/acs.jcim.7b005052Benchmarking Commercial Conformer Ensemble GeneratorsFriedrich, Nils-Ole; de Bruyn Kops, Christina; Flachsenberg, Florian; Sommer, Kai; Rarey, Matthias; Kirchmair, JohannesJournal of Chemical Information and Modeling (2017), 57 (11), 2719-2728CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)We assess and compare the performance of eight com. conformer ensemble generators (ConfGen, ConfGenX, cxcalc, iCon, MOE LowModeMD, MOE Stochastic, MOE Conformation Import, and OMEGA) and one leading free algorithm, the distance geometry algorithm implemented in RDKit. The comparative study is based on a new version of the Platinum Diverse Dataset, a high-quality benchmarking dataset of 2859 protein-bound ligand conformations extd. from the PDB. Differences in the performance of com. algorithms are much smaller than those obsd. for free algorithms in our previous study. For com. algorithms, the median min. root-mean-square deviations measured between protein-bound ligand conformations and ensembles of a max. of 250 conformers are between 0.46 and 0.61 Å. Com. conformer ensemble generators are characterized by their high robustness, with at least 99% of all input mols. successfully processed and few or even no substantial geometrical errors detectable in their output conformations. The RDKit distance geometry algorithm (with minimization enabled) appears to be a good free alternative since its performance is comparable to that of the midranked com. algorithms. Based on a statistical anal., we elaborate on which algorithms to use and how to parametrize them for best performance in different application scenarios.
- 3Friedrich, N.-O.; Meyder, A.; de Bruyn Kops, C.; Sommer, K.; Flachsenberg, F.; Rarey, M.; Kirchmair, J. High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble Generators. J. Chem. Inf. Model. 2017, 57, 529– 539, DOI: 10.1021/acs.jcim.6b006133High-Quality Dataset of Protein-Bound Ligand Conformations and Its Application to Benchmarking Conformer Ensemble GeneratorsFriedrich, Nils-Ole; Meyder, Agnes; de Bruyn Kops, Christina; Sommer, Kai; Flachsenberg, Florian; Rarey, Matthias; Kirchmair, JohannesJournal of Chemical Information and Modeling (2017), 57 (3), 529-539CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The authors developed a cheminformatics pipeline for the fully automated selection and extn. of high-quality protein-bound ligand conformations from x-ray structural data. The pipeline evaluates the validity and accuracy of the 3D structures of small mols. according to multiple criteria, including their fit to the electron d. and their physicochem. and structural properties. Using this approach, the authors compiled two high-quality datasets from the Protein Data Bank (PDB): a comprehensive dataset and a diversified subset of 4626 and 2912 structures, resp. The datasets were applied to benchmarking seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit std. conformer ensemble generator, the Exptl.-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK. Substantial differences in the performance of the individual algorithms were obsd., with RDKit and ETKDG generally achieving a favorable balance of accuracy, ensemble size and runtime. The platinum datasets are available for download from http://www.zbh.uni-hamburg.de/platinum_dataset.
- 4Ebejer, J.-P.; Morris, G. M.; Deane, C. M. Freely Available Conformer Generation Methods: How Good Are They?. J. Chem. Inf. Model. 2012, 52, 1146– 1158, DOI: 10.1021/ci20046584Freely Available Conformer Generation Methods: How Good Are They?Ebejer, Jean-Paul; Morris, Garrett M.; Deane, Charlotte M.Journal of Chemical Information and Modeling (2012), 52 (5), 1146-1158CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. Conformer generation has important implications in cheminformatics, particularly in computational drug discovery where the quality of conformer generation software may affect the outcome of a virtual screening exercise. We examine the performance of four freely available small mol. conformer generation tools (Balloon, Confab, Frog2, and RDKit) alongside a com. tool (MOE). The aim of this study is 3-fold: (i) to identify which tools most accurately reproduce exptl. detd. structures; (ii) to examine the diversity of the generated conformational set; and (iii) to benchmark the computational time expended. These aspects were tested using a set of 708 drug-like mols. assembled from the OMEGA validation set and the Astex Diverse Set. These mols. have varying physicochem. properties and at least one known X-ray crystal structure. We found that RDKit and Confab are statistically better than other methods at generating low rmsd conformers to the known structure. RDKit is particularly suited for less flexible mols. while Confab, with its systematic approach, is able to generate conformers which are geometrically closer to the exptl. detd. structure for mols. with a large no. of rotatable bonds (≥10). In our tests RDKit also resulted as the second fastest method after Frog2. In order to enhance the performance of RDKit, we developed a postprocessing algorithm to build a diverse and representative set of conformers which also contains a close conformer to the known structure. Our anal. indicates that, with postprocessing, RDKit is a valid free alternative to com., proprietary software.
- 5Penner, P.; Guba, W.; Schmidt, R.; Meyder, A.; Stahl, M.; Rarey, M. The Torsion Library: Semiautomated Improvement of Torsion Rules with SMARTScompare. J. Chem. Inf. Model. 2022, 62, 1644– 1653, DOI: 10.1021/acs.jcim.2c000435The Torsion Library: Semiautomated Improvement of Torsion Rules with SMARTScomparePenner, Patrick; Guba, Wolfgang; Schmidt, Robert; Meyder, Agnes; Stahl, Martin; Rarey, MatthiasJournal of Chemical Information and Modeling (2022), 62 (7), 1644-1653CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The Torsion Library is a collection of torsion motifs assocd. with angle distributions, derived from crystallog. databases. It is used in strain assessment, conformer generation, and geometry optimization. A hierarchical structure of expert curated SMARTS defines the chem. environments of rotatable bonds and assocs. these with preferred angles. SMARTS can be very complex and full of implications, which make them difficult to maintain manually. Recent developments in automatically comparing SMARTS patterns can be applied to the Torsion Library to ensure its correctness. We specifically discuss the implementation and the limits of such a procedure in the context of torsion motifs and show several examples of how the Torsion Library benefits from this. All automated changes are validated manually and then shown to have an effect on the angle distributions by correcting matching behavior. The cor. Torsion Library itself is available including both PDB as well as CSD histograms in the Supporting Information and can be used to evaluate rotatable bonds at https://torsions.zbh.uni-hamburg.de.
- 6Wang, S.; Witek, J.; Landrum, G. A.; Riniker, S. Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional-Angle Preferences. J. Chem. Inf. Model. 2020, 60, 2044– 2058, DOI: 10.1021/acs.jcim.0c000256Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional-Angle PreferencesWang, Shuzhe; Witek, Jagna; Landrum, Gregory A.; Riniker, SereinaJournal of Chemical Information and Modeling (2020), 60 (4), 2044-2058CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The conformer generator ETKDG is a stochastic search method that utilizes distance geometry together with knowledge derived from exptl. crystal structures. It has been shown to generate good conformers for acyclic, flexible mols. This work builds on ETKDG to improve conformer generation of mols. contg. small or large aliph. (i.e., non-arom.) rings. For one, we devise addnl. torsional-angle potentials to describe small aliph. rings and adapt the previously developed potentials for acyclic bonds to facilitate the sampling of macrocycles. However, due to the larger no. of degrees of freedom of macrocycles, the conformational space to sample is much broader than for small mols., creating a challenge for conformer generators. We therefore introduce different heuristics to restrict the search space of macrocycles and bias the sampling toward more exptl. relevant structures. Specifically, we show the usage of elliptical geometry and customizable Coulombic interactions as heuristics. The performance of the improved ETKDG is demonstrated on test sets of diverse macrocycles and cyclic peptides. The code developed here will be incorporated into the 2020.03 release of the open-source cheminformatics library RDKit.
- 7Riniker, S.; Landrum, G. A. Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. J. Chem. Inf. Model. 2015, 55, 2562– 2574, DOI: 10.1021/acs.jcim.5b006547Better Informed Distance Geometry: Using What We Know To Improve Conformation GenerationRiniker, Sereina; Landrum, Gregory A.Journal of Chemical Information and Modeling (2015), 55 (12), 2562-2574CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Small org. mols. are often flexible, i.e., they can adopt a variety of low-energy conformations in soln. that exist in equil. with each other. Two main search strategies are used to generate representative conformational ensembles for mols.: systematic and stochastic. In the first approach, each rotatable bond is sampled systematically in discrete intervals, limiting its use to mols. with a small no. of rotatable bonds. Stochastic methods, however, sample the conformational space of a mol. randomly and can thus be applied to more flexible mols. Different methods employ different degrees of exptl. data for conformer generation. So-called knowledge-based methods use predefined libraries of torsional angles and ring conformations. In the distance geometry approach, however, a smaller amt. of empirical information was used, i.e., ideal bond lengths, ideal bond angles, and a few ideal torsional angles. Distance geometry is a computationally fast method to generate conformers, but it has the downside that purely distance-based constraints tend to lead to distorted arom. rings and sp2 centers. To correct this, the resulting conformations are often minimized with a force field, adding computational complexity and run time. Here the authors present an alternative strategy that combines the distance geometry approach with exptl. torsion-angle preferences obtained from small-mol. crystallog. data. The torsional angles are described by a previously developed set of hierarchically structured SMARTS patterns. The new approach is implemented in the open-source cheminformatics library RDKit, and its performance is assessed by comparing the diversity of the generated ensemble and the ability to reproduce crystal conformations taken from the crystal structures of small mols. and protein-ligand complexes.
- 8Guba, W.; Meyder, A.; Rarey, M.; Hert, J. Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small Molecules. J. Chem. Inf. Model. 2016, 56, 1– 5, DOI: 10.1021/acs.jcim.5b005228Torsion Library Reloaded: A New Version of Expert-Derived SMARTS Rules for Assessing Conformations of Small MoleculesGuba, Wolfgang; Meyder, Agnes; Rarey, Matthias; Hert, JeromeJournal of Chemical Information and Modeling (2016), 56 (1), 1-5CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The Torsion Library contains hundreds of rules for small mol. conformations which have been derived from the Cambridge Structural Database (CSD) and are curated by mol. design experts. The torsion rules are encoded as SMARTS patterns and categorize rotatable bonds via a traffic light coloring scheme. We have systematically revised all torsion rules to better identify highly strained conformations and minimize the no. of false alerts for CSD small mol. X-ray structures. For this new release, we added or substantially modified 78 torsion patterns and reviewed all angles and tolerance intervals. The overall no. of red alerts for a filtered CSD data set with 130 000 structures was reduced by a factor of 4 compared to the predecessor. This is of clear advantage in 3D virtual screening where hits should only be removed by a conformational filter if they are in energetically inaccessible conformations.
- 9Gražulis, S.; Chateigner, D.; Downs, R. T.; Yokochi, A. F. T.; Quirós, M.; Lutterotti, L.; Manakova, E.; Butkus, J.; Moeck, P.; Le Bail, A. Crystallography Open Database – an open-access collection of crystal structures. J. Appl. Crystallogr. 2009, 42, 726– 729, DOI: 10.1107/S00218898090166909Crystallography Open Database - an open-access collection of crystal structuresGrazulis, Saulius; Chateigner, Daniel; Downs, Robert T.; Yokochi, A. F. T.; Quiros, Miguel; Lutterotti, Luca; Manakova, Elena; Butkus, Justas; Moeck, Peter; Le Bail, ArmelJournal of Applied Crystallography (2009), 42 (4), 726-729CODEN: JACGAR; ISSN:0021-8898. (International Union of Crystallography)The Crystallog. Open Database (COD), which is a project that aims to gather all available inorg., metal-org. and small org. mol. structural data in one database, is described. The database adopts an open-access model. The COD currently contains ∼80,000 entries in crystallog. information file format, with nearly full coverage of the International Union of Crystallog. publications, and is growing in size and quality.
- 10Gražulis, S.; Daškevič, A.; Merkys, A.; Chateigner, D.; Lutterotti, L.; Quirós, M.; Serebryanaya, N. R.; Moeck, P.; Downs, R. T.; Le Bail, A. Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res. 2012, 40, D420– D427, DOI: 10.1093/nar/gkr90010Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaborationGrazulis, Saulius; Daskevic, Adriana; Merkys, Andrius; Chateigner, Daniel; Lutterotti, Luca; Quiros, Miguel; Serebryanaya, Nadezhda R.; Moeck, Peter; Downs, Robert T.; Le Bail, ArmelNucleic Acids Research (2012), 40 (D1), D420-D427CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Using an open-access distribution model, the Crystallog. Open Database (COD, http://www.crystallog.net) collects all known small mol. / small to medium sized unit cell' crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated ∼150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of std. open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge.
- 11Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge Structural Database. Acta Crystallogr. B 2016, 72, 171– 179, DOI: 10.1107/S205252061600395411The Cambridge Structural DatabaseGroom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.Acta Crystallographica, Section B: Structural Science, Crystal Engineering and Materials (2016), 72 (2), 171-179CODEN: ACSBDA; ISSN:2052-5206. (International Union of Crystallography)The Cambridge Structural Database (CSD) contains a complete record of all published org. and metal-org. small-mol. crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chem. data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chem. editors prior to entering the database. A key component of this processing is the reliable assocn. of the chem. identity of the structure studied with the exptl. data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of addnl. exptl. data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of std. identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.
- 12Sadowski, P.; Baldi, P. Small-Molecule 3D Structure Prediction Using Open Crystallography Data. J. Chem. Inf. Model. 2013, 53, 3127– 3130, DOI: 10.1021/ci400528212Small-Molecule 3D Structure Prediction Using Open Crystallography DataSadowski, Peter; Baldi, PierreJournal of Chemical Information and Modeling (2013), 53 (12), 3127-3130CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Predicting the 3D structures of small mols. is a common problem in chemoinformatics. Even the best methods are inaccurate for complex mols., and there is a large gap in accuracy between proprietary and free algorithms. Previous work presented COSMOS, a novel data-driven algorithm that uses knowledge of known structures from the Cambridge Structural Database and demonstrates performance that was competitive with proprietary algorithms. However, dependence on the Cambridge Structural Database prevented its widespread use. Here, we present an updated version of the COSMOS structure predictor, complete with a free structure library derived from open data sources. We demonstrate that COSMOS performs better than other freely available methods, with a mean RMSD of 1.16 and 1.68 Å for org. and metal-org. structures, resp., and a mean prediction time of 60 ms per mol. This is a 17% and 20% redn., resp., in RMSD compared to the free predictor provided by Open Babel, and it is 10 times faster. The ChemDB Web portal provides a COSMOS prediction Web server, as well as downloadable copies of the COSMOS executable and library of mol. substructures.
- 13Wicker, J. G. P.; Cooper, R. I. Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor. J. Chem. Inf. Model. 2016, 56, 2347– 2352, DOI: 10.1021/acs.jcim.6b0056513Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single DescriptorWicker, Jerome G. P.; Cooper, Richard I.Journal of Chemical Information and Modeling (2016), 56 (12), 2347-2352CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A new mol. descriptor, nConf20, based on chem. connectivity, is presented which captures the accessible conformational space of a mol. Currently the best available two-dimensional descriptors for quantifying the flexibility of a particular mol. are the rotatable bond count (RBC) and the Kier flexibility index. The authors present a descriptor which captures this information by sampling the conformational space of a mol. using the RDKit conformer generator. Flexibility has previously been identified as a key feature in detg. whether a mol. is likely to crystallize or not. For this application, nConf20 significantly outperforms previously reported single-variable classifiers and also assists rule-based anal. of black-box machine learning classification algorithms.
- 14Das, S.; Dinpazhoh, L.; Tanemura, K. A.; Merz, K. M. Rapid and Automated Ab Initio Metabolite Collisional Cross Section Prediction from SMILES Input. J. Chem. Inf. Model. 2023, 63, 4995– 5000, DOI: 10.1021/acs.jcim.3c0089014Rapid and Automated Ab Initio Metabolite Collisional Cross Section Prediction from SMILES InputDas, Susanta; Dinpazhoh, Laleh; Tanemura, Kiyoto Aramis; Merz Jr., Kenneth M.Journal of Chemical Information and Modeling (2023), 63 (16), 4995-5000CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)We implemented an ab initio CCS prediction workflow which incrementally refines generated structures using mol. mechanics, a deep learning potential, conformational clustering, and quantum mechanics (QM). Automating intermediate steps for a high performance computing (HPC) environment allows users to input the SMILES structure of small org. mols. and obtain a Boltzmann averaged collisional cross section (CCS) value as output. The CCS of a mol. species is a metric measured by ion mobility spectrometry (IMS) which can improve annotation of untargeted metabolomics expts. We report only a minor drop in accuracy when we expedite the CCS calcn. by replacing the QM geometry refinement step with a single-point energy calcn. Even though the workflow involves stochastic steps (i.e., conformation generation and clustering), the final CCS value was highly reproducible for multiple iterations on L-carnosine. Finally, we illustrate that the gas phase ensemble modeled for the workflow are intermediate files which can be used for the prediction of other properties such as aq. phase NMR chem. shift prediction.
- 15Das, S.; Tanemura, K. A.; Dinpazhoh, L.; Keng, M.; Schumm, C.; Leahy, L.; Asef, C. K.; Rainey, M.; Edison, A. S.; Fernández, F. M.; Merz, K. M. In Silico Collision Cross Section Calculations to Aid Metabolite Annotation. J. Am. Soc. Mass Spectrom. 2022, 33, 750– 759, DOI: 10.1021/jasms.1c0031515In Silico Collision Cross Section Calculations to Aid Metabolite AnnotationDas, Susanta; Tanemura, Kiyoto Aramis; Dinpazhoh, Laleh; Keng, Mithony; Schumm, Christina; Leahy, Lydia; Asef, Carter K.; Rainey, Markace; Edison, Arthur S.; Fernandez, Facundo M.; Merz Jr., Kenneth M.Journal of the American Society for Mass Spectrometry (2022), 33 (5), 750-759CODEN: JAMSEF; ISSN:1879-1123. (American Chemical Society)The interpretation of ion mobility coupled to mass spectrometry (IM-MS) data to predict unknown structures is challenging and depends on accurate theor. ests. of the mol. ion collision cross section (CCS) against a buffer gas in a low or atm. pressure drift chamber. The sensitivity and reliability of computational prediction of CCS values depend on accurately modeling the mol. state over accessible conformations. In this work, we developed an efficient CCS computational workflow using a machine learning model in conjunction with std. DFT methods and CCS calcns. Furthermore, we have performed Traveling Wave IM-MS (TWIMS) expts. to validate the extant exptl. values and assess uncertainties in exptl. measured CCS values. The developed workflow yielded accurate structural predictions and provides unique insights into the likely preferred conformation analyzed using IM-MS expts. The complete workflow makes the computation of CCS values tractable for a large no. of conformationally flexible metabolites with complex mol. structures.
- 16Insausti, A.; Alonso, E. R.; Tercero, B.; Santos, J. I.; Calabrese, C.; Vogt, N.; Corzana, F.; Demaison, J.; Cernicharo, J.; Cocinero, E. J. Laboratory Observation of, Astrochemical Search for, and Structure of Elusive Erythrulose in the Interstellar Medium. J. Phys. Chem. Lett. 2021, 12, 1352– 1359, DOI: 10.1021/acs.jpclett.0c0305016Laboratory Observation of, Astrochemical Search for, and Structure of Elusive Erythrulose in the Interstellar MediumInsausti, Aran; Alonso, Elena R.; Tercero, Belen; Santos, Jose I.; Calabrese, Camilla; Vogt, Natalja; Corzana, Francisco; Demaison, Jean; Cernicharo, Jose; Cocinero, Emilio J.Journal of Physical Chemistry Letters (2021), 12 (4), 1352-1359CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Rotational spectroscopy provides the most powerful means of identifying mols. of biol. interest in the interstellar medium (ISM), but despite their importance, the detection of carbohydrates has remained rather elusive. Here, we present a comprehensive Fourier transform rotational spectroscopic study of elusive erythrulose, a sugar building block likely to be present in the ISM, employing a novel method of transferring the hygroscopic oily carbohydrate into the gas phase. The high sensitivity of the expt. allowed the rotational spectra of all monosubstituted isotopologue species of 13C-12C3H8O4 to be recorded, which, together with quantum chem. calcns., enabled us to det. their equil. geometries (reSE) with great precision. Searches employing the new exptl. data for erythrulose have been undertaken in different ISM regions, so far including the cold areas Barnard 1, the pre-stellar core TMC-1, Sagittarius B2. Although no lines of erythrulose were found, this data will serve to enable future searches and possible detections in other ISM regions.
- 17Alonso, E. R.; Peña, I.; Cabezas, C.; Alonso, J. L. Structural Expression of Exo-Anomeric Effect. J. Phys. Chem. Lett. 2016, 7, 845– 850, DOI: 10.1021/acs.jpclett.6b0002817Structural Expression of Exo-Anomeric EffectAlonso, Elena R.; Pena, Isabel; Cabezas, Carlos; Alonso, Jose L.Journal of Physical Chemistry Letters (2016), 7 (5), 845-850CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Structural signatures for exo-anomeric effect have been extd. from the archetypal Me β-D-xyloside using broadband Fourier transform microwave spectroscopy combined with laser ablation. Spectrum anal. allows the detn. of a set of rotational consts., which has been unequivocally attributed to conformer cc-β-4C1 g-, corresponding to the global min. of the potential energy surface, where the aglycon residue (CH3) orientation contributes to maximization of the exo-anomeric effect. Further anal. allowed the detn. of the rs structure, based on the detection of 11 isotopologues-derived from the presence of six 13C and five 18O atoms-obsd. in their natural abundances. The obsd. glycosidic C1-O1 bond length decrease (1.38 Å) can be interpreted in terms of the exo-anomeric effect. As such, the exo-anomeric effect presents itself as one of the main driving forces controlling the shape of many biol. important oligosaccharides.
- 18Peña, I.; Cocinero, E. J.; Cabezas, C.; Lesarri, A.; Mata, S.; Écija, P.; Daly, A. M.; Cimas, A.; Bermúdez, C.; Basterretxea, F. J.; Blanco, S.; Fernández, J. A.; López, J. C.; Castaño, F.; Alonso, J. L. Six Pyranoside Forms of Free 2-Deoxy-D-ribose. Angew. Chem., Int. Ed. 2013, 52, 11840– 11845, DOI: 10.1002/anie.20130558918Six Pyranoside Forms of Free 2-Deoxy-D-ribosePena, Isabel; Cocinero, Emilio J.; Cabezas, Carlos; Lesarri, Alberto; Mata, Santiago; Ecija, Patricia; Daly, Adam M.; Cimas, Alvaro; Bermudez, Celina; Basterretxea, Francisco J.; Blanco, Susana; Fernandez, Jose A.; Lopez, Juan C.; Castano, Fernando; Alonso, Jose L.Angewandte Chemie, International Edition (2013), 52 (45), 11840-11845CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)The gas phase rotational spectrum of 2-deoxy-D-ribose (I) was obsd. using a UV ultrafast laser ablation technique as a source for Balle-Flygare and chirped pulsed FTMW spectrometers and was assigned with the aide of MP2(FULL) calcns. In the gas phase, I exists as a mixt. of two α- (10%) and four β-pyranose (90%) conformers. the conformational behavior is controlled by both anomeric effects and hydrogen bonding. the OH groups are preferentially oriented so as to favor cooperative hydrogen bonding. In this context, the previous exptl. ionization potential of 9.1 eV of gas phase I (obtained using tunable vacuum UV synchrotron radiation ) assigned to the α-pyranose should actually correspond to the β-pyranose. the solvent effect on furanose-pyranose equil. and the occurrence of these conformers in RNA and DNA were discussed.
- 19Baldi, P. Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response to the Letter by the Cambridge Crystallographic Data Centre. J. Chem. Inf. Model. 2011, 51, 3029, DOI: 10.1021/ci200460z19Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response to the Letter by the Cambridge Crystallographic Data CentreBaldi, PierreJournal of Chemical Information and Modeling (2011), 51 (12), 3029CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A polemic in response to Groom, C.R. (ibid., S1, 2787 - 2787, 2011) is presented to sentiments expressed in "Data-Driven High-Throughput Prediction of the 3-D Structure of Small Mols.: Review and Progress. A Response from The Cambridge Crystallog. Data Center", recently published in the Journal of Chem. Information and Modeling, which may give readers a misleading impression regarding significant impediments to scientific research posed by the CCDC.
- 20Rappe, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A.; Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992, 114, 10024– 10035, DOI: 10.1021/ja00051a04020UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulationsRappe, A. K.; Casewit, C. J.; Colwell, K. S.; Goddard, W. A., III; Skiff, W. M.Journal of the American Chemical Society (1992), 114 (25), 10024-35CODEN: JACSAT; ISSN:0002-7863.A new mol. mechanics force field, the Universal force field (UFF), is described wherein the force field parameters are estd. using general rules based only on the element, its hybridization and its connectivity. The force field functional forms, parameters, and generating formulas for the full periodic table are presented.
- 21Folmsbee, D.; Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. Int. J. Quantum Chem. 2021, 121, e26381 DOI: 10.1002/qua.2638121Assessing conformer energies using electronic structure and machine learning methodsFolmsbee, Dakota; Hutchison, GeoffreyInternational Journal of Quantum Chemistry (2021), 121 (1), e26381CODEN: IJQCB2; ISSN:0020-7608. (John Wiley & Sons, Inc.)A review. We have performed a large-scale evaluation of current computational methods, including conventional small-mol. force fields; semiempirical, d. functional, ab initio electronic structure methods; and current machine learning (ML) techniques to evaluate relative single-point energies. Using up to 10 local min. geometries across ∼700 mols., each optimized by B3LYP-D3BJ with single-point DLPNO-CCSD(T) triple-zeta energies, we consider over 6500 single points to compare the correlation between different methods for both relative energies and ordered rankings of min. We find that the current ML methods have potential and recommend methods at each tier of the accuracy-time tradeoff, particularly the recent GFN2 semiempirical method, the B97-3c d. functional approxn., and RI-MP2 for accurate conformer energies. The ANI family of ML methods shows promise, particularly the ANI-1ccx variant trained in part on coupled-cluster energies. Multiple methods suggest continued improvements should be expected in both performance and accuracy.
- 22Kanal, I. Y.; Keith, J. A.; Hutchison, G. R. A sobering assessment of small-molecule force field methods for low energy conformer predictions. Int. J. Quantum Chem. 2018, 118, e25512 DOI: 10.1002/qua.25512There is no corresponding record for this reference.
- 23Bannwarth, C.; Ehlert, S.; Grimme, S. GFN2-xTB - An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. J. Chem. Theory Comput. 2019, 15, 1652– 1671, DOI: 10.1021/acs.jctc.8b0117623GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion ContributionsBannwarth, Christoph; Ehlert, Sebastian; Grimme, StefanJournal of Chemical Theory and Computation (2019), 15 (3), 1652-1671CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)An extended semiempirical tight-binding model is presented, which is primarily designed for the fast calcn. of structures and noncovalent interactions energies for mol. systems with roughly 1000 atoms. The essential novelty in this so-called GFN2-xTB method is the inclusion of anisotropic second order d. fluctuation effects via short-range damped interactions of cumulative at. multipole moments. Without noticeable increase in the computational demands, this results in a less empirical and overall more phys. sound method, which does not require any classical halogen or hydrogen bonding corrections and which relies solely on global and element-specific parameters (available up to radon, Z = 86). Moreover, the at. partial charge dependent D4 London dispersion model is incorporated self-consistently, which can be naturally obtained in a tight-binding picture from second order d. fluctuations. Fully anal. and numerically precise gradients (nuclear forces) are implemented. The accuracy of the method is benchmarked for a wide variety of systems and compared with other semiempirical methods. Along with excellent performance for the "target" properties, we also find lower errors for "off-target" properties such as barrier heights and mol. dipole moments. High computational efficiency along with the improved physics compared to it precursor GFN-xTB makes this method well-suited to explore the conformational space of mol. systems. Significant improvements are furthermore obsd. for various benchmark sets, which are prototypical for biomol. systems in aq. soln.
- 24Smith, J. S.; Isayev, O.; Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 2017, 8, 3192– 3203, DOI: 10.1039/C6SC05720A24ANI-1: an extensible neural network potential with DFT accuracy at force field computational costSmith, J. S.; Isayev, O.; Roitberg, A. E.Chemical Science (2017), 8 (4), 3192-3203CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A review. Deep learning is revolutionizing many areas of science and technol., esp. image, text, and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mech. (QM) DFT calcns. can learn an accurate and transferable potential for org. mols. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Mol. Energies) or ANI for short. ANI is a new method designed with the intent of developing transferable neural network potentials that utilize a highly-modified version of the Behler and Parrinello symmetry functions to build single-atom at. environment vectors (AEV) as a mol. representation. AEVs provide the ability to train neural networks to data that spans both configurational and conformational space, a feat not previously accomplished on this scale. We utilized ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for org. mols. contg. four atom types: H, C, N, and O. To obtain an accelerated but phys. relevant sampling of mol. potential surfaces, we also proposed a Normal Mode Sampling (NMS) method for generating mol. conformations. Through a series of case studies, we show that ANI-1 is chem. accurate compared to ref. DFT calcns. on much larger mol. systems (up to 54 atoms) than those included in the training data set.
- 25Smith, J. S.; Nebgen, B.; Lubbers, N.; Isayev, O.; Roitberg, A. E. Less is more: Sampling chemical space with active learning. J. Chem. Phys. 2018, 148, 241733, DOI: 10.1063/1.502380225Less is more: Sampling chemical space with active learningSmith, Justin S.; Nebgen, Ben; Lubbers, Nicholas; Isayev, Olexandr; Roitberg, Adrian E.Journal of Chemical Physics (2018), 148 (24), 241733/1-241733/10CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)The development of accurate and transferable machine learning (ML) potentials for predicting mol. energetics is a challenging task. The process of data generation to train such ML potentials is a task neither well understood nor researched in detail. In this work, we present a fully automated approach for the generation of datasets with the intent of training universal ML potentials. It is based on the concept of active learning (AL) via Query by Committee (QBC), which uses the disagreement between an ensemble of ML potentials to infer the reliability of the ensemble's prediction. QBC allows the presented AL algorithm to automatically sample regions of chem. space where the ML potential fails to accurately predict the potential energy. AL improves the overall fitness of ANAKIN-ME (ANI) deep learning potentials in rigorous test cases by mitigating human biases in deciding what new training data to use. AL also reduces the training set size to a fraction of the data required when using naive random sampling techniques. To provide validation of our AL approach, we develop the COmprehensive Machine-learning Potential (COMP6) benchmark (publicly available on GitHub) which contains a diverse set of org. mols. Active learning-based ANI potentials outperform the original random sampled ANI-1 potential with only 10% of the data, while the final active learning-based model vastly outperforms ANI-1 on the COMP6 benchmark after training to only 25% of the data. Finally, we show that our proposed AL technique develops a universal ANI potential (ANI-1x) that provides accurate energy and force predictions on the entire COMP6 benchmark. This universal ML potential achieves a level of accuracy on par with the best ML potentials for single mols. or materials, while remaining applicable to the general class of org. mols. composed of the elements CHNO. (c) 2018 American Institute of Physics.
- 26Smith, J. S.; Nebgen, B. T.; Zubatyuk, R.; Lubbers, N.; Devereux, C.; Barros, K.; Tretiak, S.; Isayev, O.; Roitberg, A. E. Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning. Nat. Commun. 2019, 10, 2903, DOI: 10.1038/s41467-019-10827-426Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learningSmith Justin S; Devereux Christian; Roitberg Adrian E; Smith Justin S; Nebgen Benjamin T; Zubatyuk Roman; Lubbers Nicholas; Barros Kipton; Tretiak Sergei; Smith Justin S; Lubbers Nicholas; Nebgen Benjamin T; Tretiak Sergei; Zubatyuk Roman; Isayev OlexandrNature communications (2019), 10 (1), 2903 ISSN:.Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.
- 27Devereux, C.; Smith, J. S.; Huddleston, K. K.; Barros, K.; Zubatyuk, R.; Isayev, O.; Roitberg, A. E. Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and Halogens. J. Chem. Theory Comput. 2020, 16, 4192– 4202, DOI: 10.1021/acs.jctc.0c0012127Extending the Applicability of the ANI Deep Learning Molecular Potential to Sulfur and HalogensDevereux, Christian; Smith, Justin S.; Huddleston, Kate K.; Barros, Kipton; Zubatyuk, Roman; Isayev, Olexandr; Roitberg, Adrian E.Journal of Chemical Theory and Computation (2020), 16 (7), 4192-4202CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)Machine learning (ML) methods have become powerful, predictive tools in a wide range of applications, such as facial recognition and autonomous vehicles. In the sciences, computational chemists and physicists have been using ML for the prediction of phys. phenomena, such as atomistic potential energy surfaces and reaction pathways. Transferable ML potentials, such as ANI-1x, have been developed with the goal of accurately simulating org. mols. contg. the chem. elements H, C, N, and O. Here, we provide an extension of the ANI-1x model. The new model, dubbed ANI-2x, is trained to three addnl. chem. elements: S, F, and Cl. Addnl., ANI-2x underwent torsional refinement training to better predict mol. torsion profiles. These new features open a wide range of new applications within org. chem. and drug development. These seven elements (H, C, N, O, F, Cl, and S) make up ∼ 90% of drug-like mols. To show that these addns. do not sacrifice accuracy, we have tested this model across a range of org. mols. and applications, including the COMP6 benchmark, dihedral rotations, conformer scoring, and nonbonded interactions. ANI-2x is shown to accurately predict mol. energies compared to d. functional theory with a ~ 106 factor speedup and a negligible slowdown compared to ANI-1x and shows subchem. accuracy across most of the COMP6 benchmark. The resulting model is a valuable tool for drug development which can potentially replace both quantum calcns. and classical force fields for a myriad of applications.
- 28Qiao, Z.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital features. J. Chem. Phys. 2020, 153, 124111, DOI: 10.1063/5.002195528OrbNet: Deep learning for quantum chemistry using symmetry-adapted atomic-orbital featuresQiao, Zhuoran; Welborn, Matthew; Anandkumar, Animashree; Manby, Frederick R.; Miller, Thomas F.Journal of Chemical Physics (2020), 153 (12), 124111CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We introduce a machine learning method in which energy solns. from the Schroedinger equation are predicted using symmetry adapted AO features and a graph neural-network architecture. OrbNet is shown to outperform existing methods in terms of learning efficiency and transferability for the prediction of d. functional theory results while employing low-cost features that are obtained from semi-empirical electronic structure calcns. For applications to datasets of drug-like mols., including QM7b-T, QM9, GDB-13-T, DrugBank, and the conformer benchmark dataset of Folmsbee and Hutchison [Int. J. Quantum Chem. (published online) (2020)], OrbNet predicts energies within chem. accuracy of d. functional theory at a computational cost that is 1000-fold or more reduced. (c) 2020 American Institute of Physics.
- 29Christensen, A. S.; Sirumalla, S. K.; Qiao, Z.; O’Connor, M. B.; Smith, D. G. A.; Ding, F.; Bygrave, P. J.; Anandkumar, A.; Welborn, M.; Manby, F. R.; Miller, T. F. OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy. J. Chem. Phys. 2021, 155, 204103, DOI: 10.1063/5.006199029OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracyChristensen, Anders S.; Sirumalla, Sai Krishna; Qiao, Zhuoran; O'Connor, Michael B.; Smith, Daniel G. A.; Ding, Feizhi; Bygrave, Peter J.; Anandkumar, Animashree; Welborn, Matthew; Manby, Frederick R.; Miller, Thomas F.Journal of Chemical Physics (2021), 155 (20), 204103CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We present OrbNet Denali, a machine learning model for an electronic structure that is designed as a drop-in replacement for ground-state d. functional theory (DFT) energy calcns. The model is a message-passing graph neural network that uses symmetry-adapted AO features from a low-cost quantum calcn. to predict the energy of a mol. OrbNet Denali is trained on a vast dataset of 2.3 x 106 DFT calcns. on mols. and geometries. This dataset covers the most common elements in biochem. and org. chem. (H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, and I) and charged mols. OrbNet Denali is demonstrated on several well-established benchmark datasets, and we find that it provides accuracy that is on par with modern DFT methods while offering a speedup of up to three orders of magnitude. For the GMTKN55 benchmark set, OrbNet Denali achieves WTMAD-1 and WTMAD-2 scores of 7.19 and 9.84, on par with modern DFT functionals. For several GMTKN55 subsets, which contain chem. problems that are not present in the training set, OrbNet Denali produces a mean abs. error comparable to those of DFT methods. For the Hutchison conformer benchmark set, OrbNet Denali has a median correlation coeff. of R2 = 0.90 compared to the ref. DLPNO-CCSD(T) calcn. and R2 = 0.97 compared to the method used to generate the training data (ωB97X-D3/def2-TZVP), exceeding the performance of any other method with a similar cost. Similarly, the model reaches chem. accuracy for non-covalent interactions in the S66x10 dataset. For torsional profiles, OrbNet Denali reproduces the torsion profiles of ωB97X-D3/def2-TZVP with an av. mean abs. error of 0.12 kcal/mol for the potential energy surfaces of the diverse fragments in the TorsionNet500 dataset. (c) 2021 American Institute of Physics.
- 30Nakata, M.; Shimazaki, T. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry. J. Chem. Inf. Model. 2017, 57, 1300– 1308, DOI: 10.1021/acs.jcim.7b0008330PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven ChemistryNakata, Maho; Shimazaki, TomomiJournal of Chemical Information and Modeling (2017), 57 (6), 1300-1308CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Large-scale mol. databases play an essential role in the investigation of various subjects such as the development of org. materials, in-silico drug designs, and data-driven studies with machine learning, among others. We developed a large-scale quantum chem. database based on the first-principles method without performing any expt. Our database currently contains three million mol. electronic structures based on the d. functional theory method at the B3LYP/6-31G* level, and we successively calcd. 10 low-lying excited states of over two million mols. by the time-dependent DFT method with the 6-31+G* basis set. To select the mols. calcd. in our project, we mainly referred to the PubChem project, and it was used as a source of the mol. structures in short strings using the InChI and the SMILES representations. Accordingly, we named our quantum chem. database project as "PubChemQC" (http://pubchemqc.riken.jp/) and placed it in the public domain. In this paper, we showed the fundamental features of the PubChemQC database and dis- cussed the techniques used to construct the dataset for large-scale quantum chem. calcns. We also presented a machine-learning approach to predict the electronic structure of mols. as an example to demonstrate the suitability of the large-scale quantum chem. database.
- 31Smith, D. G. A.; Altarawy, D.; Burns, L. A.; Welborn, M.; Naden, L. N.; Ward, L.; Ellis, S.; Pritchard, B. P.; Crawford, T. D. The MolSSI QCArchive project: An open-source platform to compute, organize, and share quantum chemistry data. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2020, 11, e1491 DOI: 10.1002/wcms.1491There is no corresponding record for this reference.
- 32Lim, V. T.; Hahn, D. F.; Tresadern, G.; Bayly, C. I.; Mobley, D. L. Benchmark assessment of molecular geometries and energies from small molecule force fields [version 1; peer review: 2 approved]. F1000Research 2020, 9, 1390, DOI: 10.12688/f1000research.27141.132Benchmark assessment of molecular geometries and energies from small molecule force fieldsLim, Victoria T.; Hahn, David F.; Tresadern, Gary; Bayly, Christopher I.; Mobley, David L.F1000Research (2020), 9 (), 1390CODEN: FRESJL; ISSN:2046-1402. (F1000 Research Ltd.)Background: Force fields are used in a wide variety of contexts for classical mol. simulation, including studies on protein-ligand binding, membrane permeation, and thermophys. property prediction. The quality of these studies relies on the quality of the force fields used to represent the systems. Methods: Focusing on small mols. of fewer than 50 heavy atoms, our aim in this work is to compare nine force fields: GAFF, GAFF2, MMFF94, MMFF94S, OPLS3e, SMIRNOFF99Frosst, and the Open Force Field Parsley, versions 1.0, 1.1, and 1.2. On a dataset comprising 22,675 mol. structures of 3,271 mols., we analyzed force field-optimized geometries and conformer energies compared to ref. quantum mech. (QM) data. Results: We show that while OPLS3e performs best, the latest Open Force Field Parsley release is approaching a comparable level of accuracy in reproducing QM geometries and energetics for this set of mols. Meanwhile, the performance of established force fields such as MMFF94S and GAFF2 is generally somewhat worse. We also fiend that the series of recent Open Force Field versions provide significant increases in accuracy. Conclusions: This study provides an extensive test of the performance of different mol. mechanics force fields on a diverse mol. set, and highlights two (OPLS3e and OpenFF 1.2) that perform better than the others tested on the present comparison. Our mol. set and results are available for other researchers to use in testing.
- 33Gražulis, S.; Merkys, A.; Vaitkus, A.; Chateigner, D.; Lutterotti, L.; Moeck, P.; Quiros, M.; Downs, R. T.; Kaminsky, W.; Bail, A. L. Materials Informatics: Methods, Tools and Applications; Isayev, O., Tropsha, A., Curtarolo, S., Eds.; Wiley, 2019; Chapter 1, pp 1– 39.There is no corresponding record for this reference.
- 34Chai, J.-D.; Head-Gordon, M. Systematic optimization of long-range corrected hybrid density functionals. J. Chem. Phys. 2008, 128, 084106, DOI: 10.1063/1.283491834Systematic optimization of long-range corrected hybrid density functionalsChai, Jeng-Da; Head-Gordon, MartinJournal of Chemical Physics (2008), 128 (8), 084106/1-084106/15CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)A general scheme for systematically modeling long-range cor. (LC) hybrid d. functionals is proposed. Our resulting two LC hybrid functionals are shown to be accurate in thermochem., kinetics, and noncovalent interactions, when compared with common hybrid d. functionals. The qual. failures of the commonly used hybrid d. functionals in some "difficult problems," such as dissocn. of sym. radical cations and long-range charge-transfer excitations, are significantly reduced by the present LC hybrid d. functionals. (c) 2008 American Institute of Physics.
- 35Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Phys. Chem. Chem. Phys. 2005, 7, 3297, DOI: 10.1039/b508541a35Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracyWeigend, Florian; Ahlrichs, ReinhartPhysical Chemistry Chemical Physics (2005), 7 (18), 3297-3305CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)Gaussian basis sets of quadruple zeta valence quality for Rb-Rn are presented, as well as bases of split valence and triple zeta valence quality for H-Rn. The latter were obtained by (partly) modifying bases developed previously. A large set of more than 300 mols. representing (nearly) all elements-except lanthanides-in their common oxidn. states was used to assess the quality of the bases all across the periodic table. Quantities investigated were atomization energies, dipole moments and structure parameters for Hartree-Fock, d. functional theory and correlated methods, for which we had chosen Moller-Plesset perturbation theory as an example. Finally recommendations are given which type of basis set is used best for a certain level of theory and a desired quality of results.
- 36Weigend, F. Accurate Coulomb-fitting basis sets for H to Rn. Phys. Chem. Chem. Phys. 2006, 8, 1057, DOI: 10.1039/b515623h36Accurate Coulomb-fitting basis sets for H to RnWeigend, FlorianPhysical Chemistry Chemical Physics (2006), 8 (9), 1057-1065CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)A series of auxiliary basis sets to fit Coulomb potentials for the elements H to Rn (except lanthanides) is presented. For each element only one auxiliary basis set is needed to approx. Coulomb energies in conjunction with orbital basis sets of split valence, triple zeta valence and quadruple zeta valence quality with errors of typically below ca. 0.15 kJ mol-1 per atom; this was demonstrated in conjunction with the recently developed orbital basis sets of types def2-SV(P), def2-TZVP and def2-QZVPP for a large set of small mols. representing (nearly) each element in all of its common oxidn. states. These auxiliary bases are slightly more than three times larger than orbital bases of split valence quality. Compared to non-approximated treatments, computation times for the Coulomb part are reduced by a factor of ca. 8 for def2-SV(P) orbital bases, ca. 25 for def2-TZVP and ca. 100 for def2-QZVPP orbital bases.
- 37Sterling, T.; Irwin, J. J. ZINC 15 – Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324– 2337, DOI: 10.1021/acs.jcim.5b0055937ZINC 15 - Ligand Discovery for EveryoneSterling, Teague; Irwin, John J.Journal of Chemical Information and Modeling (2015), 55 (11), 2324-2337CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Many questions about the biol. activity and availability of small mols. remain inaccessible to investigators who could most benefit from their answers. To narrow the gap between chemoinformatics and biol., we have developed a suite of ligand annotation, purchasability, target, and biol. assocn. tools, incorporated into ZINC and meant for investigators who are not computer specialists. The new version contains over 120 million purchasable "drug-like" compds. - effectively all org. mols. that are for sale - a quarter of which are available for immediate delivery. ZINC connects purchasable compds. to high-value ones such as metabolites, drugs, natural products, and annotated compds. from the literature. Compds. may be accessed by the genes for which they are annotated as well as the major and minor target classes to which those genes belong. It offers new anal. tools that are easy for nonspecialists yet with few limitations for experts. ZINC retains its original 3D roots - all mols. are available in biol. relevant, ready-to-dock formats. ZINC is freely available at http://zinc15.docking.org.
- 38Yoshikawa, N.; Hutchison, G. R. Fast, efficient fragment-based coordinate generation for Open Babel. J. Cheminf. 2019, 11, 49, DOI: 10.1186/s13321-019-0372-5There is no corresponding record for this reference.
- 39O’Boyle, N. M.; Morley, C.; Hutchison, G. R. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008, 2, 5, DOI: 10.1186/1752-153x-2-539Pybel: a Python wrapper for the OpenBabel cheminformatics toolkitO'Boyle Noel M; Morley Chris; Hutchison Geoffrey RChemistry Central journal (2008), 2 (), 5 ISSN:.BACKGROUND: Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit. RESULTS: Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel. CONCLUSION: Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.
- 40O’Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R. Open Babel: An open chemical toolbox. J. Cheminf. 2011, 3, 33, DOI: 10.1186/1758-2946-3-3340Open Babel: an open chemical toolboxO'Boyle, Noel M.; Banck, Michael; James, Craig A.; Morley, Chris; Vandermeersch, Tim; Hutchison, Geoffrey R.Journal of Cheminformatics (2011), 3 (), 33CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Background: A frequent problem in computational modeling is the interconversion of chem. structures between different formats. While std. interchange formats exist (for example, Chem. Markup Language) and de facto stds. have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chem. data, differences in the data stored by different formats (0D vs. 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results: We discuss, for the first time, Open Babel, an open-source chem. toolbox that speaks the many languages of chem. data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chem. and mol. data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a soln. to the proliferation of multiple chem. file formats. In addn., it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chem. data in areas such as org. chem., drug design, materials science, and computational chem. It is freely available under an open-source license.
- 41Wishart, D. S.; Feunang, Y. D.; Guo, A. C.; Lo, E. J.; Marcu, A.; Grant, J. R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074– D1082, DOI: 10.1093/nar/gkx103741DrugBank 5.0: a major update to the DrugBank database for 2018Wishart, David S.; Feunang, Yannick D.; Guo, An C.; Lo, Elvis J.; Marcu, Ana; Grant, Jason R.; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Sayeeda, Zinat; Assempour, Nazanin; Iynkkaran, Ithayavani; Liu, Yifeng; Maciejewski, Adam; Gale, Nicola; Wilson, Alex; Chin, Lucy; Cummings, Ryan; Le, Diana; Pon, Allison; Knox, Craig; Wilson, MichaelNucleic Acids Research (2018), 46 (D1), D1074-D1082CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)DrugBank is a web-enabled database contg. comprehensivemol. information about drugs, their mechanisms, their interactions and their targets. First described in 2006, Drug- Bank has continued to evolve over the past 12 years in response to marked improvements to web stds. and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total no. of investigational drugs in the database has grown by almost 300%, the no. of drug-drug interactions has grown by nearly 600% and the no. of SNP-assocd. drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoproteomics). New data have also been added on the status of hundreds of newdrug clin. trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacol. research, pharmaceutical science and drug education.
- 42Pracht, P.; Bohle, F.; Grimme, S. Automated Exploration of the low-energy Chemical Space with fast Quantum Chemical Methods. Phys. Chem. Chem. Phys. 2020, 22, 7169– 7192, DOI: 10.1039/C9CP06869D42Automated exploration of the low-energy chemical space with fast quantum chemical methodsPracht, Philipp; Bohle, Fabian; Grimme, StefanPhysical Chemistry Chemical Physics (2020), 22 (14), 7169-7192CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)We propose and discuss an efficient scheme for the in silico sampling for parts of the mol. chem. space by semiempirical tight-binding methods combined with a meta-dynamics driven search algorithm. The focus of this work is set on the generation of proper thermodn. ensembles at a quantum chem. level for conformers, but similar procedures for protonation states, tautomerism and non-covalent complex geometries are also discussed. The conformational ensembles consisting of all significantly populated min. energy structures normally form the basis of further, mostly DFT computational work, such as the calcn. of spectra or macroscopic properties. By using basic quantum chem. methods, electronic effects or possible bond breaking/formation are accounted for and a very reasonable initial energetic ranking of the candidate structures is obtained. Due to the huge computational speedup gained by the fast low-cost quantum chem. methods, overall short computation times even for systems with hundreds of atoms (typically drug-sized mols.) are achieved. Furthermore, specialized applications, such as sampling with implicit solvation models or constrained conformational sampling for transition-states, metal-, surface-, or noncovalently bound complexes are discussed, opening many possible applications in modern computational chem. and drug discovery. The procedures have been implemented in a freely available computer code called CREST, that makes use of the fast and reliable GFNn-xTB methods.
- 43Grimme, S. Exploration of Chemical Compound, Conformer, and Reaction Space with Meta-Dynamics Simulations Based on Tight-Binding Quantum Chemical Calculations. J. Chem. Theory Comput. 2019, 15, 2847– 2862, DOI: 10.1021/acs.jctc.9b0014343Exploration of Chemical Compound, Conformer, and Reaction Space with Meta-Dynamics Simulations Based on Tight-Binding Quantum Chemical CalculationsGrimme, StefanJournal of Chemical Theory and Computation (2019), 15 (5), 2847-2862CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The semiempirical tight-binding based quantum chem. method GFN2-xTB is used in the framework of meta-dynamics (MTD) to globally explore chem. compd., conformer, and reaction space. The biasing potential given as a sum of Gaussian functions is expressed with the root-mean-square-deviation (RMSD) in Cartesian space as a metric for the collective variables. This choice makes the approach robust and generally applicable to three common problems (i.e., conformer search, chem. reaction space exploration in a virtual nanoreactor, and for guessing reaction paths). Because of the inherent locality of the at. RMSD, functional group or fragment selective treatments are possible facilitating the investigation of catalytic processes where, for example, only the substrate is thermally activated. Due to the approx. character of the GFN2-xTB method, the resulting structure ensembles require further refinement with more sophisticated, for example, d. functional or wave function theory methods. However, the approach is extremely efficient running routinely on common laptop computers in minutes to hours of computation time even for realistically sized mols. with a few hundred atoms. Furthermore, the underlying potential energy surface for mols. contg. almost all elements (Z = 1-86) is globally consistent including the covalent dissocn. process and electronically complicated situations in, for example, transition metal systems. As examples, thermal decompn., ethyne oligomerization, the oxidn. of hydrocarbons (by oxygen and a P 450 enzyme model), a Miller-Urey model system, a thermally forbidden dimerization, and a multistep intramol. cyclization reaction are shown. For typical conformational search problems of org. drug mols., the new MTD(RMSD) algorithm yields lower energy structures and more complete conformer ensembles at reduced computational effort compared with its already well performing predecessor.
- 44Chan, L.; Morris, G. M.; Hutchison, G. R. Understanding Conformational Entropy in Small Molecules. J. Chem. Theory Comput. 2021, 17, 2099– 2106, DOI: 10.1021/acs.jctc.0c0121344Understanding Conformational Entropy in Small MoleculesChan, Lucian; Morris, Garrett M.; Hutchison, Geoffrey R.Journal of Chemical Theory and Computation (2021), 17 (4), 2099-2106CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The calcn. of the entropy of flexible mols. can be challenging, since the no. of possible conformers can grow exponentially with mol. size and many low-energy conformers may be thermally accessible. Different methods have been proposed to approx. the contribution of conformational entropy to the mol. std. entropy, including performing thermochem. calcns. with all possible stable conformations and developing empirical corrections from exptl. data. We have performed conformer sampling on over 120,000 small mols. generating some 12 million conformers, to develop models to predict conformational entropy across a wide range of mols. Using insight into the nature of conformational disorder, our cross-validated phys. motivated statistical model gives a mean abs. error of ∼ 4.8 J/mol·K or under 0.4 kcal/mol at 300 K. Beyond predicting mol. entropies and free energies, the model implies a high degree of correlation between torsions in most mols., often assumed to be independent. While individual dihedral rotations may have low energetic barriers, the shape and chem. functionality of most mols. necessarily correlate their torsional degrees of freedom and hence restrict the no. of low-energy conformations immensely. Our simple models capture these correlations and advance our understanding of small mol. conformational entropy.
- 45Landrum, G. RDKit: Open-Source Cheminformatics. Available at http://www.rdkit.org, 2020; http://www.rdkit.org (accesses Oct 1, 2022).There is no corresponding record for this reference.
- 46https://smarts.plus/.There is no corresponding record for this reference.
- 47Schomburg, K.; Ehrlich, H.-C.; Stierand, K.; Rarey, M. From Structure Diagrams to Visual Chemical Patterns. J. Chem. Inf. Model. 2010, 50, 1529– 1535, DOI: 10.1021/ci100209a47From Structure Diagrams to Visual Chemical PatternsSchomburg, Karen; Ehrlich, Hans-Christian; Stierand, Katrin; Rarey, MatthiasJournal of Chemical Information and Modeling (2010), 50 (9), 1529-1535CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The intuitive way of chemists to communicate mols. is via two-dimensional structure diagrams. The straightforward visual representations are mostly preferred to the often complicated systematic chem. names. For chem. patterns, however, no comparable visualization stds. have evolved so far. Chem. patterns denoting descriptions of chem. features are needed whenever a set of mols. is filtered for certain properties. The currently available representations are constrained to linear mol. pattern languages which are hardly human readable and therefore keep chemists without computational background from systematically formulating patterns. Therefore, we introduce a new visualization concept for chem. patterns. The common std. concept of structure diagrams is extended to account for property descriptions and logic combinations of chem. features in patterns. As a first application of the new concept, we developed the SMARTSviewer, a tool that converts chem. patterns encoded in SMARTS strings to a visual representation. The graphic pattern depiction provides an overview of the specified chem. features, variations, and similarities without needing to decode the often cryptic linear expressions. Taking recent chem. publications from various fields, we demonstrate the wide application range of a graphical chem. pattern language.
- 48Neese, F.; Wennmohs, F.; Becker, U.; Riplinger, C. The ORCA quantum chemistry program package. J. Chem. Phys. 2020, 152, 224108, DOI: 10.1063/5.000460848The ORCA quantum chemistry program packageNeese, Frank; Wennmohs, Frank; Becker, Ute; Riplinger, ChristophJournal of Chemical Physics (2020), 152 (22), 224108CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)In this contribution to the special software-centered issue, the ORCA program package is described. We start with a short historical perspective of how the project began and go on to discuss its current feature set. ORCA has grown into a rather comprehensive general-purpose package for theor. research in all areas of chem. and many neighboring disciplines such as materials sciences and biochem. ORCA features d. functional theory, a range of wavefunction based correlation methods, semi-empirical methods, and even force-field methods. A range of solvation and embedding models is featured as well as a complete intrinsic to ORCA quantum mechanics/mol. mechanics engine. A specialty of ORCA always has been a focus on transition metals and spectroscopy as well as a focus on applicability of the implemented methods to "real-life" chem. applications involving systems with a few hundred atoms. In addn. to being efficient, user friendly, and, to the largest extent possible, platform independent, ORCA features a no. of methods that are either unique to ORCA or have been first implemented in the course of the ORCA development. Next to a range of spectroscopic and magnetic properties, the linear- or low-order single- and multi-ref. local correlation methods based on pair natural orbitals (domain based local pair natural orbital methods) should be mentioned here. Consequently, ORCA is a widely used program in various areas of chem. and spectroscopy with a current user base of over 22 000 registered users in academic research and in industry. (c) 2020 American Institute of Physics.
- 49Lin, J. B.; Jin, Y.; Lopez, S. A.; Druckerman, N.; Wheeler, S. E.; Houk, K. N. Torsional Barriers to Rotation and Planarization in Heterocyclic Oligomers of Value in Organic Electronics. J. Chem. Theory Comput. 2017, 13, 5624– 5638, DOI: 10.1021/acs.jctc.7b0070949Torsional Barriers to Rotation and Planarization in Heterocyclic Oligomers of Value in Organic ElectronicsLin, Janice B.; Jin, Yu; Lopez, Steven A.; Druckerman, Nathaniel; Wheeler, Steven E.; Houk, K. N.Journal of Chemical Theory and Computation (2017), 13 (11), 5624-5638CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)In order to understand the conformational behavior of org. components in org. electronic devices, we have computed the torsional potentials for a library of thiophene-based heterodimers. The accuracy and efficiencies of computational methods for these org. materials were benchmarked for 11 common d. functionals with three Pople basis sets against a Focal Point Anal. (FPA) on a model oligothiophene 2,5-bis(3-tetradecylthiophen-2-yl)thieno[3,2-b]-thiophene (BTTT) system. This study establishes a set of general trends in regards to conformational preferences, as well as planarization and rotational barriers for a library comprised of common fragments found in org. materials. These gas phase structures are compared to exptl. crystal structures to det. the effect of crystal packing on geometry. Finally, we analyze the structure of hole-transporting material DERDTS-TBDT and design a new oligomer likely to be planar in the solid state.
- 50Perkins, M. A.; Cline, L. M.; Tschumper, G. S. Torsional Profiles of Thiophene and Furan Oligomers: Probing the Effects of Heterogeneity and Chain Length. J. Phys. Chem. A 2021, 125, 6228– 6237, DOI: 10.1021/acs.jpca.1c0471450Torsional Profiles of Thiophene and Furan Oligomers: Probing the Effects of Heterogeneity and Chain LengthPerkins, Morgan A.; Cline, Laura M.; Tschumper, Gregory S.Journal of Physical Chemistry A (2021), 125 (28), 6228-6237CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)A systematic anal. of the torsional profiles of 55 unique oligomers composed of two to four thiophene and/or furan rings (n = 2 to 4) has been conducted using three d. functional theory (DFT) methods along with MP2 and three different coupled-cluster methods. Two planar or quasi-planar min. were identified for each n = 2 oligomer system. In every case, the torsional angle (τ) between the heteroatoms about the carbon-carbon bond connecting the two rings is at or near 180° for the global min. and 0° for the local min., referred to as anti and syn conformations, resp. These oligomers have rotational barrier heights ranging from ca. 2 kcal mol-1 for 2,2'-bithiophene to 4 kcal mol-1 for 2,2'-bifuran, based on electronic energies computed near the CCSD(T) complete basis set (CBS) limit. The corresponding rotational barrier for the heterogeneous 2-(2-thienyl)furan counterpart falls approx. halfway between those values. The energy differences between the min. are approx. 2 and 0.4 kcal mol-1 for the homogeneous 2,2'-bifuran and 2,2'-bithiophene, resp., whereas the energy difference between the planar local and global min. (at τ = 0 and 180°, resp.) is only 0.3 kcal mol-1 for 2-(2-thienyl)furan. Extending these three oligomers by adding one or two addnl. thiophene and/or furan rings resulted in only minor changes to the torsional profiles when rotating around the same carbon-carbon bond as the two-ring profiles. Relative energy differences between the syn and anti conformations were changed by no more than 0.4 kcal mol-1 for the corresponding n = 3 and 4 oligomers, while the rotational barrier height increased by no more than 0.8 kcal mol-1.
- 51Johansson, M. P.; Olsen, J. Torsional Barriers and Equilibrium Angle of Biphenyl: Reconciling Theory with Experiment. J. Chem. Theory Comput. 2008, 4, 1460– 1471, DOI: 10.1021/ct800182e51Torsional Barriers and Equilibrium Angle of Biphenyl: Reconciling Theory with ExperimentJohansson, Mikael P.; Olsen, JeppeJournal of Chemical Theory and Computation (2008), 4 (9), 1460-1471CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)The barriers of internal rotation of the two Ph groups in biphenyl are investigated using a combination of coupled cluster and d. functional theory. The exptl. barriers are for the first time accurately reproduced; our best ests. of the barriers are 8.0 and 8.3 kJ/mol around the planar and perpendicular conformations, resp. The use of flexible basis sets of at least augmented quadruple-ζ quality is shown to be a crucial prerequisite. Further, to finally reconcile theory with expt., extrapolations of both the basis set toward the basis set limit and electron correlation toward the full configuration-interaction limit are necessary. The min. of the torsional angle is significantly increased by free energy corrections, which are needed to reach an agreement with expt. The d. functional B3LYP approach is found to perform well compared with the highest level ab initio results.
- 52Nam, S.; Cho, E.; Sim, E.; Burke, K. Explaining and Fixing DFT Failures for Torsional Barriers. J. Phys. Chem. Lett. 2021, 12, 2796– 2804, DOI: 10.1021/acs.jpclett.1c0042652Explaining and Fixing DFT Failures for Torsional BarriersNam, Seungsoo; Cho, Eunbyol; Sim, Eunji; Burke, KieronJournal of Physical Chemistry Letters (2021), 12 (11), 2796-2804CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Most torsional barriers are predicted with high accuracies (about 1 kJ/mol) by std. semilocal functionals, but a small subset was found to have much larger errors. We created a database of almost 300 carbon-carbon torsional barriers, including 12 poorly behaved barriers, that stem from the Y=C-X group, where Y is O or S and X is a halide. Functionals with enhanced exchange mixing (about 50%) worked well for all barriers. We found that poor actors have delocalization errors caused by hyperconjugation. These problematic calcns. are d.-sensitive (i.e., DFT predictions change noticeably with the d.), and using HF densities (HF-DFT) fixes these issues. For example, conventional B3LYP performs as accurately as exchange-enhanced functionals if the HF d. is used. For long-chain conjugated mols., HF-DFT can be much better than exchange-enhanced functionals. We suggest that HF-PBE0 has the best overall performance.
- 53Jackson, N. E.; Savoie, B. M.; Kohlstedt, K. L.; Olvera de la Cruz, M.; Schatz, G. C.; Chen, L. X.; Ratner, M. A. Controlling Conformations of Conjugated Polymers and Small Molecules: The Role of Nonbonding Interactions. J. Am. Chem. Soc. 2013, 135, 10475– 10483, DOI: 10.1021/ja403667s53Controlling Conformations of Conjugated Polymers and Small Molecules: The Role of Nonbonding InteractionsJackson, Nicholas E.; Savoie, Brett M.; Kohlstedt, Kevin L.; Olvera de la Cruz, Monica; Schatz, George C.; Chen, Lin X.; Ratner, Mark A.Journal of the American Chemical Society (2013), 135 (28), 10475-10483CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)The chem. variety present in the org. electronics literature has motivated us to investigate potential nonbonding interactions often incorporated into conformational "locking" schemes. We examine a variety of potential interactions, including oxygen-sulfur, nitrogen-sulfur, and fluorine-sulfur, using accurate quantum-chem. wave function methods and noncovalent interaction (NCI) anal. on a selection of high-performing conjugated polymers and small mols. found in the literature. In addn., we evaluate a set of nonbonding interactions occurring between various heterocyclic and pendant atoms taken from a group of representative π-conjugated mols. Together with our survey and set of interactions, it is detd. that while many nonbonding interactions possess weak binding capabilities, nontraditional hydrogen-bonding interactions, oxygen-hydrogen (CH···O) and nitrogen-hydrogen (CH···N), are alone in inducing conformational control and enhanced planarity along a polymer or small mol. backbone at room temp.
- 54Greenwell, C.; Beran, G. J. O. Inaccurate Conformational Energies Still Hinder Crystal Structure Prediction in Flexible Organic Molecules. Cryst. Growth Des. 2020, 20, 4875– 4881, DOI: 10.1021/acs.cgd.0c0067654Inaccurate Conformational Energies Still Hinder Crystal Structure Prediction in Flexible Organic MoleculesGreenwell, Chandler; Beran, Gregory J. O.Crystal Growth & Design (2020), 20 (8), 4875-4881CODEN: CGDEFU; ISSN:1528-7483. (American Chemical Society)Crystal structure prediction driven by d. functional theory has become an increasingly useful tool for the pharmaceutical industry and others interested in understanding and controlling org. mol. crystal packing. However, delocalization error in widely used d. functionals leads to problematic conformational energies that can cause incorrect predictions of polymorph stabilities. In 5 examples ranging from small mols. to the polymorphically challenging pharmaceuticals axitinib and galunisertib, inexpensively correcting the intramol. conformational energies with higher-level electronic structure methods leads to polymorph stability predictions that agree far better with expt. This approach also provides a valuable diagnostic for when skepticism about predicted polymorph stabilities is warranted. Commonly used d. functionals have difficulty ranking certain types of conformational polymorph structures correctly. Correcting the intramol. conformational energies with higher-level quantum chem. methods can improve the accuracy of crystal structure prediction stability rankings considerably.
- 55Smith, J. S.; Zubatyuk, R.; Nebgen, B.; Lubbers, N.; Barros, K.; Roitberg, A. E.; Isayev, O.; Tretiak, S. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 2020, 7, 134, DOI: 10.1038/s41597-020-0473-z55The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for moleculesSmith, Justin S.; Zubatyuk, Roman; Nebgen, Benjamin; Lubbers, Nicholas; Barros, Kipton; Roitberg, Adrian E.; Isayev, Olexandr; Tretiak, SergeiScientific Data (2020), 7 (1), 134CODEN: SDCABS; ISSN:2052-4463. (Nature Research)Abstr.: Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chem., ML has been used to develop models for predicting mol. properties, for example quantum mechanics (QM) calcd. potential energy surfaces and at. charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for org. mols. were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality redn. scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M d. functional theory calcns., while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approx. 14 million CPU core-hours were expended to generate this data. Multiple QM calcd. properties for the chem. elements C, H, N, and O are provided: energies, at. forces, multipole moments, at. charges, etc. We provide this data to the community to aid research and development of ML models for chem.
- 56Axelrod, S.; Gomez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 2022, 9, 185, DOI: 10.1038/s41597-022-01288-456GEOM, energy-annotated molecular conformations for property prediction and molecular generationAxelrod, Simon; Gomez-Bombarelli, RafaelScientific Data (2022), 9 (1), 185CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Machine learning (ML) outperforms traditional approaches in many mol. design tasks. ML models usually predict mol. properties from a 2D chem. graph or a single 3D structure, but neither of these representations accounts for the ensemble of 3D conformers that are accessible to a mol. Property prediction could be improved by using conformer ensembles as input, but there is no large-scale dataset that contains graphs annotated with accurate conformers and exptl. data. Here we use advanced sampling and semi-empirical d. functional theory (DFT) to generate 37 million mol. conformations for over 450,000 mols. The Geometric Ensemble Of Mols. (GEOM) dataset contains conformers for 133,000 species from QM9, and 317,000 species with exptl. data related to biophysics, physiol., and phys. chem. Ensembles of 1,511 species with BACE-1 inhibition data are also labeled with high-quality DFT free energies in an implicit water solvent, and 534 ensembles are further optimized with DFT. GEOM will assist in the development of models that predict properties from conformer ensembles, and generative models that sample 3D conformations.
- 57Isert, C.; Atz, K.; Jiménez-Luna, J.; Schneider, G. QMugs, quantum mechanical properties of drug-like molecules. Sci. Data 2022, 9, 273, DOI: 10.1038/s41597-022-01390-757QMugs, quantum mechanical properties of drug-like moleculesIsert, Clemens; Atz, Kenneth; Jimenez-Luna, Jose; Schneider, GisbertScientific Data (2022), 9 (1), 273CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Machine learning approaches in drug discovery, as well as in other areas of the chem. sciences, benefit from curated datasets of phys. mol. properties. However, there currently is a lack of data collections featuring large bioactive mols. alongside first-principle quantum chem. information. The open-access QMugs (Quantum-Mech. Properties of Drug-like Mols.) dataset fills this void. The QMugs collection comprises quantum mech. properties of more than 665 k biol. and pharmacol. relevant mols. extd. from the ChEMBL database, totaling ∼2 M conformers. QMugs contains optimized mol. geometries and thermodn. data obtained via the semi-empirical method GFN2-xTB. Atomic and mol. properties are provided on both the GFN2-xTB and on the d.-functional levels of theory (DFT, ωB97X-D/def2-SVP). QMugs features mols. of significantly larger size than previously-reported collections and comprises their resp. quantum mech. wave functions, including DFT d. and orbital matrixes. This dataset is intended to facilitate the development of models that learn from mol. data on different levels of theory while also providing insight into the corresponding relationships between mol. structure and biol. activity.
- 58Eastman, P.; Behara, P. K.; Dotson, D. L.; Galvelis, R.; Herr, J. E.; Horton, J. T.; Mao, Y.; Chodera, J. D.; Pritchard, B. P.; Wang, Y.; Fabritiis, G. D.; Markland, T. E. SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. Sci. Data 2023, 10, 11, DOI: 10.1038/s41597-022-01882-658SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning PotentialsEastman, Peter; Behara, Pavan Kumar; Dotson, David L.; Galvelis, Raimondas; Herr, John E.; Horton, Josh T.; Mao, Yuezhi; Chodera, John D.; Pritchard, Benjamin P.; Wang, Yuanqing; De Fabritiis, Gianni; Markland, Thomas E.Scientific Data (2023), 10 (1), 11CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Machine learning potentials are an important tool for mol. simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chem. dataset for training potentials relevant to simulating drug-like small mols. interacting with proteins. It contains over 1.1 million conformations for a diverse set of small mols., dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged mols., and a wide range of covalent and non-covalent interactions. It provides both forces and energies calcd. at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chem. accuracy across a broad region of chem. space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in mol. simulations.
- 59McNutt, A. T.; Bisiriyu, F.; Song, S.; Vyas, A.; Hutchison, G. R.; Koes, D. R. Conformer Generation for Structure-Based Drug Design: How Many and How Good?. J. Chem. Inf. Model. 2023, 63, 6598– 6607, DOI: 10.1021/acs.jcim.3c0124559Conformer Generation for Structure-Based Drug Design: How Many and How Good?McNutt, Andrew T.; Bisiriyu, Fatimah; Song, Sophia; Vyas, Ananya; Hutchison, Geoffrey R.; Koes, David RyanJournal of Chemical Information and Modeling (2023), 63 (21), 6598-6607CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Conformer generation, the assignment of realistic 3D coordinates to a small mol., is fundamental to structure-based drug design. Conformational ensembles are required for rigid-body matching algorithms, such as shape-based or pharmacophore approaches, and even methods that treat the ligand flexibly, such as docking, are dependent on the quality of the provided conformations due to not sampling all degrees of freedom (e.g., only sampling torsions). Here, we empirically elucidate some general principles about the size, diversity, and quality of the conformational ensembles needed to get the best performance in common structure-based drug discovery tasks. In many cases, our findings may parallel "common knowledge" well-known to practitioners of the field. Nonetheless, we feel that it is valuable to quantify these conformational effects while reproducing and expanding upon previous studies. Specifically, we investigate the performance of a state-of-the-art generative deep learning approach vs. a more classical geometry-based approach, the effect of energy minimization as a postprocessing step, the effect of ensemble size (max. no. of conformers), and construction (filtering by root-mean-square deviation for diversity) and how these choices influence the ability to recapitulate bioactive conformations and perform pharmacophore screening and mol. docking.
- 60Foloppe, N.; Chen, I.-J. Energy windows for computed compound conformers: covering artefacts or truly large reorganization energies?. Future Med. Chem. 2019, 11, 97– 118, DOI: 10.4155/fmc-2018-040060Energy windows for computed compound conformers: covering artefacts or truly large reorganization energies?Foloppe, Nicolas; Chen, I-JenFuture Medicinal Chemistry (2019), 11 (2), 97-118CODEN: FMCUA7; ISSN:1756-8919. (Future Science Ltd.)The generation of 3D conformers of small mols. underpins most computational drug discovery. Thus, the conformer quality is crit. and depends on their energetics. A key parameter is the empirical conformational energy window (ΔEw), since only conformers within ΔEw are retained. However, ΔEw values in use appear unrealistically large. We analyze the factors pertaining to the conformer energetics and ΔEw. We argue that more attention must be focused on the problem of collapsed low-energy conformers. That is due to artificial intramol. stabilization and occurs even with continuum solvation. Consequently, the conformational energy of extended bioactive structures is artifactually increased, which inflates ΔEw. Thus, this Perspective highlights the issues arising from low-energy conformers and suggests improvements via empirical or physics-based strategies. Graphical abstr. :.
- 61Rai, B. K.; Sresht, V.; Yang, Q.; Unwalla, R.; Tu, M.; Mathiowetz, A. M.; Bakken, G. A. Comprehensive Assessment of Torsional Strain in Crystal Structures of Small Molecules and Protein–Ligand Complexes using ab Initio Calculations. J. Chem. Inf. Model. 2019, 59, 4195– 4208, DOI: 10.1021/acs.jcim.9b0037361Comprehensive Assessment of Torsional Strain in Crystal Structures of Small Molecules and Protein-Ligand Complexes using ab Initio CalculationsRai, Brajesh K.; Sresht, Vishnu; Yang, Qingyi; Unwalla, Ray; Tu, Meihua; Mathiowetz, Alan M.; Bakken, Gregory A.Journal of Chemical Information and Modeling (2019), 59 (10), 4195-4208CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The energetics of rotation around single bonds (torsions) is a key determinant of the 3D shape that drug-like mols. adopt in soln., the solid state, and in different biol. environments, which in turn defines their unique phys. and pharmacol. properties. Therefore, accurate characterization of torsion angle preference and energetics is essential for the success of computational drug discovery and design. Here, the authors analyze torsional strain in crystal structures of drug-like mols. in CSD and bioactive ligand conformations in PDB, expressing the total strain energy as a sum of strain energy from constituent rotatable bonds. The authors utilized Cloud computing to generate torsion scan profiles of a very large collection of chem. diverse neutral fragments at DFT(B3LYP)/6-31G*//6-31G** or DFT(B3LYP)/6-31+G*//6-31+G** (for sulfur-contg. mol.). With the data generated from these ab initio calcns., the authors performed rigorous anal. of strain due to deviation of obsd. torsion angles relative to their ideal gas-phase geometries. Contrary to the previous studies based on mol. mechanics, the authors find that in the cryst.-state mols. generally adopt low-strain conformations, with median per-torsion strain energy in CSD and PDB under 1/10th and 1/3rd of a kcal/mol, resp. However, for a small fraction (<5%) of motifs, external effects such as steric hindrance and hydrogen bonds result in strain penalty exceeding 2.5 kcal/mol. The authors find that due to poor quality of PDB structures in general, bioactive structures tend to have higher torsional strain compared to small mol. crystal conformations. However, in the absence of structural fitting artifacts in PDB structures, protein-induced strain in bioactive conformations is quant. similar to those due to the packing forces in small mol. crystal structures. This anal. allows us to establish strain energy thresholds to help to identify biol. relevant conformers in a given ensemble. The work presented here is the most comprehensive study to date that demonstrates the utility and feasibility of gas-phase QM calcns. to study conformational preference and energetics of drug-size mols. Potential applications of this study in computational lead discovery and structure-based design are discussed.
- 62Taylor, R.; Wood, P. A. A Million Crystal Structures: The Whole Is Greater than the Sum of Its Parts. Chem. Rev. 2019, 119, 9427– 9477, DOI: 10.1021/acs.chemrev.9b0015562A Million Crystal Structures: The Whole Is Greater than the Sum of Its PartsTaylor, Robin; Wood, Peter A.Chemical Reviews (Washington, DC, United States) (2019), 119 (16), 9427-9477CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)A review. The founding in 1965 of what is now called the Cambridge Structural Database (CSD) has reaped dividends in numerous and diverse areas of chem. research. Each of the million or so crystal structures in the database was solved for its own particular reason, but collected together, the structures can be reused to address a multitude of new problems. In this Review, which is focused mainly on the last 10 years, we chronicle the contribution of the CSD to research into mol. geometries, mol. interactions, and mol. assemblies and demonstrate its value in the design of biol. active mols. and the solid forms in which they are delivered. Its potential in other com. relevant areas is described, including gas storage and delivery, thin films, and (opto)electronics. The CSD also aids the soln. of new crystal structures. Because no scientific instrument is without shortcomings, the limitations of CSD research are assessed. We emphasize the importance of maintaining database quality: notwithstanding the arrival of big data and machine learning, it remains perilous to ignore the principle of garbage in, garbage out. Finally, we explain why the CSD must evolve with the world around it to ensure it remains fit for purpose in the years ahead.
- 63Liebeschuetz, J. W. The Good, the Bad, and the Twisted Revisited: An Analysis of Ligand Geometry in Highly Resolved Protein–Ligand X-ray Structures. J. Med. Chem. 2021, 64, 7533– 7543, DOI: 10.1021/acs.jmedchem.1c0022863The Good, the Bad, and the Twisted Revisited: An Analysis of Ligand Geometry in Highly Resolved Protein-Ligand X-ray StructuresLiebeschuetz, John W.Journal of Medicinal Chemistry (2021), 64 (11), 7533-7543CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)An anal. of the rotatable bond geometry of drug-like ligand models is reported for high-resoln. (<1.1 Å) crystallog. protein-ligand complexes. In cases where the ligand fit to the electron d. is very good, unusual torsional geometry is rare and, most often, though not exclusively, assocd. with strong polar, metal, or covalent ligand-protein interactions. It is rarely assocd. with a torsional strain of greater than 2 kcal mol-1 by calcn. An unusual torsional geometry is more prevalent where the fit to electron d. is not perfect. Multiple low-strain conformer bindings were obsd. in 21% of the set and, it is suggested, may also lie behind many of the 35% of single-occupancy cases, where a poor fit to the e-d. was found. It is concluded that multiple conformer ligand binding is an under-recognized phenomenon in structure-based drug design and that there is a need for more robust crystallog. refinement methods to better handle such cases.
- 64Tong, J.; Zhao, S. Large-Scale Analysis of Bioactive Ligand Conformational Strain Energy by Ab Initio Calculation. J. Chem. Inf. Model. 2021, 61, 1180– 1192, DOI: 10.1021/acs.jcim.0c0119764Large-Scale Analysis of Bioactive Ligand Conformational Strain Energy by Ab Initio CalculationTong, Jiahui; Zhao, SuwenJournal of Chemical Information and Modeling (2021), 61 (3), 1180-1192CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Ligand conformational strain energy (LCSE) plays an important role in virtual screening and lead optimization. While various studies have provided insights into LCSE for small-mol. ligands in the Protein Data Bank (PDB), conclusions are inconsistent mainly due to small datasets, poor quality control of crystal structures, and mol. mechanics (MM) or low-level quantum mechanics (QM) calcns. Here, we built a high-quality dataset (LigBoundConf) of 8145 ligand-bound conformations from PDB crystal structures and calcd. LCSE at the M062X-D3/ma-TZVPP (SMD)//M062X-D3/def2-SVP(SMD) level for each case in the dataset. The mean/median LCSE is 4.6/3.7 kcal/mol for 6672 successfully calcd. cases, which is significantly lower than the ests. based on mol. mechanics in many previous analyses. Esp., when removing ligands with nonarom. ring(s) that are prone to have large LCSEs due to electron d. overfitting, the mean/median LCSE was reduced to 3.3/2.5 kcal/mol. We further reveal that LCSE is correlated with several ligand properties, including formal at. charge, mol. wt., no. of rotatable bonds, and no. of hydrogen-bond donors and acceptors. In addn., our results show that although summation of torsion strains is a good approxn. of LCSE for most cases, for a small fraction (about 6%) of our dataset, it underestimates LCSEs if ligands could form nonlocal intramol. interactions in the unbound state. Taken together, our work provides a comprehensive profile of LCSE for ligands in PDB, which could help ligand conformation generation, ligand docking pose evaluation, and lead optimization.
- 65Chan, L.; Hutchison, G. R.; Morris, G. M. Understanding Ring Puckering in Small Molecules and Cyclic Peptides. J. Chem. Inf. Model. 2021, 61, 743– 755, DOI: 10.1021/acs.jcim.0c0114465Understanding Ring Puckering in Small Molecules and Cyclic PeptidesChan, Lucian; Hutchison, Geoffrey R.; Morris, Garrett M.Journal of Chemical Information and Modeling (2021), 61 (2), 743-755CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The geometry of a mol. plays a significant role in detg. its phys. and chem. properties. Despite its importance, there are relatively few studies on ring puckering and conformations, often focused on small cycloalkanes, 5- and 6-membered carbohydrate rings, and specific macrocycle families. We lack a general understanding of the puckering preferences of medium-sized rings and macrocycles. To address this, we provide an extensive conformational anal. of a diverse set of rings. We used Cremer-Pople puckering coordinates to study the trends of the ring conformation across a set of 140 000 diverse small mols., including small rings, macrocycles, and cyclic peptides. By standardizing using key atoms, we show that the ring conformations can be classified into relatively few conformational clusters, based on their canonical forms. The no. of such canonical clusters increases slowly with ring size. Ring puckering motions, esp. pseudo-rotations, are generally restricted and differ between clusters. More importantly, we propose models to map puckering preferences to torsion space, which allows us to understand the inter-related changes in torsion angles during pseudo-rotation and other puckering motions. Beyond ring puckers, our models also explain the change in substituent orientation upon puckering. We also present a novel knowledge-based sampling method using the puckering preferences and coupled substituent motion to generate ring conformations efficiently. In summary, this work provides an improved understanding of general ring puckering preferences, which will in turn accelerate the identification of low-energy ring conformations for applications from polymeric materials to drug binding.
- 66Lemm, D.; von Rudorff, G. F.; von Lilienfeld, O. A. Machine learning based energy-free structure predictions of molecules, transition states, and solids. Nat. Commun. 2021, 12, 4468, DOI: 10.1038/s41467-021-24525-766Machine learning based energy-free structure predictions of molecules, transition states, and solidsLemm, Dominik; von Rudorff, Guido Falk; von Lilienfeld, O. AnatoleNature Communications (2021), 12 (1), 4468CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)The computational prediction of atomistic structure is a long-standing problem in physics, chem., materials, and biol. Conventionally, force-fields or ab initio methods det. structure through energy minimization, which is either approx. or computationally demanding. This accuracy/cost trade-off prohibits the generation of synthetic big data sets accounting for chem. space with atomistic detail. Exploiting implicit correlations among relaxed structures in training data sets, our machine learning model Graph-To-Structure (G2S) generalizes across compd. space in order to infer interat. distances for out-of-sample compds., effectively enabling the direct reconstruction of coordinates, and thereby bypassing the conventional energy optimization task. The numerical evidence collected includes 3D coordinate predictions for org. mols., transition states, and cryst. solids. G2S improves systematically with training set size, reaching mean abs. interat. distance prediction errors of less than 0.2 Å for less than eight thousand training structures - on par or better than conventional structure generators. Applicability tests of G2S include successful predictions for systems which typically require manual intervention, improved initial guesses for subsequent conventional ab initio based relaxation, and input generation for subsequent use of structure based quantum machine learning models.
- 67Hanwell, M. D.; Curtis, D. E.; Lonie, D. C.; Vandermeersch, T.; Zurek, E.; Hutchison, G. R. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminf. 2012, 4, 17, DOI: 10.1186/1758-2946-4-1767Avogadro: an advanced semantic chemical editor, visualization, and analysis platformHanwell, Marcus D.; Curtis, Donald E.; Lonie, David C.; Vandermeersch, Tim; Zurek, Eva; Hutchison, Geoffrey R.Journal of Cheminformatics (2012), 4 (), 17CODEN: JCOHB3; ISSN:1758-2946. (Chemistry Central Ltd.)Background: The Avogadro project has developed an advanced mol. editor and visualizer designed for cross-platform use in computational chem., mol. modeling, bioinformatics, materials science, and related areas. It offers flexible, high quality rendering, and a powerful plugin architecture. Typical uses include building mol. structures, formatting input files, and analyzing output of a wide variety of computational chem. packages. By using the CML file format as its native document type, Avogadro seeks to enhance the semantic accessibility of chem. data types. Results: The work presented here details the Avogadro library, which is a framework providing a code library and application programming interface (API) with three-dimensional visualization capabilities; and has direct applications to research and education in the fields of chem., physics, materials science, and biol. The Avogadro application provides a rich graphical interface using dynamically loaded plugins through the library itself. The application and library can each be extended by implementing a plugin module in C++ or Python to explore different visualization techniques, build/manipulate mol. structures, and interact with other programs. We describe some example extensions, one which uses a genetic algorithm to find stable crystal structures, and one which interfaces with the PackMol program to create packed, solvated structures for mol. dynamics simulations. The 1.0 release series of Avogadro is the main focus of the results discussed here. Conclusions: Avogadro offers a semantic chem. builder and platform for visualization and anal. For users, it offers an easy-to-use builder, integrated support for downloading from common databases such as PubChem and the Protein Data Bank, extg. chem. data from a wide variety of formats, including computational chem. output, and native, semantic support for the CML file format. For developers, it can be easily extended via a powerful plugin mechanism to support new features in org. chem., inorg. complexes, drug design, materials, biomols., and simulations.
- 68Avogadro2 Version 1.97. https://two.avogadro.cc/.There is no corresponding record for this reference.
- 69Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261– 272, DOI: 10.1038/s41592-019-0686-269SciPy 1.0: fundamental algorithms for scientific computing in PythonVirtanen, Pauli; Gommers, Ralf; Oliphant, Travis E.; Haberland, Matt; Reddy, Tyler; Cournapeau, David; Burovski, Evgeni; Peterson, Pearu; Weckesser, Warren; Bright, Jonathan; van der Walt, Stefan J.; Brett, Matthew; Wilson, Joshua; Millman, K. Jarrod; Mayorov, Nikolay; Nelson, Andrew R. J.; Jones, Eric; Kern, Robert; Larson, Eric; Carey, C. J.; Polat, Ilhan; Feng, Yu; Moore, Eric W.; Vander Plas, Jake; Laxalde, Denis; Perktold, Josef; Cimrman, Robert; Henriksen, Ian; Quintero, E. A.; Harris, Charles R.; Archibald, Anne M.; Ribeiro, Antonio H.; Pedregosa, Fabian; van Mulbregt, PaulNature Methods (2020), 17 (3), 261-272CODEN: NMAEA3; ISSN:1548-7091. (Nature Research)Abstr.: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto std. for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per yr. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent tech. developments.
- 70Chan, L.; Hutchison, G. R.; Morris, G. M. Bayesian optimization for conformer generation. J. Cheminf. 2019, 11, 32, DOI: 10.1186/s13321-019-0354-7There is no corresponding record for this reference.
- 71Chan, L.; Hutchison, G. R.; Morris, G. M. BOKEI: Bayesian optimization using knowledge of correlated torsions and expected improvement for conformer generation. Phys. Chem. Chem. Phys. 2020, 22, 5211– 5219, DOI: 10.1039/C9CP06688H71BOKEI: Bayesian optimization using knowledge of correlated torsions and expected improvement for conformer generationChan, Lucian; Hutchison, Geoffrey R.; Morris, Garrett M.Physical Chemistry Chemical Physics (2020), 22 (9), 5211-5219CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)A key challenge in conformer sampling is finding low-energy conformations with a small no. of energy evaluations. We recently demonstrated the Bayesian Optimization Algorithm (BOA) is an effective method for finding the lowest energy conformation of a small mol. Our approach balances between exploitation and exploration, and is more efficient than exhaustive or random search methods. Here, we extend strategies used on proteins and oligopeptides (e.g. Ramachandran plots of secondary structure) and study correlated torsions in small mols. We use bivariate von Mises distributions to capture correlations, and use them to constrain the search space. We validate the performance of our new method, Bayesian Optimization with Knowledge-based Expected Improvement (BOKEI), on a dataset consisting of 533 diverse small mols., using (i) a force field (MMFF94); and (ii) a semi-empirical method (GFN2), as the objective function. We compare the search performance of BOKEI, BOA with Expected Improvement (BOA-EI), and a genetic algorithm (GA), using a fixed no. of energy evaluations. In more than 60% of the cases examd., BOKEI finds lower energy conformations than global optimization with BOA-EI or GA. More importantly, we find correlated torsions in up to 15% of small mols. in larger data sets, up to 8 times more often than previously reported. The BOKEI patterns not only describe steric clashes, but also reflect favorable intramol. interactions such as hydrogen bonds and π-π stacking. Increasing our understanding of the conformational preferences of mols. will help improve our ability to find low energy conformers efficiently, which will have impact in a wide range of computational modeling applications.
- 72Pordes, R.; Petravick, D.; Kramer, B.; Olson, D.; Livny, M.; Roy, A.; Avery, P.; Blackburn, K.; Wenaus, T.; Würthwein, F. The Open Science Grid. J. Phys. Conf. 2007, 78, 012057, DOI: 10.1088/1742-6596/78/1/012057There is no corresponding record for this reference.
- 73Sfiligoi, I.; Bradley, D. C.; Holzman, B.; Mhashilkar, P.; Padhi, S.; Wurthwein, F. The Pilot Way to Grid Resources Using glideinWMS. World Congr. Comput. Sci. Inf. Eng. 2009, 2, 428– 432, DOI: 10.1109/CSIE.2009.950There is no corresponding record for this reference.
Supporting Information
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c01278.
Histograms of the rotatable bonds and conformers in the COD set, comparisons of RMSD and radius of gyration for the COD and Platinum sets, and histograms of torsional angle deviations for GFN2-optimized and ωB97X-D3/def2-SVP-optimized geometries (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.