Small-Molecule 3D Structure Prediction Using Open Crystallography Data
Abstract

Predicting the 3D structures of small molecules is a common problem in chemoinformatics. Even the best methods are inaccurate for complex molecules, and there is a large gap in accuracy between proprietary and free algorithms. Previous work presented COSMOS, a novel data-driven algorithm that uses knowledge of known structures from the Cambridge Structural Database and demonstrates performance that was competitive with proprietary algorithms. However, dependence on the Cambridge Structural Database prevented its widespread use. Here, we present an updated version of the COSMOS structure predictor, complete with a free structure library derived from open data sources. We demonstrate that COSMOS performs better than other freely available methods, with a mean RMSD of 1.16 and 1.68 Å for organic and metal–organic structures, respectively, and a mean prediction time of 60 ms per molecule. This is a 17% and 20% reduction, respectively, in RMSD compared to the free predictor provided by Open Babel, and it is 10 times faster. The ChemDB Web portal provides a COSMOS prediction Web server, as well as downloadable copies of the COSMOS executable and library of molecular substructures.
Cited By
This article is cited by 15 publications.
- Thomas Seidel, Christian Permann, Oliver Wieder, Stefan M. Kohlbacher, Thierry Langer. High-Quality Conformer Generation with CONFORGE: Algorithm and Performance Assessment. Journal of Chemical Information and Modeling 2023, 63 (17) , 5549-5570. https://doi.org/10.1021/acs.jcim.3c00563
- Marco Foscato, Vishwesh Venkatraman, Vidar R. Jensen. DENOPTIM: Software for Computational de Novo Design of Organic and Inorganic Molecules. Journal of Chemical Information and Modeling 2019, 59 (10) , 4077-4082. https://doi.org/10.1021/acs.jcim.9b00516
- Konstantinos D. Vogiatzis, Mikhail V. Polynski, Justin K. Kirkland, Jacob Townsend, Ali Hashemi, Chong Liu, Evgeny A. Pidko. Computational Approach to Molecular Catalysis by 3d Transition Metals: Challenges and Opportunities. Chemical Reviews 2019, 119 (4) , 2453-2523. https://doi.org/10.1021/acs.chemrev.8b00361
- Peter Sadowski, David Fooshee, Niranjan Subrahmanya, and Pierre Baldi . Synergies Between Quantum Mechanics and Machine Learning in Reaction Prediction. Journal of Chemical Information and Modeling 2016, 56 (11) , 2125-2128. https://doi.org/10.1021/acs.jcim.6b00351
- Sereina Riniker and Gregory A. Landrum . Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation. Journal of Chemical Information and Modeling 2015, 55 (12) , 2562-2574. https://doi.org/10.1021/acs.jcim.5b00654
- Adriana Supady, Volker Blum, and Carsten Baldauf . First-Principles Molecular Structure Search with a Genetic Algorithm. Journal of Chemical Information and Modeling 2015, 55 (11) , 2338-2348. https://doi.org/10.1021/acs.jcim.5b00243
- Marco Foscato, Vishwesh Venkatraman, Giovanni Occhipinti, Bjørn K. Alsberg, and Vidar R. Jensen . Automated Building of Organometallic Complexes from 3D Fragments. Journal of Chemical Information and Modeling 2014, 54 (7) , 1919-1931. https://doi.org/10.1021/ci5003153
- Sana Bougueroua, Marie Bricage, Ylène Aboulfath, Dominique Barth, Marie-Pierre Gaigeot. Algorithmic Graph Theory, Reinforcement Learning and Game Theory in MD Simulations: From 3D Structures to Topological 2D-Molecular Graphs (2D-MolGraphs) and Vice Versa. Molecules 2023, 28 (7) , 2892. https://doi.org/10.3390/molecules28072892
- Saulius Gražulis, Andrius Merkys, Antanas Vaitkus. Crystallography Open Database (COD). 2020, 1863-1881. https://doi.org/10.1007/978-3-319-44677-6_66
- Elman Mansimov, Omar Mahmood, Seokho Kang, Kyunghyun Cho. Molecular Geometry Prediction using a Deep Generative Graph Neural Network. Scientific Reports 2019, 9 (1) https://doi.org/10.1038/s41598-019-56773-5
- Saulius Gražulis, Andrius Merkys, Antanas Vaitkus, Daniel Chateigner, Luca Lutterotti, Peter Moeck, Miguel Quiros, Robert T. Downs, Werner Kaminsky, Armel Le Bail. Crystallography Open Database: History, Development, and Perspectives. 2019, 1-39. https://doi.org/10.1002/9783527802265.ch1
- Christopher R. Collins, Geoffrey J. Gordon, O. Anatole von Lilienfeld, David J. Yaron. Constant size descriptors for accurate machine learning models of molecular properties. The Journal of Chemical Physics 2018, 148 (24) https://doi.org/10.1063/1.5020441
- Xavier Barbeau, Antony T. Vincent, Patrick Lagüe. ConfBuster: Open-Source Tools for Macrocycle Conformational Search and Analysis. Journal of Open Research Software 2018, 6 https://doi.org/10.5334/jors.189
- Saulius Gražulis, Andrius Merkys, Antanas Vaitkus. Crystallography Open Database (COD). 2018, 1-19. https://doi.org/10.1007/978-3-319-42913-7_66-1
- Ian J. Bruno, Colin R. Groom. A crystallographic perspective on sharing data and knowledge. Journal of Computer-Aided Molecular Design 2014, 28 (10) , 1015-1022. https://doi.org/10.1007/s10822-014-9780-9