Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

You’ve supercharged your research process with ACS and Mendeley!

Click to create an ACS ID

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Your Mendeley pairing has expired. Please reconnect
ACS Publications. Most Trusted. Most Cited. Most Read
My Activity

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties

Cite this: Acc. Chem. Res. 2021, 54, 4, 827–836
Publication Date (Web):February 3, 2021
Copyright © 2021 American Chemical Society

    Article Views





    Other access options


    Abstract Image

    Machine-readable chemical structure representations are foundational in all attempts to harness machine learning for the prediction of reactivities, selectivities, and chemical properties directly from molecular structure. The featurization of discrete chemical structures into a continuous vector space is a critical phase undertaken before model selection, and the development of new ways to quantitatively encode molecules is an active area of research. In this Account, we highlight the application and suitability of different representations, from expert-guided “engineered” descriptors to automatically “learned” features, in different prediction tasks relevant to organic and organometallic chemistry, where differing amounts of training data are available. These tasks include statistical models of stereo- and enantioselectivity, thermochemistry, and kinetics developed using experimental and quantum chemical data.

    The use of expert-guided molecular descriptors provides an opportunity to incorporate chemical knowledge, domain expertise, and physical constraints into statistical modeling. In applications to stereoselective organic and organometallic catalysis, where data sets may be relatively small and 3D-geometries and conformations play an important role, mechanistically informed features can be used successfully to obtain predictive statistical models that are also chemically interpretable. We provide an overview of several recent applications of this approach to obtain quantitative models for reactivity and selectivity, where topological descriptors, quantum mechanical calculations of electronic and steric properties, along with conformational ensembles, all feature as essential ingredients of the molecular representations used.

    Alternatively, more flexible, general-purpose molecular representations such as attributed molecular graphs can be used with machine learning approaches to learn the complex relationship between a structure and prediction target. This approach has the potential to out-perform more traditional representation methods such as “hand-crafted” molecular descriptors, particularly as data set sizes grow. One area where this is particularly relevant is in the use of large sets of quantum mechanical data to train quantitative structure–property relationships. A general approach toward curating useful data sets and training highly accurate graph neural network models is discussed in the context of organic bond dissociation enthalpies, where this strategy outperforms regression using precomputed descriptors.

    Finally, we describe how graph neural network predictions can be incorporated into mechanistically informed statistical models of chemical reactivity and selectivity. Once trained, this approach avoids the expensive computational overhead associated with quantum mechanical calculations, while maintaining chemical interpretability. We illustrate examples for which fast predictions of bond dissociation enthalpy and of the identities of radicals formed through cleavage of a molecule’s weakest bond are used in simple physical models of site-selectivity and reactivity.

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.


    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. You can change your affiliated institution below.

    Cited By

    This article is cited by 57 publications.

    1. Carlos L. Bassani, Greg van Anders, Uri Banin, Dmitry Baranov, Qian Chen, Marjolein Dijkstra, Michael S. Dimitriyev, Efi Efrati, Jordi Faraudo, Oleg Gang, Nicola Gaston, Ramin Golestanian, G. Ivan Guerrero-Garcia, Michael Gruenwald, Amir Haji-Akbari, Maria Ibáñez, Matthias Karg, Tobias Kraus, Byeongdu Lee, Reid C. Van Lehn, Robert J. Macfarlane, Bortolo M. Mognetti, Arash Nikoubashman, Saeed Osat, Oleg V. Prezhdo, Grant M. Rotskoff, Leonor Saiz, An-Chang Shi, Sara Skrabalak, Ivan I. Smalyukh, Mario Tagliazucchi, Dmitri V. Talapin, Alexei V. Tkachenko, Sergei Tretiak, David Vaknin, Asaph Widmer-Cooper, Gerard C. L. Wong, Xingchen Ye, Shan Zhou, Eran Rabani, Michael Engel, Alex Travesset. Nanocrystal Assemblies: Current Advances and Open Problems. ACS Nano 2024, 18 (23) , 14791-14840.
    2. Guilian Luchini, Robert S. Paton. Bottom-Up Atomistic Descriptions of Top-Down Macroscopic Measurements: Computational Benchmarks for Hammett Electronic Parameters. ACS Physical Chemistry Au 2024, 4 (3) , 259-267.
    3. Margareth S. Baidun, Adarsh V. Kalikadien, Laurent Lefort, Evgeny A. Pidko. Impact of Model Selection and Conformational Effects on the Descriptors for In Silico Screening Campaigns: A Case Study of Rh-Catalyzed Acrylate Hydrogenation. The Journal of Physical Chemistry C 2024, 128 (19) , 7987-7998.
    4. Priyanka Raghavan, Brittany C. Haas, Madeline E. Ruos, Jules Schleinitz, Abigail G. Doyle, Sarah E. Reisman, Matthew S. Sigman, Connor W. Coley. Dataset Design for Building Models of Chemical Reactivity. ACS Central Science 2023, 9 (12) , 2196-2204.
    5. Isaiah O. Betinol, Junshan Lai, Saumya Thakur, Jolene P. Reid. A Data-Driven Workflow for Assigning and Predicting Generality in Asymmetric Catalysis. Journal of the American Chemical Society 2023, 145 (23) , 12870-12883.
    6. Zhengtao Zhou, Mario Eden, Weifeng Shen. Treat Molecular Linear Notations as Sentences: Accurate Quantitative Structure–Property Relationship Modeling via a Natural Language Processing Approach. Industrial & Engineering Chemistry Research 2023, 62 (12) , 5336-5346.
    7. Sukriti Singh, Raghavan B. Sunoj. Molecular Machine Learning for Chemical Catalysis: Prospects and Challenges. Accounts of Chemical Research 2023, 56 (3) , 402-412.
    8. Jordan J. Dotson, Lucy van Dijk, Jacob C. Timmerman, Samantha Grosslight, Richard C. Walroth, Francis Gosselin, Kurt Püntener, Kyle A. Mack, Matthew S. Sigman. Data-Driven Multi-Objective Optimization Tactics for Catalytic Asymmetric Reactions Using Bisphosphine Ligands. Journal of the American Chemical Society 2023, 145 (1) , 110-121.
    9. Yannick T. Boni, Ryan C. Cammarota, Kuangbiao Liao, Matthew S. Sigman, Huw M. L. Davies. Leveraging Regio- and Stereoselective C(sp3)–H Functionalization of Silyl Ethers to Train a Logistic Regression Classification Model for Predicting Site-Selectivity Bias. Journal of the American Chemical Society 2022, 144 (34) , 15549-15561.
    10. Danilo M. Lustosa, Anat Milo. Mechanistic Inference from Statistical Models at Different Data-Size Regimes. ACS Catalysis 2022, 12 (13) , 7886-7906.
    11. Jialing Lan, Xin Li, Yuhong Yang, Xiaoyong Zhang, Lung Wa Chung. New Insights and Predictions into Complex Homogeneous Reactions Enabled by Computational Chemistry in Synergy with Experiments: Isotopes and Mechanisms. Accounts of Chemical Research 2022, 55 (8) , 1109-1123.
    12. Han Lu, Xiaohui Kang, Yi Luo. Structure-Based Relative Energy Prediction Model: A Case Study of Pd(II)-Catalyzed Ethylene Polymerization and the Electronic Effect of Ancillary Ligands. The Journal of Physical Chemistry B 2021, 125 (43) , 12047-12053.
    13. Ali Shoja, Jianyu Zhai, Jolene P. Reid. Comprehensive Stereochemical Models for Selectivity Prediction in Diverse Chiral Phosphate-Catalyzed Reaction Space. ACS Catalysis 2021, 11 (19) , 11897-11905.
    14. Sukriti Singh, José Miguel Hernández-Lobato. Deep Kernel learning for reaction outcome prediction and optimization. Communications Chemistry 2024, 7 (1)
    15. Mario Villares, Carla M. Saunders, Natalie Fey. Comparison of dimensionality reduction techniques for the visualisation of chemical space in organometallic catalysis. Artificial Intelligence Chemistry 2024, 2 (1) , 100055.
    16. Haoliang Zhong, Ying Wu, Xu Li, Tongfei Shi. Machine learning and DFT coupling: A powerful approach to explore organic amine catalysts for ring-opening polymerization reaction. Chemical Engineering Science 2024, 292 , 119955.
    17. Lucía Morán-González, Feliu Maseras. Hidden descriptors: Using statistical treatments to generate better descriptor sets. Artificial Intelligence Chemistry 2024, 2 (1) , 100061.
    18. Nil Sanosa, David Dalmau, Diego Sampedro, Juan V. Alegre-Requena, Ignacio Funes-Ardoiz. Recent advances of machine learning applications in the development of experimental homogeneous catalysis. Artificial Intelligence Chemistry 2024, 2 (1) , 100068.
    19. Tian Qin, Haoyi Yang, Quan Li, Xiqian Yu, Hong Li. Design of functional binders for high-specific-energy lithium-ion batteries: from molecular structure to electrode properties. Industrial Chemistry & Materials 2024, 2 (2) , 191-225.
    20. Javier E. Alfonso-Ramos, Rebecca M. Neeser, Thijs Stuyver. Repurposing quantum chemical descriptor datasets for on-the-fly generation of informative reaction representations: application to hydrogen atom transfer reactions. Digital Discovery 2024, 3 (5) , 919-931.
    21. Puck van Gerwen, Ksenia R. Briling, Yannick Calvino Alonso, Malte Franke, Clemence Corminboeuf. Benchmarking machine-readable vectors of chemical reactions on computed activation barriers. Digital Discovery 2024, 3 (5) , 932-943.
    22. Lukas M. Sigmund, Shree Sowndarya S., Andreas Albers, Philipp Erdmann, Robert S. Paton, Lutz Greb. Predicting Lewis Acidity: Machine Learning the Fluoride Ion Affinity of p ‐Block‐Atom‐Based Molecules. Angewandte Chemie 2024, 136 (17)
    23. Lukas M. Sigmund, Shree Sowndarya S., Andreas Albers, Philipp Erdmann, Robert S. Paton, Lutz Greb. Predicting Lewis Acidity: Machine Learning the Fluoride Ion Affinity of p ‐Block‐Atom‐Based Molecules. Angewandte Chemie International Edition 2024, 63 (17)
    24. Yun-Wen Mao, Roman V Krems. Efficient interpolation of molecular properties across chemical compound space with low-dimensional descriptors. Machine Learning: Science and Technology 2024,
    25. Simone Gallarati, Puck van Gerwen, Ruben Laplaza, Lucien Brey, Alexander Makaveev, Clemence Corminboeuf. A genetic optimization strategy with generality in asymmetric organocatalysis as a primary target. Chemical Science 2024, 15 (10) , 3640-3660.
    26. Adarsh V. Kalikadien, Adrian Mirza, Aydin Najl Hossaini, Avadakkam Sreenithya, Evgeny A. Pidko. Paving the road towards automated homogeneous catalyst design. ChemPlusChem 2024, 10
    27. Shree Sowndarya S. V., Yeonjoon Kim, Seonah Kim, Peter C. St. John, Robert S. Paton. Expansion of bond dissociation prediction with machine learning to medicinally and environmentally relevant chemical space. Digital Discovery 2023, 2 (6) , 1900-1910.
    28. Zi-Jing Zhang, Shu-Wen Li, João C. A. Oliveira, Yanjun Li, Xinran Chen, Shuo-Qing Zhang, Li-Cheng Xu, Torben Rogge, Xin Hong, Lutz Ackermann. Data-driven design of new chiral carboxylic acid for construction of indoles with C-central and C–N axial chirality via cobalt catalysis. Nature Communications 2023, 14 (1)
    29. Shu-Wen Li, Li-Cheng Xu, Cheng Zhang, Shuo-Qing Zhang, Xin Hong. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nature Communications 2023, 14 (1)
    30. Babak Mahjour, Rui Zhang, Yuning Shen, Andrew McGrath, Ruheng Zhao, Osama G. Mohamed, Yingfu Lin, Zirong Zhang, James L. Douthwaite, Ashootosh Tripathi, Tim Cernak. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nature Communications 2023, 14 (1)
    31. Roger Monreal-Corona, Anna Pla-Quintana, Albert Poater. Predictive catalysis: a valuable step towards machine learning. Trends in Chemistry 2023, 5 (12) , 935-946.
    32. Truong Ba Tai, Jonghun Lim, Hyeyoung Shin. Chemisorption and Surface Reaction of Hafnium Precursors on the Hydroxylated Si(100) Surface. Coatings 2023, 13 (12) , 2094.
    33. Han Lu, Xiaohui Kang, Hang Yu, Wenzhen Zhang, Yi Luo. Using a single complex to predict the reaction energy profile: a case study of Pd/Ni-catalyzed ethylene polymerization. Dalton Transactions 2023, 52 (41) , 14790-14796.
    34. Cheng-Han Li, Daniel P. Tabor. Generative organic electronic molecular design informed by quantum chemistry. Chemical Science 2023, 14 (40) , 11045-11055.
    35. Juan V. Alegre‐Requena, Shree Sowndarya S. V., Raúl Pérez‐Soto, Turki M. Alturaifi, Robert S. Paton. AQME: Automated quantum mechanical environments for researchers and educators. WIREs Computational Molecular Science 2023, 13 (5)
    36. Hannes Kneiding, Ruslan Lukin, Lucas Lang, Simen Reine, Thomas Bondo Pedersen, Riccardo De Bin, David Balcells. Deep learning metal complex properties with natural quantum graphs. Digital Discovery 2023, 2 (3) , 618-633.
    37. Li-Cheng Xu, Johanna Frey, Xiaoyan Hou, Shuo-Qing Zhang, Yan-Yu Li, João C. A. Oliveira, Shu-Wen Li, Lutz Ackermann, Xin Hong. Enantioselectivity prediction of pallada-electrocatalysed C–H activation using transition state knowledge in machine learning. Nature Synthesis 2023, 2 (4) , 321-330.
    38. Jacob L. North, Victor L. Hsu. PREFMoDeL: A Systematic Review and Proposed Taxonomy of Biomolecular Features for Deep Learning. Applied Sciences 2023, 13 (7) , 4356.
    39. Yutao Kuang, Junshan Lai, Jolene P. Reid. Transferrable selectivity profiles enable prediction in synergistic catalyst space. Chemical Science 2023, 14 (7) , 1885-1895.
    40. Shuo‐Qing Zhang, Li‐Cheng Xu, Shu‐Wen Li, João C. A. Oliveira, Xin Li, Lutz Ackermann, Xin Hong. Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis. Chemistry – A European Journal 2023, 29 (6)
    41. Zhengkai Tu, Thijs Stuyver, Connor W. Coley. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chemical Science 2023, 14 (2) , 226-244.
    42. Jingru Lu, Irina Paci, David C. Leitch. A broadly applicable quantitative relative reactivity model for nucleophilic aromatic substitution (S N Ar) using simple descriptors. Chemical Science 2022, 13 (43) , 12681-12695.
    43. Mario Krenn, Qianxiang Ai, Senja Barthel, Nessa Carson, Angelo Frei, Nathan C. Frey, Pascal Friederich, Théophile Gaudin, Alberto Alexander Gayle, Kevin Maik Jablonka, Rafael F. Lameiro, Dominik Lemm, Alston Lo, Seyed Mohamad Moosavi, José Manuel Nápoles-Duarte, AkshatKumar Nigam, Robert Pollice, Kohulan Rajan, Ulrich Schatzschneider, Philippe Schwaller, Marta Skreta, Berend Smit, Felix Strieth-Kalthoff, Chong Sun, Gary Tom, Guido Falk von Rudorff, Andrew Wang, Andrew D. White, Adamo Young, Rose Yu, Alán Aspuru-Guzik. SELFIES and the future of molecular string representations. Patterns 2022, 3 (10) , 100588.
    44. João C.A. Oliveira, Johanna Frey, Shuo-Qing Zhang, Li-Cheng Xu, Xin Li, Shu-Wen Li, Xin Hong, Lutz Ackermann. When machine learning meets molecular synthesis. Trends in Chemistry 2022, 4 (10) , 863-885.
    45. Junshan Lai, Jolene P. Reid. Interrogating the thionium hydrogen bond as a noncovalent stereocontrolling interaction in chiral phosphate catalysis. Chemical Science 2022, 13 (37) , 11065-11073.
    46. Daniel S. Wigh, Jonathan M. Goodman, Alexei A. Lapkin. A review of molecular representation in the age of machine learning. WIREs Computational Molecular Science 2022, 12 (5)
    47. Nikita Fedik, Roman Zubatyuk, Maksim Kulichenko, Nicholas Lubbers, Justin S. Smith, Benjamin Nebgen, Richard Messerly, Ying Wai Li, Alexander I. Boldyrev, Kipton Barros, Olexandr Isayev, Sergei Tretiak. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nature Reviews Chemistry 2022, 6 (9) , 653-672.
    48. Shree Sowndarya S. V., Jeffrey N. Law, Charles E. Tripp, Dmitry Duplyakin, Erotokritos Skordilis, David Biagioni, Robert S. Paton, Peter C. St. John. Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries. Nature Machine Intelligence 2022, 4 (8) , 720-730.
    49. Sukriti Singh, Raghavan B. Sunoj. A transfer learning protocol for chemical catalysis using a recurrent neural network adapted from natural language processing. Digital Discovery 2022, 1 (3) , 303-312.
    50. Morgan M Cencer, Jeffrey S Moore, Rajeev S Assary. Machine learning for polymeric materials: an introduction. Polymer International 2022, 71 (5) , 537-542.
    51. Jingru Lu, Sofia Donnecke, Irina Paci, David C. Leitch. A reactivity model for oxidative addition to palladium enables quantitative predictions for catalytic cross-coupling reactions. Chemical Science 2022, 13 (12) , 3477-3488.
    52. Thijs Stuyver, Connor W. Coley. Quantum chemistry-augmented neural networks for reactivity prediction: Performance, generalizability, and explainability. The Journal of Chemical Physics 2022, 156 (8)
    53. Yunhe Li, Jie Zhang, Xiang Zhao, Youliang Wang. Exploring the chemistry of E/Z configuration in gold-catalyzed domino cyclization: Insights on the stereoselectivity. Molecular Catalysis 2022, 519 , 112154.
    54. Shree Sowndarya S. V., Peter C. St. John, Robert S. Paton. A quantitative metric for organic radical stability and persistence using thermodynamic and kinetic features. Chemical Science 2021, 12 (39) , 13158-13166.
    55. Croix J. Laconsay, Dean J. Tantillo. Melding of Experiment and Theory Illuminates Mechanisms of Metal-Catalyzed Rearrangements: Computational Approaches and Caveats. Synthesis 2021, 53 (20) , 3639-3652.
    56. Yanfei Guan, S. V. Shree Sowndarya, Liliana C. Gallegos, Peter C. St. John, Robert S. Paton. Real-time prediction of 1 H and 13 C chemical shifts with DFT accuracy using a 3D graph neural network. Chemical Science 2021, 12 (36) , 12012-12026.
    57. Agustí Lledós. Computational Organometallic Catalysis: Where We Are, Where We Are Going. European Journal of Inorganic Chemistry 2021, 2021 (26) , 2547-2555.