Barrier Height Prediction by Machine Learning Correction of Semiempirical CalculationsClick to copy article linkArticle link copied!
- Xabier García-Andrade
- Pablo García TahocesPablo García TahocesDepartment of Electronics and Computer Science, University of Santiago de Compostela, Santiago de Compostela 15782, SpainMore by Pablo García Tahoces
- Jesús Pérez-RíosJesús Pérez-RíosDepartment of Physics, Stony Brook University, Stony Brook, New York 11794, United StatesInstitute for Advanced Computational Science, Stony Brook University, Stony Brook, New York 11794-3800, United StatesMore by Jesús Pérez-Ríos
- Emilio Martínez Núñez*Emilio Martínez Núñez*Email: [email protected]Department of Physical Chemistry, University of Santiago de Compostela, Santiago de Compostela 15782, SpainMore by Emilio Martínez Núñez
Abstract
Different machine learning (ML) models are proposed in the present work to predict density functional theory-quality barrier heights (BHs) from semiempirical quantum mechanical (SQM) calculations. The ML models include a multitask deep neural network, gradient-boosted trees by means of the XGBoost interface, and Gaussian process regression. The obtained mean absolute errors are similar to those of previous models considering the same number of data points. The ML corrections proposed in this paper could be useful for rapid screening of the large reaction networks that appear in combustion chemistry or in astrochemistry. Finally, our results show that 70% of the features with the highest impact on model output are bespoke predictors. This custom-made set of predictors could be employed by future Δ-ML models to improve the quantitative prediction of other reaction properties.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
1. Introduction
2. Methods
2.1. Performance of PM7-TS on the GPOC Data Set
2.2. Data Set Curation
2.3. Machine Learning Models
3. Results and Discussion
3.1. Performance of the Machine Learning Models
3.2. Interpretability
3.3. Entropic Effects
4. Conclusions
a) | Cheap SQM calculations can be leveraged to obtain DFT-quality BHs by means of ML. | ||||
b) | The MAEs of our ML models (multitask DNN, gradient-boosted trees by means of the XGB interface, and GP regression) are of the same magnitude as those obtained in previous work. | ||||
c) | The analysis of the models shows that the custom-made descriptors obtained from the MOPAC calculations are, in general, considered more important than those obtained from standard cheminformatics libraries. | ||||
d) | Our MOPAC-based descriptors could be widely adopted in future quantitative predictions of reaction properties. | ||||
e) | Our ML models could be used for screening large reaction networks, or they could be implemented in automated reaction mechanism programs based on SQM calculations. |
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpca.2c08340.
Exploratory data analysis; details of the hyperparameter optimization; descriptor explanation; links to the data and code employed in this work; and free energies of activation obtained with the GP model (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
This work was partially supported by Consellería de Cultura, Educación e Ordenación Universitaria (Grupo de referencia competitiva ED431C 2021/40) and by Ministerio de Ciencia e Innovación through Grant #PID2019-107307RB-I00. J.P.-R. acknowledges the support of the Simons Foundation.
References
This article references 79 other publications.
- 1Truhlar, D. G.; Garrett, B. C.; Klippenstein, S. J. Current Status of Transition-State Theory. J. Phys. Chem. 1996, 100, 12771– 12800, DOI: 10.1021/jp953748qGoogle Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK28Xkt1ansr8%253D&md5=5663e2f23815cdc1c0bbb6bbb91adabeCurrent Status of Transition-State TheoryTruhlar, Donald G.; Garrett, Bruce C.; Klippenstein, Stephen J.Journal of Physical Chemistry (1996), 100 (31), 12771-12800CODEN: JPCHAX; ISSN:0022-3654. (American Chemical Society)A review with 843 refs.; we present an overview of the current status of transition-state theory and its generalizations. We emphasize (i) recent improvements in available methodol. for calcns. on complex systems, including the interface with electronic structure theory, (ii) progress in the theory and application of transition-state theory to condensed-phase reactions, and (iii) insight into the relation of transition-state theory to accurate quantum dynamics and tests of its accuracy via comparisons with both exptl. and other theor. dynamical approxns.
- 2Bao, J. L.; Truhlar, D. G. Variational transition state theory: theoretical framework and recent developments. Chem. Soc. Rev. 2017, 46, 7548– 7596, DOI: 10.1039/C7CS00602KGoogle Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVGms7zO&md5=a34ca9337c59c7eb0c7e4eb7be2be025Variational transition state theory: theoretical framework and recent developmentsBao, Junwei Lucas; Truhlar, Donald G.Chemical Society Reviews (2017), 46 (24), 7548-7596CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)This article reviews the fundamentals of variational transition state theory (VTST), its recent theor. development, and some modern applications. The theor. methods reviewed here include multidimensional quantum mech. tunneling, multistructural VTST (MS-VTST), multi-path VTST (MP-VTST), both reaction-path VTST (RP-VTST) and variable reaction coordinate VTST (VRC-VTST), system-specific quantum Rice-Ramsperger-Kassel theory (SS-QRRK) for predicting pressure-dependent rate consts., and VTST in the solid phase, liq. phase, and enzymes. We also provide some perspectives regarding the general applicability of VTST.
- 3Zhang, J.; Valeev, E. F. Prediction of Reaction Barriers and Thermochemical Properties with Explicitly Correlated Coupled-Cluster Methods: A Basis Set Assessment. J. Chem. Theor. Comput. 2012, 8, 3175– 3186, DOI: 10.1021/ct3005547Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xps1Ghs74%253D&md5=5906e9036c7e8f340cfbb1d53258968bPrediction of Reaction Barriers and Thermochemical Properties with Explicitly Correlated Coupled-Cluster Methods: A Basis Set AssessmentZhang, Jinmei; Valeev, Edward F.Journal of Chemical Theory and Computation (2012), 8 (9), 3175-3186CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)We assessed the performance of our perturbative explicitly correlated coupled-cluster method, CCSD(T)F12, for accurate prediction of chem. reactivity. The ref. data included reaction barrier heights, electronic reaction energies, atomization energies, and enthalpies of formation from the following sources: (1) the DBH24/08 database of 22 reaction barriers (Truhlar et al.), (2) the HJO12 set of isogyric reaction energies (Helgaker et al.), and (3) a HEAT set of atomization energies and heats of formation (Stanton et al.). We performed two types of analyses targeting the two distinct uses of explicitly correlated CCSD(T) models: as a replacement for basis-set-extrapolated CCSD(T) in highly accurate composite methods like HEAT and as a distinct model chem. for standalone applications. Hence, we analyzed in detail (1) the basis set error of each component of the CCSD(T)F12 contribution to the chem. energy difference in question and (2) the total error of the CCSD(T)F12 model chem. relative to the benchmark values. Two basis set families were utilized in the calcns.: the std. aug-cc-p(C)VXZ-F12 (X = D, T, Q) basis sets for the conventional correlation methods and the cc-p(C)VXZ-F12 (X = D, T, Q) basis sets of Peterson and co-workers that are specifically designed for explicitly correlated methods. Our conclusion is that the performance of the two families for CCSD correlation contributions (which are the only components affected by the explicitly correlated terms in our formation) are nearly identical with triple- and quadruple-ζ quality basis sets, with some differences at the double-ζ level. Chem. accuracy (∼4.18 kJ/mol) for reaction barrier heights, electronic reaction energies, atomization energies, and enthalpies of formation is attained on av. with the aug-cc-pVDZ, aug-cc-pVTZ, cc-pCVTZ-F12/aug-cc-pCVTZ, and cc-pCVDZ-F12 basis sets, resp., at the CCSD(T)F12 level of theory. The corresponding mean unsigned errors are 1.72 kJ/mol, 1.5 kJ/mol, ∼2 kJ/mol, and 2.17 kJ/mol, and the corresponding max. unsigned errors are 4.44 kJ/mol, 3.6 kJ/mol, ∼5 kJ/mol, and 5.75 kJ/mol.
- 4Mardirossian, N.; Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 2017, 115, 2315– 2372, DOI: 10.1080/00268976.2017.1333644Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVCltb3O&md5=ba27d707ee3f5fcdd949644d3d2cbd5eThirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionalsMardirossian, Narbe; Head-Gordon, MartinMolecular Physics (2017), 115 (19), 2315-2372CODEN: MOPHAM; ISSN:0026-8976. (Taylor & Francis Ltd.)In the past 30 years, Kohn-Sham d. functional theory has emerged as the most popular electronic structure method in computational chem. To assess the ever-increasing no. of approx. exchange-correlation functionals, this review benchmarks a total of 200 d. functionals on a mol. database (MGCDB84) of nearly 5000 data points. The database employed, provided as Supplemental Data, is comprised of 84 data-sets and contains non-covalent interactions, isomerisation energies, thermochem., and barrier heights. In addn., the evolution of non-empirical and semi-empirical d. functional design is reviewed, and guidelines are provided for the proper and effective use of d. functionals. The most promising functional considered is ωB97M-V, a range-sepd. hybrid meta-GGA with VV10 nonlocal correlation, designed using a combinatorial approach. From the local GGAs, B97-D3, revPBE-D3, and BLYP-D3 are recommended, while from the local meta-GGAs, B97M-rV is the leading choice, followed by MS1-D3 and M06-L-D3. The best hybrid GGAs are ωB97X-V, ωB97X-D3, and ωB97X-D, while useful hybrid meta-GGAs (besides ωB97M-V) include ωM05-D, M06-2X-D3, and MN15. Ultimately, today's state-of-the-art functionals are close to achieving the level of accuracy desired for a broad range of chem. applications, and the principal remaining limitations are assocd. with systems that exhibit significant self-interaction/delocalisation errors and/or strong correlation effects.
- 5Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine Learning. Chem. – Eur. J. 2018, 24, 12354– 12358, DOI: 10.1002/chem.201800345Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXotlCrs7Y%253D&md5=52481ca6747d597cce88f0178278b8d8Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine LearningChoi, Sunghwan; Kim, Yeonjoon; Kim, Jin Woo; Kim, Zeehyo; Kim, Woo YounChemistry - A European Journal (2018), 24 (47), 12354-12358CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)Machine learning based on big data has emerged as a powerful soln. in various chem. problems. The authors studied the feasibility of machine learning models for the prediction of activation energies of gas-phase reactions. Six different models with three different types, including the artificial neural network, the support vector regression, and the tree boosting methods, were tested. The authors used the structural and thermodn. properties of mols. and their differences as input features without resorting to specific reaction types so as to maintain the most general input form for broad applicability. The tree boosting method showed the best performance among others in terms of the coeff. of detn., mean abs. error, and root mean square error, the values of which were 0.89, 1.95, and 4.49 kcal mol-1, resp. Computation time for the prediction of activation energies for 2541 test reactions was about one 2nd on a single computing node without using accelerators.
- 6Grambow, C. A.; Pattanaik, L.; Green, W. H. Deep Learning of Activation Energies. J. Phys. Chem. Lett. 2020, 11, 2992– 2997, DOI: 10.1021/acs.jpclett.0c00500Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXls12gsLY%253D&md5=78f5a7c3984860caaeee00f1763f2dd8Deep Learning of Activation EnergiesGrambow, Colin A.; Pattanaik, Lagnajit; Green, William H.Journal of Physical Chemistry Letters (2020), 11 (8), 2992-2997CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Quant. predictions of reaction properties, such as activation energy, have been limited due to a lack of available training data. Such predictions would be useful for computer-assisted reaction mechanism generation and org. synthesis planning. We develop a template-free deep learning model to predict the activation energy given reactant and product graphs and train the model on a new, diverse data set of gas-phase quantum chem. reactions. We demonstrate that our model achieves accurate predictions and agrees with an intuitive understanding of chem. reactivity. With the continued generation of quant. chem. reaction data and the development of methods that leverage such data, we expect many more methods for reactivity prediction to become available in the near future.
- 7Grambow, C. A.; Pattanaik, L.; Green, W. H. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 2020, 7, 137, DOI: 10.1038/s41597-020-0460-4Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXpt1Glsbs%253D&md5=9cf761c2bbe35c8329bf7d4f9d8887bcReactants, products, and transition states of elementary chemical reactions based on quantum chemistryGrambow, Colin A.; Pattanaik, Lagnajit; Green, William H.Scientific Data (2020), 7 (1), 137CODEN: SDCABS; ISSN:2052-4463. (Nature Research)Reaction times, activation energies, branching ratios, yields, and many other quant. attributes are important for precise org. syntheses and generating detailed reaction mechanisms. Often, it would be useful to be able to classify proposed reactions as fast or slow. However, quant. chem. reaction data, esp. for atom-mapped reactions, are difficult to find in existing databases. Therefore, we used automated potential energy surface exploration to generate 12,000 org. reactions involving H, C, N, and O atoms calcd. at the ωB97X-D3/def2-TZVP quantum chem. level. We report the results of geometry optimizations and frequency calcns. for reactants, products, and transition states of all reactions. Addnl., we extd. atom-mapped reaction SMILES, activation energies, and enthalpies of reaction. We believe that this data will accelerate progress in automated methods for org. synthesis and reaction mechanism generation-for example, by enabling the development of novel machine learning models for quant. reaction prediction.
- 8Spiekermann, K. A.; Pattanaik, L.; Green, W. H. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J. Phys. Chem. A 2022, 126, 3976– 3986, DOI: 10.1021/acs.jpca.2c02614Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XhsFeqsbvJ&md5=019207c4eefd18ce30260ffe4285136fFast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster AccuracySpiekermann, Kevin A.; Pattanaik, Lagnajit; Green, William H.Journal of Physical Chemistry A (2022), 126 (25), 3976-3986CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)Quant. ests. of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of exptl. data and the steep scaling of accurate quantum calcns. often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calcd. at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately est. performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good d. functional theory calcn. Furthermore, our results show that future modeling efforts to est. reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small mol. chem.
- 9Vargas, S.; Hennefarth, M. R.; Liu, Z.; Alexandrova, A. N. Machine Learning to Predict Diels–Alder Reaction Barriers from the Reactant State Electron Density. J. Chem. Theor. Comput. 2021, 17, 6203– 6213, DOI: 10.1021/acs.jctc.1c00623Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhvFeqsrvN&md5=99e09ef5503d2992be46e06bf38bb233Machine Learning to Predict Diels-Alder Reaction Barriers from the Reactant State Electron DensityVargas, Santiago; Hennefarth, Matthew R.; Liu, Zhihao; Alexandrova, Anastassia N.Journal of Chemical Theory and Computation (2021), 17 (10), 6203-6213CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)Reaction barriers are key to our understanding of chem. reactivity and catalysis. Certain reactions are so seminal in chem. that countless variants, with or without catalysts, have been studied, and their barriers have been computed or measured exptl. This wealth of data represents a perfect opportunity to leverage machine learning models, which could quickly predict barriers without explicit calcns. or measurement. Here, we show that the topol. descriptors of the quantum mech. charge d. in the reactant state constitute a set that is both rigorous and continuous and can be used effectively for the prediction of reaction barrier energies to a high degree of accuracy. We demonstrate this on the Diels-Alder reaction, highly important in biol. and medicinal chem., and as such, studied extensively. This reaction exhibits a range of barriers as large as 270 kJ/mol. While we trained our single-objective supervised (labeled) regression algorithms on simpler Diels-Alder reactions in soln., they predict reaction barriers also in significantly more complicated contexts, such a Diels-Alder reaction catalyzed by an artificial enzyme and its evolved variants, in agreement with exptl. changes in kcat. We expect this tool to apply broadly to a variety of reactions in soln. or in the presence of a catalyst, for screening and circumventing heavily involved computations or expts.
- 10Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 2021, 12, 1163– 1175, DOI: 10.1039/D0SC04896HGoogle Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXit1GnsrfF&md5=50e3f6b5c94a6c882208e4fd745a19ffMachine learning meets mechanistic modelling for accurate prediction of experimental activation energiesJorner, Kjell; Brinck, Tore; Norrby, Per-Ola; Buttar, DavidChemical Science (2021), 12 (3), 1163-1175CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Accurate prediction of chem. reactions in soln. is challenging for current state-of-the-art approaches based on transition state modeling with d. functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modeling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality exptl. kinetic data for the nucleophilic arom. substitution reaction and use it to predict barriers with a mean abs. error of 0.77 kcal mol-1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate consts. are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from d. functional theory, we envision that hybrid models will soon become a std. alternative to complement current machine learning approaches based on ground-state phys. org. descriptors or structural information such as mol. graphs or fingerprints.
- 11Ravasco, J. M. J. M.; Coelho, J. A. S. Predictive Multivariate Models for Bioorthogonal Inverse-Electron Demand Diels–Alder Reactions. J. Am. Chem. Soc. 2020, 142, 4235– 4241, DOI: 10.1021/jacs.9b11948Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXjtFGitLo%253D&md5=f2808c6ad2a71806558210f606204fd5Predictive Multivariate Models for Bioorthogonal Inverse-Electron Demand Diels-Alder ReactionsRavasco, Joao M. J. M.; Coelho, Jaime A. S.Journal of the American Chemical Society (2020), 142 (9), 4235-4241CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)Inverse-electron demand Diels-Alder cycloaddns. have emerged as important bioorthogonal reactions in chem. biol. Understanding and predicting reaction rates for bioconjugation reactions is fundamental for evaluating their efficacy in biol. systems. Here, we present multivariate models to predict the second order rate consts. of bioorthogonal inverse-electron demand Diels-Alder reactions involving 1,2,4,5-tetrazines derivs. A data-driven approach was used to model these reactions by parametrizing both the dienophiles and the dienes partners. The models are statistically robust and were used to predict/extrapolate the outcome of several reactions as well as to identify mechanistic differences among similar reactants.
- 12Glavatskikh, M.; Madzhidov, T.; Horvath, D.; Nugmanov, R.; Gimadiev, T.; Malakhova, D.; Marcou, G.; Varnek, A. Predictive Models for Kinetic Parameters of Cycloaddition Reactions. Mol. Inf. 2019, 38, e1800077 DOI: 10.1002/minf.201800077Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFGls7fN&md5=5b44369615c780dc3a196a3b8a2ca379Predictive Models for Kinetic Parameters of Cycloaddition ReactionsGlavatskikh, Marta; Madzhidov, Timur; Horvath, Dragos; Nugmanov, Ramil; Gimadiev, Timur; Malakhova, Daria; Marcou, Gilles; Varnek, AlexandreMolecular Informatics (2019), 38 (1-2), 1800077CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)This paper reports SVR (Support Vector Regression) and GTM (Generative Topog. Mapping) modeling of three kinetic properties of cycloaddn. reactions: rate const. (logk), activation energy (Ea) and pre-exponential factor (logA). A data set of 1849 reactions, comprising (4+2), (3+2) and (2+2) cycloaddns. (CA) were studied in different solvents and at different temps. The reactions were encoded by the ISIDA fragment descriptors generated for Condensed Graph of Reaction (CGR). For a given reaction, a CGR condenses structures of all the reactants and products into one single mol. graph, described both by conventional chem. bonds and "dynamical" bonds characterizing chem. transformations. Different scenarios of logk assessment were exploited: direct modeling, application of the Arrhenius equation and temp.-scaled GTM landscapes. The logk models with optimal cross-validated statistics (Q2=0.78-0.94 RMSE=0.45-0.86) have been challenged to predict rates for the external test set of 200 reactions, comprising both reactions that were not present in the training set, and training set transformations performed under different reaction conditions. The models are freely available on our web-server: http://cimm.kpfu.ru/models.
- 13Gimadiev, T.; Madzhidov, T.; Tetko, I.; Nugmanov, R.; Casciuc, I.; Klimchuk, O.; Bodrov, A.; Polishchuk, P.; Antipin, I.; Varnek, A. Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis. Mol. Inf. 2019, 38, 1800104 DOI: 10.1002/minf.201800104Google ScholarThere is no corresponding record for this reference.
- 14Madzhidov, T. I.; Gimadiev, T. R.; Malakhova, D. A.; Nugmanov, R. I.; Baskin, I. I.; Antipin, I. S.; Varnek, A. A. Structure–reactivity relationship in Diels–Alder reactions obtained using the condensed reaction graph approach. J. Struct. Chem. 2017, 58, 650– 656, DOI: 10.1134/S0022476617040023Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsFSmt7%252FN&md5=ce2180b7476a67bf73ee774d5e5749d1Structure-reactivity relationship in Diels-Alder reactions obtained using the condensed reaction graph approachMadzhidov, T. I.; Gimadiev, T. R.; Malakhova, D. A.; Nugmanov, R. I.; Baskin, I. I.; Antipin, I. S.; Varnek, A. A.Journal of Structural Chemistry (2017), 58 (4), 650-656CODEN: JSTCAM; ISSN:0022-4766. (Springer)By the structural representation of a chem. reaction in the form of a condensed graph a model allowing the prediction of rate consts. (logk) of Diels-Alder reactions performed in different solvents and at different temps. is constructed for the first time. The model demonstrates good agreement between the predicted and exptl. logk values: the mean squared error is less than 0.75 log units. Erroneous predictions correspond to reactions in which reagents contain rarely occurring structural fragments. The model is available for users at https://cimm.kpfu.ru/predictor/.
- 15Friederich, P.; dos Passos Gomes, G.; De Bin, R.; Aspuru-Guzik, A.; Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 2020, 11, 4584– 4601, DOI: 10.1039/D0SC00445FGoogle Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXmsFSmtbo%253D&md5=e5d26fb19a97ffb2a5e0660865859394Machine learning dihydrogen activation in the chemical space surrounding Vaska's complexFriederich, Pascal; Gomes, Gabriel dos Passos; De Bin, Riccardo; Aspuru-Guzik, Alan; Balcells, DavidChemical Science (2020), 11 (18), 4584-4601CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A review. Homogeneous catalysis using transition metal complexes is ubiquitously used for org. synthesis, as well as technol. relevant in applications such as water splitting and CO2 redn. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large no. of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans-[Ir(PPh3)2(CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addn. of hydrogen, yielding the dihydride cis-[Ir(H)2(PPh3)2(CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivs. can be formulated for the activation of H2, with a limited no. of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chem. spaces contg. thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calcns. are used to train and test ML models that predict the H2-activation barrier. In contrast to expts. and calcns. requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean abs. errors (MAE) between 1 and 2 kcal mol-1, depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol-1, by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chem. compn., atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H2-activation barrier were identified.
- 16Spiekermann, K.; Pattanaik, L.; Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 2022, 9, 417, DOI: 10.1038/s41597-022-01529-6Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xit1SlsLbI&md5=41543ede9b34d45867196d0c969e7912High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactionsSpiekermann, Kevin; Pattanaik, Lagnajit; Green, William H.Scientific Data (2022), 9 (1), 417CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Abstr.: Quant. chem. reaction data, including activation energies and reaction rates, are crucial for developing detailed kinetic mechanisms and accurately predicting reaction outcomes. However, such data are often difficult to find, and high-quality datasets are esp. rare. Here, we use CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP to obtain high-quality single point calcns. for nearly 22,000 unique stable species and transition states. We report the results from these quantum chem. calcns. and ext. the barrier heights and reaction enthalpies to create a kinetics dataset of nearly 12,000 gas-phase reactions. These reactions involve H, C, N, and O, contain up to seven heavy atoms, and have cleaned atom-mapped SMILES. Our higher-accuracy coupled-cluster barrier heights differ significantly (RMSE of ∼5 kcal mol-1) relative to those calcd. at ωB97X-D3/def2-TZVP. We also report accurate transition state theory rate coeffs. k∞(T) between 300 K and 2000 K and the corresponding Arrhenius parameters for a subset of rigid reactions. We believe this data will accelerate development of automated and reliable methods for quant. reaction prediction.
- 17Ismail, I.; Robertson, C.; Habershon, S. Successes and challenges in using machine-learned activation energies in kinetic simulations. J. Chem. Phys. 2022, 157, 014109 DOI: 10.1063/5.0096027Google Scholar17https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XhslOgsLbJ&md5=27446240ae1328b657c184a1b1f6574cSuccesses and challenges in using machine-learned activation energies in kinetic simulationsIsmail, I.; Robertson, C.; Habershon, S.Journal of Chemical Physics (2022), 157 (1), 014109CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)The prediction of the thermodn. and kinetic properties of chem. reactions is increasingly being addressed by machine-learning (ML) methods, such as artificial neural networks (ANNs). While a no. of recent studies have reported success in predicting chem. reaction activation energies, less attention has been focused on how the accuracy of ML predictions filters through to predictions of macroscopic observables. Here, we consider the impact of the uncertainty assocd. with ML prediction of activation energies on observable properties of chem. reaction networks, as given by microkinetics simulations based on ML-predicted reaction rates. After training an ANN to predict activation energies, given std. mol. descriptors for reactants and products alone, we performed microkinetics simulations of three different prototypical reaction networks: formamide decompn., aldol reactions, and decompn. of 3-hydroperoxypropanal. We find that the kinetic modeling predictions can be in excellent agreement with corresponding simulations performed with ab initio calcns., but this is dependent on the inherent energetic landscape of the networks. We use these simulations to suggest some guidelines for when ML-based activation energies can be reliable and when one should take more care in applications to kinetics modeling. (c) 2022 American Institute of Physics.
- 18Stewart, J. J. P. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters. J. Mol. Model. 2013, 19, 1– 32, DOI: 10.1007/s00894-012-1667-xGoogle Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXjtVegtA%253D%253D&md5=7177311730da8242d5e05f7f4e045e57Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parametersStewart, James J. P.Journal of Molecular Modeling (2013), 19 (1), 1-32CODEN: JMMOFK; ISSN:0948-5023. (Springer)Modern semiempirical methods are of sufficient accuracy when used in the modeling of mols. of the same type as used as ref. data in the parameterization. Outside that subset, however, there is an abundance of evidence that these methods are of very limited utility. In an attempt to expand the range of applicability, a new method called PM7 has been developed. PM7 was parameterized using exptl. and high-level ab initio ref. data, augmented by a new type of ref. data intended to better define the structure of parameter space. The resulting method was tested by modeling crystal structures and heats of formation of solids. Two changes were made to the set of approxns.: a modification was made to improve the description of noncovalent interactions, and two minor errors in the NDDO formalism were rectified. Av. unsigned errors (AUEs) in geometry and ΔH f for PM7 were reduced relative to PM6; for simple gas-phase org. systems, the AUE in bond lengths decreased by about 5 % and the AUE in ΔH f decreased by about 10 %; for org. solids, the AUE in ΔH f dropped by 60 % and the redn. was 33.3 % for geometries. A two-step process (PM7-TS) for calcg. the heights of activation barriers has been developed. Using PM7-TS, the AUE in the barrier heights for simple org. reactions was decreased from values of 12.6 kcal/mol-1 in PM6 and 10.8 kcal/mol-1 in PM7 to 3.8 kcal/mol-1. The origins of the errors in NDDO methods have been examd., and were found to be attributable to inadequate and inaccurate ref. data. This conclusion provides insight into how these methods can be improved.
- 19Martinez-Nunez, E.; Vazquez, S. A. Three-center vs. four-center HF elimination from vinyl fluoride: a direct dynamics study. Chem. Phys. Lett. 2000, 332, 583– 590, DOI: 10.1016/S0009-2614(00)01198-2Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXosl2mtQ%253D%253D&md5=ff515b8c5fcf38ad2c58b4bf4f9f9122Three-center vs. four-center HF elimination from vinyl fluoride: a direct dynamics studyMartinez-Nunez, Emilio; Vazquez, Saulo A.Chemical Physics Letters (2000), 332 (5,6), 583-590CODEN: CHPLBC; ISSN:0009-2614. (Elsevier Science B.V.)Two fragmentation reactions of vinyl fluoride (three-center and four-center HF eliminations) were investigated by AM1 direct classical trajectories. Product energy distributions (PEDs) were computed for different initial excitation schemes and the results compared with the exptl. observations. The results support that the four-center elimination is the preferred decompn. process but HF elimination through the three-center mechanism is predicted to be significant.
- 20Gonzalez-Lafont, A.; Truong, T. N.; Truhlar, D. G. Direct dynamics calculations with NDDO (neglect of diatomic differential overlap) molecular orbital theory with specific reaction parameters. J. Phys. Chem. 1991, 95, 4618– 4627, DOI: 10.1021/j100165a009Google Scholar20https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXktVOrsLs%253D&md5=4780b532782f207e2c1b592aba97a0c9Direct dynamics calculations with NDDO (neglect of diatomic differential overlap) molecular orbital theory with specific reaction parametersGonzalez-Lafont, Angels; Truong, Thanh N.; Truhlar, Donald G.Journal of Physical Chemistry (1991), 95 (12), 4618-27CODEN: JPCHAX; ISSN:0022-3654.The α-deuterium secondary kinetic isotope effect and the heavy-water solvent kinetic isotope effect were calcd. for the reaction Cl-(H2O)n + CH3Cl' → CH3Cl + Cl'-(H2O)n with n = 0, 1, and 2. Instead of using an anal. potential energy function, the energy and gradient were calcd. whenever needed by NDDO MO theory with parameters adjusted specifically for these individual reactions. The interface of the MO calcns. with the dynamics calcns. was accomplished by the use of a new direct dynamics computer program MORATE. The results are compared in detail to previous calcns. based on 18-, 27-, and 36-dimensional semiglobal anal. potential energy functions, and the correspondences between the kinetic isotope effects and their interpretation in terms of specific modes are very encouraging. NDDO MO theory with specific reaction parameters should be a very useful technique for modeling potential energy surfaces for polyat. reactions.
- 21Martinez-Nunez, E.; Estevez, C. M.; Flores, J. R.; Vazquez, S. A. Product energy distributions for the four-center HF elimination from 1,1-difluoroethylene. A direct dynamics study. Chem. Phys. Lett. 2001, 348, 81– 88, DOI: 10.1016/S0009-2614(01)01092-2Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXovFejs7o%253D&md5=b4ede2936daee3abde4657ef8c9af8bbProduct energy distributions for the four-center HF elimination from 1,1-difluoroethylene. A direct dynamics studyMartinez-Nunez, Emilio; Estevez, Carlos M.; Flores, Jesus R.; Vazquez, Saulo A.Chemical Physics Letters (2001), 348 (1,2), 81-88CODEN: CHPLBC; ISSN:0009-2614. (Elsevier Science B.V.)Product energy distributions (PEDs) were computed on the four-center HF elimination from 1,1-difluoroethylene by using direct trajectory calcns. The vibrational and rotational populations of HF obtained with a quasi-classical normal mode/rigid rotor excitation model compare very well with the exptl. results. Also, the translational energy distributions obtained with an efficient microcanonical sampling (EMS) at the barrier are in excellent accord with expt. and do not substantially change as the excitation energy increases.
- 22Gonzalez-Vazquez, J.; Fernandez-Ramos, A.; Martinez-Nunez, E.; Vazquez, S. A. Dissociation of difluoroethylenes. I Global potential energy surface, RRKM, and VTST calculations. J. Phys. Chem. A 2003, 107, 1389– 1397, DOI: 10.1021/jp021901sGoogle Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXpvVajug%253D%253D&md5=b4fff49a8051082e30548873b9bec58fDissociation of Difluoroethylenes. I. Global Potential Energy Surface, RRKM, and VTST CalculationsGonzalez-Vazquez, Jesus; Fernandez-Ramos, Antonio; Martinez-Nunez, Emilio; Vazquez, Saulo A.Journal of Physical Chemistry A (2003), 107 (9), 1389-1397CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)A global ground-state potential energy surface for the dissocn. reactions of difluoroethylenes (DFEs) was computed by B3LYP and QCISD calcns., using the std. 6-311G(2d,2p) basis set. RRKM calcns. were performed to compute relative abundances of HF and mol. H produced from 1,1-DFE and from 1,2-DFE (cis and trans) at energies ranging from 110 to 180 kcal mol-1 relative to the zero point energy of 1,1-DFE. Thermal rate consts. were also evaluated by the variational transition state theory for temps. in the range 1250-1500 K. Both theor. methods agree that, at the energies and temps. studied, the main channel for HF elimination from 1,1-DFE is through a four-center transition state, whereas for 1,2-DFE the process occurs through a direct three-center elimination. At the energies studied, the RRKM method predicts that the main channel for mol. H elimination from the DFEs goes through a three-center transition state that connects 1,1-DFE with products.
- 23Gonzalez-Vazquez, J.; Martinez-Nunez, E.; Fernandez-Ramos, A.; Vazquez, S. A. Dissociation of difluoroethylenes. II Direct Classical Trajectory Study of the HF elimination from 1,2-difluoroethylene. J. Phys. Chem. A 2003, 107, 1398– 1404, DOI: 10.1021/jp021902kGoogle Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXpvVajuw%253D%253D&md5=a1c87d70c09201b40f97eb2c08d825b0Dissociation of Difluoroethylenes. II. Direct Classical Trajectory Study of the HF Elimination from 1,2-DifluoroethyleneGonzalez-Vazquez, Jesus; Martinez-Nunez, Emilio; Fernandez-Ramos, Antonio; Vazquez, Saulo A.Journal of Physical Chemistry A (2003), 107 (9), 1398-1404CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)Direct dynamics calcns. on the HF elimination channels from cis- and trans-1,2-difluoroethylene (1,2-DFE) were carried out considering five different elimination mechanisms involving four-center and three-center eliminations and also H atom migrations from the cis and trans isomers. The results were compared with exptl. HF vibrational state distributions and translational energy distributions at 112 and 148 kcal mol-1, resp. The calcns. corroborate the exptl. conclusion that direct three-center eliminations from 1,2-DFE are the major reaction pathways and take place through stepwise mechanisms in which fluorovinylidene is formed before its isomerization to fluoroacetylene. The good agreement between theory and expt. supports that the dissocn. takes place through the ground electronic state.
- 24Kromann, J. C.; Christensen, A. S.; Cui, Q.; Jensen, J. H. Towards a barrier height benchmark set for biologically relevant systems. PeerJ 2016, 4, e1994 DOI: 10.7717/peerj.1994Google ScholarThere is no corresponding record for this reference.
- 25Iron, M. A.; Janes, T. Evaluating Transition Metal Barrier Heights with the Latest Density Functional Theory Exchange–Correlation Functionals: The MOBH35 Benchmark Database. J. Phys. Chem. A 2019, 123, 3761– 3781, DOI: 10.1021/acs.jpca.9b01546Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXovVKmtrY%253D&md5=3b3f361b9cad9c4c199d41f672dab992Evaluating Transition Metal Barrier Heights with the Latest Density Functional Theory Exchange-Correlation Functionals: The MOBH35 Benchmark DatabaseIron, Mark A.; Janes, TrevorJournal of Physical Chemistry A (2019), 123 (17), 3761-3781CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)A new database of transition metal reaction barrier heights (MOBH35) is presented. Benchmark energies (forward and reverse barriers and reaction energy) are calcd. using DLPNO-CCSD(T) extrapolated to the complete basis set limit using a Weizmann-1-like scheme. Using these benchmark energies, the performance of a wide selection of d. functional theory (DFT) exchange-correlation functionals, including the latest from the Martin, Truhlar, and Head-Gordon groups, is evaluated. It was found, using the def2-TZVPP basis set, that the ωB97M-V (MAD 1.7 kcal/mol), ωB97M-D3BJ (MAD 1.9 kcal/mol), ωB97X-V (MAD 2.0 kcal/mol), and revTPSS0-D4 (MAD 2.2 kcal/mol) hybrid functionals are recommended. The double-hybrid functionals B2K-PLYP (MAD 1.7 kcal/mol) and revDOD-PBEP86-D4 (MAD 1.8 kcal/mol) also performed well, but this has to be balanced by their increased computational cost.
- 26Pérez-Tabero, S.; Fernández, B.; Cabaleiro-Lago, E. M.; Martínez-Núñez, E.; Vázquez, S. A. New Approach for Correcting Noncovalent Interactions in Semiempirical Quantum Mechanical Methods: The Importance of Multiple-Orientation Sampling. J. Chem. Theor. Comput. 2021, 17, 5556– 5567, DOI: 10.1021/acs.jctc.1c00365Google Scholar26https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhvVCjtr%252FO&md5=9ebdd8ff5fc65487020cdf90caa87eafNew Approach for Correcting Noncovalent Interactions in Semiempirical Quantum Mechanical Methods: The Importance of Multiple-Orientation SamplingPerez-Tabero, Sergio; Fernandez, Berta; Cabaleiro-Lago, Enrique M.; Martinez-Nunez, Emilio; Vazquez, Saulo A.Journal of Chemical Theory and Computation (2021), 17 (9), 5556-5567CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)A new approach is presented to improve the performance of semiempirical quantum mech. (SQM) methods in the description of noncovalent interactions. To show the strategy, the PM6 Hamiltonian was selected, although, in general, the procedure can be applied to other semiempirical Hamiltonians and to different methodologies. A set of small mols. were selected as representative of various functional groups, and intermol. potential energy curves (IPECs) were evaluated for the most relevant orientations of interacting mol. pairs. Then, anal. corrections to PM6 were derived from fits to B3LYP-D3/def2-TZVP ref.-PM6 interaction energy differences. IPECs provided by the B3LYP-D3/def2-TZVP combination of the electronic structure method and basis set were chosen as the ref. because they are in excellent agreement with CCSD(T)/aug-cc-pVTZ curves for the studied systems. The resulting method, called PM6-FGC (from functional group corrections), significantly improves the performance of PM6 and shows the importance of including a sufficient no. of orientations of the interacting mols. in the ref. data set in order to obtain well-balanced descriptions.
- 27Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theor. Comput. 2015, 11, 2087– 2096, DOI: 10.1021/acs.jctc.5b00099Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXmtlams7Y%253D&md5=a59b33f51a9dd6dbad95290f2642c306Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning ApproachRamakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias; von Lilienfeld, O. AnatoleJournal of Chemical Theory and Computation (2015), 11 (5), 2087-2096CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)Chem. accurate and comprehensive studies of the virtual space of all possible mols. are severely limited by the computational cost of quantum chem. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approx. legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger mol. sets than used for training. For thermochem. properties of up to 16k isomers of C7H10O2 we present numerical evidence that chem. accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qual. relationship between mol. entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chem. and machine learning models trained on 1 and 10% of 134k org. mols., to reproduce enthalpies of all remaining mols. at d. functional theory level of accuracy.
- 28Plehiers, P. P.; Lengyel, I.; West, D. H.; Marin, G. B.; Stevens, C. V.; Van Geem, K. M. Fast estimation of standard enthalpy of formation with chemical accuracy by artificial neural network correction of low-level-of-theory ab initio calculations. Chem. Eng. J. 2021, 426, 131304 DOI: 10.1016/j.cej.2021.131304Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhs1els7vK&md5=ba1c0633e727cc79eb4bfc8ac766138cFast estimation of standard enthalpy of formation with chemical accuracy by artificial neural network correction of low-level-of-theory ab initio calculationsPlehiers, Pieter P.; Lengyel, Istvan; West, David H.; Marin, Guy B.; Stevens, Christian V.; Van Geem, Kevin M.Chemical Engineering Journal (Amsterdam, Netherlands) (2021), 426 (), 131304CODEN: CMEJAJ; ISSN:1385-8947. (Elsevier B.V.)A methodol. for predicting the std. enthalpy of formation of gas-phase mols. with high speed and accuracy has been developed. This includes the development of: (a) a large, diverse database of mol. structures (consisting of H, C, O, N, and S, and up to 23 heavy atoms), computed at the G3MP2B3 level of chem. accurate theory; (b) a 3D, mol. size-independent descriptor, derived from a radial distribution function contg. the convolution of weighted interat. distances up to 8 Å; (c) a neural network structure that is capable to decode 3D structural information and use it to correct enthalpy of formation of lower level theory to that of the high-accuracy method; and (d) a method to est. uncertainty of predictions. The predictions have about 2.5 kJ/mol (0.6 kcal/mol) av. deviation from G3MP2B3 level results, at the computational cost of the B3LYP/6-31G* method. The model is able to extrapolate to increased mol. sizes and to different type of hetero-atoms - although with reduced accuracy but still at significant improvements comparing to low-level theory results. Extrapolations with the neural-network based model does not generate spurious results, which may be attributed to the careful selection of a phys. and chem. relevant set of inputs. The methodol. may be useful for other computational methods, and for computation of other chem. properties in an automated fashion.
- 29Bogojeski, M.; Vogt-Maranto, L.; Tuckerman, M. E.; Müller, K.-R.; Burke, K. Quantum chemical accuracy from density functional approximations via machine learning. Nat. Commun. 2020, 11, 5223, DOI: 10.1038/s41467-020-19093-1Google Scholar29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXitFCks7nP&md5=fc71afb4e9c02a9d99ce586f7c161d50Quantum chemical accuracy from density functional approximations via machine learningBogojeski, Mihail; Vogt-Maranto, Leslie; Tuckerman, Mark E.; Mueller, Klaus-Robert; Burke, KieronNature Communications (2020), 11 (1), 5223CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)Kohn-Sham d. functional theory (DFT) is a std. tool in most branches of chem., but accuracies for many mols. are limited to 2-3 kcal · mol-1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small mols. In this paper, we leverage machine learning to calc. coupled-cluster energies from DFT densities, reaching quantum chem. accuracy (errors below 1 kcal · mol-1) on test data. Moreover, d.-based Δ-learning (learning only the correction to a std. DFT calcn., termed Δ-DFT ) significantly reduces the amt. of training data required, particularly when mol. symmetries are included. The robustness of Δ-DFT is highlighted by correcting "on the fly" DFT-based mol. dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates running gas-phase MD simulations with quantum chem. accuracy, even for strained geometries and conformer changes where std. DFT fails.
- 30Gao, T.; Li, H.; Li, W.; Li, L.; Fang, C.; Li, H.; Hu, L.; Lu, Y.; Su, Z.-M. A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases. J. Cheminform. 2016, 8, 24, DOI: 10.1186/s13321-016-0133-7Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC28bms1Ogug%253D%253D&md5=e6e4b1336a600f88f5e4ab091d0dc37eA machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databasesGao Ting; Li Hongzhi; Li Wenze; Li Lin; Fang Chao; Li Hui; Hu LiHong; Lu Yinghua; Su Zhong-MinJournal of cheminformatics (2016), 8 (), 24 ISSN:1758-2946.BACKGROUND: Non-covalent interactions (NCIs) play critical roles in supramolecular chemistries; however, they are difficult to measure. Currently, reliable computational methods are being pursued to meet this challenge, but the accuracy of calculations based on low levels of theory is not satisfactory and calculations based on high levels of theory are often too costly. Accordingly, to reduce the cost and increase the accuracy of low-level theoretical calculations to describe NCIs, an efficient approach is proposed to correct NCI calculations based on the benchmark databases S22, S66 and X40 (Hobza in Acc Chem Rev 45: 663-672, 2012; Rezac et al. in J Chem Theory Comput 8:4285, 2012). RESULTS: A novel type of NCI correction is presented for density functional theory (DFT) methods. In this approach, the general regression neural network machine learning method is used to perform the correction for DFT methods on the basis of DFT calculations. Various DFT methods, including M06-2X, B3LYP, B3LYP-D3, PBE, PBE-D3 and ωB97XD, with two small basis sets (i.e., 6-31G* and 6-31+G*) were investigated. Moreover, the conductor-like polarizable continuum model with two types of solvents (i.e., water and pentylamine, which mimics a protein environment with ε = 4.2) were considered in the DFT calculations. With the correction, the root mean square errors of all DFT calculations were improved by at least 70 %. Relative to CCSD(T)/CBS benchmark values (used as experimental NCI values because of its high accuracy), the mean absolute error of the best result was 0.33 kcal/mol, which is comparable to high-level ab initio methods or DFT methods with fairly large basis sets. Notably, this level of accuracy is achieved within a fraction of the time required by other methods. For all of the correction models based on various DFT approaches, the validation parameters according to OECD principles (i.e., the correlation coefficient R (2), the predictive squared correlation coefficient q (2) and [Formula: see text] from cross-validation) were >0.92, which suggests that the correction model has good stability, robustness and predictive power. CONCLUSIONS: The correction can be added following DFT calculations. With the obtained molecular descriptors, the NCIs produced by DFT methods can be improved to achieve high-level accuracy. Moreover, only one parameter is introduced into the correction model, which makes it easily applicable. Overall, this work demonstrates that the correction model may be an alternative to the traditional means of correcting for NCIs.Graphical abstractA machine learning correction model efficiently improved the accuracy of non-covalent interactions(NCIs) calculated by DFT methods. The application of the correction model is easy and flexible, so it may be an alternative correction means for NCIs by first-principle calculations.
- 31Wan, Z.; Wang, Q.-D.; Liang, J. Accurate prediction of standard enthalpy of formation based on semiempirical quantum chemistry methods with artificial neural network and molecular descriptors. Int. J. Quantum Chem. 2021, 121, e26441 DOI: 10.1002/qua.26441Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhs1yktrzJ&md5=470943c05935e32db1e6039cc6776930Accurate prediction of standard enthalpy of formation based on semiempirical quantum chemistry methods with artificial neural network and molecular descriptorsWan, Zhongyu; Wang, Quan-De; Liang, JinhuInternational Journal of Quantum Chemistry (2021), 121 (2), e26441CODEN: IJQCB2; ISSN:0020-7608. (John Wiley & Sons, Inc.)This work investigates possible improvements in the accuracy of semiempirical quantum chem. (SQC) methods for the prediction of std. enthalpy of formation (ΔfHo) through the use of an artificial neural network (ANN) with mol. descriptors. A total of 142 org. compds. with enough structural diversity has been considered in the training set. Std. enthalpy of formation for the selected compds. at the semiempirical PM3 and PM6 quantum chem. methods is collected from literature and is calcd. by using the semiempirical PM7 method in this work. The multiple stepwise regression is first used to screen effective mol. descriptors, which are highly correlated with the error terms of the std. enthalpy of formation compared with exptl. values. The obtained seven effective mol. descriptors are then used as input set to establish three 7-11-1 neural network-based correction models to improve the accuracy of SQC methods. By using the developed correction models, the mean abs. errors for ΔfHo of PM3, PM6, and PM7 methods are reduced from 22.36, 18.60, and 17.27 to 9.86, 9.83, and 8.95, resp., in kJ/mol. Meanwhile, the results of the test set show that the neural network does not have the problem of overfitting. Detailed anal. of the seven effective mol. descriptors indicates that the major source of the correction models is the electron-withdrawing effect. The developed ANN models for the three selected SQC methods provide an efficient method for the quick and accurate prediction of thermodn. properties.
- 32Zhu, J.; Vuong, V. Q.; Sumpter, B. G.; Irle, S. Artificial neural network correction for density-functional tight-binding molecular dynamics simulations. MRS Commun. 2019, 9, 867– 873, DOI: 10.1557/mrc.2019.80Google Scholar32https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhvVKisrvK&md5=17b123e38564eb369fdfc4ba3a8296e7Artificial neural network correction for density-functional tight-binding molecular dynamics simulationsZhu, Junmian; Vuong, Van Quan; Sumpter, Bobby G.; Irle, StephanMRS Communications (2019), 9 (3), 867-873CODEN: MCROF8; ISSN:2159-6867. (Cambridge University Press)The authors developed a Behler-Parrinello-type neural network (NN) to improve the d.-functional tight-binding (DFTB) energy and force prediction. The Δ-machine learning approach was adopted and the NN was designed to predict the energy differences between the d. functional theory (DFT) quantum chem. potential and DFTB for a given mol. structure. Most notably, the DFTB-NN method is capable of improving the energetics of intramol. hydrogen bonds and torsional potentials without modifying the framework of DFTB itself. This improvement enables considerably larger simulations of complex chem. systems that currently could not easily been accomplished using DFT or higher level ab initio quantum chem. methods alone.
- 33Chen, T.; Guestrin, C., XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: San Francisco, California, USA, 2016; 785– 794.Google ScholarThere is no corresponding record for this reference.
- 34Cui, J.; Krems, R. V. Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processes. J. Phys. B At. Mol. Opt. Phys. 2016, 49, 224001 DOI: 10.1088/0953-4075/49/22/224001Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVygt7o%253D&md5=ccbe9e8be7ea32a01356ec3c62f244cfEfficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processesCui, Jie; Krems, Roman V.Journal of Physics B: Atomic, Molecular and Optical Physics (2016), 49 (22), 224001/1-224001/9CODEN: JPAPEH; ISSN:0953-4075. (IOP Publishing Ltd.)We explore the efficiency of a statistical learning technique based on Gaussian process (GP) regression as an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyat. mols. Using an example of the mol. N4, we show that a realistic GP model of the six-dimensional PES can be constructed with only 240 potential energy points. We construct a series of the GP models and illustrate the accuracy of the resulting surfaces as a function of the no. of ab initio points. We show that the GP model based on ∼1500 potential energy points achieves the same level of accuracy as the conventional regression fits based on 16 421 points. The GP model of the PES requires no fitting of ab initio data with anal. functions and can be readily extended to surfaces of higher dimensions.
- 35Christianen, A.; Karman, T.; Vargas-Hernández, R. A.; Groenenboom, G. C.; Krems, R. V. Six-dimensional potential energy surface for NaK–NaK collisions: Gaussian process representation with correct asymptotic form. J. Chem. Phys. 2019, 150, 064106 DOI: 10.1063/1.5082740Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjtFGgurg%253D&md5=e795b63789a155c98b959873b687a469Six-dimensional potential energy surface for NaK-NaK collisions: Gaussian process representation with correct asymptotic formChristianen, Arthur; Karman, Tijs; Vargas-Hernandez, Rodrigo A.; Groenenboom, Gerrit C.; Krems, Roman V.Journal of Chemical Physics (2019), 150 (6), 064106/1-064106/11CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)Constructing accurate global potential energy surfaces (PESs) describing chem. reactive mol.-mol. collisions of alkali metal dimers presents a major challenge. To be suitable for quantum scattering calcns., such PESs must represent accurately three- and four-body interactions, describe conical intersections, and have a proper asymptotic form at the long range. Here, we demonstrate that such global potentials can be obtained by Gaussian Process (GP) regression merged with the analytic asymptotic expansions at the long range. We propose an efficient sampling technique, which allows us to construct an accurate global PES accounting for different chem. arrangements with <2500 ab initio calcns. We apply this method to (NaK)2 and obtain the first global PES for a system of four alkali metal atoms. The resulting surface exhibits a complex landscape including a pair and a quartet of sym. equiv. local min. and a seam of conical intersections. The dissocn. energy found from our ab initio calcns. is 4534 cm-1. This result is reproduced by the GP models with an error of less than 3%. The GP models of the PES allow us to analyze the features of the global PES, representative of general alkali metal four-atom interactions. Understanding these interactions is of key importance in the field of ultracold chem. (c) 2019 American Institute of Physics.
- 36Dai, J.; Krems, R. V. Interpolation and Extrapolation of Global Potential Energy Surfaces for Polyatomic Systems by Gaussian Processes with Composite Kernels. J. Chem. Theor. Comput. 2020, 16, 1386– 1395, DOI: 10.1021/acs.jctc.9b00700Google ScholarThere is no corresponding record for this reference.
- 37Sugisawa, H.; Ida, T.; Krems, R. V. Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimer. J. Chem. Phys. 2020, 153, 114101, DOI: 10.1063/5.0023492Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhvVOnt77N&md5=f2eed09a73cb02dde58b8c1505953dd7Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimerSugisawa, Hiroki; Ida, Tomonori; Krems, R. V.Journal of Chemical Physics (2020), 153 (11), 114101CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)The goal of the present work is to obtain accurate potential energy surfaces (PESs) for high-dimensional mol. systems with a small no. of ab initio calcns. in a system-agnostic way. We use probabilistic modeling based on Gaussian processes (GPs). We illustrate that it is possible to build an accurate GP model of a 51-dimensional PES based on 5000 randomly distributed ab initio calcns. with a global accuracy of <0.2 kcal/mol. Our approach uses GP models with composite kernels designed to enhance the Bayesian information content and represents the global PES as a sum of a full-dimensional GP and several GP models for mol. fragments of lower dimensionality. We demonstrate the potency of these algorithms by constructing the global PES for the protonated imidazole dimer, a mol. system with 19 atoms. We illustrate that GP models thus constructed can extrapolate the PES from low energies (<10 000 cm-1), yielding a PES at high energies (>20 000 cm-1). This opens the prospect for new applications of GPs, such as mapping out phase transitions by extrapolation or accelerating Bayesian optimization, for high-dimensional physics and chem. problems with a restricted no. of inputs, i.e., for high-dimensional problems where obtaining training data is very difficult. (c) 2020 American Institute of Physics.
- 38Liu, X.; Meijer, G.; Pérez-Ríos, J. On the relationship between spectroscopic constants of diatomic molecules: a machine learning approach. RSC Adv. 2021, 11, 14552– 14561, DOI: 10.1039/D1RA02061GGoogle Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXptVWqs7w%253D&md5=45158691e3b4f11c56144b035107437dOn the relationship between spectroscopic constants of diatomic molecules: a machine learning approachLiu, Xiangyue; Meijer, Gerard; Perez-Rios, JesusRSC Advances (2021), 11 (24), 14552-14561CODEN: RSCACL; ISSN:2046-2069. (Royal Society of Chemistry)Through a machine learning approach, we show that the equil. distance, harmonic vibrational frequency and binding energy of diat. mols. are related, independently of the nature of the bond of a mol.; they depend solely on the group and period of the constituent atoms. As a result, we show that by employing the group and period of the atoms that form a mol., the spectroscopic consts. are predicted with an accuracy of <5%, whereas for the A-excited electronic state it is needed to include other at. properties leading to an accuracy of <11%.
- 39Liu, X.; Meijer, G.; Pérez-Ríos, J. A data-driven approach to determine dipole moments of diatomic molecules. Phys. Chem. Chem. Phys. 2020, 22, 24191– 24200, DOI: 10.1039/D0CP03810EGoogle Scholar39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhs1Ggsb%252FN&md5=287ca9e42d6876e565fd74c9e67d5549A data-driven approach to determine dipole moments of diatomic moleculesLiu, Xiangyue; Meijer, Gerard; Perez-Rios, JesusPhysical Chemistry Chemical Physics (2020), 22 (42), 24191-24200CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)We present a data-driven approach for the prediction of the elec. dipole moment of diat. mols., which is one of the most relevant mol. properties. In particular, we apply Gaussian process regression to a novel dataset to show that dipole moments of diat. mols. can be learned, and hence predicted, with a relative error .ltorsim.5%. The dataset contains the dipole moment of 162 diat. mols., the most exhaustive and unbiased dataset of dipole moments up to date. Our findings show that the dipole moment of diat. mols. depends on at. properties of the constituents atoms: electron affinity and ionization potential, as well as on (a feature related to) the first deriv. of the electronic kinetic energy at the equil. distance.
- 40Cretu, M. T.; Pérez-Ríos, J. Predicting second virial coefficients of organic and inorganic compounds using Gaussian process regression. Phys. Chem. Chem. Phys. 2021, 23, 2891– 2898, DOI: 10.1039/D0CP05509CGoogle Scholar40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXoslSktg%253D%253D&md5=401138bbf8d2864d470f67bd84bb3e10Predicting second virial coefficients of organic and inorganic compounds using Gaussian process regressionCretu, Miruna T.; Perez-Rios, JesusPhysical Chemistry Chemical Physics (2021), 23 (4), 2891-2898CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)We show that by using intuitive and accessible mol. features it is possible to predict the temp.-dependent second virial coeff. of org. and inorg. compds. with Gaussian process regression. In particular, we built a low dimensional representation of features based on intrinsic mol. properties, topol. and phys. properties relevant for the characterization of mol.-mol. interactions. The featurization was used to predict second virial coeffs. in the interpolative regime with a relative error .ltorsim.1% and to extrapolate the prediction to temps. outside of the training range for each compd. in the dataset with a relative error of 2.1%. Addnl., the model's predictive abilities were extended to org. mols. unseen in the training process, yielding a prediction with a relative error of 2.7%. Test mols. must be well-represented in the training set by instances of their families, which are high in variety. The method shows a generally better performance when compared to several semiempirical procedures employed in the prediction of the quantity. Therefore, apart from being robust, the present Gaussian process regression model is extensible to a variety of org. and inorg. compds.
- 41Stewart, J. J. P. MOPAC2016, Stewart Computational Chemistry: Colorado Springs, CO, USA, 2016, HTTP://OpenMOPAC.net (accessed July 01, 2022).Google ScholarThere is no corresponding record for this reference.
- 42Carpenter, B. K.; Ellison, G. B.; Nimlos, M. R.; Scheer, A. M. A Conical Intersection Influences the Ground State Rearrangement of Fulvene to Benzene. J. Phys. Chem. A 2022, 126, 1429– 1447, DOI: 10.1021/acs.jpca.2c00038Google Scholar42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xkt1Chtrc%253D&md5=1474a85851256f5f7825a5a0e7957c30A Conical Intersection Influences the Ground State Rearrangement of Fulvene to BenzeneCarpenter, Barry K.; Ellison, G. Barney; Nimlos, Mark R.; Scheer, Adam M.Journal of Physical Chemistry A (2022), 126 (8), 1429-1447CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)The rearrangement of fulvene to benzene is believed to play an important role in the formation of soot during hydrocarbon combustion. Previous work has identified two possible mechanisms for the rearrangement-a unimol. path and a hydrogen-atom-assisted, bimol. path. Computational results to date have suggested that the unimol. mechanism faces a barrier of about 74 kcal/mol, which makes it unable to compete with the bimol. mechanism under typical combustion conditions. This computed barrier is about 10 kcal/mol higher than the exptl. value, which is an unusually large discrepancy for modern electronic structure theory. In the present work, we have reinvestigated the unimol. mechanism computationally, and we have found a second transition state that is approx. 10 kcal/mol lower in energy than the previously identified one and, therefore, in excellent agreement with the exptl. value. The existence of two transition states for the same rearrangement arises because there is a conical intersection between the two lowest singlet states which occurs in the vicinity of the reaction coordinates. The two possible paths around the cone on the lower adiabatic surface give rise to the two distinct saddle points. The lower barrier for the unimol. mechanism now makes it competitive with the bimol. one, according to our calcns. In support of this conclusion, we have reanalyzed some previous exptl. results on anisole pyrolysis, which leads to benzene as a significant product and have shown that the unimol. and bimol. mechanisms for fulvene → benzene must be occurring competitively in that system. Finally, we have identified that similar conical intersections arise during the isomerizations of benzofulvene and isobenzofulvene to naphthalene.
- 43Farrar, E. H. E.; Grayson, M. N. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem. Sci. 2022, 13, 7594– 7603, DOI: 10.1039/D2SC02925AGoogle Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XhsFCksLbJ&md5=7c2876fe04038a1bb3abf57fa0cd51bbMachine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier predictionFarrar, Elliot H. E.; Grayson, Matthew N.Chemical Science (2022), 13 (25), 7594-7603CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Modern QM modeling methods, such as DFT, have provided detailed mechanistic insights into countless reactions. However, their computational cost inhibits their ability to rapidly screen large nos. of substrates and catalysts in reaction discovery. For a C-C bond forming nitro-Michael addn., we introduce a synergistic semi-empirical quantum mech. (SQM) and machine learning (ML) approach that allows the prediction of DFT-quality reaction barriers in minutes, even on a std. laptop using widely available modeling software. Mean abs. errors (MAEs) are obtained that are below the accepted chem. accuracy threshold of 1 kcal mol-1 and substantially better than SQM methods without ML correction (5.71 kcal mol-1). Predictive power is shown to hold when the ML models are applied to an unseen set of compds. from the toxicol. literature. Mechanistic insight is also achieved via the generation of full SQM transition state (TS) structures which are found to be very good approxns. for the DFT-level geometries, revealing important steric interactions in some TSs. This combination of speed, accuracy, and mechanistic insight is unprecedented; current ML barrier models compromise on at least one of these important criteria.
- 44Martínez-Núñez, E. An automated method to find transition states using chemical dynamics simulations. J. Comput. Chem. 2015, 36, 222– 234, DOI: 10.1002/jcc.23790Google Scholar44https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFGgtLbO&md5=e22946d4913acb912ffd139e36d6c11cAn automated method to find transition states using chemical dynamics simulationsMartinez-Nunez, EmilioJournal of Computational Chemistry (2015), 36 (4), 222-234CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)A procedure to automatically find the transition states (TSs) of a mol. system (MS) is proposed. It has two components: high-energy chem. dynamics simulations (CDS), and an algorithm that analyzes the geometries along the trajectories to find reactive pathways. Two levels of electronic structure calcns. are involved: a low level (LL) is used to integrate the trajectories and also to optimize the TSs, and a higher level (HL) is used to reoptimize the structures. The method has been tested in three MSs: formaldehyde, formic acid (FA), and vinyl cyanide (VC), using MOPAC2012 and Gaussian09 to run the LL and HL calcns., resp. Both the efficacy and efficiency of the method are very good, with around 15 TS structures optimized every 10 trajectories, which gives a total of 7, 12, and 83 TSs for formaldehyde, FA, and VC, resp. The use of CDS makes it a powerful tool to unveil possible nonstatistical behavior of the system under study. © 2014 Wiley Periodicals, Inc.
- 45Martínez-Núñez, E. An automated transition state search using classical trajectories initialized at multiple minima. Phys. Chem. Chem. Phys. 2015, 17, 14912– 14921, DOI: 10.1039/C5CP02175HGoogle Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXnslOqsLo%253D&md5=695f6ea8566cc28afe6bb7c89a434bbeAn automated transition state search using classical trajectories initialized at multiple minimaMartinez-Nunez, EmilioPhysical Chemistry Chemical Physics (2015), 17 (22), 14912-14921CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)Very recently, we proposed an automated method for finding transition states of chem. reactions using dynamics simulations; the method has been termed Transition State Search using Chem. Dynamics Simulations (TSSCDS) (E. Martinez-Nunez, J. Comput. Chem., 2015, 36, 222-234). In the present work, an improved automated search procedure is developed, which consists of iteratively running different ensembles of trajectories initialized at different min. The iterative TSSCDS method is applied to the complex C3H4O system, obtaining a total of 66 different min. and 276 transition states. With the obtained transition states and paths, statistical RRKM calcns. and Kinetic Monte Carlo simulations are carried out to study the fragmentation dynamics of propenal, which is the global min. of the system. The kinetic simulations provide a (three-body dissocn.)/(CO elimination) ratio of 1.49 for an excitation energy of 148 kcal mol-1, which agrees well with the corresponding value obtained in the photolysis of propenal at 193 nm (1.1), suggesting that at least these two channels: three-body dissocn. (to give H2 + CO + C2H2) and CO elimination occur on the ground electronic state.
- 46Varela, J. A.; Vazquez, S. A.; Martinez-Nunez, E. An automated method to find reaction mechanisms and solve the kinetics in organometallic catalysis. Chem. Sci. 2017, 8, 3843– 3851, DOI: 10.1039/C7SC00549KGoogle Scholar46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXjsl2rtr8%253D&md5=df77d0eab3dac5fc42530661b1d57954An automated method to find reaction mechanisms and solve the kinetics in organometallic catalysisVarela, J. A.; Vazquez, S. A.; Martinez-Nunez, E.Chemical Science (2017), 8 (5), 3843-3851CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A novel computational method is proposed in this work for use in discovering reaction mechanisms and solving the kinetics of transition metal-catalyzed reactions. The method does not rely on either chem.intuition or assumed a priori mechanisms, and it works in a fully automated fashion. Its core is a procedure, recently developed by one of the authors, that combines accelerated direct dynamics with an efficient geometry-based post-processing algorithm to find transition states. In the present work, several auxiliary tools have been added to deal with the specific features of transition metal catalytic reactions. As a test case, we chose the cobalt-catalyzed hydroformylation of ethylene because of its well-established mechanism, and the fact that it has already been used in previous automated computational studies. Besides the generally accepted mechanism of Heck and Breslow, several side reactions, such as hydrogenation of the alkene, emerged from our calcns. Addnl., the calcd.rate law for the hydroformylation reaction agrees reasonably well with those obtained in previous exptl.and theor.studies.
- 47Martínez-Núñez, E.; Barnes, G. L.; Glowacki, D. R.; Kopec, S.; Peláez, D.; Rodríguez, A.; Rodríguez-Fernández, R.; Shannon, R. J.; Stewart, J. J. P.; Tahoces, P. G.; Vazquez, S. A. AutoMeKin2021: An open-source program for automated reaction discovery. J. Comput. Chem. 2021, 42, 2036– 2048, DOI: 10.1002/jcc.26734Google Scholar47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsl2ksbbO&md5=22e37b1a05073f7d3f680cd1947f87f5AutoMeKin2021 : An open-source program for automated reaction discoveryMartinez-Nunez, Emilio; Barnes, George L.; Glowacki, David R.; Kopec, Sabine; Pelaez, Daniel; Rodriguez, Aurelio; Rodriguez-Fernandez, Roberto; Shannon, Robin J.; Stewart, James J. P.; Tahoces, Pablo G.; Vazquez, Saulo A.Journal of Computational Chemistry (2021), 42 (28), 2036-2048CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)AutoMeKin2021 is an updated version of tsscds2018, a program for the automated discovery of reaction mechanisms (J. Comput. Chem. 2018, 39, 1922). This release features a no. of new capabilities: rare-event mol. dynamics simulations to enhance reaction discovery, extension of the original search algorithm to study van der Waals complexes, use of chem. knowledge, a new search algorithm based on bond-order time series anal., statistics of the chem. reaction networks, a web application to submit jobs, and other features. The source code, manual, installation instructions and the website link are available at https://rxnkin.usc.es/index.php/AutoMeKin.
- 48Taketsugu, T.; Gordon, M. S. Dynamic reaction path analysis based on an intrinsic reaction coordinate. J. Chem. Phys. 1995, 103, 10042– 10049, DOI: 10.1063/1.470704Google Scholar48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2MXhtVShtr%252FJ&md5=7cc12ffdcdc03fe1a196e8ba7723e449Dynamic reaction path analysis based on an intrinsic reaction coordinateTaketsugu, Tetsuya; Gordon, Mark S.Journal of Chemical Physics (1995), 103 (23), 10042-9CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We propose two methods that can be used to describe the dynamic reaction path (DRP) based on an intrinsic reaction coordinate (IRC) or min. energy path, to exam. how the actual dynamics proceeds relative to the IRC path. In the first of these, any point on the DRP is expressed in terms of the IRC and the distance from the IRC path. In the second method, any DRP point is expressed in terms of the IRC, the curvature coordinate, and the distance from a two-dimensional "reaction plane" detd. by the IRC path tangent and curvature vectors. The latter representation is based on the fact that the 3N-8 dimensional space orthogonal to the reaction plane is independent of an internal centrifugal force caused by the motion along the IRC path. To analyze the relation between geometric features of the IRC path and the dynamics, we introduce a function that ests. the variation of the reaction plane along the IRC path. As demonstrations, the methods are applied to the dissocn. reaction of thioformaldehyde (H2CS → H2 + CS).
- 49Vazquez, S. A.; Otero, X. L.; Martinez-Nunez, E. A Trajectory-Based Method to Explore Reaction Mechanisms. Molecules 2018, 23, 3156, DOI: 10.3390/molecules23123156Google Scholar49https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXms1yrsbk%253D&md5=2ae83ce1fc9891be638a87e62ca2b29fA trajectory-based method to explore reaction mechanismsVazquez, Saulo A.; Otero, Xose L.; Martinez-Nunez, EmilioMolecules (2018), 23 (12), 3156/1-3156/21CODEN: MOLEFW; ISSN:1420-3049. (MDPI AG)The tsscds method, recently developed in our group, discovers chem. reaction mechanisms with minimal human intervention. It employs accelerated mol. dynamics, spectral graph theory, statistical rate theory and stochastic simulations to uncover chem. reaction paths and to solve the kinetics at the exptl. conditions. In the present review, its application to solve mechanistic/kinetics problems in different research areas will be presented. Examples will be given of reactions involved in photodissocn. dynamics, mass spectrometry, combustion chem. and organometallic catalysis. Some planned improvements will also be described.
- 50Landrum, G. RDKit: Open-source cheminformatics (2016). https://www.rdkit.org (accessed July 01, 2022).Google ScholarThere is no corresponding record for this reference.
- 51Randic, M. Characterization of molecular branching. J. Am. Chem. Soc. 1975, 97, 6609– 6615, DOI: 10.1021/ja00856a001Google Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE28XjvFSq&md5=65f6f5f51ebd82c9103d93a90ceaf897Characterization of molecular branchingRandic, MilanJournal of the American Chemical Society (1975), 97 (23), 6609-15CODEN: JACSAT; ISSN:0002-7863.A theor. characterization of mol. branching is considered. Members of homologous series are ordered in a sequence and a numerical index is assigned to individual structures based on a differentiation of edge types of mol. graphs. Linear and branched alkanes having eight or less carbon atoms were considered in particular and correlations between the derived branching index and properties which critically depend on mol. size and shape are established. The proposed index is also in satisfactory agreement with the empirical of Kovats. The approach reveals some inherent relationships between isomers which can be traced to connectivity and mol. topology. It points, in some cases, to a considerable redn. in the no. of exptl. deduced consts. characterizing mol. properties provided a sacrifice in precision can be tolerated and is compensated for by the significance of the indicated inter-relations. This point is illustrated on an anal. of the empirical consts. of the Antoine equation.
- 52Estrada, E. Characterization of the folding degree of proteins. Bioinformatics 2002, 18, 697– 704, DOI: 10.1093/bioinformatics/18.5.697Google Scholar52https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XkvV2htr4%253D&md5=488d8a035099f225020d92cd3fa5e280Characterization of the folding degree of proteinsEstrada, ErnestoBioinformatics (2002), 18 (5), 697-704CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: The characterization of the folding degree of chains is central to the elucidation of structure-function relationships in proteins. Here we present a new index for characterizing the folding degree of a (protein) chain. This index shows a range of features that are desirable for the study of the relation between structure and function in proteins. Results: A novel index characterizing the folding degree of (protein) chains is developed based on the spectral moments of a matrix representing the dihedral angles (.vphi., ω and ψ) of the protein main chain. The proposed index is normalized to the chain size, is not correlated to the gyration radius of the backbone chain and is able to distinguish between structures for which the sum of the main-chain dihedral angles is identical. The index is well correlated to the percentages of helix and strand in proteins, shows a linear dependence with temp. changes, and is able to differentiate among protein families.
- 53Gutman, I.; Trinajstić, N. Graph theory and molecular orbitals. Total φ-electron energy of alternant hydrocarbons. Chem. Phys. Lett. 1972, 17, 535– 538, DOI: 10.1016/0009-2614(72)85099-1Google Scholar53https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE3sXpslSjsw%253D%253D&md5=656781f016e47c7e5072559fef7d7f7cGraph theory and molecular orbitals. Total π-electron energy of alternant hydrocarbonsGutman, I.; Trinajstic, N.Chemical Physics Letters (1972), 17 (4), 535-8CODEN: CHPLBC; ISSN:0009-2614.The dependence of the Hueckel total π-electron energy on the mol. topology is shown. General rules governing the structural dependence of the π-electron energy in conjugated molecules are derived.
- 54Parr, R. G.; Pearson, R. G. Absolute hardness: companion parameter to absolute electronegativity. J. Am. Chem. Soc. 1983, 105, 7512– 7516, DOI: 10.1021/ja00364a005Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXht1yrtw%253D%253D&md5=2053ed462c19c9b6da9e0f8472b5c356Absolute hardness: companion parameter to absolute electronegativityParr, Robert G.; Pearson, Ralph G.Journal of the American Chemical Society (1983), 105 (26), 7512-16CODEN: JACSAT; ISSN:0002-7863.For neutral and charged species, at. and mol., a property called abs. hardness η is defined. Let E(N) be a ground-state electronic energy as a function of the no. of electrons N. As is well-known, the deriv. of E(N) with respect to N, keeping nuclear charges Z fixed, is the chem. potential μ or the neg. of the abs. electronegativity χ. The corresponding second deriv. is hardness. Operational definitions of χ and η are provided by the finite difference formulas. The principle of hard and soft Acids and Bases is derived theor. by making use of the hypothesis that extra stability attends bonding of A to B when the ionization potentials of A and B in the mol. (after charge transfer) are the same. For bases B, hardness is identified as the hardness of the species B+. Tables of abs. hardness are given for a no. of free atoms, Lewis acids, and Lewis bases, and the value are found to agree well with chem. facts.
- 55Mulliken, R. S. A New Electroaffinity Scale; Together with Data on Valence States and on Valence Ionization Potentials and Electron Affinities. J. Chem. Phys. 1934, 2, 782– 793, DOI: 10.1063/1.1749394Google Scholar55https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaA2MXhvFI%253D&md5=68f1fbe16a4bd07afa2a6d8b74a72866New electroaffinity scale; together with data on valence states and on valence ionization potentials and electron affinitiesMulliken, Robert S.Journal of Chemical Physics (1934), 2 (), 782-93CODEN: JCPSA6; ISSN:0021-9606.A new "absolute" scale of electronegativity, or electroaffinity, is set up. The abs. electroaffinity is the average of ionization potential and electron affinity. Electroaffinity values are calcd. for H, Li, B, C, N, O, F, Cl, Br and I; they agree well with Pauling's electronegativity scale and with the dipole-moment scale.
- 56Coulson, C. A.; Longuet-Higgins, H. C.; Bell, R. P. The electronic structure of conjugated systems II. Unsaturated hydrocarbons and their hetero-derivatives. Proc. R. Soc. Lond. A 1947, 192, 16– 32, DOI: 10.1098/rspa.1947.0136Google Scholar56https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaH1MXivFSnsg%253D%253D&md5=93ef9a022cf4cc3bce837fa4f0a008f5Electronic structure of conjugated systems. II. Unsaturated hydrocarbons and their hetero derivativesCoulson, C. A.; Longuet-Higgins, H. C.Proceedings of the Royal Society of London, Series A: Mathematical, Physical and Engineering Sciences (1947), 192 (), 16-32CODEN: PRLAAZ; ISSN:1364-5021.Theoretical. The theory of C.A. 42, 1489i, is applied to hydrocarbons and their hetero derivs. An equation is given relating differences in activation energy to electron ds. and atom polarizabilities for a heterolytic reaction at different positions in a conjugated system. The equations are then applied to hydrocarbons contg. no odd-membered unsatd. rings. When one coulomb integral is altered slightly, the electron ds. are alternately increased or decreased throughout the mol. Thus, a theoretical basis is provided for the exptl. law of alternating polarity in conjugated systems contg. a hetero atom. Furthermore, the theory allows assessment of the relative extents to which substitution affects different positions in a mol. Applications to other mols. are indicated.
- 57Ye, Z.; Yang, Y.; Li, X.; Cao, D.; Ouyang, D. An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter Prediction. Mol. Pharmaceutics 2019, 16, 533– 541, DOI: 10.1021/acs.molpharmaceut.8b00816Google Scholar57https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXisFKitbzE&md5=b07f3cb3eaa74859ab442f3d52238694An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter PredictionYe, Zhuyifan; Yang, Yilong; Li, Xiaoshan; Cao, Dongsheng; Ouyang, DefangMolecular Pharmaceutics (2019), 16 (2), 533-541CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)Background: Pharmacokinetic evaluation is one of the key processes in drug discovery and development. However, current absorption, distribution, metab., and excretion prediction models still have limited accuracy. Aim: This study aims to construct an integrated transfer learning and multitask learning approach for developing quant. structure-activity relationship models to predict four human pharmacokinetic parameters. Methods: A pharmacokinetic data set included 1104 U.S. FDA approved small mol. drugs. The data set included four human pharmacokinetic parameter subsets (oral bioavailability, plasma protein binding rate, apparent vol. of distribution at steady-state, and elimination half-life). The pretrained model was trained on over 30 million bioactivity data entries. An integrated transfer learning and multitask learning approach was established to enhance the model generalization. Results: The pharmacokinetic data set was split into three parts (60:20:20) for training, validation, and testing by the improved max. dissimilarity algorithm with the representative initial set selection algorithm and the weighted distance function. The multitask learning techniques enhanced the model predictive ability. The integrated transfer learning and multitask learning model demonstrated the best accuracies, because deep neural networks have the general feature extn. ability; transfer learning and multitask learning improve the model generalization. Conclusions: The integrated transfer learning and multitask learning approach with the improved data set splitting algorithm was first introduced to predict the pharmacokinetic parameters. This method can be further employed in drug discovery and development.
- 58Popov, S.; Morozov, S.; Babenko, A., Neural oblivious decision ensembles for deep learning on tabular data. In International Conference on Learning Representations; Addis Ababa, Ethiopia, 2020.Google ScholarThere is no corresponding record for this reference.
- 59Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M., Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: Anchorage, AK, USA, 2019; 2623– 2631.Google ScholarThere is no corresponding record for this reference.
- 60Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Goodfellow, I.; Harp, A.; Irving, G.; Isard, M.; Jozefowicz, R.; Jia, Y.; Kaiser, L.; Kudlur, M.; Levenberg, J.; Mané, D.; Schuster, M.; Monga, R.; Moore, S.; Murray, D.; Olah, C.; Shlens, J.; Steiner, B.; Sutskever, I.; Talwar, K.; Tucker, P.; Vanhoucke, V.; Vasudevan, V.; Viégas, F.; Vinyals, O.; Warden, P.; Wattenberg, M.; Wicke, M.; Yu, Y.; Zheng, X. TensorFlow: Large-scale machine learning on heterogeneous systems , 2015, Software available from tensorflow.org.Google ScholarThere is no corresponding record for this reference.
- 61MATLAB, R2022a; The MathWorks Inc.: Natick, Massachussetts, 2022.Google ScholarThere is no corresponding record for this reference.
- 62Lundberg, S. M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J. M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56– 67, DOI: 10.1038/s42256-019-0138-9Google Scholar62https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB38nmslygtQ%253D%253D&md5=ab09c86773dd1908bce8f5e05959ef91From Local Explanations to Global Understanding with Explainable AI for TreesLundberg Scott M; Lundberg Scott M; Erion Gabriel; Chen Hugh; DeGrave Alex; Lee Su-In; Erion Gabriel; DeGrave Alex; Prutkin Jordan M; Nair Bala; Nair Bala; Katz Ronit; Himmelfarb Jonathan; Bansal NishaNature machine intelligence (2020), 2 (1), 56-67 ISSN:.Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are popular non-linear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here, we improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.
- 63Koopmans, T. Über die Zuordnung von Wellenfunktionen und Eigenwerten zu den Einzelnen Elektronen Eines Atoms. Physica 1934, 1, 104– 113, DOI: 10.1016/S0031-8914(34)90011-2Google ScholarThere is no corresponding record for this reference.
- 64Datta, D. ″Hardness profile″ of a reaction path. J. Phys. Chem. 1992, 96, 2409– 2410, DOI: 10.1021/j100185a005Google Scholar64https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK38XhvVagtr0%253D&md5=eef189f6e3832f68305ff068f8b56bd6"Hardness profile" of a reaction pathDatta, DipankarJournal of Physical Chemistry (1992), 96 (6), 2409-10CODEN: JPCHAX; ISSN:0022-3654.The variation of the hardness of a chem. species along a reaction path, which is called here the "hardness profile", is shown to go through a min. at the transition state. The hardness values are calcd. by the MNDO method.
- 65Ordon, P.; Tachibana, A. Nuclear reactivity indices within regional density functional theory. J. Mol. Model. 2005, 11, 312– 316, DOI: 10.1007/s00894-005-0248-7Google Scholar65https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhsF2qtL8%253D&md5=19c257e881c7f34a6df8b96ccb36da0eNuclear reactivity indices within regional density functional theoryOrdon, Piotr; Tachibana, AkitomoJournal of Molecular Modeling (2005), 11 (4-5), 312-316CODEN: JMMOFK; ISSN:0948-5023. (Springer GmbH)Regional chem. potential values-μ R have been obtained with the use of nuclear reactivity indexes. Perturbational formulas use values of reactivity indexes of isolated mol. fragments. The changes of the parameters (ΔNR,{ ΔQi }i εR) within each fragment det. the value of the regional chem. potential after a chem. reaction. The computational scheme has been tested numerically along the chem. reaction path. We have studied a set of chem. reactions to obtain regional chem. potentials (μtsR) and regional transfer potentials (τtsR) for transition states of the following chem. reactions: HF + CO = HFCO, HCl + CO = HClCO, HF + SiO = HFSiO and HF + GeO = HFGeO. The results are reasonable and encouraging. Values of these indexes show the possible reactivity directions of the transition states examd.
- 66Chandra, A. K.; Nguyen, M. T. Density Functional Approach to Regiochemistry, Activation Energy, and Hardness Profile in 1,3-Dipolar Cycloadditions. J. Phys. Chem. A 1998, 102, 6181– 6185, DOI: 10.1021/jp980949wGoogle Scholar66https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXkt1CqtLs%253D&md5=8c2aba17c3a2fed154d7e59cc37ec3b2Density Functional Approach to Regiochemistry, Activation Energy, and Hardness Profile in 1,3-Dipolar CycloadditionsChandra, Asit K.; Nguyen, Minh ThoJournal of Physical Chemistry A (1998), 102 (30), 6181-6185CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)The principle of hard and soft acids and bases was applied in a local sense to rationalize the regiochem. in the cycloaddn. reaction of a few typical 1,3-dipoles with P-contg. dipolarophiles. It was obsd. in most cases that the transition state with higher hardness is assocd. with lower activation energy. The hardness profile also was studied for these cycloaddn. reactions; while the hardness value goes through a min. along the reaction coordinate, its min. does not coincide with the energy max.
- 67Zhan, C.-G.; Nichols, J. A.; Dixon, D. A. Ionization Potential, Electron Affinity, Electronegativity, Hardness, and Electron Excitation Energy: Molecular Properties from Density Functional Theory Orbital Energies. J. Phys. Chem. A 2003, 107, 4184– 4195, DOI: 10.1021/jp0225774Google Scholar67https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXjt1Shtbk%253D&md5=71feb2642a03fc6ada0177fd5e1f153dIonization Potential, Electron Affinity, Electronegativity, Hardness, and Electron Excitation Energy: Molecular Properties from Density Functional Theory Orbital EnergiesZhan, Chang-Guo; Nichols, Jeffrey A.; Dixon, David A.Journal of Physical Chemistry A (2003), 107 (20), 4184-4195CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)Representative at. and mol. systems, including various inorg. and org. mols. with covalent and ionic bonds, have been studied by using d. functional theory. The calcns. were done with the commonly used exchange-correlation functional B3LYP followed by a comprehensive anal. of the calcd. highest-occupied and lowest-unoccupied Kohn-Sham orbital (HOMO and LUMO) energies. The basis set dependence of the DFT results shows that the economical 6-31+G* basis set is generally sufficient for calcg. the HOMO and LUMO energies (if the calcd. LUMO energies are neg.) for use in correlating with mol. properties. The directly calcd. ionization potential (IP), electron affinity (EA), electronegativity (χ), hardness (η), and first electron excitation energy (τ) are all in good agreement with the available exptl. data. A generally applicable linear correlation relationship exists between the calcd. HOMO energies and the exptl./calcd. IPs. We have also found satisfactory linear correlation relationships between the calcd. LUMO energies and exptl./calcd. EAs (for the bound anionic states), between the calcd. av. HOMO/LUMO energies and χ values, between the calcd. HOMO-LUMO energy gaps and η values, and between the calcd. HOMO-LUMO energy gaps and exptl./calcd. first excitation energies. By using these linear correlation relationships, the calcd. HOMO and LUMO energies can be employed to semiquant. est. ionization potential, electron affinity, electronegativity, hardness, and first excitation energy.
- 68Alfrey, T., Jr.; Price, C. C. Relative reactivities in vinyl copolymerization. J. Polym. Sci. 1947, 2, 101– 106, DOI: 10.1002/pol.1947.120020112Google Scholar68https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaH2sXisVOktw%253D%253D&md5=3d6ba8614b0c931105d7c4509ff92779Relative reactivities in vinyl copolymerizationAlfrey, Turner, Jr.; Price, Charles C.Journal of Polymer Science (1947), 2 (), 101-6CODEN: JPSCAU; ISSN:0022-3832.A study was made of the interpretation of data on the relative reactivities of unsatd. compds. with free radicals, specifically vinyl compds. and 1,1-disubstituted ethylenes. Tables give data on (1) relative rates of monomer addn. for the monomers styrene, acrylonitrile, Me methacrylate, and vinylidene chloride with the same 4 radicals, (2) analysis of relative rates of monomer addn., (3) geometric mean reactivities, (4) relative rates of monomer addn. corrected for general monomer reactivity, and (5) comparison of observed and calcd. rates of monomer addn. From data on relative rates of copolymerization it is possible to evaluate 2 consts., Q and e, characteristic of an individual monomer, which appears to account satisfactorily for its behavior in copolymerization. The const. Q describes the "general monomer reactivity," and is apparently related to possibilities for stabilization in a radical adduct. The const. e takes account of polar factors influencing copolymerization. It is possible to calc. the relative copolymerization ratios if Q and e are known for both monomers.
- 69Geerlings, P.; De Proft, F.; Langenaeker, W. Conceptual Density Functional Theory. Chem. Rev. 2003, 103, 1793– 1874, DOI: 10.1021/cr990029pGoogle Scholar69https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXivFGgu7g%253D&md5=085c8ae5893c0b05158629e182ecb0a4Conceptual Density Functional TheoryGeerlings, P.; De Proft, F.; Langenaeker, W.Chemical Reviews (Washington, DC, United States) (2003), 103 (5), 1793-1873CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)A review on conceptual d. functional theory including the following topics: fundamental and computational aspects of DFT, DFT-based concepts and principles and applications of DFT.
- 70De Proft, F.; Geerlings, P. Conceptual and Computational DFT in the Study of Aromaticity. Chem. Rev. 2001, 101, 1451– 1464, DOI: 10.1021/cr9903205Google Scholar70https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXivFSntrs%253D&md5=2232a8b7e4ffe6c5495e0984e08d12fbConceptual and Computational DFT in the Study of AromaticityDe Proft, Frank; Geerlings, PaulChemical Reviews (Washington, D. C.) (2001), 101 (5), 1451-1464CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)A review with 225 refs. The study of aromaticity remains a very important topic in the chem. literature. Many indicators of this concept are available, many of which are accessible through quantum chem. calcns. In recent years, d. functional theory has been a shooting star in mol. quantum mechanics. The development of better and better exchange-correlation functionals made it possible to calc. many mol. properties with comparable accuracies to traditional correlated ab initio methods, with more favorable computational costs. Unfortunately, contrary to wave function ab initio methods, a systematic methodol. to improve these functionals toward the exact soln. of the nonrelativistic, Born-Oppenheimer time-independent Schrodinger equation is not available. The development and refinement of this theory has its impact on the study of aromaticity in two distinct ways. Ed on structural, energetic, and magnetic criteria, can be calcd. quite accurately using DFT methods for large mol. systems, as shown among others in this work. It also has been emphasized that the noncomputational or conceptual side of DFT is a basis for a nonempirical theory of chem. reactivity in which response functions emerge, some of which have been proposed as measures of aromaticity themselves. The central function in DFT is the electron d., of which the topol. has also been used to quantify the aromaticity of mols. Properties derived from the d. such as the electron localization function and the local ionization potential have also been discussed. Another important concept is the HOMO-LUMO gap, later generalized to hardness, which, based on Pear- son's principle of chem. hardness, can be used as an indicator of stability, since "mols. will arrange themselves to be as hard as possible". Other indicators included polarizability, inversely related to the hardness and the electrostatic potential, proven to be an approxn. to the local hardness. To summarize, it has been shown that d. functional theory is at the present time a priceless tool to study the aromaticity of mols. and that the chem. reactivity concepts originating from DFT can provide an alternative approach to the aromaticity concept. Providing a new DFT-based definition of aromaticity was not the aim of this contribution. The existing definitions highlighting different aspects of this classical concept are remarkably complementary, and DFT helps to quantify them and to study their interrelationships.
- 71Beg, H.; De, S. P.; Ash, S.; Misra, A. Use of polarizability and chemical hardness to locate the transition state and the potential energy curve for double proton transfer reaction: A DFT based study. Comput. Theor. Chem. 2012, 984, 13– 18, DOI: 10.1016/j.comptc.2011.12.018Google Scholar71https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xjt1Ggu7w%253D&md5=8022f0aaebf4ace70d1d3a01906096f9Use of polarizability and chemical hardness to locate the transition state and the potential energy curve for double proton transfer reaction: A DFT based studyBeg, Hasibul; De, Sankar Prasad; Ash, Sankarlal; Misra, AjayComputational & Theoretical Chemistry (2012), 984 (), 13-18CODEN: CTCOA5; ISSN:2210-271X. (Elsevier B.V.)D. functional theory (DFT) based calcns. on double and single proton-transfer reactions e.g. formamide (FA), acetamide (AA) and trifluoro acetamide (TFA) dimers are performed to understand the potential energy surfaces during proton transfer processes. Apart from using the N-H distances as proton transfer coordinate the authors have computed the variations in polarizations and chem. hardnesses of the species involved to locate the transition state structures during the double proton transfer reactions. The av. polarizability (αav) and the chem. hardness (η) show their optimum value at the same N-H distance and it corresponds to the transition state for all the three titled complexes. The max. polarizability and min. chem. hardness at the transition state (TS) are due to maximal charge sepn. at TS. Computation of max. polarizability and min. chem. hardness along the reaction coordinate are the easiest way to locate the transition state during the proton transfer processes.
- 72Qu, X.; Latino, D. A. R. S.; Aires-de-Sousa, J. A big data approach to the ultra-fast prediction of DFT-calculated bond energies. J. Cheminform. 2013, 5, 34, DOI: 10.1186/1758-2946-5-34Google Scholar72https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3sfgtlGnuw%253D%253D&md5=56b94923d31f8fb7d9ca1f51d53bfa51A big data approach to the ultra-fast prediction of DFT-calculated bond energiesQu Xiaohui; Aires-de-Sousa Joao; Latino Diogo ArsJournal of cheminformatics (2013), 5 (), 34 ISSN:1758-2946.BACKGROUND: The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE). RESULTS: Machine learning models were trained with a data set of >12,000 BDEs calculated by B3LYP/6-311++G(d,p)//DFTB. Descriptors were designed to encode atom types and connectivity in the 2D topological environment of the bonds. The best model, an Associative Neural Network (ASNN) based on 85 bond descriptors, was able to predict the BDE of 887 bonds in an independent test set (covering a range of 17.67-202.30 kcal/mol) with RMSD of 5.29 kcal/mol, mean absolute deviation of 3.35 kcal/mol, and R (2) = 0.953. The predictions were compared with semi-empirical PM6 calculations, and were found to be superior for all types of bonds in the data set, except for O-H, N-H, and N-N bonds. The B3LYP/6-311++G(d,p)//DFTB calculations can approach the higher-level calculations B3LYP/6-311++G(3df,2p)//B3LYP/6-31G(d,p) with an RMSD of 3.04 kcal/mol, which is less than the RMSD of ASNN (against both DFT methods). An experimental web service for on-line prediction of BDEs is available at http://joao.airesdesousa.com/bde. CONCLUSION: Knowledge could be automatically extracted by machine learning techniques from a data set of calculated BDEs, providing ultra-fast access to accurate estimations of DFT-calculated BDEs. This demonstrates how to extract value from large volumes of data currently being produced by quantum chemistry calculations at an increasing speed mostly without human intervention. In this way, high-level theoretical quantum calculations can be used in large-scale applications that otherwise would not afford the intrinsic computational cost.
- 73Labute, P. A widely applicable set of descriptors. J. Mol. Graphics Modell. 2000, 18, 464– 477, DOI: 10.1016/S1093-3263(00)00068-1Google Scholar73https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXotVOit7w%253D&md5=7ada8491da4d97e1f6277b3a75458cf5A widely applicable set of descriptorsLabute, P.Journal of Molecular Graphics & Modelling (2000), 18 (4/5), 464-477CODEN: JMGMFI; ISSN:1093-3263. (Elsevier Science Inc.)Three sets of mol. descriptors computable from connection table information are defined. These descriptors are based on at. contributions to van der Waals surface area, log P (octanol/water), molar refractivity, and partial charge. The descriptors are applied to the construction of QSAR/QSPR models for b.p., vapor pressure, free energy of solvation in water, soly. in water, thrombin/trypsin/factor Xa activity, blood-brain barrier permeability, and compd. classification. The wide applicability of these descriptors suggests uses in QSAR/QSPR, combinatorial library design, and mol. diversity work.
- 74Balaban, A. T. Highly discriminating distance-based topological index. Chem. Phys. Lett. 1982, 89, 399– 404, DOI: 10.1016/0009-2614(82)80009-2Google Scholar74https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL38XltV2hs7Y%253D&md5=bb767daefc6e59b5a2a77ff4099a8280Highly discriminating distance-based topological indexBalaban, Alexandru T.Chemical Physics Letters (1982), 89 (5), 399-404CODEN: CHPLBC; ISSN:0009-2614.A new topol. index J (based on distance sums si as graph invariants) is proposed. For unsatd. or arom. compds., fractional bond orders are used in calcg. si. The degeneracy of J is lowest among all single topol. indexes described so far. The asymptotic behavior of J is discussed, e.g. when n → ∞ in CnH2n+2, J → π for linear alkanes, and J → ∞ for highly branched ones.
- 75Hall, L. H.; Kier, L. B. The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. In. Rev. Comput. Chem . 2007, 367 − 422.Google ScholarThere is no corresponding record for this reference.
- 76Wildman, S. A.; Crippen, G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868– 873, DOI: 10.1021/ci990307lGoogle Scholar76https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXlt1WjtbY%253D&md5=5eb46da66f7861906be7078f0b7e1b95Prediction of Physicochemical Parameters by Atomic ContributionsWildman, Scott A.; Crippen, Gordon M.Journal of Chemical Information and Computer Sciences (1999), 39 (5), 868-873CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)We present a new atom type classification system for use in atom-based calcn. of partition coeff. (log P) and molar refractivity (MR) designed in part to address published concerns of previous at. methods. The 68 at. contributions to log P have been detd. by fitting an extensive training set of 9920 mols., with r2 = 0.918 and σ = 0.677. A sep. set of 3412 mols. was used for the detn. of contributions to MR with r2 = 0.997 and σ = 1.43. Both calcns. are shown to have high predictive ability.
- 77Vazquez, S. A.; Martinez-Nunez, E. HCN elimination from vinyl cyanide: product energy partitioning, the role of hydrogen-deuterium exchange reactions and a new pathway. Phys. Chem. Chem. Phys. 2015, 17, 6948– 6955, DOI: 10.1039/C4CP05626DGoogle Scholar77https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXit1aquro%253D&md5=7d4dd3daa6fd88d8777f26838c395778HCN elimination from vinyl cyanide: product energy partitioning, the role of hydrogen-deuterium exchange reactions and a new pathwayVazquez, Saulo A.; Martinez-Nunez, EmilioPhysical Chemistry Chemical Physics (2015), 17 (10), 6948-6955CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)The different HCN elimination pathways from vinyl cyanide (VCN) are studied in this paper using RRKM, Kinetic Monte Carlo (KMC), and quasi-classical trajectory (QCT) calcns. A new HCN elimination pathway proves to be very competitive with the traditional 3-center and 4-center mechanisms, particularly at low excitation energies. However, low excitation energies have never been exptl. explored, and the high and low excitation regions are dynamically different. The KMC simulations carried out using singly deuterated VCN (CH2=CD-CN) at 148 kcal mol-1 show the importance of hydrogen-deuterium exchange reactions: both DCN and HCN will be produced in any of the 1,1 and 1,2 elimination pathways. The QCT simulation results obtained for the 3-center pathway are in agreement with the available exptl. results, with the 4-center results showing much more excitation of the products. In general, results seem to be consistent with a photodissocn. mechanism at 193 nm, where the mol. dissocs. (at least the HCN elimination pathways) in the ground electronic state. However, simulations assume that internal conversion is a fully statistical process, i.e., the HCN elimination channels proceed on the ground electronic state according to RRKM theory, which might not be the case. In future studies it would be of interest to include the photo-prepd. electronically excited state(s) in the dynamics simulations.
- 78Kesharwani, M. K.; Brauer, B.; Martin, J. M. L. Frequency and Zero-Point Vibrational Energy Scale Factors for Double-Hybrid Density Functionals (and Other Selected Methods): Can Anharmonic Force Fields Be Avoided?. J. Phys. Chem. A 2015, 119, 1701– 1714, DOI: 10.1021/jp508422uGoogle Scholar78https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhslagtLnK&md5=84e2097cde6fe9024f0db91a45f94a3cFrequency and Zero-Point Vibrational Energy Scale Factors for Double-Hybrid Density Functionals (and Other Selected Methods): Can Anharmonic Force Fields Be Avoided?Kesharwani, Manoj K.; Brauer, Brina; Martin, Jan M. L.Journal of Physical Chemistry A (2015), 119 (9), 1701-1714CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)We have obtained uniform frequency scaling factors λharm (for harmonic frequencies), λfund (for fundamentals), and λZPVE (for zero-point vibrational energies (ZPVEs)) for the Weigend-Ahlrichs and other selected basis sets for MP2, SCS-MP2, and a variety of DFT functionals including double hybrids. For selected levels of theory, we have also obtained scaling factors for true anharmonic fundamentals and ZPVEs obtained from quartic force fields. For harmonic frequencies, the double hybrids B2PLYP, B2GP-PLYP, and DSD-PBEP86 clearly yield the best performance at RMSD = 10-12 cm-1 for def2-TZVP and larger basis sets, compared to 5 cm-1 at the CCSD(T) basis set limit. For ZPVEs, again, the double hybrids are the best performers, reaching root-mean-square deviations (RMSDs) as low as 0.05 kcal/mol, but even mainstream functionals like B3LYP can get down to 0.10 kcal/mol. Explicitly anharmonic ZPVEs only are marginally more accurate. For fundamentals, however, simple uniform scaling is clearly inadequate.
- 79Rozanska, X.; Stewart, J. J. P.; Ungerer, P.; Leblanc, B.; Freeman, C.; Saxe, P.; Wimmer, E. High-Throughput Calculations of Molecular Properties in the MedeA Environment: Accuracy of PM7 in Predicting Vibrational Frequencies, Ideal Gas Entropies, Heat Capacities, and Gibbs Free Energies of Organic Molecules. J. Chem. Eng. Data 2014, 59, 3136– 3143, DOI: 10.1021/je500201yGoogle Scholar79https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXptVagurg%253D&md5=dfcd4164720554c2e16a7fc4e78f2529High-Throughput Calculations of Molecular Properties in the MedeA Environment: Accuracy of PM7 in Predicting Vibrational Frequencies, Ideal Gas Entropies, Heat Capacities, and Gibbs Free Energies of Organic MoleculesRozanska, Xavier; Stewart, James J. P.; Ungerer, Philippe; Leblanc, Benoit; Freeman, Clive; Saxe, Paul; Wimmer, ErichJournal of Chemical & Engineering Data (2014), 59 (10), 3136-3143CODEN: JCEAAX; ISSN:0021-9568. (American Chemical Society)The atomistic and mol. simulation environment MedeA (MedeA: Materials Exploration and Design Anal., version 2.14.6; Material Design, Inc.: Angel Fire, NM, 1998-2014; http://www.materialsdesign.com) in its functionalities and graphical user interface has been enhanced to prep. and submit on the order of 1000 simulations on different structures, and to collect and help in the anal. of the results. We illustrate this with the detn. of the accuracy of the semiempirical (SE) package MOPAC2012 (Stewart, J. J. P. MOPAC2012; Stewart Computational Chem.: Colorado Springs, CO, USA, 2012; http://OpenMOPAC.net) with the PM7 method (Stewart, J. J. P. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approxns. and reoptimization of parameters. J. Mol. Model. 2013, 19, 1-32) to compute frequencies of vibration and thermodn. properties, specifically the zero point energies, ideal gas heat capacity at const. pressure, entropy, and Gibbs free energy, between 200 and 1000 K for 795 org. mols. The results were compared with exptl. data and d. functional theory (DFT) values (using B3LYP/TZVP and BP86/TZVP DFT methods). This comparison showed that the PM7 frequencies of vibration above 2500 cm-1 are systematically underestimated. An a posteriori correction using a linear relationship rescaling of the frequencies permitted resetting to zero the av. relative deviations with respect to exptl. ref. values. This frequency correction also removed the bias from the zero point energies, ideal gas heat capacity, and entropy av. deviations from the PM7 results. The root-mean-square deviation (RMSD) of PM7 and the DFT heat capacities of 160 org. mols. were equiv. with respect to exptl. values, being about 5 %, 2.5 %, and 3 % at 300 K, 600 K, and 1000 K, resp. The RMSD of PM7, when compared to the DFT values, became 4 %, 2 %, and 1 % for the same temps. when the anal. was extended to a set of 795 mols. In the case of the ideal gas entropies, the RMSD of the PM7 relative to DFT values were between 5 % and 4 % between 300 K and 1000 K, resp. The RMSD of the Gibbs free energies of PM7 were 15 kJ mol-1 and 30 kJ mol-1 at 300 K and 1000 K, resp. The efficiency of this semiempirical approach was tested on a set of approx. 5800 mols. This set was processed in about a day, thus demonstrating the scalability of the approach to big data sets.
Cited By
This article is cited by 9 publications.
- Yu Zhang, Min Xia, Hongwei Song, Minghui Yang. Predicting Rate Constants of Alkane Cracking Reactions Using Machine Learning. The Journal of Physical Chemistry A 2024, 128
(12)
, 2383-2392. https://doi.org/10.1021/acs.jpca.4c00912
- Yu Zhang, Jinhui Yu, Hongwei Song, Minghui Yang. Structure-Based Reaction Descriptors for Predicting Rate Constants by Machine Learning: Application to Hydrogen Abstraction from Alkanes by CH3/H/O Radicals. Journal of Chemical Information and Modeling 2023, 63
(16)
, 5097-5106. https://doi.org/10.1021/acs.jcim.3c00892
- Frederick Nii Ofei Bruce, Di Zhang, Xin Bai, Siwei Song, Fang Wang, Qingzhao Chu, Dongping Chen, Yang Li. Machine learning predictions of thermochemical properties for aliphatic carbon and oxygen species. Fuel 2025, 384 , 133999. https://doi.org/10.1016/j.fuel.2024.133999
- Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi, Matthew N. Grayson. Distortion/interaction analysis
via
machine learning. Digital Discovery 2024, 3
(12)
, 2479-2486. https://doi.org/10.1039/D4DD00224E
- Daniel Julian, Rian Koots, Jesús Pérez-Ríos. Machine-learning models for atom-diatom reactions across isotopologues. Physical Review A 2024, 110
(3)
https://doi.org/10.1103/PhysRevA.110.032811
- Di Zhang, Qingzhao Chu, Dongping Chen. Predicting the enthalpy of formation of energetic molecules
via
conventional machine learning and GNN. Physical Chemistry Chemical Physics 2024, 26
(8)
, 7029-7041. https://doi.org/10.1039/D3CP05490J
- Miki Kaneko, Yu Takano, Toru Saito. C–H bond dissociation enthalpy prediction with machine learning reinforced semi-empirical quantum mechanical calculations. Chemistry Letters 2024, 53
(2)
https://doi.org/10.1093/chemle/upae016
- Simone Ciarella, Dmytro Khomenko, Ludovic Berthier, Felix C. Mocanu, David R. Reichman, Camille Scalliet, Francesco Zamponi. Finding defects in glasses through machine learning. Nature Communications 2023, 14
(1)
https://doi.org/10.1038/s41467-023-39948-7
- Samuel G. Espley, Elliot H. E. Farrar, David Buttar, Simone Tomasi, Matthew N. Grayson. Machine learning reaction barriers in low data regimes: a horizontal and diagonal transfer learning approach. Digital Discovery 2023, 2
(4)
, 941-951. https://doi.org/10.1039/D3DD00085K
- Hongchen Ji, Anita Rágyanszki, René A. Fournier. Machine Learning Estimation of Reaction Energy Barriers. 2023https://doi.org/10.2139/ssrn.4535818
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
References
This article references 79 other publications.
- 1Truhlar, D. G.; Garrett, B. C.; Klippenstein, S. J. Current Status of Transition-State Theory. J. Phys. Chem. 1996, 100, 12771– 12800, DOI: 10.1021/jp953748q1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK28Xkt1ansr8%253D&md5=5663e2f23815cdc1c0bbb6bbb91adabeCurrent Status of Transition-State TheoryTruhlar, Donald G.; Garrett, Bruce C.; Klippenstein, Stephen J.Journal of Physical Chemistry (1996), 100 (31), 12771-12800CODEN: JPCHAX; ISSN:0022-3654. (American Chemical Society)A review with 843 refs.; we present an overview of the current status of transition-state theory and its generalizations. We emphasize (i) recent improvements in available methodol. for calcns. on complex systems, including the interface with electronic structure theory, (ii) progress in the theory and application of transition-state theory to condensed-phase reactions, and (iii) insight into the relation of transition-state theory to accurate quantum dynamics and tests of its accuracy via comparisons with both exptl. and other theor. dynamical approxns.
- 2Bao, J. L.; Truhlar, D. G. Variational transition state theory: theoretical framework and recent developments. Chem. Soc. Rev. 2017, 46, 7548– 7596, DOI: 10.1039/C7CS00602K2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVGms7zO&md5=a34ca9337c59c7eb0c7e4eb7be2be025Variational transition state theory: theoretical framework and recent developmentsBao, Junwei Lucas; Truhlar, Donald G.Chemical Society Reviews (2017), 46 (24), 7548-7596CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)This article reviews the fundamentals of variational transition state theory (VTST), its recent theor. development, and some modern applications. The theor. methods reviewed here include multidimensional quantum mech. tunneling, multistructural VTST (MS-VTST), multi-path VTST (MP-VTST), both reaction-path VTST (RP-VTST) and variable reaction coordinate VTST (VRC-VTST), system-specific quantum Rice-Ramsperger-Kassel theory (SS-QRRK) for predicting pressure-dependent rate consts., and VTST in the solid phase, liq. phase, and enzymes. We also provide some perspectives regarding the general applicability of VTST.
- 3Zhang, J.; Valeev, E. F. Prediction of Reaction Barriers and Thermochemical Properties with Explicitly Correlated Coupled-Cluster Methods: A Basis Set Assessment. J. Chem. Theor. Comput. 2012, 8, 3175– 3186, DOI: 10.1021/ct30055473https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xps1Ghs74%253D&md5=5906e9036c7e8f340cfbb1d53258968bPrediction of Reaction Barriers and Thermochemical Properties with Explicitly Correlated Coupled-Cluster Methods: A Basis Set AssessmentZhang, Jinmei; Valeev, Edward F.Journal of Chemical Theory and Computation (2012), 8 (9), 3175-3186CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)We assessed the performance of our perturbative explicitly correlated coupled-cluster method, CCSD(T)F12, for accurate prediction of chem. reactivity. The ref. data included reaction barrier heights, electronic reaction energies, atomization energies, and enthalpies of formation from the following sources: (1) the DBH24/08 database of 22 reaction barriers (Truhlar et al.), (2) the HJO12 set of isogyric reaction energies (Helgaker et al.), and (3) a HEAT set of atomization energies and heats of formation (Stanton et al.). We performed two types of analyses targeting the two distinct uses of explicitly correlated CCSD(T) models: as a replacement for basis-set-extrapolated CCSD(T) in highly accurate composite methods like HEAT and as a distinct model chem. for standalone applications. Hence, we analyzed in detail (1) the basis set error of each component of the CCSD(T)F12 contribution to the chem. energy difference in question and (2) the total error of the CCSD(T)F12 model chem. relative to the benchmark values. Two basis set families were utilized in the calcns.: the std. aug-cc-p(C)VXZ-F12 (X = D, T, Q) basis sets for the conventional correlation methods and the cc-p(C)VXZ-F12 (X = D, T, Q) basis sets of Peterson and co-workers that are specifically designed for explicitly correlated methods. Our conclusion is that the performance of the two families for CCSD correlation contributions (which are the only components affected by the explicitly correlated terms in our formation) are nearly identical with triple- and quadruple-ζ quality basis sets, with some differences at the double-ζ level. Chem. accuracy (∼4.18 kJ/mol) for reaction barrier heights, electronic reaction energies, atomization energies, and enthalpies of formation is attained on av. with the aug-cc-pVDZ, aug-cc-pVTZ, cc-pCVTZ-F12/aug-cc-pCVTZ, and cc-pCVDZ-F12 basis sets, resp., at the CCSD(T)F12 level of theory. The corresponding mean unsigned errors are 1.72 kJ/mol, 1.5 kJ/mol, ∼2 kJ/mol, and 2.17 kJ/mol, and the corresponding max. unsigned errors are 4.44 kJ/mol, 3.6 kJ/mol, ∼5 kJ/mol, and 5.75 kJ/mol.
- 4Mardirossian, N.; Head-Gordon, M. Thirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionals. Mol. Phys. 2017, 115, 2315– 2372, DOI: 10.1080/00268976.2017.13336444https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVCltb3O&md5=ba27d707ee3f5fcdd949644d3d2cbd5eThirty years of density functional theory in computational chemistry: an overview and extensive assessment of 200 density functionalsMardirossian, Narbe; Head-Gordon, MartinMolecular Physics (2017), 115 (19), 2315-2372CODEN: MOPHAM; ISSN:0026-8976. (Taylor & Francis Ltd.)In the past 30 years, Kohn-Sham d. functional theory has emerged as the most popular electronic structure method in computational chem. To assess the ever-increasing no. of approx. exchange-correlation functionals, this review benchmarks a total of 200 d. functionals on a mol. database (MGCDB84) of nearly 5000 data points. The database employed, provided as Supplemental Data, is comprised of 84 data-sets and contains non-covalent interactions, isomerisation energies, thermochem., and barrier heights. In addn., the evolution of non-empirical and semi-empirical d. functional design is reviewed, and guidelines are provided for the proper and effective use of d. functionals. The most promising functional considered is ωB97M-V, a range-sepd. hybrid meta-GGA with VV10 nonlocal correlation, designed using a combinatorial approach. From the local GGAs, B97-D3, revPBE-D3, and BLYP-D3 are recommended, while from the local meta-GGAs, B97M-rV is the leading choice, followed by MS1-D3 and M06-L-D3. The best hybrid GGAs are ωB97X-V, ωB97X-D3, and ωB97X-D, while useful hybrid meta-GGAs (besides ωB97M-V) include ωM05-D, M06-2X-D3, and MN15. Ultimately, today's state-of-the-art functionals are close to achieving the level of accuracy desired for a broad range of chem. applications, and the principal remaining limitations are assocd. with systems that exhibit significant self-interaction/delocalisation errors and/or strong correlation effects.
- 5Choi, S.; Kim, Y.; Kim, J. W.; Kim, Z.; Kim, W. Y. Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine Learning. Chem. – Eur. J. 2018, 24, 12354– 12358, DOI: 10.1002/chem.2018003455https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXotlCrs7Y%253D&md5=52481ca6747d597cce88f0178278b8d8Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine LearningChoi, Sunghwan; Kim, Yeonjoon; Kim, Jin Woo; Kim, Zeehyo; Kim, Woo YounChemistry - A European Journal (2018), 24 (47), 12354-12358CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)Machine learning based on big data has emerged as a powerful soln. in various chem. problems. The authors studied the feasibility of machine learning models for the prediction of activation energies of gas-phase reactions. Six different models with three different types, including the artificial neural network, the support vector regression, and the tree boosting methods, were tested. The authors used the structural and thermodn. properties of mols. and their differences as input features without resorting to specific reaction types so as to maintain the most general input form for broad applicability. The tree boosting method showed the best performance among others in terms of the coeff. of detn., mean abs. error, and root mean square error, the values of which were 0.89, 1.95, and 4.49 kcal mol-1, resp. Computation time for the prediction of activation energies for 2541 test reactions was about one 2nd on a single computing node without using accelerators.
- 6Grambow, C. A.; Pattanaik, L.; Green, W. H. Deep Learning of Activation Energies. J. Phys. Chem. Lett. 2020, 11, 2992– 2997, DOI: 10.1021/acs.jpclett.0c005006https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXls12gsLY%253D&md5=78f5a7c3984860caaeee00f1763f2dd8Deep Learning of Activation EnergiesGrambow, Colin A.; Pattanaik, Lagnajit; Green, William H.Journal of Physical Chemistry Letters (2020), 11 (8), 2992-2997CODEN: JPCLCD; ISSN:1948-7185. (American Chemical Society)Quant. predictions of reaction properties, such as activation energy, have been limited due to a lack of available training data. Such predictions would be useful for computer-assisted reaction mechanism generation and org. synthesis planning. We develop a template-free deep learning model to predict the activation energy given reactant and product graphs and train the model on a new, diverse data set of gas-phase quantum chem. reactions. We demonstrate that our model achieves accurate predictions and agrees with an intuitive understanding of chem. reactivity. With the continued generation of quant. chem. reaction data and the development of methods that leverage such data, we expect many more methods for reactivity prediction to become available in the near future.
- 7Grambow, C. A.; Pattanaik, L.; Green, W. H. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 2020, 7, 137, DOI: 10.1038/s41597-020-0460-47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXpt1Glsbs%253D&md5=9cf761c2bbe35c8329bf7d4f9d8887bcReactants, products, and transition states of elementary chemical reactions based on quantum chemistryGrambow, Colin A.; Pattanaik, Lagnajit; Green, William H.Scientific Data (2020), 7 (1), 137CODEN: SDCABS; ISSN:2052-4463. (Nature Research)Reaction times, activation energies, branching ratios, yields, and many other quant. attributes are important for precise org. syntheses and generating detailed reaction mechanisms. Often, it would be useful to be able to classify proposed reactions as fast or slow. However, quant. chem. reaction data, esp. for atom-mapped reactions, are difficult to find in existing databases. Therefore, we used automated potential energy surface exploration to generate 12,000 org. reactions involving H, C, N, and O atoms calcd. at the ωB97X-D3/def2-TZVP quantum chem. level. We report the results of geometry optimizations and frequency calcns. for reactants, products, and transition states of all reactions. Addnl., we extd. atom-mapped reaction SMILES, activation energies, and enthalpies of reaction. We believe that this data will accelerate progress in automated methods for org. synthesis and reaction mechanism generation-for example, by enabling the development of novel machine learning models for quant. reaction prediction.
- 8Spiekermann, K. A.; Pattanaik, L.; Green, W. H. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. J. Phys. Chem. A 2022, 126, 3976– 3986, DOI: 10.1021/acs.jpca.2c026148https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XhsFeqsbvJ&md5=019207c4eefd18ce30260ffe4285136fFast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster AccuracySpiekermann, Kevin A.; Pattanaik, Lagnajit; Green, William H.Journal of Physical Chemistry A (2022), 126 (25), 3976-3986CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)Quant. ests. of reaction barriers are essential for developing kinetic mechanisms and predicting reaction outcomes. However, the lack of exptl. data and the steep scaling of accurate quantum calcns. often hinder the ability to obtain reliable kinetic values. Here, we train a directed message passing neural network on nearly 24,000 diverse gas-phase reactions calcd. at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP. Our model uses 75% fewer parameters than previous studies, an improved reaction representation, and proper data splits to accurately est. performance on unseen reactions. Using information from only the reactant and product, our model quickly predicts barrier heights with a testing MAE of 2.6 kcal mol-1 relative to the coupled-cluster data, making it more accurate than a good d. functional theory calcn. Furthermore, our results show that future modeling efforts to est. reaction properties would significantly benefit from fine-tuning calibration using a transfer learning technique. We anticipate this model will accelerate and improve kinetic predictions for small mol. chem.
- 9Vargas, S.; Hennefarth, M. R.; Liu, Z.; Alexandrova, A. N. Machine Learning to Predict Diels–Alder Reaction Barriers from the Reactant State Electron Density. J. Chem. Theor. Comput. 2021, 17, 6203– 6213, DOI: 10.1021/acs.jctc.1c006239https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhvFeqsrvN&md5=99e09ef5503d2992be46e06bf38bb233Machine Learning to Predict Diels-Alder Reaction Barriers from the Reactant State Electron DensityVargas, Santiago; Hennefarth, Matthew R.; Liu, Zhihao; Alexandrova, Anastassia N.Journal of Chemical Theory and Computation (2021), 17 (10), 6203-6213CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)Reaction barriers are key to our understanding of chem. reactivity and catalysis. Certain reactions are so seminal in chem. that countless variants, with or without catalysts, have been studied, and their barriers have been computed or measured exptl. This wealth of data represents a perfect opportunity to leverage machine learning models, which could quickly predict barriers without explicit calcns. or measurement. Here, we show that the topol. descriptors of the quantum mech. charge d. in the reactant state constitute a set that is both rigorous and continuous and can be used effectively for the prediction of reaction barrier energies to a high degree of accuracy. We demonstrate this on the Diels-Alder reaction, highly important in biol. and medicinal chem., and as such, studied extensively. This reaction exhibits a range of barriers as large as 270 kJ/mol. While we trained our single-objective supervised (labeled) regression algorithms on simpler Diels-Alder reactions in soln., they predict reaction barriers also in significantly more complicated contexts, such a Diels-Alder reaction catalyzed by an artificial enzyme and its evolved variants, in agreement with exptl. changes in kcat. We expect this tool to apply broadly to a variety of reactions in soln. or in the presence of a catalyst, for screening and circumventing heavily involved computations or expts.
- 10Jorner, K.; Brinck, T.; Norrby, P.-O.; Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 2021, 12, 1163– 1175, DOI: 10.1039/D0SC04896H10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXit1GnsrfF&md5=50e3f6b5c94a6c882208e4fd745a19ffMachine learning meets mechanistic modelling for accurate prediction of experimental activation energiesJorner, Kjell; Brinck, Tore; Norrby, Per-Ola; Buttar, DavidChemical Science (2021), 12 (3), 1163-1175CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Accurate prediction of chem. reactions in soln. is challenging for current state-of-the-art approaches based on transition state modeling with d. functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modeling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality exptl. kinetic data for the nucleophilic arom. substitution reaction and use it to predict barriers with a mean abs. error of 0.77 kcal mol-1 for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate consts. are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from d. functional theory, we envision that hybrid models will soon become a std. alternative to complement current machine learning approaches based on ground-state phys. org. descriptors or structural information such as mol. graphs or fingerprints.
- 11Ravasco, J. M. J. M.; Coelho, J. A. S. Predictive Multivariate Models for Bioorthogonal Inverse-Electron Demand Diels–Alder Reactions. J. Am. Chem. Soc. 2020, 142, 4235– 4241, DOI: 10.1021/jacs.9b1194811https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXjtFGitLo%253D&md5=f2808c6ad2a71806558210f606204fd5Predictive Multivariate Models for Bioorthogonal Inverse-Electron Demand Diels-Alder ReactionsRavasco, Joao M. J. M.; Coelho, Jaime A. S.Journal of the American Chemical Society (2020), 142 (9), 4235-4241CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)Inverse-electron demand Diels-Alder cycloaddns. have emerged as important bioorthogonal reactions in chem. biol. Understanding and predicting reaction rates for bioconjugation reactions is fundamental for evaluating their efficacy in biol. systems. Here, we present multivariate models to predict the second order rate consts. of bioorthogonal inverse-electron demand Diels-Alder reactions involving 1,2,4,5-tetrazines derivs. A data-driven approach was used to model these reactions by parametrizing both the dienophiles and the dienes partners. The models are statistically robust and were used to predict/extrapolate the outcome of several reactions as well as to identify mechanistic differences among similar reactants.
- 12Glavatskikh, M.; Madzhidov, T.; Horvath, D.; Nugmanov, R.; Gimadiev, T.; Malakhova, D.; Marcou, G.; Varnek, A. Predictive Models for Kinetic Parameters of Cycloaddition Reactions. Mol. Inf. 2019, 38, e1800077 DOI: 10.1002/minf.20180007712https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFGls7fN&md5=5b44369615c780dc3a196a3b8a2ca379Predictive Models for Kinetic Parameters of Cycloaddition ReactionsGlavatskikh, Marta; Madzhidov, Timur; Horvath, Dragos; Nugmanov, Ramil; Gimadiev, Timur; Malakhova, Daria; Marcou, Gilles; Varnek, AlexandreMolecular Informatics (2019), 38 (1-2), 1800077CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)This paper reports SVR (Support Vector Regression) and GTM (Generative Topog. Mapping) modeling of three kinetic properties of cycloaddn. reactions: rate const. (logk), activation energy (Ea) and pre-exponential factor (logA). A data set of 1849 reactions, comprising (4+2), (3+2) and (2+2) cycloaddns. (CA) were studied in different solvents and at different temps. The reactions were encoded by the ISIDA fragment descriptors generated for Condensed Graph of Reaction (CGR). For a given reaction, a CGR condenses structures of all the reactants and products into one single mol. graph, described both by conventional chem. bonds and "dynamical" bonds characterizing chem. transformations. Different scenarios of logk assessment were exploited: direct modeling, application of the Arrhenius equation and temp.-scaled GTM landscapes. The logk models with optimal cross-validated statistics (Q2=0.78-0.94 RMSE=0.45-0.86) have been challenged to predict rates for the external test set of 200 reactions, comprising both reactions that were not present in the training set, and training set transformations performed under different reaction conditions. The models are freely available on our web-server: http://cimm.kpfu.ru/models.
- 13Gimadiev, T.; Madzhidov, T.; Tetko, I.; Nugmanov, R.; Casciuc, I.; Klimchuk, O.; Bodrov, A.; Polishchuk, P.; Antipin, I.; Varnek, A. Bimolecular Nucleophilic Substitution Reactions: Predictive Models for Rate Constants and Molecular Reaction Pairs Analysis. Mol. Inf. 2019, 38, 1800104 DOI: 10.1002/minf.201800104There is no corresponding record for this reference.
- 14Madzhidov, T. I.; Gimadiev, T. R.; Malakhova, D. A.; Nugmanov, R. I.; Baskin, I. I.; Antipin, I. S.; Varnek, A. A. Structure–reactivity relationship in Diels–Alder reactions obtained using the condensed reaction graph approach. J. Struct. Chem. 2017, 58, 650– 656, DOI: 10.1134/S002247661704002314https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsFSmt7%252FN&md5=ce2180b7476a67bf73ee774d5e5749d1Structure-reactivity relationship in Diels-Alder reactions obtained using the condensed reaction graph approachMadzhidov, T. I.; Gimadiev, T. R.; Malakhova, D. A.; Nugmanov, R. I.; Baskin, I. I.; Antipin, I. S.; Varnek, A. A.Journal of Structural Chemistry (2017), 58 (4), 650-656CODEN: JSTCAM; ISSN:0022-4766. (Springer)By the structural representation of a chem. reaction in the form of a condensed graph a model allowing the prediction of rate consts. (logk) of Diels-Alder reactions performed in different solvents and at different temps. is constructed for the first time. The model demonstrates good agreement between the predicted and exptl. logk values: the mean squared error is less than 0.75 log units. Erroneous predictions correspond to reactions in which reagents contain rarely occurring structural fragments. The model is available for users at https://cimm.kpfu.ru/predictor/.
- 15Friederich, P.; dos Passos Gomes, G.; De Bin, R.; Aspuru-Guzik, A.; Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 2020, 11, 4584– 4601, DOI: 10.1039/D0SC00445F15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXmsFSmtbo%253D&md5=e5d26fb19a97ffb2a5e0660865859394Machine learning dihydrogen activation in the chemical space surrounding Vaska's complexFriederich, Pascal; Gomes, Gabriel dos Passos; De Bin, Riccardo; Aspuru-Guzik, Alan; Balcells, DavidChemical Science (2020), 11 (18), 4584-4601CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A review. Homogeneous catalysis using transition metal complexes is ubiquitously used for org. synthesis, as well as technol. relevant in applications such as water splitting and CO2 redn. The key steps underlying homogeneous catalysis require a specific combination of electronic and steric effects from the ligands bound to the metal center. Finding the optimal combination of ligands is a challenging task due to the exceedingly large no. of possibilities and the non-trivial ligand-ligand interactions. The classic example of Vaska's complex, trans-[Ir(PPh3)2(CO)(Cl)], illustrates this scenario. The ligands of this species activate iridium for the oxidative addn. of hydrogen, yielding the dihydride cis-[Ir(H)2(PPh3)2(CO)(Cl)] complex. Despite the simplicity of this system, thousands of derivs. can be formulated for the activation of H2, with a limited no. of ligands belonging to the same general categories found in the original complex. In this work, we show how DFT and machine learning (ML) methods can be combined to enable the prediction of reactivity within large chem. spaces contg. thousands of complexes. In a space of 2574 species derived from Vaska's complex, data from DFT calcns. are used to train and test ML models that predict the H2-activation barrier. In contrast to expts. and calcns. requiring several days to be completed, the ML models were trained and used on a laptop on a time-scale of minutes. As a first approach, we combined Bayesian-optimized artificial neural networks (ANN) with features derived from autocorrelation and deltametric functions. The resulting ANNs achieved high accuracies, with mean abs. errors (MAE) between 1 and 2 kcal mol-1, depending on the size of the training set. By using a Gaussian process (GP) model trained with a set of selected features, including fingerprints, accuracy was further enhanced. Remarkably, this GP model minimized the MAE below 1 kcal mol-1, by using only 20% or less of the data available for training. The gradient boosting (GB) method was also used to assess the relevance of the features, which was used for both feature selection and model interpretation purposes. Features accounting for chem. compn., atom size and electronegativity were found to be the most determinant in the predictions. Further, the ligand fragments with the strongest influence on the H2-activation barrier were identified.
- 16Spiekermann, K.; Pattanaik, L.; Green, W. H. High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions. Sci. Data 2022, 9, 417, DOI: 10.1038/s41597-022-01529-616https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xit1SlsLbI&md5=41543ede9b34d45867196d0c969e7912High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactionsSpiekermann, Kevin; Pattanaik, Lagnajit; Green, William H.Scientific Data (2022), 9 (1), 417CODEN: SDCABS; ISSN:2052-4463. (Nature Portfolio)Abstr.: Quant. chem. reaction data, including activation energies and reaction rates, are crucial for developing detailed kinetic mechanisms and accurately predicting reaction outcomes. However, such data are often difficult to find, and high-quality datasets are esp. rare. Here, we use CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP to obtain high-quality single point calcns. for nearly 22,000 unique stable species and transition states. We report the results from these quantum chem. calcns. and ext. the barrier heights and reaction enthalpies to create a kinetics dataset of nearly 12,000 gas-phase reactions. These reactions involve H, C, N, and O, contain up to seven heavy atoms, and have cleaned atom-mapped SMILES. Our higher-accuracy coupled-cluster barrier heights differ significantly (RMSE of ∼5 kcal mol-1) relative to those calcd. at ωB97X-D3/def2-TZVP. We also report accurate transition state theory rate coeffs. k∞(T) between 300 K and 2000 K and the corresponding Arrhenius parameters for a subset of rigid reactions. We believe this data will accelerate development of automated and reliable methods for quant. reaction prediction.
- 17Ismail, I.; Robertson, C.; Habershon, S. Successes and challenges in using machine-learned activation energies in kinetic simulations. J. Chem. Phys. 2022, 157, 014109 DOI: 10.1063/5.009602717https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XhslOgsLbJ&md5=27446240ae1328b657c184a1b1f6574cSuccesses and challenges in using machine-learned activation energies in kinetic simulationsIsmail, I.; Robertson, C.; Habershon, S.Journal of Chemical Physics (2022), 157 (1), 014109CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)The prediction of the thermodn. and kinetic properties of chem. reactions is increasingly being addressed by machine-learning (ML) methods, such as artificial neural networks (ANNs). While a no. of recent studies have reported success in predicting chem. reaction activation energies, less attention has been focused on how the accuracy of ML predictions filters through to predictions of macroscopic observables. Here, we consider the impact of the uncertainty assocd. with ML prediction of activation energies on observable properties of chem. reaction networks, as given by microkinetics simulations based on ML-predicted reaction rates. After training an ANN to predict activation energies, given std. mol. descriptors for reactants and products alone, we performed microkinetics simulations of three different prototypical reaction networks: formamide decompn., aldol reactions, and decompn. of 3-hydroperoxypropanal. We find that the kinetic modeling predictions can be in excellent agreement with corresponding simulations performed with ab initio calcns., but this is dependent on the inherent energetic landscape of the networks. We use these simulations to suggest some guidelines for when ML-based activation energies can be reliable and when one should take more care in applications to kinetics modeling. (c) 2022 American Institute of Physics.
- 18Stewart, J. J. P. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters. J. Mol. Model. 2013, 19, 1– 32, DOI: 10.1007/s00894-012-1667-x18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXjtVegtA%253D%253D&md5=7177311730da8242d5e05f7f4e045e57Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parametersStewart, James J. P.Journal of Molecular Modeling (2013), 19 (1), 1-32CODEN: JMMOFK; ISSN:0948-5023. (Springer)Modern semiempirical methods are of sufficient accuracy when used in the modeling of mols. of the same type as used as ref. data in the parameterization. Outside that subset, however, there is an abundance of evidence that these methods are of very limited utility. In an attempt to expand the range of applicability, a new method called PM7 has been developed. PM7 was parameterized using exptl. and high-level ab initio ref. data, augmented by a new type of ref. data intended to better define the structure of parameter space. The resulting method was tested by modeling crystal structures and heats of formation of solids. Two changes were made to the set of approxns.: a modification was made to improve the description of noncovalent interactions, and two minor errors in the NDDO formalism were rectified. Av. unsigned errors (AUEs) in geometry and ΔH f for PM7 were reduced relative to PM6; for simple gas-phase org. systems, the AUE in bond lengths decreased by about 5 % and the AUE in ΔH f decreased by about 10 %; for org. solids, the AUE in ΔH f dropped by 60 % and the redn. was 33.3 % for geometries. A two-step process (PM7-TS) for calcg. the heights of activation barriers has been developed. Using PM7-TS, the AUE in the barrier heights for simple org. reactions was decreased from values of 12.6 kcal/mol-1 in PM6 and 10.8 kcal/mol-1 in PM7 to 3.8 kcal/mol-1. The origins of the errors in NDDO methods have been examd., and were found to be attributable to inadequate and inaccurate ref. data. This conclusion provides insight into how these methods can be improved.
- 19Martinez-Nunez, E.; Vazquez, S. A. Three-center vs. four-center HF elimination from vinyl fluoride: a direct dynamics study. Chem. Phys. Lett. 2000, 332, 583– 590, DOI: 10.1016/S0009-2614(00)01198-219https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXosl2mtQ%253D%253D&md5=ff515b8c5fcf38ad2c58b4bf4f9f9122Three-center vs. four-center HF elimination from vinyl fluoride: a direct dynamics studyMartinez-Nunez, Emilio; Vazquez, Saulo A.Chemical Physics Letters (2000), 332 (5,6), 583-590CODEN: CHPLBC; ISSN:0009-2614. (Elsevier Science B.V.)Two fragmentation reactions of vinyl fluoride (three-center and four-center HF eliminations) were investigated by AM1 direct classical trajectories. Product energy distributions (PEDs) were computed for different initial excitation schemes and the results compared with the exptl. observations. The results support that the four-center elimination is the preferred decompn. process but HF elimination through the three-center mechanism is predicted to be significant.
- 20Gonzalez-Lafont, A.; Truong, T. N.; Truhlar, D. G. Direct dynamics calculations with NDDO (neglect of diatomic differential overlap) molecular orbital theory with specific reaction parameters. J. Phys. Chem. 1991, 95, 4618– 4627, DOI: 10.1021/j100165a00920https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXktVOrsLs%253D&md5=4780b532782f207e2c1b592aba97a0c9Direct dynamics calculations with NDDO (neglect of diatomic differential overlap) molecular orbital theory with specific reaction parametersGonzalez-Lafont, Angels; Truong, Thanh N.; Truhlar, Donald G.Journal of Physical Chemistry (1991), 95 (12), 4618-27CODEN: JPCHAX; ISSN:0022-3654.The α-deuterium secondary kinetic isotope effect and the heavy-water solvent kinetic isotope effect were calcd. for the reaction Cl-(H2O)n + CH3Cl' → CH3Cl + Cl'-(H2O)n with n = 0, 1, and 2. Instead of using an anal. potential energy function, the energy and gradient were calcd. whenever needed by NDDO MO theory with parameters adjusted specifically for these individual reactions. The interface of the MO calcns. with the dynamics calcns. was accomplished by the use of a new direct dynamics computer program MORATE. The results are compared in detail to previous calcns. based on 18-, 27-, and 36-dimensional semiglobal anal. potential energy functions, and the correspondences between the kinetic isotope effects and their interpretation in terms of specific modes are very encouraging. NDDO MO theory with specific reaction parameters should be a very useful technique for modeling potential energy surfaces for polyat. reactions.
- 21Martinez-Nunez, E.; Estevez, C. M.; Flores, J. R.; Vazquez, S. A. Product energy distributions for the four-center HF elimination from 1,1-difluoroethylene. A direct dynamics study. Chem. Phys. Lett. 2001, 348, 81– 88, DOI: 10.1016/S0009-2614(01)01092-221https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXovFejs7o%253D&md5=b4ede2936daee3abde4657ef8c9af8bbProduct energy distributions for the four-center HF elimination from 1,1-difluoroethylene. A direct dynamics studyMartinez-Nunez, Emilio; Estevez, Carlos M.; Flores, Jesus R.; Vazquez, Saulo A.Chemical Physics Letters (2001), 348 (1,2), 81-88CODEN: CHPLBC; ISSN:0009-2614. (Elsevier Science B.V.)Product energy distributions (PEDs) were computed on the four-center HF elimination from 1,1-difluoroethylene by using direct trajectory calcns. The vibrational and rotational populations of HF obtained with a quasi-classical normal mode/rigid rotor excitation model compare very well with the exptl. results. Also, the translational energy distributions obtained with an efficient microcanonical sampling (EMS) at the barrier are in excellent accord with expt. and do not substantially change as the excitation energy increases.
- 22Gonzalez-Vazquez, J.; Fernandez-Ramos, A.; Martinez-Nunez, E.; Vazquez, S. A. Dissociation of difluoroethylenes. I Global potential energy surface, RRKM, and VTST calculations. J. Phys. Chem. A 2003, 107, 1389– 1397, DOI: 10.1021/jp021901s22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXpvVajug%253D%253D&md5=b4fff49a8051082e30548873b9bec58fDissociation of Difluoroethylenes. I. Global Potential Energy Surface, RRKM, and VTST CalculationsGonzalez-Vazquez, Jesus; Fernandez-Ramos, Antonio; Martinez-Nunez, Emilio; Vazquez, Saulo A.Journal of Physical Chemistry A (2003), 107 (9), 1389-1397CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)A global ground-state potential energy surface for the dissocn. reactions of difluoroethylenes (DFEs) was computed by B3LYP and QCISD calcns., using the std. 6-311G(2d,2p) basis set. RRKM calcns. were performed to compute relative abundances of HF and mol. H produced from 1,1-DFE and from 1,2-DFE (cis and trans) at energies ranging from 110 to 180 kcal mol-1 relative to the zero point energy of 1,1-DFE. Thermal rate consts. were also evaluated by the variational transition state theory for temps. in the range 1250-1500 K. Both theor. methods agree that, at the energies and temps. studied, the main channel for HF elimination from 1,1-DFE is through a four-center transition state, whereas for 1,2-DFE the process occurs through a direct three-center elimination. At the energies studied, the RRKM method predicts that the main channel for mol. H elimination from the DFEs goes through a three-center transition state that connects 1,1-DFE with products.
- 23Gonzalez-Vazquez, J.; Martinez-Nunez, E.; Fernandez-Ramos, A.; Vazquez, S. A. Dissociation of difluoroethylenes. II Direct Classical Trajectory Study of the HF elimination from 1,2-difluoroethylene. J. Phys. Chem. A 2003, 107, 1398– 1404, DOI: 10.1021/jp021902k23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXpvVajuw%253D%253D&md5=a1c87d70c09201b40f97eb2c08d825b0Dissociation of Difluoroethylenes. II. Direct Classical Trajectory Study of the HF Elimination from 1,2-DifluoroethyleneGonzalez-Vazquez, Jesus; Martinez-Nunez, Emilio; Fernandez-Ramos, Antonio; Vazquez, Saulo A.Journal of Physical Chemistry A (2003), 107 (9), 1398-1404CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)Direct dynamics calcns. on the HF elimination channels from cis- and trans-1,2-difluoroethylene (1,2-DFE) were carried out considering five different elimination mechanisms involving four-center and three-center eliminations and also H atom migrations from the cis and trans isomers. The results were compared with exptl. HF vibrational state distributions and translational energy distributions at 112 and 148 kcal mol-1, resp. The calcns. corroborate the exptl. conclusion that direct three-center eliminations from 1,2-DFE are the major reaction pathways and take place through stepwise mechanisms in which fluorovinylidene is formed before its isomerization to fluoroacetylene. The good agreement between theory and expt. supports that the dissocn. takes place through the ground electronic state.
- 24Kromann, J. C.; Christensen, A. S.; Cui, Q.; Jensen, J. H. Towards a barrier height benchmark set for biologically relevant systems. PeerJ 2016, 4, e1994 DOI: 10.7717/peerj.1994There is no corresponding record for this reference.
- 25Iron, M. A.; Janes, T. Evaluating Transition Metal Barrier Heights with the Latest Density Functional Theory Exchange–Correlation Functionals: The MOBH35 Benchmark Database. J. Phys. Chem. A 2019, 123, 3761– 3781, DOI: 10.1021/acs.jpca.9b0154625https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXovVKmtrY%253D&md5=3b3f361b9cad9c4c199d41f672dab992Evaluating Transition Metal Barrier Heights with the Latest Density Functional Theory Exchange-Correlation Functionals: The MOBH35 Benchmark DatabaseIron, Mark A.; Janes, TrevorJournal of Physical Chemistry A (2019), 123 (17), 3761-3781CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)A new database of transition metal reaction barrier heights (MOBH35) is presented. Benchmark energies (forward and reverse barriers and reaction energy) are calcd. using DLPNO-CCSD(T) extrapolated to the complete basis set limit using a Weizmann-1-like scheme. Using these benchmark energies, the performance of a wide selection of d. functional theory (DFT) exchange-correlation functionals, including the latest from the Martin, Truhlar, and Head-Gordon groups, is evaluated. It was found, using the def2-TZVPP basis set, that the ωB97M-V (MAD 1.7 kcal/mol), ωB97M-D3BJ (MAD 1.9 kcal/mol), ωB97X-V (MAD 2.0 kcal/mol), and revTPSS0-D4 (MAD 2.2 kcal/mol) hybrid functionals are recommended. The double-hybrid functionals B2K-PLYP (MAD 1.7 kcal/mol) and revDOD-PBEP86-D4 (MAD 1.8 kcal/mol) also performed well, but this has to be balanced by their increased computational cost.
- 26Pérez-Tabero, S.; Fernández, B.; Cabaleiro-Lago, E. M.; Martínez-Núñez, E.; Vázquez, S. A. New Approach for Correcting Noncovalent Interactions in Semiempirical Quantum Mechanical Methods: The Importance of Multiple-Orientation Sampling. J. Chem. Theor. Comput. 2021, 17, 5556– 5567, DOI: 10.1021/acs.jctc.1c0036526https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhvVCjtr%252FO&md5=9ebdd8ff5fc65487020cdf90caa87eafNew Approach for Correcting Noncovalent Interactions in Semiempirical Quantum Mechanical Methods: The Importance of Multiple-Orientation SamplingPerez-Tabero, Sergio; Fernandez, Berta; Cabaleiro-Lago, Enrique M.; Martinez-Nunez, Emilio; Vazquez, Saulo A.Journal of Chemical Theory and Computation (2021), 17 (9), 5556-5567CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)A new approach is presented to improve the performance of semiempirical quantum mech. (SQM) methods in the description of noncovalent interactions. To show the strategy, the PM6 Hamiltonian was selected, although, in general, the procedure can be applied to other semiempirical Hamiltonians and to different methodologies. A set of small mols. were selected as representative of various functional groups, and intermol. potential energy curves (IPECs) were evaluated for the most relevant orientations of interacting mol. pairs. Then, anal. corrections to PM6 were derived from fits to B3LYP-D3/def2-TZVP ref.-PM6 interaction energy differences. IPECs provided by the B3LYP-D3/def2-TZVP combination of the electronic structure method and basis set were chosen as the ref. because they are in excellent agreement with CCSD(T)/aug-cc-pVTZ curves for the studied systems. The resulting method, called PM6-FGC (from functional group corrections), significantly improves the performance of PM6 and shows the importance of including a sufficient no. of orientations of the interacting mols. in the ref. data set in order to obtain well-balanced descriptions.
- 27Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. J. Chem. Theor. Comput. 2015, 11, 2087– 2096, DOI: 10.1021/acs.jctc.5b0009927https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXmtlams7Y%253D&md5=a59b33f51a9dd6dbad95290f2642c306Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning ApproachRamakrishnan, Raghunathan; Dral, Pavlo O.; Rupp, Matthias; von Lilienfeld, O. AnatoleJournal of Chemical Theory and Computation (2015), 11 (5), 2087-2096CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)Chem. accurate and comprehensive studies of the virtual space of all possible mols. are severely limited by the computational cost of quantum chem. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approx. legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger mol. sets than used for training. For thermochem. properties of up to 16k isomers of C7H10O2 we present numerical evidence that chem. accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qual. relationship between mol. entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chem. and machine learning models trained on 1 and 10% of 134k org. mols., to reproduce enthalpies of all remaining mols. at d. functional theory level of accuracy.
- 28Plehiers, P. P.; Lengyel, I.; West, D. H.; Marin, G. B.; Stevens, C. V.; Van Geem, K. M. Fast estimation of standard enthalpy of formation with chemical accuracy by artificial neural network correction of low-level-of-theory ab initio calculations. Chem. Eng. J. 2021, 426, 131304 DOI: 10.1016/j.cej.2021.13130428https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhs1els7vK&md5=ba1c0633e727cc79eb4bfc8ac766138cFast estimation of standard enthalpy of formation with chemical accuracy by artificial neural network correction of low-level-of-theory ab initio calculationsPlehiers, Pieter P.; Lengyel, Istvan; West, David H.; Marin, Guy B.; Stevens, Christian V.; Van Geem, Kevin M.Chemical Engineering Journal (Amsterdam, Netherlands) (2021), 426 (), 131304CODEN: CMEJAJ; ISSN:1385-8947. (Elsevier B.V.)A methodol. for predicting the std. enthalpy of formation of gas-phase mols. with high speed and accuracy has been developed. This includes the development of: (a) a large, diverse database of mol. structures (consisting of H, C, O, N, and S, and up to 23 heavy atoms), computed at the G3MP2B3 level of chem. accurate theory; (b) a 3D, mol. size-independent descriptor, derived from a radial distribution function contg. the convolution of weighted interat. distances up to 8 Å; (c) a neural network structure that is capable to decode 3D structural information and use it to correct enthalpy of formation of lower level theory to that of the high-accuracy method; and (d) a method to est. uncertainty of predictions. The predictions have about 2.5 kJ/mol (0.6 kcal/mol) av. deviation from G3MP2B3 level results, at the computational cost of the B3LYP/6-31G* method. The model is able to extrapolate to increased mol. sizes and to different type of hetero-atoms - although with reduced accuracy but still at significant improvements comparing to low-level theory results. Extrapolations with the neural-network based model does not generate spurious results, which may be attributed to the careful selection of a phys. and chem. relevant set of inputs. The methodol. may be useful for other computational methods, and for computation of other chem. properties in an automated fashion.
- 29Bogojeski, M.; Vogt-Maranto, L.; Tuckerman, M. E.; Müller, K.-R.; Burke, K. Quantum chemical accuracy from density functional approximations via machine learning. Nat. Commun. 2020, 11, 5223, DOI: 10.1038/s41467-020-19093-129https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXitFCks7nP&md5=fc71afb4e9c02a9d99ce586f7c161d50Quantum chemical accuracy from density functional approximations via machine learningBogojeski, Mihail; Vogt-Maranto, Leslie; Tuckerman, Mark E.; Mueller, Klaus-Robert; Burke, KieronNature Communications (2020), 11 (1), 5223CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)Kohn-Sham d. functional theory (DFT) is a std. tool in most branches of chem., but accuracies for many mols. are limited to 2-3 kcal · mol-1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small mols. In this paper, we leverage machine learning to calc. coupled-cluster energies from DFT densities, reaching quantum chem. accuracy (errors below 1 kcal · mol-1) on test data. Moreover, d.-based Δ-learning (learning only the correction to a std. DFT calcn., termed Δ-DFT ) significantly reduces the amt. of training data required, particularly when mol. symmetries are included. The robustness of Δ-DFT is highlighted by correcting "on the fly" DFT-based mol. dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates running gas-phase MD simulations with quantum chem. accuracy, even for strained geometries and conformer changes where std. DFT fails.
- 30Gao, T.; Li, H.; Li, W.; Li, L.; Fang, C.; Li, H.; Hu, L.; Lu, Y.; Su, Z.-M. A machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databases. J. Cheminform. 2016, 8, 24, DOI: 10.1186/s13321-016-0133-730https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC28bms1Ogug%253D%253D&md5=e6e4b1336a600f88f5e4ab091d0dc37eA machine learning correction for DFT non-covalent interactions based on the S22, S66 and X40 benchmark databasesGao Ting; Li Hongzhi; Li Wenze; Li Lin; Fang Chao; Li Hui; Hu LiHong; Lu Yinghua; Su Zhong-MinJournal of cheminformatics (2016), 8 (), 24 ISSN:1758-2946.BACKGROUND: Non-covalent interactions (NCIs) play critical roles in supramolecular chemistries; however, they are difficult to measure. Currently, reliable computational methods are being pursued to meet this challenge, but the accuracy of calculations based on low levels of theory is not satisfactory and calculations based on high levels of theory are often too costly. Accordingly, to reduce the cost and increase the accuracy of low-level theoretical calculations to describe NCIs, an efficient approach is proposed to correct NCI calculations based on the benchmark databases S22, S66 and X40 (Hobza in Acc Chem Rev 45: 663-672, 2012; Rezac et al. in J Chem Theory Comput 8:4285, 2012). RESULTS: A novel type of NCI correction is presented for density functional theory (DFT) methods. In this approach, the general regression neural network machine learning method is used to perform the correction for DFT methods on the basis of DFT calculations. Various DFT methods, including M06-2X, B3LYP, B3LYP-D3, PBE, PBE-D3 and ωB97XD, with two small basis sets (i.e., 6-31G* and 6-31+G*) were investigated. Moreover, the conductor-like polarizable continuum model with two types of solvents (i.e., water and pentylamine, which mimics a protein environment with ε = 4.2) were considered in the DFT calculations. With the correction, the root mean square errors of all DFT calculations were improved by at least 70 %. Relative to CCSD(T)/CBS benchmark values (used as experimental NCI values because of its high accuracy), the mean absolute error of the best result was 0.33 kcal/mol, which is comparable to high-level ab initio methods or DFT methods with fairly large basis sets. Notably, this level of accuracy is achieved within a fraction of the time required by other methods. For all of the correction models based on various DFT approaches, the validation parameters according to OECD principles (i.e., the correlation coefficient R (2), the predictive squared correlation coefficient q (2) and [Formula: see text] from cross-validation) were >0.92, which suggests that the correction model has good stability, robustness and predictive power. CONCLUSIONS: The correction can be added following DFT calculations. With the obtained molecular descriptors, the NCIs produced by DFT methods can be improved to achieve high-level accuracy. Moreover, only one parameter is introduced into the correction model, which makes it easily applicable. Overall, this work demonstrates that the correction model may be an alternative to the traditional means of correcting for NCIs.Graphical abstractA machine learning correction model efficiently improved the accuracy of non-covalent interactions(NCIs) calculated by DFT methods. The application of the correction model is easy and flexible, so it may be an alternative correction means for NCIs by first-principle calculations.
- 31Wan, Z.; Wang, Q.-D.; Liang, J. Accurate prediction of standard enthalpy of formation based on semiempirical quantum chemistry methods with artificial neural network and molecular descriptors. Int. J. Quantum Chem. 2021, 121, e26441 DOI: 10.1002/qua.2644131https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhs1yktrzJ&md5=470943c05935e32db1e6039cc6776930Accurate prediction of standard enthalpy of formation based on semiempirical quantum chemistry methods with artificial neural network and molecular descriptorsWan, Zhongyu; Wang, Quan-De; Liang, JinhuInternational Journal of Quantum Chemistry (2021), 121 (2), e26441CODEN: IJQCB2; ISSN:0020-7608. (John Wiley & Sons, Inc.)This work investigates possible improvements in the accuracy of semiempirical quantum chem. (SQC) methods for the prediction of std. enthalpy of formation (ΔfHo) through the use of an artificial neural network (ANN) with mol. descriptors. A total of 142 org. compds. with enough structural diversity has been considered in the training set. Std. enthalpy of formation for the selected compds. at the semiempirical PM3 and PM6 quantum chem. methods is collected from literature and is calcd. by using the semiempirical PM7 method in this work. The multiple stepwise regression is first used to screen effective mol. descriptors, which are highly correlated with the error terms of the std. enthalpy of formation compared with exptl. values. The obtained seven effective mol. descriptors are then used as input set to establish three 7-11-1 neural network-based correction models to improve the accuracy of SQC methods. By using the developed correction models, the mean abs. errors for ΔfHo of PM3, PM6, and PM7 methods are reduced from 22.36, 18.60, and 17.27 to 9.86, 9.83, and 8.95, resp., in kJ/mol. Meanwhile, the results of the test set show that the neural network does not have the problem of overfitting. Detailed anal. of the seven effective mol. descriptors indicates that the major source of the correction models is the electron-withdrawing effect. The developed ANN models for the three selected SQC methods provide an efficient method for the quick and accurate prediction of thermodn. properties.
- 32Zhu, J.; Vuong, V. Q.; Sumpter, B. G.; Irle, S. Artificial neural network correction for density-functional tight-binding molecular dynamics simulations. MRS Commun. 2019, 9, 867– 873, DOI: 10.1557/mrc.2019.8032https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhvVKisrvK&md5=17b123e38564eb369fdfc4ba3a8296e7Artificial neural network correction for density-functional tight-binding molecular dynamics simulationsZhu, Junmian; Vuong, Van Quan; Sumpter, Bobby G.; Irle, StephanMRS Communications (2019), 9 (3), 867-873CODEN: MCROF8; ISSN:2159-6867. (Cambridge University Press)The authors developed a Behler-Parrinello-type neural network (NN) to improve the d.-functional tight-binding (DFTB) energy and force prediction. The Δ-machine learning approach was adopted and the NN was designed to predict the energy differences between the d. functional theory (DFT) quantum chem. potential and DFTB for a given mol. structure. Most notably, the DFTB-NN method is capable of improving the energetics of intramol. hydrogen bonds and torsional potentials without modifying the framework of DFTB itself. This improvement enables considerably larger simulations of complex chem. systems that currently could not easily been accomplished using DFT or higher level ab initio quantum chem. methods alone.
- 33Chen, T.; Guestrin, C., XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: San Francisco, California, USA, 2016; 785– 794.There is no corresponding record for this reference.
- 34Cui, J.; Krems, R. V. Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processes. J. Phys. B At. Mol. Opt. Phys. 2016, 49, 224001 DOI: 10.1088/0953-4075/49/22/22400134https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVygt7o%253D&md5=ccbe9e8be7ea32a01356ec3c62f244cfEfficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processesCui, Jie; Krems, Roman V.Journal of Physics B: Atomic, Molecular and Optical Physics (2016), 49 (22), 224001/1-224001/9CODEN: JPAPEH; ISSN:0953-4075. (IOP Publishing Ltd.)We explore the efficiency of a statistical learning technique based on Gaussian process (GP) regression as an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyat. mols. Using an example of the mol. N4, we show that a realistic GP model of the six-dimensional PES can be constructed with only 240 potential energy points. We construct a series of the GP models and illustrate the accuracy of the resulting surfaces as a function of the no. of ab initio points. We show that the GP model based on ∼1500 potential energy points achieves the same level of accuracy as the conventional regression fits based on 16 421 points. The GP model of the PES requires no fitting of ab initio data with anal. functions and can be readily extended to surfaces of higher dimensions.
- 35Christianen, A.; Karman, T.; Vargas-Hernández, R. A.; Groenenboom, G. C.; Krems, R. V. Six-dimensional potential energy surface for NaK–NaK collisions: Gaussian process representation with correct asymptotic form. J. Chem. Phys. 2019, 150, 064106 DOI: 10.1063/1.508274035https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjtFGgurg%253D&md5=e795b63789a155c98b959873b687a469Six-dimensional potential energy surface for NaK-NaK collisions: Gaussian process representation with correct asymptotic formChristianen, Arthur; Karman, Tijs; Vargas-Hernandez, Rodrigo A.; Groenenboom, Gerrit C.; Krems, Roman V.Journal of Chemical Physics (2019), 150 (6), 064106/1-064106/11CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)Constructing accurate global potential energy surfaces (PESs) describing chem. reactive mol.-mol. collisions of alkali metal dimers presents a major challenge. To be suitable for quantum scattering calcns., such PESs must represent accurately three- and four-body interactions, describe conical intersections, and have a proper asymptotic form at the long range. Here, we demonstrate that such global potentials can be obtained by Gaussian Process (GP) regression merged with the analytic asymptotic expansions at the long range. We propose an efficient sampling technique, which allows us to construct an accurate global PES accounting for different chem. arrangements with <2500 ab initio calcns. We apply this method to (NaK)2 and obtain the first global PES for a system of four alkali metal atoms. The resulting surface exhibits a complex landscape including a pair and a quartet of sym. equiv. local min. and a seam of conical intersections. The dissocn. energy found from our ab initio calcns. is 4534 cm-1. This result is reproduced by the GP models with an error of less than 3%. The GP models of the PES allow us to analyze the features of the global PES, representative of general alkali metal four-atom interactions. Understanding these interactions is of key importance in the field of ultracold chem. (c) 2019 American Institute of Physics.
- 36Dai, J.; Krems, R. V. Interpolation and Extrapolation of Global Potential Energy Surfaces for Polyatomic Systems by Gaussian Processes with Composite Kernels. J. Chem. Theor. Comput. 2020, 16, 1386– 1395, DOI: 10.1021/acs.jctc.9b00700There is no corresponding record for this reference.
- 37Sugisawa, H.; Ida, T.; Krems, R. V. Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimer. J. Chem. Phys. 2020, 153, 114101, DOI: 10.1063/5.002349237https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhvVOnt77N&md5=f2eed09a73cb02dde58b8c1505953dd7Gaussian process model of 51-dimensional potential energy surface for protonated imidazole dimerSugisawa, Hiroki; Ida, Tomonori; Krems, R. V.Journal of Chemical Physics (2020), 153 (11), 114101CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)The goal of the present work is to obtain accurate potential energy surfaces (PESs) for high-dimensional mol. systems with a small no. of ab initio calcns. in a system-agnostic way. We use probabilistic modeling based on Gaussian processes (GPs). We illustrate that it is possible to build an accurate GP model of a 51-dimensional PES based on 5000 randomly distributed ab initio calcns. with a global accuracy of <0.2 kcal/mol. Our approach uses GP models with composite kernels designed to enhance the Bayesian information content and represents the global PES as a sum of a full-dimensional GP and several GP models for mol. fragments of lower dimensionality. We demonstrate the potency of these algorithms by constructing the global PES for the protonated imidazole dimer, a mol. system with 19 atoms. We illustrate that GP models thus constructed can extrapolate the PES from low energies (<10 000 cm-1), yielding a PES at high energies (>20 000 cm-1). This opens the prospect for new applications of GPs, such as mapping out phase transitions by extrapolation or accelerating Bayesian optimization, for high-dimensional physics and chem. problems with a restricted no. of inputs, i.e., for high-dimensional problems where obtaining training data is very difficult. (c) 2020 American Institute of Physics.
- 38Liu, X.; Meijer, G.; Pérez-Ríos, J. On the relationship between spectroscopic constants of diatomic molecules: a machine learning approach. RSC Adv. 2021, 11, 14552– 14561, DOI: 10.1039/D1RA02061G38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXptVWqs7w%253D&md5=45158691e3b4f11c56144b035107437dOn the relationship between spectroscopic constants of diatomic molecules: a machine learning approachLiu, Xiangyue; Meijer, Gerard; Perez-Rios, JesusRSC Advances (2021), 11 (24), 14552-14561CODEN: RSCACL; ISSN:2046-2069. (Royal Society of Chemistry)Through a machine learning approach, we show that the equil. distance, harmonic vibrational frequency and binding energy of diat. mols. are related, independently of the nature of the bond of a mol.; they depend solely on the group and period of the constituent atoms. As a result, we show that by employing the group and period of the atoms that form a mol., the spectroscopic consts. are predicted with an accuracy of <5%, whereas for the A-excited electronic state it is needed to include other at. properties leading to an accuracy of <11%.
- 39Liu, X.; Meijer, G.; Pérez-Ríos, J. A data-driven approach to determine dipole moments of diatomic molecules. Phys. Chem. Chem. Phys. 2020, 22, 24191– 24200, DOI: 10.1039/D0CP03810E39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhs1Ggsb%252FN&md5=287ca9e42d6876e565fd74c9e67d5549A data-driven approach to determine dipole moments of diatomic moleculesLiu, Xiangyue; Meijer, Gerard; Perez-Rios, JesusPhysical Chemistry Chemical Physics (2020), 22 (42), 24191-24200CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)We present a data-driven approach for the prediction of the elec. dipole moment of diat. mols., which is one of the most relevant mol. properties. In particular, we apply Gaussian process regression to a novel dataset to show that dipole moments of diat. mols. can be learned, and hence predicted, with a relative error .ltorsim.5%. The dataset contains the dipole moment of 162 diat. mols., the most exhaustive and unbiased dataset of dipole moments up to date. Our findings show that the dipole moment of diat. mols. depends on at. properties of the constituents atoms: electron affinity and ionization potential, as well as on (a feature related to) the first deriv. of the electronic kinetic energy at the equil. distance.
- 40Cretu, M. T.; Pérez-Ríos, J. Predicting second virial coefficients of organic and inorganic compounds using Gaussian process regression. Phys. Chem. Chem. Phys. 2021, 23, 2891– 2898, DOI: 10.1039/D0CP05509C40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXoslSktg%253D%253D&md5=401138bbf8d2864d470f67bd84bb3e10Predicting second virial coefficients of organic and inorganic compounds using Gaussian process regressionCretu, Miruna T.; Perez-Rios, JesusPhysical Chemistry Chemical Physics (2021), 23 (4), 2891-2898CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)We show that by using intuitive and accessible mol. features it is possible to predict the temp.-dependent second virial coeff. of org. and inorg. compds. with Gaussian process regression. In particular, we built a low dimensional representation of features based on intrinsic mol. properties, topol. and phys. properties relevant for the characterization of mol.-mol. interactions. The featurization was used to predict second virial coeffs. in the interpolative regime with a relative error .ltorsim.1% and to extrapolate the prediction to temps. outside of the training range for each compd. in the dataset with a relative error of 2.1%. Addnl., the model's predictive abilities were extended to org. mols. unseen in the training process, yielding a prediction with a relative error of 2.7%. Test mols. must be well-represented in the training set by instances of their families, which are high in variety. The method shows a generally better performance when compared to several semiempirical procedures employed in the prediction of the quantity. Therefore, apart from being robust, the present Gaussian process regression model is extensible to a variety of org. and inorg. compds.
- 41Stewart, J. J. P. MOPAC2016, Stewart Computational Chemistry: Colorado Springs, CO, USA, 2016, HTTP://OpenMOPAC.net (accessed July 01, 2022).There is no corresponding record for this reference.
- 42Carpenter, B. K.; Ellison, G. B.; Nimlos, M. R.; Scheer, A. M. A Conical Intersection Influences the Ground State Rearrangement of Fulvene to Benzene. J. Phys. Chem. A 2022, 126, 1429– 1447, DOI: 10.1021/acs.jpca.2c0003842https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xkt1Chtrc%253D&md5=1474a85851256f5f7825a5a0e7957c30A Conical Intersection Influences the Ground State Rearrangement of Fulvene to BenzeneCarpenter, Barry K.; Ellison, G. Barney; Nimlos, Mark R.; Scheer, Adam M.Journal of Physical Chemistry A (2022), 126 (8), 1429-1447CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)The rearrangement of fulvene to benzene is believed to play an important role in the formation of soot during hydrocarbon combustion. Previous work has identified two possible mechanisms for the rearrangement-a unimol. path and a hydrogen-atom-assisted, bimol. path. Computational results to date have suggested that the unimol. mechanism faces a barrier of about 74 kcal/mol, which makes it unable to compete with the bimol. mechanism under typical combustion conditions. This computed barrier is about 10 kcal/mol higher than the exptl. value, which is an unusually large discrepancy for modern electronic structure theory. In the present work, we have reinvestigated the unimol. mechanism computationally, and we have found a second transition state that is approx. 10 kcal/mol lower in energy than the previously identified one and, therefore, in excellent agreement with the exptl. value. The existence of two transition states for the same rearrangement arises because there is a conical intersection between the two lowest singlet states which occurs in the vicinity of the reaction coordinates. The two possible paths around the cone on the lower adiabatic surface give rise to the two distinct saddle points. The lower barrier for the unimol. mechanism now makes it competitive with the bimol. one, according to our calcns. In support of this conclusion, we have reanalyzed some previous exptl. results on anisole pyrolysis, which leads to benzene as a significant product and have shown that the unimol. and bimol. mechanisms for fulvene → benzene must be occurring competitively in that system. Finally, we have identified that similar conical intersections arise during the isomerizations of benzofulvene and isobenzofulvene to naphthalene.
- 43Farrar, E. H. E.; Grayson, M. N. Machine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier prediction. Chem. Sci. 2022, 13, 7594– 7603, DOI: 10.1039/D2SC02925A43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XhsFCksLbJ&md5=7c2876fe04038a1bb3abf57fa0cd51bbMachine learning and semi-empirical calculations: a synergistic approach to rapid, accurate, and mechanism-based reaction barrier predictionFarrar, Elliot H. E.; Grayson, Matthew N.Chemical Science (2022), 13 (25), 7594-7603CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Modern QM modeling methods, such as DFT, have provided detailed mechanistic insights into countless reactions. However, their computational cost inhibits their ability to rapidly screen large nos. of substrates and catalysts in reaction discovery. For a C-C bond forming nitro-Michael addn., we introduce a synergistic semi-empirical quantum mech. (SQM) and machine learning (ML) approach that allows the prediction of DFT-quality reaction barriers in minutes, even on a std. laptop using widely available modeling software. Mean abs. errors (MAEs) are obtained that are below the accepted chem. accuracy threshold of 1 kcal mol-1 and substantially better than SQM methods without ML correction (5.71 kcal mol-1). Predictive power is shown to hold when the ML models are applied to an unseen set of compds. from the toxicol. literature. Mechanistic insight is also achieved via the generation of full SQM transition state (TS) structures which are found to be very good approxns. for the DFT-level geometries, revealing important steric interactions in some TSs. This combination of speed, accuracy, and mechanistic insight is unprecedented; current ML barrier models compromise on at least one of these important criteria.
- 44Martínez-Núñez, E. An automated method to find transition states using chemical dynamics simulations. J. Comput. Chem. 2015, 36, 222– 234, DOI: 10.1002/jcc.2379044https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFGgtLbO&md5=e22946d4913acb912ffd139e36d6c11cAn automated method to find transition states using chemical dynamics simulationsMartinez-Nunez, EmilioJournal of Computational Chemistry (2015), 36 (4), 222-234CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)A procedure to automatically find the transition states (TSs) of a mol. system (MS) is proposed. It has two components: high-energy chem. dynamics simulations (CDS), and an algorithm that analyzes the geometries along the trajectories to find reactive pathways. Two levels of electronic structure calcns. are involved: a low level (LL) is used to integrate the trajectories and also to optimize the TSs, and a higher level (HL) is used to reoptimize the structures. The method has been tested in three MSs: formaldehyde, formic acid (FA), and vinyl cyanide (VC), using MOPAC2012 and Gaussian09 to run the LL and HL calcns., resp. Both the efficacy and efficiency of the method are very good, with around 15 TS structures optimized every 10 trajectories, which gives a total of 7, 12, and 83 TSs for formaldehyde, FA, and VC, resp. The use of CDS makes it a powerful tool to unveil possible nonstatistical behavior of the system under study. © 2014 Wiley Periodicals, Inc.
- 45Martínez-Núñez, E. An automated transition state search using classical trajectories initialized at multiple minima. Phys. Chem. Chem. Phys. 2015, 17, 14912– 14921, DOI: 10.1039/C5CP02175H45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXnslOqsLo%253D&md5=695f6ea8566cc28afe6bb7c89a434bbeAn automated transition state search using classical trajectories initialized at multiple minimaMartinez-Nunez, EmilioPhysical Chemistry Chemical Physics (2015), 17 (22), 14912-14921CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)Very recently, we proposed an automated method for finding transition states of chem. reactions using dynamics simulations; the method has been termed Transition State Search using Chem. Dynamics Simulations (TSSCDS) (E. Martinez-Nunez, J. Comput. Chem., 2015, 36, 222-234). In the present work, an improved automated search procedure is developed, which consists of iteratively running different ensembles of trajectories initialized at different min. The iterative TSSCDS method is applied to the complex C3H4O system, obtaining a total of 66 different min. and 276 transition states. With the obtained transition states and paths, statistical RRKM calcns. and Kinetic Monte Carlo simulations are carried out to study the fragmentation dynamics of propenal, which is the global min. of the system. The kinetic simulations provide a (three-body dissocn.)/(CO elimination) ratio of 1.49 for an excitation energy of 148 kcal mol-1, which agrees well with the corresponding value obtained in the photolysis of propenal at 193 nm (1.1), suggesting that at least these two channels: three-body dissocn. (to give H2 + CO + C2H2) and CO elimination occur on the ground electronic state.
- 46Varela, J. A.; Vazquez, S. A.; Martinez-Nunez, E. An automated method to find reaction mechanisms and solve the kinetics in organometallic catalysis. Chem. Sci. 2017, 8, 3843– 3851, DOI: 10.1039/C7SC00549K46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXjsl2rtr8%253D&md5=df77d0eab3dac5fc42530661b1d57954An automated method to find reaction mechanisms and solve the kinetics in organometallic catalysisVarela, J. A.; Vazquez, S. A.; Martinez-Nunez, E.Chemical Science (2017), 8 (5), 3843-3851CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A novel computational method is proposed in this work for use in discovering reaction mechanisms and solving the kinetics of transition metal-catalyzed reactions. The method does not rely on either chem.intuition or assumed a priori mechanisms, and it works in a fully automated fashion. Its core is a procedure, recently developed by one of the authors, that combines accelerated direct dynamics with an efficient geometry-based post-processing algorithm to find transition states. In the present work, several auxiliary tools have been added to deal with the specific features of transition metal catalytic reactions. As a test case, we chose the cobalt-catalyzed hydroformylation of ethylene because of its well-established mechanism, and the fact that it has already been used in previous automated computational studies. Besides the generally accepted mechanism of Heck and Breslow, several side reactions, such as hydrogenation of the alkene, emerged from our calcns. Addnl., the calcd.rate law for the hydroformylation reaction agrees reasonably well with those obtained in previous exptl.and theor.studies.
- 47Martínez-Núñez, E.; Barnes, G. L.; Glowacki, D. R.; Kopec, S.; Peláez, D.; Rodríguez, A.; Rodríguez-Fernández, R.; Shannon, R. J.; Stewart, J. J. P.; Tahoces, P. G.; Vazquez, S. A. AutoMeKin2021: An open-source program for automated reaction discovery. J. Comput. Chem. 2021, 42, 2036– 2048, DOI: 10.1002/jcc.2673447https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsl2ksbbO&md5=22e37b1a05073f7d3f680cd1947f87f5AutoMeKin2021 : An open-source program for automated reaction discoveryMartinez-Nunez, Emilio; Barnes, George L.; Glowacki, David R.; Kopec, Sabine; Pelaez, Daniel; Rodriguez, Aurelio; Rodriguez-Fernandez, Roberto; Shannon, Robin J.; Stewart, James J. P.; Tahoces, Pablo G.; Vazquez, Saulo A.Journal of Computational Chemistry (2021), 42 (28), 2036-2048CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)AutoMeKin2021 is an updated version of tsscds2018, a program for the automated discovery of reaction mechanisms (J. Comput. Chem. 2018, 39, 1922). This release features a no. of new capabilities: rare-event mol. dynamics simulations to enhance reaction discovery, extension of the original search algorithm to study van der Waals complexes, use of chem. knowledge, a new search algorithm based on bond-order time series anal., statistics of the chem. reaction networks, a web application to submit jobs, and other features. The source code, manual, installation instructions and the website link are available at https://rxnkin.usc.es/index.php/AutoMeKin.
- 48Taketsugu, T.; Gordon, M. S. Dynamic reaction path analysis based on an intrinsic reaction coordinate. J. Chem. Phys. 1995, 103, 10042– 10049, DOI: 10.1063/1.47070448https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2MXhtVShtr%252FJ&md5=7cc12ffdcdc03fe1a196e8ba7723e449Dynamic reaction path analysis based on an intrinsic reaction coordinateTaketsugu, Tetsuya; Gordon, Mark S.Journal of Chemical Physics (1995), 103 (23), 10042-9CODEN: JCPSA6; ISSN:0021-9606. (American Institute of Physics)We propose two methods that can be used to describe the dynamic reaction path (DRP) based on an intrinsic reaction coordinate (IRC) or min. energy path, to exam. how the actual dynamics proceeds relative to the IRC path. In the first of these, any point on the DRP is expressed in terms of the IRC and the distance from the IRC path. In the second method, any DRP point is expressed in terms of the IRC, the curvature coordinate, and the distance from a two-dimensional "reaction plane" detd. by the IRC path tangent and curvature vectors. The latter representation is based on the fact that the 3N-8 dimensional space orthogonal to the reaction plane is independent of an internal centrifugal force caused by the motion along the IRC path. To analyze the relation between geometric features of the IRC path and the dynamics, we introduce a function that ests. the variation of the reaction plane along the IRC path. As demonstrations, the methods are applied to the dissocn. reaction of thioformaldehyde (H2CS → H2 + CS).
- 49Vazquez, S. A.; Otero, X. L.; Martinez-Nunez, E. A Trajectory-Based Method to Explore Reaction Mechanisms. Molecules 2018, 23, 3156, DOI: 10.3390/molecules2312315649https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXms1yrsbk%253D&md5=2ae83ce1fc9891be638a87e62ca2b29fA trajectory-based method to explore reaction mechanismsVazquez, Saulo A.; Otero, Xose L.; Martinez-Nunez, EmilioMolecules (2018), 23 (12), 3156/1-3156/21CODEN: MOLEFW; ISSN:1420-3049. (MDPI AG)The tsscds method, recently developed in our group, discovers chem. reaction mechanisms with minimal human intervention. It employs accelerated mol. dynamics, spectral graph theory, statistical rate theory and stochastic simulations to uncover chem. reaction paths and to solve the kinetics at the exptl. conditions. In the present review, its application to solve mechanistic/kinetics problems in different research areas will be presented. Examples will be given of reactions involved in photodissocn. dynamics, mass spectrometry, combustion chem. and organometallic catalysis. Some planned improvements will also be described.
- 50Landrum, G. RDKit: Open-source cheminformatics (2016). https://www.rdkit.org (accessed July 01, 2022).There is no corresponding record for this reference.
- 51Randic, M. Characterization of molecular branching. J. Am. Chem. Soc. 1975, 97, 6609– 6615, DOI: 10.1021/ja00856a00151https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE28XjvFSq&md5=65f6f5f51ebd82c9103d93a90ceaf897Characterization of molecular branchingRandic, MilanJournal of the American Chemical Society (1975), 97 (23), 6609-15CODEN: JACSAT; ISSN:0002-7863.A theor. characterization of mol. branching is considered. Members of homologous series are ordered in a sequence and a numerical index is assigned to individual structures based on a differentiation of edge types of mol. graphs. Linear and branched alkanes having eight or less carbon atoms were considered in particular and correlations between the derived branching index and properties which critically depend on mol. size and shape are established. The proposed index is also in satisfactory agreement with the empirical of Kovats. The approach reveals some inherent relationships between isomers which can be traced to connectivity and mol. topology. It points, in some cases, to a considerable redn. in the no. of exptl. deduced consts. characterizing mol. properties provided a sacrifice in precision can be tolerated and is compensated for by the significance of the indicated inter-relations. This point is illustrated on an anal. of the empirical consts. of the Antoine equation.
- 52Estrada, E. Characterization of the folding degree of proteins. Bioinformatics 2002, 18, 697– 704, DOI: 10.1093/bioinformatics/18.5.69752https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XkvV2htr4%253D&md5=488d8a035099f225020d92cd3fa5e280Characterization of the folding degree of proteinsEstrada, ErnestoBioinformatics (2002), 18 (5), 697-704CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: The characterization of the folding degree of chains is central to the elucidation of structure-function relationships in proteins. Here we present a new index for characterizing the folding degree of a (protein) chain. This index shows a range of features that are desirable for the study of the relation between structure and function in proteins. Results: A novel index characterizing the folding degree of (protein) chains is developed based on the spectral moments of a matrix representing the dihedral angles (.vphi., ω and ψ) of the protein main chain. The proposed index is normalized to the chain size, is not correlated to the gyration radius of the backbone chain and is able to distinguish between structures for which the sum of the main-chain dihedral angles is identical. The index is well correlated to the percentages of helix and strand in proteins, shows a linear dependence with temp. changes, and is able to differentiate among protein families.
- 53Gutman, I.; Trinajstić, N. Graph theory and molecular orbitals. Total φ-electron energy of alternant hydrocarbons. Chem. Phys. Lett. 1972, 17, 535– 538, DOI: 10.1016/0009-2614(72)85099-153https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaE3sXpslSjsw%253D%253D&md5=656781f016e47c7e5072559fef7d7f7cGraph theory and molecular orbitals. Total π-electron energy of alternant hydrocarbonsGutman, I.; Trinajstic, N.Chemical Physics Letters (1972), 17 (4), 535-8CODEN: CHPLBC; ISSN:0009-2614.The dependence of the Hueckel total π-electron energy on the mol. topology is shown. General rules governing the structural dependence of the π-electron energy in conjugated molecules are derived.
- 54Parr, R. G.; Pearson, R. G. Absolute hardness: companion parameter to absolute electronegativity. J. Am. Chem. Soc. 1983, 105, 7512– 7516, DOI: 10.1021/ja00364a00554https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL2cXht1yrtw%253D%253D&md5=2053ed462c19c9b6da9e0f8472b5c356Absolute hardness: companion parameter to absolute electronegativityParr, Robert G.; Pearson, Ralph G.Journal of the American Chemical Society (1983), 105 (26), 7512-16CODEN: JACSAT; ISSN:0002-7863.For neutral and charged species, at. and mol., a property called abs. hardness η is defined. Let E(N) be a ground-state electronic energy as a function of the no. of electrons N. As is well-known, the deriv. of E(N) with respect to N, keeping nuclear charges Z fixed, is the chem. potential μ or the neg. of the abs. electronegativity χ. The corresponding second deriv. is hardness. Operational definitions of χ and η are provided by the finite difference formulas. The principle of hard and soft Acids and Bases is derived theor. by making use of the hypothesis that extra stability attends bonding of A to B when the ionization potentials of A and B in the mol. (after charge transfer) are the same. For bases B, hardness is identified as the hardness of the species B+. Tables of abs. hardness are given for a no. of free atoms, Lewis acids, and Lewis bases, and the value are found to agree well with chem. facts.
- 55Mulliken, R. S. A New Electroaffinity Scale; Together with Data on Valence States and on Valence Ionization Potentials and Electron Affinities. J. Chem. Phys. 1934, 2, 782– 793, DOI: 10.1063/1.174939455https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaA2MXhvFI%253D&md5=68f1fbe16a4bd07afa2a6d8b74a72866New electroaffinity scale; together with data on valence states and on valence ionization potentials and electron affinitiesMulliken, Robert S.Journal of Chemical Physics (1934), 2 (), 782-93CODEN: JCPSA6; ISSN:0021-9606.A new "absolute" scale of electronegativity, or electroaffinity, is set up. The abs. electroaffinity is the average of ionization potential and electron affinity. Electroaffinity values are calcd. for H, Li, B, C, N, O, F, Cl, Br and I; they agree well with Pauling's electronegativity scale and with the dipole-moment scale.
- 56Coulson, C. A.; Longuet-Higgins, H. C.; Bell, R. P. The electronic structure of conjugated systems II. Unsaturated hydrocarbons and their hetero-derivatives. Proc. R. Soc. Lond. A 1947, 192, 16– 32, DOI: 10.1098/rspa.1947.013656https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaH1MXivFSnsg%253D%253D&md5=93ef9a022cf4cc3bce837fa4f0a008f5Electronic structure of conjugated systems. II. Unsaturated hydrocarbons and their hetero derivativesCoulson, C. A.; Longuet-Higgins, H. C.Proceedings of the Royal Society of London, Series A: Mathematical, Physical and Engineering Sciences (1947), 192 (), 16-32CODEN: PRLAAZ; ISSN:1364-5021.Theoretical. The theory of C.A. 42, 1489i, is applied to hydrocarbons and their hetero derivs. An equation is given relating differences in activation energy to electron ds. and atom polarizabilities for a heterolytic reaction at different positions in a conjugated system. The equations are then applied to hydrocarbons contg. no odd-membered unsatd. rings. When one coulomb integral is altered slightly, the electron ds. are alternately increased or decreased throughout the mol. Thus, a theoretical basis is provided for the exptl. law of alternating polarity in conjugated systems contg. a hetero atom. Furthermore, the theory allows assessment of the relative extents to which substitution affects different positions in a mol. Applications to other mols. are indicated.
- 57Ye, Z.; Yang, Y.; Li, X.; Cao, D.; Ouyang, D. An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter Prediction. Mol. Pharmaceutics 2019, 16, 533– 541, DOI: 10.1021/acs.molpharmaceut.8b0081657https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXisFKitbzE&md5=b07f3cb3eaa74859ab442f3d52238694An Integrated Transfer Learning and Multitask Learning Approach for Pharmacokinetic Parameter PredictionYe, Zhuyifan; Yang, Yilong; Li, Xiaoshan; Cao, Dongsheng; Ouyang, DefangMolecular Pharmaceutics (2019), 16 (2), 533-541CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)Background: Pharmacokinetic evaluation is one of the key processes in drug discovery and development. However, current absorption, distribution, metab., and excretion prediction models still have limited accuracy. Aim: This study aims to construct an integrated transfer learning and multitask learning approach for developing quant. structure-activity relationship models to predict four human pharmacokinetic parameters. Methods: A pharmacokinetic data set included 1104 U.S. FDA approved small mol. drugs. The data set included four human pharmacokinetic parameter subsets (oral bioavailability, plasma protein binding rate, apparent vol. of distribution at steady-state, and elimination half-life). The pretrained model was trained on over 30 million bioactivity data entries. An integrated transfer learning and multitask learning approach was established to enhance the model generalization. Results: The pharmacokinetic data set was split into three parts (60:20:20) for training, validation, and testing by the improved max. dissimilarity algorithm with the representative initial set selection algorithm and the weighted distance function. The multitask learning techniques enhanced the model predictive ability. The integrated transfer learning and multitask learning model demonstrated the best accuracies, because deep neural networks have the general feature extn. ability; transfer learning and multitask learning improve the model generalization. Conclusions: The integrated transfer learning and multitask learning approach with the improved data set splitting algorithm was first introduced to predict the pharmacokinetic parameters. This method can be further employed in drug discovery and development.
- 58Popov, S.; Morozov, S.; Babenko, A., Neural oblivious decision ensembles for deep learning on tabular data. In International Conference on Learning Representations; Addis Ababa, Ethiopia, 2020.There is no corresponding record for this reference.
- 59Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M., Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: Anchorage, AK, USA, 2019; 2623– 2631.There is no corresponding record for this reference.
- 60Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G. S.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Goodfellow, I.; Harp, A.; Irving, G.; Isard, M.; Jozefowicz, R.; Jia, Y.; Kaiser, L.; Kudlur, M.; Levenberg, J.; Mané, D.; Schuster, M.; Monga, R.; Moore, S.; Murray, D.; Olah, C.; Shlens, J.; Steiner, B.; Sutskever, I.; Talwar, K.; Tucker, P.; Vanhoucke, V.; Vasudevan, V.; Viégas, F.; Vinyals, O.; Warden, P.; Wattenberg, M.; Wicke, M.; Yu, Y.; Zheng, X. TensorFlow: Large-scale machine learning on heterogeneous systems , 2015, Software available from tensorflow.org.There is no corresponding record for this reference.
- 61MATLAB, R2022a; The MathWorks Inc.: Natick, Massachussetts, 2022.There is no corresponding record for this reference.
- 62Lundberg, S. M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J. M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56– 67, DOI: 10.1038/s42256-019-0138-962https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB38nmslygtQ%253D%253D&md5=ab09c86773dd1908bce8f5e05959ef91From Local Explanations to Global Understanding with Explainable AI for TreesLundberg Scott M; Lundberg Scott M; Erion Gabriel; Chen Hugh; DeGrave Alex; Lee Su-In; Erion Gabriel; DeGrave Alex; Prutkin Jordan M; Nair Bala; Nair Bala; Katz Ronit; Himmelfarb Jonathan; Bansal NishaNature machine intelligence (2020), 2 (1), 56-67 ISSN:.Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are popular non-linear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here, we improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.
- 63Koopmans, T. Über die Zuordnung von Wellenfunktionen und Eigenwerten zu den Einzelnen Elektronen Eines Atoms. Physica 1934, 1, 104– 113, DOI: 10.1016/S0031-8914(34)90011-2There is no corresponding record for this reference.
- 64Datta, D. ″Hardness profile″ of a reaction path. J. Phys. Chem. 1992, 96, 2409– 2410, DOI: 10.1021/j100185a00564https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK38XhvVagtr0%253D&md5=eef189f6e3832f68305ff068f8b56bd6"Hardness profile" of a reaction pathDatta, DipankarJournal of Physical Chemistry (1992), 96 (6), 2409-10CODEN: JPCHAX; ISSN:0022-3654.The variation of the hardness of a chem. species along a reaction path, which is called here the "hardness profile", is shown to go through a min. at the transition state. The hardness values are calcd. by the MNDO method.
- 65Ordon, P.; Tachibana, A. Nuclear reactivity indices within regional density functional theory. J. Mol. Model. 2005, 11, 312– 316, DOI: 10.1007/s00894-005-0248-765https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhsF2qtL8%253D&md5=19c257e881c7f34a6df8b96ccb36da0eNuclear reactivity indices within regional density functional theoryOrdon, Piotr; Tachibana, AkitomoJournal of Molecular Modeling (2005), 11 (4-5), 312-316CODEN: JMMOFK; ISSN:0948-5023. (Springer GmbH)Regional chem. potential values-μ R have been obtained with the use of nuclear reactivity indexes. Perturbational formulas use values of reactivity indexes of isolated mol. fragments. The changes of the parameters (ΔNR,{ ΔQi }i εR) within each fragment det. the value of the regional chem. potential after a chem. reaction. The computational scheme has been tested numerically along the chem. reaction path. We have studied a set of chem. reactions to obtain regional chem. potentials (μtsR) and regional transfer potentials (τtsR) for transition states of the following chem. reactions: HF + CO = HFCO, HCl + CO = HClCO, HF + SiO = HFSiO and HF + GeO = HFGeO. The results are reasonable and encouraging. Values of these indexes show the possible reactivity directions of the transition states examd.
- 66Chandra, A. K.; Nguyen, M. T. Density Functional Approach to Regiochemistry, Activation Energy, and Hardness Profile in 1,3-Dipolar Cycloadditions. J. Phys. Chem. A 1998, 102, 6181– 6185, DOI: 10.1021/jp980949w66https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXkt1CqtLs%253D&md5=8c2aba17c3a2fed154d7e59cc37ec3b2Density Functional Approach to Regiochemistry, Activation Energy, and Hardness Profile in 1,3-Dipolar CycloadditionsChandra, Asit K.; Nguyen, Minh ThoJournal of Physical Chemistry A (1998), 102 (30), 6181-6185CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)The principle of hard and soft acids and bases was applied in a local sense to rationalize the regiochem. in the cycloaddn. reaction of a few typical 1,3-dipoles with P-contg. dipolarophiles. It was obsd. in most cases that the transition state with higher hardness is assocd. with lower activation energy. The hardness profile also was studied for these cycloaddn. reactions; while the hardness value goes through a min. along the reaction coordinate, its min. does not coincide with the energy max.
- 67Zhan, C.-G.; Nichols, J. A.; Dixon, D. A. Ionization Potential, Electron Affinity, Electronegativity, Hardness, and Electron Excitation Energy: Molecular Properties from Density Functional Theory Orbital Energies. J. Phys. Chem. A 2003, 107, 4184– 4195, DOI: 10.1021/jp022577467https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXjt1Shtbk%253D&md5=71feb2642a03fc6ada0177fd5e1f153dIonization Potential, Electron Affinity, Electronegativity, Hardness, and Electron Excitation Energy: Molecular Properties from Density Functional Theory Orbital EnergiesZhan, Chang-Guo; Nichols, Jeffrey A.; Dixon, David A.Journal of Physical Chemistry A (2003), 107 (20), 4184-4195CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)Representative at. and mol. systems, including various inorg. and org. mols. with covalent and ionic bonds, have been studied by using d. functional theory. The calcns. were done with the commonly used exchange-correlation functional B3LYP followed by a comprehensive anal. of the calcd. highest-occupied and lowest-unoccupied Kohn-Sham orbital (HOMO and LUMO) energies. The basis set dependence of the DFT results shows that the economical 6-31+G* basis set is generally sufficient for calcg. the HOMO and LUMO energies (if the calcd. LUMO energies are neg.) for use in correlating with mol. properties. The directly calcd. ionization potential (IP), electron affinity (EA), electronegativity (χ), hardness (η), and first electron excitation energy (τ) are all in good agreement with the available exptl. data. A generally applicable linear correlation relationship exists between the calcd. HOMO energies and the exptl./calcd. IPs. We have also found satisfactory linear correlation relationships between the calcd. LUMO energies and exptl./calcd. EAs (for the bound anionic states), between the calcd. av. HOMO/LUMO energies and χ values, between the calcd. HOMO-LUMO energy gaps and η values, and between the calcd. HOMO-LUMO energy gaps and exptl./calcd. first excitation energies. By using these linear correlation relationships, the calcd. HOMO and LUMO energies can be employed to semiquant. est. ionization potential, electron affinity, electronegativity, hardness, and first excitation energy.
- 68Alfrey, T., Jr.; Price, C. C. Relative reactivities in vinyl copolymerization. J. Polym. Sci. 1947, 2, 101– 106, DOI: 10.1002/pol.1947.12002011268https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaH2sXisVOktw%253D%253D&md5=3d6ba8614b0c931105d7c4509ff92779Relative reactivities in vinyl copolymerizationAlfrey, Turner, Jr.; Price, Charles C.Journal of Polymer Science (1947), 2 (), 101-6CODEN: JPSCAU; ISSN:0022-3832.A study was made of the interpretation of data on the relative reactivities of unsatd. compds. with free radicals, specifically vinyl compds. and 1,1-disubstituted ethylenes. Tables give data on (1) relative rates of monomer addn. for the monomers styrene, acrylonitrile, Me methacrylate, and vinylidene chloride with the same 4 radicals, (2) analysis of relative rates of monomer addn., (3) geometric mean reactivities, (4) relative rates of monomer addn. corrected for general monomer reactivity, and (5) comparison of observed and calcd. rates of monomer addn. From data on relative rates of copolymerization it is possible to evaluate 2 consts., Q and e, characteristic of an individual monomer, which appears to account satisfactorily for its behavior in copolymerization. The const. Q describes the "general monomer reactivity," and is apparently related to possibilities for stabilization in a radical adduct. The const. e takes account of polar factors influencing copolymerization. It is possible to calc. the relative copolymerization ratios if Q and e are known for both monomers.
- 69Geerlings, P.; De Proft, F.; Langenaeker, W. Conceptual Density Functional Theory. Chem. Rev. 2003, 103, 1793– 1874, DOI: 10.1021/cr990029p69https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXivFGgu7g%253D&md5=085c8ae5893c0b05158629e182ecb0a4Conceptual Density Functional TheoryGeerlings, P.; De Proft, F.; Langenaeker, W.Chemical Reviews (Washington, DC, United States) (2003), 103 (5), 1793-1873CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)A review on conceptual d. functional theory including the following topics: fundamental and computational aspects of DFT, DFT-based concepts and principles and applications of DFT.
- 70De Proft, F.; Geerlings, P. Conceptual and Computational DFT in the Study of Aromaticity. Chem. Rev. 2001, 101, 1451– 1464, DOI: 10.1021/cr990320570https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXivFSntrs%253D&md5=2232a8b7e4ffe6c5495e0984e08d12fbConceptual and Computational DFT in the Study of AromaticityDe Proft, Frank; Geerlings, PaulChemical Reviews (Washington, D. C.) (2001), 101 (5), 1451-1464CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)A review with 225 refs. The study of aromaticity remains a very important topic in the chem. literature. Many indicators of this concept are available, many of which are accessible through quantum chem. calcns. In recent years, d. functional theory has been a shooting star in mol. quantum mechanics. The development of better and better exchange-correlation functionals made it possible to calc. many mol. properties with comparable accuracies to traditional correlated ab initio methods, with more favorable computational costs. Unfortunately, contrary to wave function ab initio methods, a systematic methodol. to improve these functionals toward the exact soln. of the nonrelativistic, Born-Oppenheimer time-independent Schrodinger equation is not available. The development and refinement of this theory has its impact on the study of aromaticity in two distinct ways. Ed on structural, energetic, and magnetic criteria, can be calcd. quite accurately using DFT methods for large mol. systems, as shown among others in this work. It also has been emphasized that the noncomputational or conceptual side of DFT is a basis for a nonempirical theory of chem. reactivity in which response functions emerge, some of which have been proposed as measures of aromaticity themselves. The central function in DFT is the electron d., of which the topol. has also been used to quantify the aromaticity of mols. Properties derived from the d. such as the electron localization function and the local ionization potential have also been discussed. Another important concept is the HOMO-LUMO gap, later generalized to hardness, which, based on Pear- son's principle of chem. hardness, can be used as an indicator of stability, since "mols. will arrange themselves to be as hard as possible". Other indicators included polarizability, inversely related to the hardness and the electrostatic potential, proven to be an approxn. to the local hardness. To summarize, it has been shown that d. functional theory is at the present time a priceless tool to study the aromaticity of mols. and that the chem. reactivity concepts originating from DFT can provide an alternative approach to the aromaticity concept. Providing a new DFT-based definition of aromaticity was not the aim of this contribution. The existing definitions highlighting different aspects of this classical concept are remarkably complementary, and DFT helps to quantify them and to study their interrelationships.
- 71Beg, H.; De, S. P.; Ash, S.; Misra, A. Use of polarizability and chemical hardness to locate the transition state and the potential energy curve for double proton transfer reaction: A DFT based study. Comput. Theor. Chem. 2012, 984, 13– 18, DOI: 10.1016/j.comptc.2011.12.01871https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xjt1Ggu7w%253D&md5=8022f0aaebf4ace70d1d3a01906096f9Use of polarizability and chemical hardness to locate the transition state and the potential energy curve for double proton transfer reaction: A DFT based studyBeg, Hasibul; De, Sankar Prasad; Ash, Sankarlal; Misra, AjayComputational & Theoretical Chemistry (2012), 984 (), 13-18CODEN: CTCOA5; ISSN:2210-271X. (Elsevier B.V.)D. functional theory (DFT) based calcns. on double and single proton-transfer reactions e.g. formamide (FA), acetamide (AA) and trifluoro acetamide (TFA) dimers are performed to understand the potential energy surfaces during proton transfer processes. Apart from using the N-H distances as proton transfer coordinate the authors have computed the variations in polarizations and chem. hardnesses of the species involved to locate the transition state structures during the double proton transfer reactions. The av. polarizability (αav) and the chem. hardness (η) show their optimum value at the same N-H distance and it corresponds to the transition state for all the three titled complexes. The max. polarizability and min. chem. hardness at the transition state (TS) are due to maximal charge sepn. at TS. Computation of max. polarizability and min. chem. hardness along the reaction coordinate are the easiest way to locate the transition state during the proton transfer processes.
- 72Qu, X.; Latino, D. A. R. S.; Aires-de-Sousa, J. A big data approach to the ultra-fast prediction of DFT-calculated bond energies. J. Cheminform. 2013, 5, 34, DOI: 10.1186/1758-2946-5-3472https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3sfgtlGnuw%253D%253D&md5=56b94923d31f8fb7d9ca1f51d53bfa51A big data approach to the ultra-fast prediction of DFT-calculated bond energiesQu Xiaohui; Aires-de-Sousa Joao; Latino Diogo ArsJournal of cheminformatics (2013), 5 (), 34 ISSN:1758-2946.BACKGROUND: The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE). RESULTS: Machine learning models were trained with a data set of >12,000 BDEs calculated by B3LYP/6-311++G(d,p)//DFTB. Descriptors were designed to encode atom types and connectivity in the 2D topological environment of the bonds. The best model, an Associative Neural Network (ASNN) based on 85 bond descriptors, was able to predict the BDE of 887 bonds in an independent test set (covering a range of 17.67-202.30 kcal/mol) with RMSD of 5.29 kcal/mol, mean absolute deviation of 3.35 kcal/mol, and R (2) = 0.953. The predictions were compared with semi-empirical PM6 calculations, and were found to be superior for all types of bonds in the data set, except for O-H, N-H, and N-N bonds. The B3LYP/6-311++G(d,p)//DFTB calculations can approach the higher-level calculations B3LYP/6-311++G(3df,2p)//B3LYP/6-31G(d,p) with an RMSD of 3.04 kcal/mol, which is less than the RMSD of ASNN (against both DFT methods). An experimental web service for on-line prediction of BDEs is available at http://joao.airesdesousa.com/bde. CONCLUSION: Knowledge could be automatically extracted by machine learning techniques from a data set of calculated BDEs, providing ultra-fast access to accurate estimations of DFT-calculated BDEs. This demonstrates how to extract value from large volumes of data currently being produced by quantum chemistry calculations at an increasing speed mostly without human intervention. In this way, high-level theoretical quantum calculations can be used in large-scale applications that otherwise would not afford the intrinsic computational cost.
- 73Labute, P. A widely applicable set of descriptors. J. Mol. Graphics Modell. 2000, 18, 464– 477, DOI: 10.1016/S1093-3263(00)00068-173https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXotVOit7w%253D&md5=7ada8491da4d97e1f6277b3a75458cf5A widely applicable set of descriptorsLabute, P.Journal of Molecular Graphics & Modelling (2000), 18 (4/5), 464-477CODEN: JMGMFI; ISSN:1093-3263. (Elsevier Science Inc.)Three sets of mol. descriptors computable from connection table information are defined. These descriptors are based on at. contributions to van der Waals surface area, log P (octanol/water), molar refractivity, and partial charge. The descriptors are applied to the construction of QSAR/QSPR models for b.p., vapor pressure, free energy of solvation in water, soly. in water, thrombin/trypsin/factor Xa activity, blood-brain barrier permeability, and compd. classification. The wide applicability of these descriptors suggests uses in QSAR/QSPR, combinatorial library design, and mol. diversity work.
- 74Balaban, A. T. Highly discriminating distance-based topological index. Chem. Phys. Lett. 1982, 89, 399– 404, DOI: 10.1016/0009-2614(82)80009-274https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL38XltV2hs7Y%253D&md5=bb767daefc6e59b5a2a77ff4099a8280Highly discriminating distance-based topological indexBalaban, Alexandru T.Chemical Physics Letters (1982), 89 (5), 399-404CODEN: CHPLBC; ISSN:0009-2614.A new topol. index J (based on distance sums si as graph invariants) is proposed. For unsatd. or arom. compds., fractional bond orders are used in calcg. si. The degeneracy of J is lowest among all single topol. indexes described so far. The asymptotic behavior of J is discussed, e.g. when n → ∞ in CnH2n+2, J → π for linear alkanes, and J → ∞ for highly branched ones.
- 75Hall, L. H.; Kier, L. B. The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure-Property Modeling. In. Rev. Comput. Chem . 2007, 367 − 422.There is no corresponding record for this reference.
- 76Wildman, S. A.; Crippen, G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868– 873, DOI: 10.1021/ci990307l76https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXlt1WjtbY%253D&md5=5eb46da66f7861906be7078f0b7e1b95Prediction of Physicochemical Parameters by Atomic ContributionsWildman, Scott A.; Crippen, Gordon M.Journal of Chemical Information and Computer Sciences (1999), 39 (5), 868-873CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)We present a new atom type classification system for use in atom-based calcn. of partition coeff. (log P) and molar refractivity (MR) designed in part to address published concerns of previous at. methods. The 68 at. contributions to log P have been detd. by fitting an extensive training set of 9920 mols., with r2 = 0.918 and σ = 0.677. A sep. set of 3412 mols. was used for the detn. of contributions to MR with r2 = 0.997 and σ = 1.43. Both calcns. are shown to have high predictive ability.
- 77Vazquez, S. A.; Martinez-Nunez, E. HCN elimination from vinyl cyanide: product energy partitioning, the role of hydrogen-deuterium exchange reactions and a new pathway. Phys. Chem. Chem. Phys. 2015, 17, 6948– 6955, DOI: 10.1039/C4CP05626D77https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXit1aquro%253D&md5=7d4dd3daa6fd88d8777f26838c395778HCN elimination from vinyl cyanide: product energy partitioning, the role of hydrogen-deuterium exchange reactions and a new pathwayVazquez, Saulo A.; Martinez-Nunez, EmilioPhysical Chemistry Chemical Physics (2015), 17 (10), 6948-6955CODEN: PPCPFQ; ISSN:1463-9076. (Royal Society of Chemistry)The different HCN elimination pathways from vinyl cyanide (VCN) are studied in this paper using RRKM, Kinetic Monte Carlo (KMC), and quasi-classical trajectory (QCT) calcns. A new HCN elimination pathway proves to be very competitive with the traditional 3-center and 4-center mechanisms, particularly at low excitation energies. However, low excitation energies have never been exptl. explored, and the high and low excitation regions are dynamically different. The KMC simulations carried out using singly deuterated VCN (CH2=CD-CN) at 148 kcal mol-1 show the importance of hydrogen-deuterium exchange reactions: both DCN and HCN will be produced in any of the 1,1 and 1,2 elimination pathways. The QCT simulation results obtained for the 3-center pathway are in agreement with the available exptl. results, with the 4-center results showing much more excitation of the products. In general, results seem to be consistent with a photodissocn. mechanism at 193 nm, where the mol. dissocs. (at least the HCN elimination pathways) in the ground electronic state. However, simulations assume that internal conversion is a fully statistical process, i.e., the HCN elimination channels proceed on the ground electronic state according to RRKM theory, which might not be the case. In future studies it would be of interest to include the photo-prepd. electronically excited state(s) in the dynamics simulations.
- 78Kesharwani, M. K.; Brauer, B.; Martin, J. M. L. Frequency and Zero-Point Vibrational Energy Scale Factors for Double-Hybrid Density Functionals (and Other Selected Methods): Can Anharmonic Force Fields Be Avoided?. J. Phys. Chem. A 2015, 119, 1701– 1714, DOI: 10.1021/jp508422u78https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhslagtLnK&md5=84e2097cde6fe9024f0db91a45f94a3cFrequency and Zero-Point Vibrational Energy Scale Factors for Double-Hybrid Density Functionals (and Other Selected Methods): Can Anharmonic Force Fields Be Avoided?Kesharwani, Manoj K.; Brauer, Brina; Martin, Jan M. L.Journal of Physical Chemistry A (2015), 119 (9), 1701-1714CODEN: JPCAFH; ISSN:1089-5639. (American Chemical Society)We have obtained uniform frequency scaling factors λharm (for harmonic frequencies), λfund (for fundamentals), and λZPVE (for zero-point vibrational energies (ZPVEs)) for the Weigend-Ahlrichs and other selected basis sets for MP2, SCS-MP2, and a variety of DFT functionals including double hybrids. For selected levels of theory, we have also obtained scaling factors for true anharmonic fundamentals and ZPVEs obtained from quartic force fields. For harmonic frequencies, the double hybrids B2PLYP, B2GP-PLYP, and DSD-PBEP86 clearly yield the best performance at RMSD = 10-12 cm-1 for def2-TZVP and larger basis sets, compared to 5 cm-1 at the CCSD(T) basis set limit. For ZPVEs, again, the double hybrids are the best performers, reaching root-mean-square deviations (RMSDs) as low as 0.05 kcal/mol, but even mainstream functionals like B3LYP can get down to 0.10 kcal/mol. Explicitly anharmonic ZPVEs only are marginally more accurate. For fundamentals, however, simple uniform scaling is clearly inadequate.
- 79Rozanska, X.; Stewart, J. J. P.; Ungerer, P.; Leblanc, B.; Freeman, C.; Saxe, P.; Wimmer, E. High-Throughput Calculations of Molecular Properties in the MedeA Environment: Accuracy of PM7 in Predicting Vibrational Frequencies, Ideal Gas Entropies, Heat Capacities, and Gibbs Free Energies of Organic Molecules. J. Chem. Eng. Data 2014, 59, 3136– 3143, DOI: 10.1021/je500201y79https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXptVagurg%253D&md5=dfcd4164720554c2e16a7fc4e78f2529High-Throughput Calculations of Molecular Properties in the MedeA Environment: Accuracy of PM7 in Predicting Vibrational Frequencies, Ideal Gas Entropies, Heat Capacities, and Gibbs Free Energies of Organic MoleculesRozanska, Xavier; Stewart, James J. P.; Ungerer, Philippe; Leblanc, Benoit; Freeman, Clive; Saxe, Paul; Wimmer, ErichJournal of Chemical & Engineering Data (2014), 59 (10), 3136-3143CODEN: JCEAAX; ISSN:0021-9568. (American Chemical Society)The atomistic and mol. simulation environment MedeA (MedeA: Materials Exploration and Design Anal., version 2.14.6; Material Design, Inc.: Angel Fire, NM, 1998-2014; http://www.materialsdesign.com) in its functionalities and graphical user interface has been enhanced to prep. and submit on the order of 1000 simulations on different structures, and to collect and help in the anal. of the results. We illustrate this with the detn. of the accuracy of the semiempirical (SE) package MOPAC2012 (Stewart, J. J. P. MOPAC2012; Stewart Computational Chem.: Colorado Springs, CO, USA, 2012; http://OpenMOPAC.net) with the PM7 method (Stewart, J. J. P. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approxns. and reoptimization of parameters. J. Mol. Model. 2013, 19, 1-32) to compute frequencies of vibration and thermodn. properties, specifically the zero point energies, ideal gas heat capacity at const. pressure, entropy, and Gibbs free energy, between 200 and 1000 K for 795 org. mols. The results were compared with exptl. data and d. functional theory (DFT) values (using B3LYP/TZVP and BP86/TZVP DFT methods). This comparison showed that the PM7 frequencies of vibration above 2500 cm-1 are systematically underestimated. An a posteriori correction using a linear relationship rescaling of the frequencies permitted resetting to zero the av. relative deviations with respect to exptl. ref. values. This frequency correction also removed the bias from the zero point energies, ideal gas heat capacity, and entropy av. deviations from the PM7 results. The root-mean-square deviation (RMSD) of PM7 and the DFT heat capacities of 160 org. mols. were equiv. with respect to exptl. values, being about 5 %, 2.5 %, and 3 % at 300 K, 600 K, and 1000 K, resp. The RMSD of PM7, when compared to the DFT values, became 4 %, 2 %, and 1 % for the same temps. when the anal. was extended to a set of 795 mols. In the case of the ideal gas entropies, the RMSD of the PM7 relative to DFT values were between 5 % and 4 % between 300 K and 1000 K, resp. The RMSD of the Gibbs free energies of PM7 were 15 kJ mol-1 and 30 kJ mol-1 at 300 K and 1000 K, resp. The efficiency of this semiempirical approach was tested on a set of approx. 5800 mols. This set was processed in about a day, thus demonstrating the scalability of the approach to big data sets.
Supporting Information
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpca.2c08340.
Exploratory data analysis; details of the hyperparameter optimization; descriptor explanation; links to the data and code employed in this work; and free energies of activation obtained with the GP model (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.