Using Machine Learning To Predict Suitable Conditions for Organic ReactionsClick to copy article linkArticle link copied!
- Hanyu GaoHanyu GaoDepartment of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United StatesMore by Hanyu Gao
- Thomas J. StrubleThomas J. StrubleDepartment of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United StatesMore by Thomas J. Struble
- Connor W. ColeyConnor W. ColeyDepartment of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United StatesMore by Connor W. Coley
- Yuran WangYuran WangDepartment of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United StatesMore by Yuran Wang
- William H. GreenWilliam H. GreenDepartment of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United StatesMore by William H. Green
- Klavs F. Jensen*Klavs F. Jensen*E-mail: [email protected]Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United StatesMore by Klavs F. Jensen
Abstract
Reaction condition recommendation is an essential element for the realization of computer-assisted synthetic planning. Accurate suggestions of reaction conditions are required for experimental validation and can have a significant effect on the success or failure of an attempted transformation. However, de novo condition recommendation remains a challenging and under-explored problem and relies heavily on chemists’ knowledge and experience. In this work, we develop a neural-network model to predict the chemical context (catalyst(s), solvent(s), reagent(s)), as well as the temperature most suitable for any particular organic reaction. Trained on ∼10 million examples from Reaxys, the model is able to propose conditions where a close match to the recorded catalyst, solvent, and reagent is found within the top-10 predictions 69.6% of the time, with top-10 accuracies for individual species reaching 80–90%. Temperature is accurately predicted within ±20 °C from the recorded temperature in 60–70% of test cases, with higher accuracy for cases with correct chemical context predictions. The utility of the model is illustrated through several examples spanning a range of common reaction classes. We also demonstrate that the model implicitly learns a continuous numerical embedding of solvent and reagent species that captures their functional similarity.
Synopsis
Machine learning model predicts conditions for organic synthesis reactions and quantifies solvent and reagent similarity.
Introduction
(1) | There has not been a published method that accurately predicts complete reaction conditions (catalysts, solvents, reagents, and temperature) suitable for use with a very large reaction corpus. | ||||
(2) | The compatibility and interdependence of chemical context and temperature are not taken into account in previous approaches. | ||||
(3) | No previous studies have performed quantitative evaluation of reaction condition predictions on a large-scale reaction data set. There are two major challenges which have impeded progress: (i) There is not a machine readable large data set available with catalysts/solvents/reagents classified into different types. (ii) For the similarity-based approaches it is difficult to quantitatively assess the level of “correctness” of conditions when comparing entire sets of conditions associated with different literature reactions. | ||||
(4) | Closer attention should be paid to balancing the generality/specificity of representing chemical context. If the representation is too general, such as manually encoded types/groups, it might not fully characterize functionality, and if it is too specific, e.g., copy–pasting the entire conditions from other reactions, it does not provide further information about chemical similarity. |
Results and Discussion
Figure 1
Figure 1. Change of the loss functions with the number of epochs (left figure, overall; right figure, chemical context and temperature).
Statistical Analysis
Prediction task | Top-3 exact matches | Top-10 exact matches | Top-3 close matches | Top-10 close matches |
---|---|---|---|---|
c | 93.6% | 94.9% | 94.9% | 96.4% |
s1 | 75.8% | 83.0% | 78.2% | 85.4% |
s2 | 90.1% | 91.7% | 90.2% | 91.9% |
r1 | 73.2% | 83.1% | 74.8% | 84.9% |
r2 | 89.3% | 91.8% | 89.3% | 92.1% |
c, s1, r1 | 57.3% | 66.0% | 60.4% | 69.6% |
c, s1, s2, r1, r2 | 50.1% | 57.3% | 53.2% | 60.3% |
c, s1, s2, r1, and r2 refer to catalyst, solvent 1, solvent 2, reagent 1, and reagent 2, respectively.
Figure 2
Figure 2. Relationship between the true temperature and the top-one predicted temperature (left panel), and predicted temperature if the predicted context matches the chemical context (right panel).
Qualitative Evaluation of Reaction Examples
Figure 3
Figure 3. Example of model predictions compared with recorded context (temperature rounded to the closest integer; black text represents the recorded conditions, and blue text represents the predicted conditions). (A) Nucleophilic epoxidation. (B) Deprotection of fluorenylmethyloxycarbonyl (Fmoc). (C) Luche reduction of eneone, TBS = tert-butyl(dimethyl)silyl. (D) Buchwald–Hartwig aryl amination, BINAP = 2,2′-bis(diphenylphosphino)-1,1′-binaphthyl. (E) Suzuki-Miyaura coupling, CyJohnPhos = (2-biphenyl)dicyclohexylphosphine. (F) Hoveyda–Grubbs cross metathesis.
Figure 4
Figure 4. Examples of the reactions with the fewest chemical elements matching the recorded context (temperature rounded to the nearest integer; black text represents the recorded conditions, and red text represents the predicted conditions). (A) Birch alkylation. (B) Hoveyda–Grubbs cross metathesis, TBS = tert-butyl(dimethyl)silyl. (C) Suzuki-Miyaura coupling. (D) Azide reduction.
Learned Embedding of Solvents and Reagents
Figure 5
Figure 5. Embedding of the most common 50 solvents projected onto a two-dimensional space using t-SNE. Solvents are naturally clustered into their corresponding classes (manually annotated).
Figure 6
Figure 6. Embedding of the most common 50 reagents projected onto a two-dimensional space using t-SNE. Reagents are naturally clustered into their corresponding classes (manually annotated).
Strengths and Limitations
(1) | By training on ∼10 million reactions from Reaxys, the model covers a wide range of organic reactions. | ||||
(2) | With a hierarchical neural network, the model predicts all the elements in the reaction condition sequentially with interdependence and relatively high accuracy. | ||||
(3) | The chemicals are not precategorized into classes so predictions can point to the specific chemicals that might, for example, be used as either an acid or an oxidant based on the reaction (e.g., sulfuric acid). On the other hand, individual chemical species are modeled as separate entities so that functional similarity can be learned during training and extracted from the model. | ||||
(4) | The model recommends reaction conditions much faster (less than 100 ms for one reaction) than nearest-neighbor search methods, and allows quantitative evaluation of model predictions on a large scale. It can also be used for efficient condition recommendation for a large number of reactions suggested by computer-assisted retrosynthetic analysis. The reaction conditions can be utilized by forward evaluation tools to better predict reactivity, especially for condition dependent reactions. In addition, the learned embeddings of solvents and reagents can be used to quantify similarity of conditions of sequential reactions to estimate separation requirements, and to find potential green alternatives to toxic solvents/reagents, both of which are helpful for pathway-level route screening and prioritization. |
(1) | Since the chemical context is predicted in a sequential manner, we must limit number of predictions at each stage to obtain approximate top-10 combinations in a short time period (similar to a beam search). | ||||
(2) | Truncating the data based on minimal frequencies of catalysts, reagents, and solvents lowers the total number of trainable parameters and avoids data sparsity issues during training, but also limits the ability to predict rare contexts that are used by highly specific reactions. | ||||
(3) | There are various other limitations imposed by the imperfection of the training data. For example, even after filtering, some reactions with multiple transformations remain which confuses model prediction, and the labeling of reagents is sometimes misleading (e.g., quenching chemicals included as a reagent); there are some duplicated records or different labels for the same chemical. While these situations are relatively uncommon in the entire data set, a better curated data set can potentially further improve the model performance. |
Methods
Overview
Data
criterion | number of reactions |
---|---|
originally from Reaxys | 53 143 003 |
temperature out of range | 56 235 |
multistep | 23 536 281 |
multiproduct | 3 335 439 |
missing SMILES (including half reactions) | 92 472 |
cannot be sanitized by RDKit | 1 693 625 |
no condition information | 9 684 738 |
exceeding one catalyst, two solvents, or two reagents | 2 645 058 |
using rare catalysts, solvents, or reagents | 676 848 |
final data set | 11 422 307 |
Molecular Representation
Model Structure
(1) | Reaction and product fingerprints are concatenated and passed through two fully connected layers (ReLU activation, size 1000; ReLU activation, size 1000, with a 0.5 dropout) to generate a dense representation of the fingerprints (referred to as Dense FP). | ||||
(2) | Dense FP is passed through two fully connected layers (ReLU activation, size 300; Softmax activation, size 803) to predict the catalyst (or NULL) for the reaction. | ||||
(3) | The one-hot vector of the catalyst prediction is then concatenated with Dense FP and passed through two fully connected layers (ReLU activation, size 300; Softmax activation, size 232) to predict the first solvent (or NULL). | ||||
(4) | Step 3 is repeated for prediction of the second solvent (size 228), the first reagent (size 2240), and the second reagent (size 1979). The numbers are smaller than the total class of solvents/reagents because some solvents/reagents are only present in one of the fields (i.e., only as Solvent/Reagent 1 or Solvent/Reagent 2). | ||||
(5) | One-hot vectors of the catalyst, solvents, and reagents and Dense FP are all concatenated and passed through two fully connected layers (ReLU activation, size 300; Linear activation, size 1) to predict the temperature. |
Figure 7
Figure 7. Graphical representation of the neural-network model for context recommendation (“Hard Selection” refers to setting the value of the maximal element to one and zero for the rest, although the output of each classification task is a probability distribution).
(1) | One feature is the order of the prediction tasks. The earlier it appears in the model, the more that task is able to be performed solely based on the reaction, independent of the other predictions. We experimented with predicting single elements using fingerprint information only and found that the validation accuracy (top-one accuracy) is highest for catalyst (92.1%), and similar for solvent and reagent (60.6% and 60.6%, respectively). This is consistent with how chemists generally approach this problem manually, i.e., identify if the reaction requires a catalyst. Reagents are placed last in the sequential prediction, so that information about catalyst and solvent selection is included in predicting reagents, which have the most unique possibilities and a greater level of flexibility even when the catalysts and solvents are fixed. | ||||
(2) | Another feature is the number of catalysts, solvents, and reagents for each reaction. Most of the reactions in the data set have no catalyst or at most one catalyst recorded, so the number of catalysts is limited to one. A majority of reactions use one solvent, but there are still many examples that use multiple solvent or multiple reagent combinations. Few reactions in the data set use three or more solvents or reagents, so limiting the number of solvents and reagents to two for each category keeps the model in a reasonable size. The final model has 38 M parameters. |
Training and Evaluation
Conclusion
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscentsci.8b00357.
Description of computational methods, frequency vs rank plots, extended lists of reaction examples, and comparison of the model performance against baseline models (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
This work was supported by the DARPA Make-It program under Contract ARO W911NF-16-2-0023. We thank Elsevier for the permission to access the Reaxys API to obtain detailed information on the reaction records. The code and models are available online at https://github.com/Coughy1991/Reaction_condition_recommendation. A user-friendly web module is available at http://askcos.mit.edu/context.
References
This article references 48 other publications.
- 1Robinson, R. LXIII.–A Synthesis of Tropinone. J. Chem. Soc., Trans. 1917, 111 (0), 762– 768, DOI: 10.1039/CT9171100762Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaC2sXhsFShsw%253D%253D&md5=860007da88776b311b4f0070c5e56e7fSynthesis of tropinoneRobinson, RobertJournal of the Chemical Society, Transactions (1917), 111 (), 762-8CODEN: JCHTA3; ISSN:0368-1645.Of the derivs. of tropinone (I) none is so suitable for the identification small amts. of (I) as dipiperonylidenetropinone (A), from (I) and piperonal boiled in alc. with aq. KOH for 15 min., yellow needles, m. 214°, sparingly sol. in most org. solvents; yield, quant., gives a blue color with H2SO4; the acetate dissolves in H2O with a bright orange-yellow color, the soln. giving cryst. ppts. with HCl, HBr, HNO3, H2SO4, (CO2H)2, and picric acid; hydrochloride (B), yellow microneedles. As a test for (I) the formation of (A) may be carried out as follows: The soln. may first be acidified and evapd. in vacuo in order to remove volatile impurities such as acetone; alc., excess of piperonal and KOH are added, and the soln. heated on the H2O bath for a few min. and poured into Et2O. This is washed with H2O and shaken with a little dil. HCl, whereupon (B) crysts. (CH2CHO)2 was prepd. according to Harries (Ber. 34, 1494(1901)) from 7 g. of the oxime in 30 cc. H2O, 50 cc. H2O added, and the HNO2 and HNO3 neutralized with pptd. CaCO3. 5 g. acetone and 4.5 g. MeNH2 in 20 cc. H2O were added and the mixt. let stand 0.5 hr., after which the soln. was tested for (I) as above and a small amt. of (B) isolated. If the mixt. was allowed to stand overnight other products were formed which interfered with the identification of (I). A more certain result was obtained when the acetone was substituted by OC(CH2CO2Et)2, letting the mixt. stand overnight, distg. off the alc., boiling with dil. H2SO4 for 0.5 hr., evapg. in a high vacuum, making alk. with KOH, and distg. off the (I) with steam and identifying as above. Best results were obtained, however, when OC(CH2CO2H)2 was used and the mixt. let stand 50 hrs., filtered, acidified with HCl, and concd. in vacuo. The residue was made alk. and distd. with steam and the distillate acidified with HCl, concd. in vacuo, made alk., and extd. with Et2O. After drying rapidly over KOH and evapg. the residue crystd. partly, the crystals being identical with a sample of (I) prepd. by the oxidation of tropine, and an additional amt. of (B) being obtained from the non-cryst. portion. Final conditions for the synthesis have not as yet been worked out, but in another expt. a yield of 42% of the theory was obtained, based on the estimation of the amt. of aldehyde used by conversion of a portion into the diphenylhydrazone and estimation of (I) by conversion into (B). It is hoped that the reaction will prove of general application, and an attempt will be made to synthesize ψ-pelletierine (II) by condensing H2C(CH2CHO)2 with MeNH2 and an acetone deriv.
- 2Corey, E. J.; Wipke, W. T. Computer Assisted Design of Complex Organic Syntheses. Science (Washington, DC, U. S.) 1969, 166 (3902), 178– 192, DOI: 10.1126/science.166.3902.178Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaF1MXlt1eju78%253D&md5=67a17fda766d2a14be6255cd33bfaa65Computer-assisted design of complex organic synthesesCorey, Elias J.; Wipke, W. ToddScience (Washington, DC, United States) (1969), 166 (3902), 178-92CODEN: SCIEAS; ISSN:0036-8075.The application of digital computers to the generation of paths for the synthesis of complex org. mols. was discussed. Given the requisite computing and graphic communications hardware a set of programs of such power as to make computerized synthetic anal. an indispensable aid is possible.
- 3Cook, A.; Johnson, A. P.; Law, J.; Mirzazadeh, M.; Ravitz, O.; Simon, A. Computer-Aided Synthesis Design: 40 Years On. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2 (1), 79– 107, DOI: 10.1002/wcms.61Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvFGls7g%253D&md5=a659f7b50039d0ba6de10e4e829b6a9eComputer-aided synthesis design. 40 years onCook, Anthony; Johnson, A. Peter; Law, James; Mirzazadeh, Mahdi; Ravitz, Orr; Simon, AnikoWiley Interdisciplinary Reviews: Computational Molecular Science (2012), 2 (1), 79-107CODEN: WIRCAH; ISSN:1759-0884. (Wiley-Blackwell)A review. The discipline of retrosynthetic anal. is now just over 40 years old. From the earliest day, attempts were made to incorporate this approach into computer programs to test the extent in which chem. perception and synthetic thinking could be formalized. Despite pioneering research efforts, computer-aided synthetic anal. failed to achieve widespread routine use by chemists, which can be attributed in part to the difficulty of building the required high-quality retrosynthetic transform databases required for credible analyses. However, with the advent over the past 25 years of large comprehensive reaction databases, work on successfully automating the construction of reliable and comprehensive reaction rule databases is promising to revitalize research in this field. This review compares and contrasts the diverse approaches taken by selected programs in both the design and implementation of mol. feature perception and reaction rule representation, and the concepts of synthetic strategy selection, representation, and execution were reviewed. In particular, the current work on automating the construction of reliable and comprehensive synthetic rule sets from available reaction databases in newer programs such as ARChem were discussed. The authors argued that the progress achieved in this aspect paves the way to a deeper exploration of computer approaches to applying strategy and control in the synthesis problem.
- 4Warr, W. A. A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility. Mol. Inf. 2014, 33 (6–7), 469– 476, DOI: 10.1002/minf.201400052Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXovFCmur8%253D&md5=392354d1d71f1ad31f1087d56b9afd57A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic FeasibilityWarr, Wendy A.Molecular Informatics (2014), 33 (6-7), 469-476CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)This article is the text for a pedagogical lecture to be given at the Strasbourg Summer School in Chemoinformatics in June 2104. It covers a very wide range of reaction topics including structure and reaction representation, reaction centers, atom-to-atom mapping, reaction retrieval systems, computer-aided synthesis design, retrosynthesis, reaction prediction and synthetic feasibility. In the time available, the coverage of each topic can only be cursory; the main usefulness of this article to the research community is the extensive bibliog.
- 5Engkvist, O.; Norrby, P. O.; Selmi, N.; Lam, Y. H.; Peng, Z.; Sherer, E. C.; Amberg, W.; Erhard, T.; Smyth, L. A. Computational Prediction of Chemical Reactions: Current Status and Outlook. Drug Discovery Today 2018, 23 (6), 1203– 1218, DOI: 10.1016/j.drudis.2018.02.014Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkslSitrk%253D&md5=527a5783d106ec915e6f7afbd82db210Computational prediction of chemical reactions: current status and outlookEngkvist, Ola; Norrby, Per-Ola; Selmi, Nidhal; Lam, Yu-hong; Peng, Zhengwei; Sherer, Edward C.; Amberg, Willi; Erhard, Thomas; Smyth, Lynette A.Drug Discovery Today (2018), 23 (6), 1203-1218CODEN: DDTOFS; ISSN:1359-6446. (Elsevier Ltd.)A review. Over the past few decades, various computational methods have become increasingly important for discovering and developing novel drugs. Computational prediction of chem. reactions is a key part of an efficient drug discovery process. In this review, we discuss important parts of this field, with a focus on utilizing reaction data to build predictive models, the existing programs for synthesis prediction, and usage of quantum mechanics and mol. mechanics (QM/MM) to explore chem. reactions. We also outline potential future developments with an emphasis on pre-competitive collaboration opportunities.
- 6Coley, C. W.; Green, W. H.; Jensen, K. F. Machine Learning in Computer-Aided Synthesis Planning. Acc. Chem. Res. 2018, 51 (5), 1281– 1289, DOI: 10.1021/acs.accounts.8b00087Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXosFKhsb0%253D&md5=a1ea72c55942f3c0f0a99ab080f96899Machine Learning in Computer-Aided Synthesis PlanningColey, Connor W.; Green, William H.; Jensen, Klavs F.Accounts of Chemical Research (2018), 51 (5), 1281-1289CODEN: ACHRE4; ISSN:0001-4842. (American Chemical Society)Computer-aided synthesis planning (CASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small mol. compds. The ideal CASP program would take a mol. structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chem. feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two crit. aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target mol. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing no. of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chem. reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of exptl. success. While we introduce this task in the context of reaction validation, its utility extends to the prediction of side products and impurities, among other applications. We describe neural network-based approaches that we and others have developed for this forward prediction task that can be trained on previously published exptl. data. Machine learning and artificial intelligence have revolutionized a no. of disciplines, not limited to image recognition, dictation, translation, content recommendation, advertising, and autonomous driving. While there is a rich history of using machine learning for structure-activity models in chem., it is only now that it is being successfully applied more broadly to org. synthesis and synthesis design. As reported in this Account, machine learning is rapidly transforming CASP, but there are several remaining challenges and opportunities, many pertaining to the availability and standardization of both data and evaluation metrics, which must be addressed by the community at large.
- 7Goodman, J. M. Reaction Prediction and Synthesis Design. Appl. Chemoinformatics Achiev. Futur. Oppor. 2018, 86– 105, DOI: 10.1002/9783527806539.ch4bGoogle ScholarThere is no corresponding record for this reference.
- 8Reaxys. https://new.reaxys.com/ (accessed on Sept 28, 2017).Google ScholarThere is no corresponding record for this reference.
- 9Lowe, D. M. Patent Reaction Extractor (v1.0); 2014.Google ScholarThere is no corresponding record for this reference.
- 10Szymkuć, S.; Gajewska, E. P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B. A. Computer-Assisted Synthetic Planning: The End of the Beginning. Angew. Chem., Int. Ed. 2016, 55 (20), 5904– 5937, DOI: 10.1002/anie.201506101Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XlvVagsbs%253D&md5=f07bb3b25f9b87c549860d7d081ed28fComputer-Assisted Synthetic Planning: The End of the BeginningSzymkuc, Sara; Gajewska, Ewa P.; Klucznik, Tomasz; Molga, Karol; Dittwald, Piotr; Startek, Michal; Bajczyk, Michal; Grzybowski, Bartosz A.Angewandte Chemie, International Edition (2016), 55 (20), 5904-5937CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted org. synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan org. syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in org. synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chem. rules (with full stereo- and regiochem.) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial org. mols. in a matter of seconds to minutes. The Review begins with an overview of some basic theor. concepts essential for the big-data anal. of chem. syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of org.-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.
- 11Segler, M. H.S.; Preuss, M.; Waller, M. P. Learning to Plan Chemical Syntheses. 2017, arXiv:1708.04202. arXiv.org e-Print archive. https://arxiv.org/abs/1708.04202.Google ScholarThere is no corresponding record for this reference.
- 12Law, J.; Zsoldos, Z.; Simon, A.; Reid, D.; Liu, Y.; Khew, S. Y.; Johnson, A. P.; Major, S.; Wade, R. A.; Ando, H. Y. Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation. J. Chem. Inf. Model. 2009, 49 (3), 593– 602, DOI: 10.1021/ci800228yGoogle Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhsFegtL8%253D&md5=ef949298d3c3201c70bcfc2af56218eaRoute Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule GenerationLaw, James; Zsoldos, Zsolt; Simon, Aniko; Reid, Darryl; Liu, Yang; Khew, Sing Yoong; Johnson, A. Peter; Major, Sarah; Wade, Robert A.; Ando, Howard Y.Journal of Chemical Information and Modeling (2009), 49 (3), 593-602CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Route Designer, version 1.0, is a new retrosynthetic anal. package that generates complete synthetic routes for target mols. starting from readily available starting materials. Rules describing retrosynthetic transformations are automatically generated from reaction databases, which ensure that the rules can be easily updated to reflect the latest reaction literature. These rules are used to carry out an exhaustive retrosynthetic anal. of the target mol., in which heuristics are used to mitigate the combinatorial explosion. Proposed routes are prioritized by an empirical rating algorithm to present a diverse profile of the most promising solns. The program runs on a server with a web-based user interface. An overview of the system is presented together with examples that illustrate Route Designer's utility.
- 13Liu, B.; Ramsundar, B.; Kawthekar, P.; Shi, J.; Gomes, J.; Luu Nguyen, Q.; Ho, S.; Sloane, J.; Wender, P.; Pande, V. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Cent. Sci. 2017, 3 (10), 1103– 1113, DOI: 10.1021/acscentsci.7b00303Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVahu7fI&md5=61b6213efc544c8e24e4aa8b750e28d3Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence ModelsLiu, Bowen; Ramsundar, Bharath; Kawthekar, Prasad; Shi, Jade; Gomes, Joseph; Nguyen, Quang Luu; Ho, Stephen; Sloane, Jack; Wender, Paul; Pande, VijayACS Central Science (2017), 3 (10), 1103-1113CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 exptl. reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations assocd. with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step toward solving the challenging problem of computational retrosynthetic anal.
- 14Bøgevig, A.; Federsel, H.-J.; Huerta, F.; Hutchings, M. G.; Kraut, H.; Langer, T.; Löw, P.; Oppawsky, C.; Rein, T.; Saller, H. Software Tool as an Idea Generator for Synthesis Prediction. Org. Process Res. Dev. 2015, 19 (2), 357– 368, DOI: 10.1021/op500373eGoogle ScholarThere is no corresponding record for this reference.
- 15Coley, C. W.; Rogers, L.; Green, W. H.; Jensen, K. F. Computer-Assisted Retrosynthesis Based on Molecular Similarity. ACS Cent. Sci. 2017, 3 (12), 1237– 1245, DOI: 10.1021/acscentsci.7b00355Google Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVSmurfN&md5=53ca1ab17142a856afce4ffb67e6aceaComputer-Assisted Retrosynthesis Based on Molecular SimilarityColey, Connor W.; Rogers, Luke; Green, William H.; Jensen, Klavs F.ACS Central Science (2017), 3 (12), 1237-1245CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)We demonstrate mol. similarity to be a surprisingly effective metric for proposing and ranking one-step retrosynthetic disconnections based on analogy to precedent reactions. The developed approach mimics the retrosynthetic strategy defined implicitly by a corpus of known reactions without the need to encode any chem. knowledge. Using 40000 reactions from the patent literature as a knowledge base, the recorded reactants are among the top 10 proposed precursors in 74.1% of 5000 test reactions, providing strong quant. support for our methodol. Extension of the one-step strategy to multistep pathway planning is demonstrated and discussed for two exemplary drug products.
- 16Segler, M. H.S.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604– 610, DOI: 10.1038/nature25978Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXmsVGqt7c%253D&md5=400e9945ff83ffe2d12278aa4c562893Planning chemical syntheses with deep neural networks and symbolic AISegler, Marwin H. S.; Preuss, Mike; Waller, Mark P.Nature (London, United Kingdom) (2018), 555 (7698), 604-610CODEN: NATUAS; ISSN:0028-0836. (Nature Research)To plan the syntheses of small org. mols., chemists use retrosynthesis, a problem-solving technique in which target mols. are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here, we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in org. chem. Our system solves for almost twice as many mols., thirty times faster than the traditional computer-aided search method, which is based on extd. rules and hand-designed heuristics. In a double-blind AB test, chemists on av. considered our computer-generated routes to be equiv. to reported literature routes.
- 17Kayala, M. A.; Baldi, P. F. Learning to Predict Chemical Reactions. J. Chem. Inf. Model. 2011, 51 (9), 2209– 2222, DOI: 10.1021/ci200207yGoogle Scholar17https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtFSks7jP&md5=1b40eaa05deaeb36da0ede14bf8d29f9Learning to Predict Chemical ReactionsKayala, Matthew A.; Azencott, Chloe-Agathe; Chen, Jonathan H.; Baldi, PierreJournal of Chemical Information and Modeling (2011), 51 (9), 2209-2222CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Being able to predict the course of arbitrary chem. reactions is essential to the theory and applications of org. chem. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) phys. laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles, resp., are not high throughput, are not generalizable or scalable, and lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a phys. inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approxns. of MOs (MOs) and use topol. and physicochem. attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chem. data set consisting of 1630 full multistep reactions with 2358 distinct starting materials and intermediates, assocd. with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top-ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of nonproductive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal (http://cdb.ics.uci.edu) under the Toolkits section.
- 18Kayala, M. A.; Baldi, P. ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning. J. Chem. Inf. Model. 2012, 52 (10), 2526– 2540, DOI: 10.1021/ci3003039Google Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtlGjtrjF&md5=3507988197eb3c3400045d0bfc939ff1ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine LearningKayala, Matthew A.; Baldi, PierreJournal of Chemical Information and Modeling (2012), 52 (10), 2526-2540CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Proposing reasonable mechanisms and predicting the course of chem. reactions is important to the practice of org. chem. Approaches to reaction prediction have historically used obfuscating representations and manually encoded patterns or rules. Here we present ReactionPredictor, a machine learning approach to reaction prediction that models elementary, mechanistic reactions as interactions between approx. MOs (MOs). A training data set of productive reactions known to occur at reasonable rates and yields and verified by inclusion in the literature or textbooks is derived from an existing rule-based system and expanded upon with manual curation from graduate level textbooks. Using this training data set of complex polar, hypervalent, radical, and pericyclic reactions, a two-stage machine learning prediction framework is trained and validated. In the first stage, filtering models trained at the level of individual MOs are used to reduce the space of possible reactions to consider. In the second stage, ranking models over the filtered space of possible reactions are used to order the reactions such that the productive reactions are the top ranked. The resulting model, ReactionPredictor, perfectly ranks polar reactions 78.1% of the time and recovers all productive reactions 95.7% of the time when allowing for small nos. of errors. Pericyclic and radical reactions are perfectly ranked 85.8% and 77.0% of the time, resp., rising to >93% recovery for both reaction types with a small no. of allowed errors. Decisions about which of the polar, pericyclic, or radical reaction type ranking models to use can be made with >99% accuracy. Finally, for multistep reaction pathways, we implement the first mechanistic pathway predictor using constrained tree-search to discover a set of reasonable mechanistic steps from given reactants to given products. Webserver implementations of both the single step and pathway versions of ReactionPredictor are available via the chemoinformatics portal http://cdb.ics.uci.edu/.
- 19Segler, M. H.S.; Waller, M. P. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. - Eur. J. 2017, 23 (25), 5966– 5971, DOI: 10.1002/chem.201605499Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXjtlynsrw%253D&md5=e4f689a132ea6ad8713fa2d3d9422c78Neural-Symbolic Machine Learning for Retrosynthesis and Reaction PredictionSegler, Marwin H. S.; Waller, Mark P.Chemistry - A European Journal (2017), 23 (25), 5966-5971CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)Reaction prediction and retrosynthesis are the cornerstones of org. chem. Rule-based expert systems have been the most widespread approach to computationally solve these two related challenges to date. However, reaction rules often fail because they ignore the mol. context, which leads to reactivity conflicts. Herein, we report that deep neural networks can learn to resolve reactivity conflicts and to prioritize the most suitable transformation rules. We show that by training our model on 3.5 million reactions taken from the collective published knowledge of the entire discipline of chem., our model exhibits a top10-accuracy of 95 % in retrosynthesis and 97 % for reaction prediction on a validation set of almost 1 million reactions.
- 20Jin, W.; Coley, C. W.; Barzilay, R.; Jaakkola, T. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA; 2017; pp 2604– 2613.Google ScholarThere is no corresponding record for this reference.
- 21Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3 (5), 434– 443, DOI: 10.1021/acscentsci.7b00064Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXmtVyqtb0%253D&md5=cc3e8a50d8dd9d68e294a21a558200dcPrediction of Organic Reaction Outcomes Using Machine LearningColey, Connor W.; Barzilay, Regina; Jaakkola, Tommi S.; Green, William H.; Jensen, Klavs F.ACS Central Science (2017), 3 (5), 434-443CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)Computer assistance in synthesis design has existed for over 40 years, yet retrosynthesis planning software has struggled to achieve widespread adoption. One crit. challenge in developing high-quality pathway suggestions is that proposed reaction steps often fail when attempted in the lab., despite initially seeming viable. The true measure of success for any synthesis program is whether the predicted outcome matches what is obsd. exptl. We report a model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks. Using 15 000 exptl. reaction records from granted United States patents, a model is trained to select the major (recorded) product by ranking a self-generated list of candidates where one candidate is known to be the major product. Candidate reactions are represented using a unique edit-based representation that emphasizes the fundamental transformation from reactants to products, rather than the constituent mols.' overall structures. In a 5-fold cross-validation, the trained model assigns the major product rank 1 in 71.8% of cases, rank ≤3 in 86.7% of cases, and rank ≤5 in 90.8% of cases.
- 22Schwaller, P.; Laino, T. “Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions Using Neural Sequence-to-Sequence Models. 2017, arXiv:1711.04810. arXiv.org e-Print archive. https://arxiv.org/abs/1711.04810.Google ScholarThere is no corresponding record for this reference.
- 23Reizman, B. J.; Jensen, K. F. Simultaneous Solvent Screening and Reaction Optimization in Microliter Slugs. Chem. Commun. 2015, 51 (68), 13290– 13293, DOI: 10.1039/C5CC03651HGoogle Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXht1SqtL7O&md5=a27eeadd9328ade293e218e0167c87b8Simultaneous solvent screening and reaction optimization in microliter slugsReizman, Brandon J.; Jensen, Klavs F.Chemical Communications (Cambridge, United Kingdom) (2015), 51 (68), 13290-13293CODEN: CHCOFS; ISSN:1359-7345. (Royal Society of Chemistry)An automated, continuous flow droplet screening system is presented, enabling real-time simultaneous solvent and continuous variable optimization. An optimal design of expts. strategy is applied to the alkylation of 1,2-diaminocyclohexane in 16 μL droplets, with scale-up demonstrated. Anal. of segmented flow results suggests correlation of yield with solvent hydrogen bond basicity.
- 24Sans, V.; Porwol, L.; Dragone, V.; Cronin, L. A Self Optimizing Synthetic Organic Reactor System Using Real-Time In-Line NMR Spectroscopy. Chem. Sci. 2015, 6 (2), 1258– 1264, DOI: 10.1039/C4SC03075CGoogle Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFeisb%252FO&md5=e3d6d3dbfb1ba77055defb4078800dc1A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopySans, Victor; Porwol, Luzian; Dragone, Vincenza; Cronin, LeroyChemical Science (2015), 6 (2), 1258-1264CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A configurable platform for synthetic chem. incorporating an in-line bench-top NMR capable of monitoring and controlling org. reactions in real-time is discussed. The platform is controlled by a modular LabView software control system for hardware, NMR, data anal., and feedback optimization. Using this platform, real-time advanced structural characterization of reaction mixts., including 19F, 13C, DEPT, 2-dimensional NMR spectroscopy (COSY, HSQC, 19F-COSY), are reported for the first time. The potential of this technique was demonstrated by optimizing a catalytic org. reaction in real-time, showing its applicability to self-optimizing systems using criteria such as stereo-selectivity, multi-nuclear measurements, or 2-dimensional correlations.
- 25Holmes, N.; Akien, G. R.; Savage, R. J.D.; Stanetty, C.; Baxendale, I. R.; Blacker, A. J.; Taylor, B. A.; Woodward, R. L.; Meadows, R. E.; Bourne, R. A. Reaction Chemistry & Engineering Online Quantitative Mass Spectrometry for the Reactors †. React. Chem. Eng. 2016, 1 (1), 96– 100, DOI: 10.1039/C5RE00083AGoogle Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtlSlt7zM&md5=a8755576a3a4e7c07c10d7f1f3f675f2Online quantitative mass spectrometry for the rapid adaptive optimisation of automated flow reactorsHolmes, Nicholas; Akien, Geoffrey R.; Savage, Robert J. D.; Stanetty, Christian; Baxendale, Ian R.; Blacker, A. John; Taylor, Brian A.; Woodward, Robert L.; Meadows, Rebecca E.; Bourne, Richard A.Reaction Chemistry & Engineering (2016), 1 (1), 96-100CODEN: RCEEBW; ISSN:2058-9883. (Royal Society of Chemistry)An automated continuous reactor for the synthesis of org. compds., which uses online mass spectrometry (MS) for reaction monitoring and product quantification, is presented. Quant. and rapid MS monitoring was developed and calibrated using HPLC. The amidation of Me nicotinate with aq. MeNH2 was optimized using design of expts. and a self-optimization algorithm approach to produce >93% yield.
- 26Reizman, B. J.; Jensen, K. F. Feedback in Flow for Accelerated Reaction Development. Acc. Chem. Res. 2016, 49 (9), 1786– 1796, DOI: 10.1021/acs.accounts.6b00261Google Scholar26https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhtlaru7%252FI&md5=ddaef4153972b8b045df12015e5b5d14Feedback in Flow for Accelerated Reaction DevelopmentReizman, Brandon J.; Jensen, Klavs F.Accounts of Chemical Research (2016), 49 (9), 1786-1796CODEN: ACHRE4; ISSN:0001-4842. (American Chemical Society)The pharmaceutical industry is investing in continuous flow and high-throughput experimentation as tools for rapid process development accelerated scale-up. Coupled with automation, these technologies offer the potential for comprehensive reaction characterization and optimization, but with the cost of conducting exhaustive multifactor screens. Automated feedback in flow offers researchers an alternative strategy for efficient characterization of reactions based on the use of continuous technol. to control chem. reaction conditions and optimize in lieu of screening. Optimization with feedback allows expts. to be conducted where the most information can be gained from the chem., enabling product yields to be maximized and kinetic models to be generated while the total no. of expts. is minimized. This account opens by reviewing select examples of feedback optimization in flow and applications to chem. research. Systems in the literature are classified into (i) deterministic black box optimization systems that do not model the reaction system and are therefore limited in the utility of results for scale-up, (ii) deterministic model-based optimization systems from which reaction kinetics and/or mechanisms can be automatically evaluated, and (iii) stochastic systems. Though diverse in application, flow feedback systems have predominantly focused upon the optimization of continuous variables, i.e., variables such as time, temp., and concn. that can be ramped from one expt. to the next. Unfortunately, this implies that the screening of discrete variables such as catalyst, ligand, or solvent generally does not factor into automated flow optimization, resulting in incomplete process knowledge. Herein, a system and strategy is presented developed for optimizing discrete and continuous variables of a chem. reaction simultaneously. The approach couples automated feedback with high-throughput reaction screening in droplet flow microfluidics. This account details the system configuration for on-demand creation of sub-20 μL droplets with interchangeable reagents and catalysts. These droplets are reacted in a fully automated microfluidic system and analyzed online by LC/MS. Feeding back from the online anal. results, a design of expts. (DoE)-based adaptive response surface algorithm is employed that deductively removes candidate reagents from the optimization as optimal reaction conditions are refined, leading to rapid convergence. Using the automated optimization platform, case studies are presented for solvent selection in a competitive alkylation chem. and for catalyst-ligand selection in heteroarom. Suzuki-Miyaura cross-coupling chemistries. For the monoalkylation of trans-1,2-diaminocyclohexane, polar aprotic solvents at moderate temps. are shown to be favorable, with optimality accurately identified with DMSO as the solvent in 67 expts. For Suzuki-Miyaura cross-couplings, the optimality of precatalysts and continuous variable conditions are obsd. to change in accordance with the coupling reagents, providing insights into catalyst behavior in the context of the reaction mechanism. Future opportunities in automated reaction development include the incorporation of chemoinformatics for faster anal. and machine-learning algorithms to guide and optimize the synthesis. Adoption of this technol. stands to reduce graduate student and postdoc time on routine tasks in the lab., while feeding back knowledge used to guide new research directions. Moreover, the application of this technol. in industry promises to lessen the cost and time assocd. with advancing pharmaceutical mols. through development and scale-up.
- 27Baumgartner, L. M.; Coley, C. W.; Reizman, B. J.; Gao, K. W.; Jensen, K. F. Optimum Catalyst Selection over Continuous and Discrete Process Variables with a Single Droplet Microfluidic Reaction Platform. React. Chem. Eng. 2018, 3 (3), 301– 311, DOI: 10.1039/C8RE00032HGoogle Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXntlCntbo%253D&md5=a77e6dec5646d61ecc5eb42d04786fb4Optimum catalyst selection over continuous and discrete process variables with a single droplet microfluidic reaction platformBaumgartner, Lorenz M.; Coley, Connor W.; Reizman, Brandon J.; Gao, Kevin W.; Jensen, Klavs F.Reaction Chemistry & Engineering (2018), 3 (3), 301-311CODEN: RCEEBW; ISSN:2058-9883. (Royal Society of Chemistry)A mixed-integer nonlinear program (MINLP) algorithm to optimize catalyst turnover no. (TON) and product yield by simultaneously modulating discrete variables-catalyst types-and continuous variables-temp., residence time, and catalyst loading-was implemented and validated. Several simulated case studies, with and without random measurement error, demonstrate the algorithm's robustness in finding optimal conditions in the presence of side reactions and other complicating nonlinearities. This algorithm was applied to the real-time optimization of a Suzuki-Miyaura cross-coupling reaction in an automated microfluidic reaction platform comprising a liq. handler, an oscillatory flow reactor, and an online LC/MS. The algorithm, based on a combination of branch and bound and adaptive response surface methods, identified exptl. conditions that maximize TON subject to a yield constraint from a pool of eight catalyst candidates in just 60 expts., considerably fewer than a previous version of the algorithm.
- 28Kamlet, M. J.; Abboud, J. L.M.; Abraham, M. H.; Taft, R. W. Linear Solvation Energy Relationships. 23. A Comprehensive Collection of the Solvatochromic Parameters,.Pi.*,.Alpha., and.Beta., and Some Methods for Simplifying the Generalized Solvatochromic Equation. J. Org. Chem. 1983, 48 (17), 2877– 2887, DOI: 10.1021/jo00165a018Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL3sXkvVOgsbc%253D&md5=99b60c3cc817d5508b28d85f3ad6c8a1Linear solvation energy relationships. 23. A comprehensive collection of the solvatochromic parameters, π*, α, and β, and some methods for simplifying the generalized solvatochromic equationKamlet, Mortimer J.; Abboud, Jose Luis M.; Abraham, Michael H.; Taft, R. W.Journal of Organic Chemistry (1983), 48 (17), 2877-87CODEN: JOCEAH; ISSN:0022-3263.A generalized equation for linear solvation energy relations or complexation energy relations is developed which involves 6 terms: π* (a solvent dipolarity-polarizability term), α (solvent hydrogen-bond acceptor term), β (solvent hydrogen-bond donor term), δ (solvent polarizability correction term), δH (Hildebrand soly. parameter), and ξ. This equation is reduced to a more manageable form by a judicious choice of solvents and reactants or indicators. One-, two- or three-parameter LFER involving different combinations of the above parameters and various types of physicochem. properties are obsd. A comprehensive collection of π*, α, and β for 217 solvents is presented.
- 29Struebing, H.; Ganase, Z.; Karamertzanis, P. G.; Siougkrou, E.; Haycock, P.; Piccione, P. M.; Armstrong, A.; Galindo, A.; Adjiman, C. S. Computer-Aided Molecular Design of Solvents for Accelerated Reaction Kinetics. Nat. Chem. 2013, 5 (11), 952– 957, DOI: 10.1038/nchem.1755Google Scholar29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhsV2jsrbP&md5=2fcd40cb79d2e6db80227791a628da0bComputer-aided molecular design of solvents for accelerated reaction kineticsStruebing, Heiko; Ganase, Zara; Karamertzanis, Panagiotis G.; Siougkrou, Eirini; Haycock, Peter; Piccione, Patrick M.; Armstrong, Alan; Galindo, Amparo; Adjiman, Claire S.Nature Chemistry (2013), 5 (11), 952-957CODEN: NCAHBB; ISSN:1755-4330. (Nature Publishing Group)Solvents can significantly alter the rates and selectivity of liq.-phase org. reactions, often hindering the development of new synthetic routes or, if chosen wisely, facilitating routes by improving rates and selectivities. To address this challenge, a systematic methodol. is proposed that quickly identifies improved reaction solvents by combining quantum mech. computations of the reaction rate const. in a few solvents with a computer-aided mol. design (CAMD) procedure. The approach allows the identification of a high-performance solvent within a very large set of possible mols. The validity of the authors' CAMD approach is demonstrated through application to a classical nucleophilic substitution reaction for the study of solvent effects, the Menschutkin reaction. The results were validated successfully by in situ kinetic expts. A space of 1,341 solvents was explored in silico, but required quantum-mech. calcns. of the rate const. in only nine solvents, and uncovered a solvent that increases the rate const. by 40%.
- 30Marcou, G.; Aires De Sousa, J.; Latino, D. A.R.S.; De Luca, A.; Horvath, D.; Rietsch, V.; Varnek, A. Expert System for Predicting Reaction Conditions: The Michael Reaction Case. J. Chem. Inf. Model. 2015, 55 (2), 239– 250, DOI: 10.1021/ci500698aGoogle Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXnvFGnsQ%253D%253D&md5=205b5b0bd2af7f06f63174e2f6f799b1Expert System for Predicting Reaction Conditions: The Michael Reaction CaseMarcou, G.; Aires de Sousa, J.; Latino, D. A. R. S.; de Luca, A.; Horvath, D.; Rietsch, V.; Varnek, A.Journal of Chemical Information and Modeling (2015), 55 (2), 239-250CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A generic chem. transformation may often be achieved under various synthetic conditions. However, for any specific reagents, only one or a few among the reported synthetic protocols may be successful. For example, Michael β-addn. reactions may proceed under different choices of solvent (e.g., hydrophobic, aprotic polar, protic) and catalyst (e.g., Bronsted acid, Lewis acid, Lewis base, etc.). Chemoinformatics methods could be efficiently used to establish a relationship between the reagent structures and the required reaction conditions, which would allow synthetic chemists to waste less time and resources in trying out various protocols in search for the appropriate one. In order to address this problem, a no. of 2-classes classification models have been built on a set of 198 Michael reactions retrieved from literature. Trained models discriminate between processes that are compatible and resp. processes not feasible under a specific reaction condition option (feasible or not with a Lewis acid catalyst, feasible or not in hydrophobic solvent, etc.). Eight distinct models were built to decide the compatibility of a Michael addn. process with each considered reaction condition option, while a ninth model was aimed to predict whether the assumed Michael addn. is feasible at all. Different machine-learning methods (Support Vector Machine, Naive Bayes, and Random Forest) in combination with different types of descriptors (ISIDA fragments issued from Condensed Graphs of Reactions, MOLMAP, Electronic Effect Descriptors, and Chem. Development Kit computed descriptors) have been used. Models have good predictive performance in 3-fold cross-validation done three times: balanced accuracy varies from 0.7 to 1. Developed models are available for the users at http://infochim.u-strasbg.fr/webserv/VSEngine.html. Eventually, these were challenged to predict feasibility conditions for ∼50 novel Michael reactions from the eNovalys database (originally from patent literature).
- 31Lin, A. I.; Madzhidov, T. I.; Klimchuk, O.; Nugmanov, R. I.; Antipin, I. S.; Varnek, A. Automatized Assessment of Protective Group Reactivity: A Step toward Big Reaction Data Analysis. J. Chem. Inf. Model. 2016, 56 (11), 2140– 2148, DOI: 10.1021/acs.jcim.6b00319Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhslemsL%252FM&md5=734545a7f2a3b34f49af6713d443b6beAutomatized Assessment of Protective Group Reactivity: A Step Toward Big Reaction Data AnalysisLin, Arkadii I.; Madzhidov, Timur I.; Klimchuk, Olga; Nugmanov, Ramil I.; Antipin, Igor S.; Varnek, AlexandreJournal of Chemical Information and Modeling (2016), 56 (11), 2140-2148CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The authors report a new method to assess protective groups (PGs) reactivity as a function of reaction conditions (catalyst, solvent) using raw reaction data. It is based on an intuitive similarity principle for chem. reactions: similar reactions proceed under similar conditions. Tech., reaction similarity can be assessed using the Condensed Graph of Reaction (CGR) approach representing an ensemble of reactants and products as a single mol. graph, i.e., as a pseudomol. for which mol. descriptors or fingerprints can be calcd. CGR-based inhouse tools were used to process data for 142,111 catalytic hydrogenation reactions extd. from the Reaxys database. Results reveal some contradictions with famous Greene's Reactivity Charts based on manual expert anal. Models developed in this study show high accuracy (∼90%) for predicting optimal exptl. conditions of protective group deprotection.
- 32Segler, M. H.S.; Waller, M. P. Modelling Chemical Reasoning to Predict and Invent Reactions. Chem. - Eur. J. 2017, 23 (25), 6118– 6128, DOI: 10.1002/chem.201604556Google Scholar32https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXislersw%253D%253D&md5=ac3b304ec62b8d90110b7722305e2b3dModelling Chemical Reasoning to Predict and Invent ReactionsSegler, Marwin H. S.; Waller, Mark P.Chemistry - A European Journal (2017), 23 (25), 6118-6128CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)The ability to reason beyond established knowledge allows org. chemists to solve synthetic problems and invent novel transformations. Herein, we propose a model that mimics chem. reasoning, and formalises reaction prediction as finding missing links in a knowledge graph. We have constructed a knowledge graph contg. 14.4 million mols. and 8.2 million binary reactions, which represents the bulk of all chem. reactions ever published in the scientific literature. Our model outperforms a rule-based expert system in the reaction prediction task for 180 000 randomly selected binary reactions. The data-driven model generalises even beyond known reaction types, and is thus capable of effectively (re-)discovering novel transformations (even including transition metal-catalyzed reactions). Our model enables computers to infer hypotheses about reactivity and reactions by only considering the intrinsic local structure of the graph and because each single reaction prediction is typically achieved in a sub-second time frame, the model can be used as a high-throughput generator of reaction hypotheses for reaction discovery.
- 33Mikie Kanada, R.; Taniguchi, T.; Ogasawara, K. Asymmetric Hydrogen Transfer Protocol for a Synthesis of (+)-Frontalin and (−)-Malyngolide. Tetrahedron Lett. 2000, 41 (19), 3631– 3635, DOI: 10.1016/S0040-4039(00)00430-5Google ScholarThere is no corresponding record for this reference.
- 34Faroux-Corlay, B.; Clary, L.; Gadras, C.; Hammache, D.; Greiner, J.; Santaella, C.; Aubertin, A. M.; Vierling, P.; Fantini, J. Synthesis of Single- and Double-Chain Fluorocarbon and Hydrocarbon Galactosyl Amphiphiles and Their Anti-HIV-1 Activity. Carbohydr. Res. 2000, 327 (3), 223– 260, DOI: 10.1016/S0008-6215(00)00055-0Google ScholarThere is no corresponding record for this reference.
- 35Wang, H.; Yu, S. Synthesis of Isoquinolones Using Visible-Light-Promoted Denitrogenative Alkyne Insertion of 1,2,3-Benzotriazinones. Org. Lett. 2015, 17 (17), 4272– 4275, DOI: 10.1021/acs.orglett.5b01960Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtlOjur3K&md5=86034a6db6d0456dbe780e5faec49e63Synthesis of Isoquinolones Using Visible-Light-Promoted Denitrogenative Alkyne Insertion of 1,2,3-BenzotriazinonesWang, Hao; Yu, ShouyunOrganic Letters (2015), 17 (17), 4272-4275CODEN: ORLEF7; ISSN:1523-7052. (American Chemical Society)A visible-light-promoted regioselective denitrogenative insertion of terminal alkynes into 1,2,3-benzotriazinones is reported. This mechanistically novel process allows the synthesis of substituted isoquinolones in satisfactory isolated yields (24 examples, 46-84% yield) at room temp. under visible-light irradn. with the assistance of a photocatalyst. The proposed single-electron-transfer pathway was supported by TEMPO trapping, radical clock expts., and Stern-Volmer anal.
- 36Li, K.; Zeng, Y.; Neuenswander, B. Sequential Pd (II) -Pd (0) Catalysis for the Rapid Synthesis of Coumarins. J. Org. Chem. 2005, 70 (16), 6515– 6518, DOI: 10.1021/jo050671lGoogle ScholarThere is no corresponding record for this reference.
- 37Mavunkel, B.; Xu, Y.; Goyal, B.; Lim, D.; Lu, Q.; Chen, Z.; Wang, D.-X.; Higaki, J.; Chakraborty, I.; Liclican, A. Pyrimidine-Based Inhibitors of CaMKIIδ. Bioorg. Med. Chem. Lett. 2008, 18 (7), 2404– 2408, DOI: 10.1016/j.bmcl.2008.02.056Google ScholarThere is no corresponding record for this reference.
- 38Lautens, M.; Maddess, M. L. Chemoselective Cross Metathesis of Bishomoallylic Alcohols : Rapid Access to Fragment A of the Cryptophycins. Supplementary Material The Following Includes Representative Experimental Procedures and Details for Isolation of Compounds. Full Characterisat. Org. Lett. 2004, 6 (12), 1883– 1886, DOI: 10.1021/ol049883fGoogle ScholarThere is no corresponding record for this reference.
- 39Krüger, T.; Vorndran, K.; Linker, T. Regioselective Arene Functionalization: Simple Substitution of Carboxylate by Alkyl Groups. Chem. - Eur. J. 2009, 15 (44), 12082– 12091, DOI: 10.1002/chem.200901774Google Scholar39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtlyqu77F&md5=b143b98bc1a436b16bf909a9347629dcRegioselective Arene Functionalization: Simple Substitution of Carboxylate by Alkyl GroupsKrueger, Tobias; Vorndran, Katja; Linker, TorstenChemistry - A European Journal (2009), 15 (44), 12082-12091CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)Arenes with various alkyl side-chains were synthesized in high yields and excellent regioselectivities. The carboxylate group in toluic and naphthoic acids was conveniently substituted by alkyl halides by Birch redn. and subsequent decarbonylation. The method is characterized by inexpensive starting materials and reagents, and methylation of arenes was realized. Besides simple alkyl substituents, the scope of arene functionalization was extended by benzyl as well as fluoro-, amino-, and ester-contg. alkyl groups. The alkylation of 1-naphthoic acid during Birch redn. can be controlled by the addn. of tert-butanol. This allowed the regioselective synthesis of mono and bis-substituted naphthalenes from the same starting material.
- 40Palmes, J. A.; Paioti, P. H.S.; De Souza, L. P.; Aponick, A. PdII-Catalyzed Spiroketalization of Ketoallylic Diols. Chem. - Eur. J. 2013, 19 (35), 11613– 11621, DOI: 10.1002/chem.201301723Google Scholar40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtFShtLrN&md5=4271b9729abcbf0817cd65ea2cd5fb49PdII-Catalyzed Spiroketalization of Ketoallylic DiolsPalmes, Jean A.; Paioti, Paulo H. S.; de Souza, Leonardo Perez; Aponick, AaronChemistry - A European Journal (2013), 19 (35), 11613-11621CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)A high-yielding stereoselective method for forming spiroketals from simple ketone-allylic diol derivs. was reported. Using catalytic [PdCl2(MeCN)2] in THF at 0° these dehydration-cyclization reactions require only mild conditions to produce vinyl-substituted spiroketals in high yields after brief reaction times with water as the only byproduct. Using this method, the stereochem. information embedded at the nucleophile is transmitted down-the-chain and efficiently sets the stereochem. at both the anomeric carbon atom and the newly formed allylic stereocenter. The title compds. thus formed included a spiroketal (I) and related substances, such as ro[5.5]undecane derivs. and 1,6-dioxaspiro[4.5]decane derivs., carbohydrate monosaccharide analogs, such as a 5,9-anhydro-2,3,4-trideoxy-D-gluco-5-deculo-5,1-pyranose deriv. The synthesis of the target compds. was achieved by a palladium-catalyzed cyclization of a chiral 3,13-dihydroxy-11-tetradecen-7-one deriv. (II) and similar compds., for example (8E)-1,10-dihydroxy-8-decen-5-one.
- 41Liu, J.; Fitzgerald, A. E.; Mani, N. S. Facile Assembly of Fused Benzo[4,5]Furo Heterocycles. J. Org. Chem. 2008, 73 (7), 2951– 2954, DOI: 10.1021/jo8000595Google ScholarThere is no corresponding record for this reference.
- 42Schaub, C.; Müller, B.; Schmidt, R. R. Sialyltransferase Inhibitors Based on CMP-Quinic Acid. Eur. J. Org. Chem. 2000, 2000 (9), 1745– 1758, DOI: 10.1002/(SICI)1099-0690(200005)2000:9<1745::AID-EJOC1745>3.0.CO;2-8Google ScholarThere is no corresponding record for this reference.
- 43Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Advances in neural information processing systems; NIPS: Lake Tahoe, 2013; pp 3111– 3119.Google ScholarThere is no corresponding record for this reference.
- 44García-Alonso, C. R.; Pérez-Naranjo, L. M.; Fernández-Caballero, J. C. Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms. Ann. Oper. Res. 2014, 219 (1), 187– 202, DOI: 10.1007/s10479-011-0841-3Google ScholarThere is no corresponding record for this reference.
- 45Open-source. RDKit: Open-Source Cheminformatics Software , 2006 (accessed on Sept 28, 2017).Google ScholarThere is no corresponding record for this reference.
- 46Schneider, N.; Lowe, D. M.; Sayle, R. A.; Landrum, G. A. Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity. J. Chem. Inf. Model. 2015, 55 (1), 39– 53, DOI: 10.1021/ci5006614Google Scholar46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitFGrsrrE&md5=694e4122d2642de22065e46d7a0826ccDevelopment of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and SimilaritySchneider, Nadine; Lowe, Daniel M.; Sayle, Roger A.; Landrum, Gregory A.Journal of Chemical Information and Modeling (2015), 55 (1), 39-53CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Fingerprint methods applied to mols. have proven to be useful for similarity detn. and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chem. reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calcd. physicochem. properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also obsd. when applying the classifier to reactions from an inhouse electronic lab. notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster anal. that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the anal. are provided in the Supporting Information.
- 47Williams, R. J.; Zipser, D. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Comput. 1989, 1 (2), 270– 280, DOI: 10.1162/neco.1989.1.2.270Google ScholarThere is no corresponding record for this reference.
- 48Gobbi, A.; Poppinger, D. Genetic Optimization of Combinatorial Libraries. Biotechnol. Bioeng. 1998, 61 (1), 47– 54, DOI: 10.1002/(SICI)1097-0290(199824)61:1<47::AID-BIT9>3.0.CO;2-ZGoogle Scholar48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXisVeksb8%253D&md5=3cfe243bf7cea6951886c5551f5ab3c6Genetic optimization of combinatorial librariesGobbi, Alberto; Poppinger, DieterBiotechnology and Bioengineering (1998), 61 (1), 47-53CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)Most agrochem. and pharmaceutical companies have set up high-throughput screening programs which require large nos. of compds. to screen. Combinatorial libraries provide an attractive way to deliver these compds. A single combinatorial library with four variable positions can yield >1012 potential compds., if one assumes that ∼1000 reagents are available for each position. This is far more than any high-throughput screening facility can afford to screen. We have proposed a method for iterative compd. selection from large databases, which identifies the most active compds. by examg. only a small fraction of the database. In this article, we describe the extension of this method to the problem of selecting compds. from large combinatorial libraries.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 289 publications.
- Matthew A. McDonald, Brent A. Koscher, Richard B. Canty, Jason Zhang, Angelina Ning, Klavs F. Jensen. Bayesian Optimization over Multiple Experimental Fidelities Accelerates Automated Discovery of Drug Molecules. ACS Central Science 2025, 11
(2)
, 346-356. https://doi.org/10.1021/acscentsci.4c01991
- Lanxin Long, Rui Li, Jian Zhang. Artificial Intelligence in Retrosynthesis Prediction and its Applications in Medicinal Chemistry. Journal of Medicinal Chemistry 2025, 68
(3)
, 2333-2355. https://doi.org/10.1021/acs.jmedchem.4c02749
- Dong-Zhi Li, Xue-Qing Gong. Challenges with Literature-Derived Data in Machine Learning for Yield Prediction: A Case Study on Pd-Catalyzed Carbonylation Reactions. The Journal of Physical Chemistry A 2024, 128
(48)
, 10423-10430. https://doi.org/10.1021/acs.jpca.4c05489
- Alexander S. Shved, Blake E. Ocampo, Elena S. Burlova, Casey L. Olen, N. Ian Rinehart, Scott E. Denmark. molli: A General Purpose Python Toolkit for Combinatorial Small Molecule Library Generation, Manipulation, and Feature Extraction. Journal of Chemical Information and Modeling 2024, 64
(21)
, 8083-8090. https://doi.org/10.1021/acs.jcim.4c00424
- Ajnabiul Hoque, Mihir Surve, Shivaram Kalyanakrishnan, Raghavan B. Sunoj. Reinforcement Learning for Improving Chemical Reaction Performance. Journal of the American Chemical Society 2024, 146
(41)
, 28250-28267. https://doi.org/10.1021/jacs.4c08866
- Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, Regina Barzilay. OpenChemIE: An Information Extraction Toolkit for Chemistry Literature. Journal of Chemical Information and Modeling 2024, 64
(14)
, 5521-5534. https://doi.org/10.1021/acs.jcim.4c00572
- Samuel T. Cahill, Joseph E. B. Young, Max Howe, Ryan Clark, Andrew F. Worrall, Malcolm I. Stewart. Assignment of Regioisomers Using Infrared Spectroscopy: A Python Coding Exercise in Data Processing and Machine Learning. Journal of Chemical Education 2024, 101
(7)
, 2925-2932. https://doi.org/10.1021/acs.jchemed.4c00295
- Liang Gao, Jiaping Lin, Liquan Wang, Lei Du. Machine Learning-Assisted Design of Advanced Polymeric Materials. Accounts of Materials Research 2024, 5
(5)
, 571-584. https://doi.org/10.1021/accountsmr.3c00288
- Eisuke Sato, Gaku Tachiwaki, Mayu Fujii, Koichi Mitsudo, Takashi Washio, Shinobu Takizawa, Seiji Suga. Electrochemical Carbon-Ferrier Rearrangement Using a Microflow Reactor and Machine Learning-Assisted Exploration of Suitable Conditions. Organic Process Research & Development 2024, 28
(5)
, 1422-1429. https://doi.org/10.1021/acs.oprd.2c00267
- Daniel S. Wigh, Joe Arrowsmith, Alexander Pomberger, Kobi C. Felton, Alexei A. Lapkin. ORDerly: Data Sets and Benchmarks for Chemical Reaction Data. Journal of Chemical Information and Modeling 2024, 64
(9)
, 3790-3798. https://doi.org/10.1021/acs.jcim.4c00292
- Yuheng Ding, Bo Qiang, Qixuan Chen, Yiqiao Liu, Liangren Zhang, Zhenming Liu. Exploring Chemical Reaction Space with Machine Learning Models: Representation and Feature Perspective. Journal of Chemical Information and Modeling 2024, 64
(8)
, 2955-2970. https://doi.org/10.1021/acs.jcim.4c00004
- Hiroaki Okada, Satoshi Maeda. On Accelerating Substrate Optimization Using Computational Gibbs Energy Barriers: A Numerical Consideration Utilizing a Computational Data Set. ACS Omega 2024, 9
(6)
, 7123-7131. https://doi.org/10.1021/acsomega.3c09066
- Varvara Voinarovska, Mikhail Kabeshov, Dmytro Dudenko, Samuel Genheden, Igor V. Tetko. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. Journal of Chemical Information and Modeling 2024, 64
(1)
, 42-56. https://doi.org/10.1021/acs.jcim.3c01524
- Erin C. Day, Supraja S. Chittari, Matthew P. Bogen, Abigail S. Knight. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS Polymers Au 2023, 3
(6)
, 406-427. https://doi.org/10.1021/acspolymersau.3c00025
- Alessandra Toniato, Alain C. Vaucher, Marzena Maria Lehmann, Torsten Luksch, Philippe Schwaller, Marco Stenta, Teodoro Laino. Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets. Chemistry of Materials 2023, 35
(21)
, 8806-8815. https://doi.org/10.1021/acs.chemmater.3c01406
- Hossam Nada, Anam Rana Gul, Ahmed Elkamhawy, Sungdo Kim, Minkyoung Kim, Yongseok Choi, Tae Jung Park, Kyeong Lee. Machine Learning-Based Approach to Developing Potent EGFR Inhibitors for Breast Cancer─Design, Synthesis, and In Vitro Evaluation. ACS Omega 2023, 8
(35)
, 31784-31800. https://doi.org/10.1021/acsomega.3c02799
- William Finnigan, Max Lubberink, Lorna J. Hepworth, Joan Citoler, Ashley P. Mattey, Grayson J. Ford, Jack Sangster, Sebastian C. Cosgrove, Bruna Zucoloto da Costa, Rachel S. Heath, Thomas W. Thorpe, Yuqi Yu, Sabine L. Flitsch, Nicholas J. Turner. RetroBioCat Database: A Platform for Collaborative Curation and Automated Meta-Analysis of Biocatalysis Data. ACS Catalysis 2023, 13
(17)
, 11771-11780. https://doi.org/10.1021/acscatal.3c01418
- Rocío Mercado, Steven M. Kearnes, Connor W. Coley. Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data. Journal of Chemical Information and Modeling 2023, 63
(14)
, 4253-4265. https://doi.org/10.1021/acs.jcim.3c00607
- Matisyahu S. Fogel, Kazunori Koide. Recent Progress on One-Pot Multisubstrate Screening. Organic Process Research & Development 2023, 27
(7)
, 1235-1247. https://doi.org/10.1021/acs.oprd.3c00128
- Teresa M. Karl, Samir Bouayad-Gervais, Julian A. Hueffel, Theresa Sperger, Sebastian Wellig, Sherif J. Kaldas, Uladzislava Dabranskaya, Jas S. Ward, Kari Rissanen, Graham J. Tizzard, Franziska Schoenebeck. Machine Learning-Guided Development of Trialkylphosphine Ni(I) Dimers and Applications in Site-Selective Catalysis. Journal of the American Chemical Society 2023, 145
(28)
, 15414-15424. https://doi.org/10.1021/jacs.3c03403
- Johanna Kleinekorte, Jonas Kleppich, Lorenz Fleitmann, Verena Beckert, Luise Blodau, André Bardow. APPROPRIATE Life Cycle Assessment: A PROcess-Specific, PRedictive Impact AssessmenT Method for Emerging Chemical Processes. ACS Sustainable Chemistry & Engineering 2023, 11
(25)
, 9303-9319. https://doi.org/10.1021/acssuschemeng.2c07682
- Wei Liu, James Mulhearn, Bo Hao, Santiago Cañellas, Stefaan Last, José Enrique Gómez, Alexander Jones, Alexander De Vera, Kiran Kumar, Raquel Rodríguez, Lars Van Eynde, Iulia I. Strambeanu, Scott E. Wolkenberg. Enabling Deoxygenative C(sp2)-C(sp3) Cross-Coupling for Parallel Medicinal Chemistry. ACS Medicinal Chemistry Letters 2023, 14
(6)
, 853-859. https://doi.org/10.1021/acsmedchemlett.3c00118
- Andrzej M. Żurański, Shivaani S. Gandhi, Abigail G. Doyle. A Machine Learning Approach to Model Interaction Effects: Development and Application to Alcohol Deoxyfluorination. Journal of the American Chemical Society 2023, 145
(14)
, 7898-7909. https://doi.org/10.1021/jacs.2c13093
- Fernando Jaume-Santero, Alban Bornet, Alain Valery, Nona Naderi, David Vicente Alvarez, Dimitrios Proios, Anthony Yazdani, Colin Bournez, Thomas Fessard, Douglas Teodoro. Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios. Journal of Chemical Information and Modeling 2023, 63
(7)
, 1914-1924. https://doi.org/10.1021/acs.jcim.2c01407
- Daniel S. Wigh, Matthieu Tissot, Patrick Pasau, Jonathan M. Goodman, Alexei A. Lapkin. Quantitative In Silico Prediction of the Rate of Protodeboronation by a Mechanistic Density Functional Theory-Aided Algorithm. The Journal of Physical Chemistry A 2023, 127
(11)
, 2628-2636. https://doi.org/10.1021/acs.jpca.2c08250
- Connor J. Taylor, Alexander Pomberger, Kobi C. Felton, Rachel Grainger, Magda Barecka, Thomas W. Chamberlain, Richard A. Bourne, Christopher N. Johnson, Alexei A. Lapkin. A Brief Introduction to Chemical Reaction Optimization. Chemical Reviews 2023, 123
(6)
, 3089-3126. https://doi.org/10.1021/acs.chemrev.2c00798
- Christopher Karpovich, Elton Pan, Zach Jensen, Elsa Olivetti. Interpretable Machine Learning Enabled Inorganic Reaction Classification and Synthesis Condition Prediction. Chemistry of Materials 2023, 35
(3)
, 1062-1079. https://doi.org/10.1021/acs.chemmater.2c03010
- Martin Fitzner, Georg Wuitschik, Raffael Koller, Jean-Michel Adam, Torsten Schindler. Machine Learning C–N Couplings: Obstacles for a General-Purpose Reaction Yield Prediction. ACS Omega 2023, 8
(3)
, 3017-3025. https://doi.org/10.1021/acsomega.2c05546
- Jose Raul Montero BastidasAbdellatif El MarrouniMaria Irina Chiriac, Thomas Struble, Dipannita Kalyani. ACCELERATING DRUG DISCOVERY BY HIGH-THROUGHPUT EXPERIMENTATION. , 443-463. https://doi.org/10.1021/mc-2022-vol57.ch18
- Youngchun Kwon, Dongseon Lee, Jin Woo Kim, Youn-Suk Choi, Sun Kim. Exploring Optimal Reaction Conditions Guided by Graph Neural Networks and Bayesian Optimization. ACS Omega 2022, 7
(49)
, 44939-44950. https://doi.org/10.1021/acsomega.2c05165
- Youngchun Kwon, Sun Kim, Youn-Suk Choi, Seokho Kang. Generative Modeling to Predict Multiple Suitable Conditions for Chemical Reactions. Journal of Chemical Information and Modeling 2022, 62
(23)
, 5952-5960. https://doi.org/10.1021/acs.jcim.2c01085
- Austin M. Mroz, Victor Posligua, Andrew Tarzia, Emma H. Wolpert, Kim E. Jelfs. Into the Unknown: How Computation Can Help Explore Uncharted Material Space. Journal of the American Chemical Society 2022, 144
(41)
, 18730-18743. https://doi.org/10.1021/jacs.2c06833
- Liang Cao, Jianping Su, Yixiu Wang, Yankai Cao, Lim C. Siang, Jin Li, Jack Nicholas Saddler, Bhushan Gopaluni. Causal Discovery Based on Observational Data and Process Knowledge in Industrial Processes. Industrial & Engineering Chemistry Research 2022, 61
(38)
, 14272-14283. https://doi.org/10.1021/acs.iecr.2c01326
- Jike Wang, Xiaorui Wang, Huiyong Sun, Mingyang Wang, Yundian Zeng, Dejun Jiang, Zhenxing Wu, Zeyi Liu, Ben Liao, Xiaojun Yao, Chang-Yu Hsieh, Dongsheng Cao, Xi Chen, Tingjun Hou. ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery. Journal of Medicinal Chemistry 2022, 65
(18)
, 12482-12496. https://doi.org/10.1021/acs.jmedchem.2c01179
- Lixue Cheng, Jiace Sun, Thomas F. Miller, III. Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. Journal of Chemical Theory and Computation 2022, 18
(8)
, 4826-4835. https://doi.org/10.1021/acs.jctc.2c00396
- Uschi Dolfus, Hans Briem, Matthias Rarey. Synthesis-Aware Generation of Structural Analogues. Journal of Chemical Information and Modeling 2022, 62
(15)
, 3565-3576. https://doi.org/10.1021/acs.jcim.2c00246
- Li-Tao Zhu, Xi-Zhong Chen, Bo Ouyang, Wei-Cheng Yan, He Lei, Zhe Chen, Zheng-Hong Luo. Review of Machine Learning for Hydrodynamics, Transport, and Reactions in Multiphase Flows and Reactors. Industrial & Engineering Chemistry Research 2022, 61
(28)
, 9901-9949. https://doi.org/10.1021/acs.iecr.2c01036
- Kevin A. Spiekermann, Lagnajit Pattanaik, William H. Green. Fast Predictions of Reaction Barrier Heights: Toward Coupled-Cluster Accuracy. The Journal of Physical Chemistry A 2022, 126
(25)
, 3976-3986. https://doi.org/10.1021/acs.jpca.2c02614
- Zhiqing Xu, Radhakrishnan Mahadevan. Efficient Enumeration of Branched Novel Biochemical Pathways Using a Probabilistic Technique. Industrial & Engineering Chemistry Research 2022, 61
(25)
, 8645-8657. https://doi.org/10.1021/acs.iecr.1c02211
- Hanyu Gao, Li-Tao Zhu, Zheng-Hong Luo, Marco A. Fraga, I-Ming Hsing. Machine Learning and Data Science in Chemical Engineering. Industrial & Engineering Chemistry Research 2022, 61
(24)
, 8357-8358. https://doi.org/10.1021/acs.iecr.2c01788
- Alfredo Pereira, Camilo Albornoz, Oleksandra S. Trofymchuk. Data-Driven Analysis of Reactions Catalyzed by [CoCp*(CO)I2]. Organometallics 2022, 41
(10)
, 1158-1166. https://doi.org/10.1021/acs.organomet.2c00051
- Jiang Guo, A. Santiago Ibanez-Lopez, Hanyu Gao, Victor Quach, Connor W. Coley, Klavs F. Jensen, Regina Barzilay. Automated Chemical Reaction Extraction from Scientific Literature. Journal of Chemical Information and Modeling 2022, 62
(9)
, 2035-2045. https://doi.org/10.1021/acs.jcim.1c00284
- Alexe L. Haywood, Joseph Redshaw, Magnus W. D. Hanson-Heine, Adam Taylor, Alex Brown, Andrew M. Mason, Thomas Gärtner, Jonathan D. Hirst. Kernel Methods for Predicting Yields of Chemical Reactions. Journal of Chemical Information and Modeling 2022, 62
(9)
, 2077-2092. https://doi.org/10.1021/acs.jcim.1c00699
- Jieyu Lu, Yingkai Zhang. Unified Deep Learning Model for Multitask Reaction Predictions with Explanation. Journal of Chemical Information and Modeling 2022, 62
(6)
, 1376-1387. https://doi.org/10.1021/acs.jcim.1c01467
- Wiktor Beker, Rafał Roszak, Agnieszka Wołos, Nicholas H. Angello, Vandana Rathore, Martin D. Burke, Bartosz A. Grzybowski. Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki–Miyaura Coupling. Journal of the American Chemical Society 2022, 144
(11)
, 4819-4827. https://doi.org/10.1021/jacs.1c12005
- Hyunsoo Park, Yeonghun Kang, Wonyoung Choe, Jihan Kim. Mining Insights on Metal–Organic Framework Synthesis from Scientific Literature Texts. Journal of Chemical Information and Modeling 2022, 62
(5)
, 1190-1198. https://doi.org/10.1021/acs.jcim.1c01297
- Senthil M. Arumugam, Dalwinder Singh, Sangeeta Mahala, Bhawana Devi, Sandeep Kumar, Sunaina Jakhu, Sasikumar Elumalai. MgO/CaO Nanocomposite Facilitates Economical Production of d-Fructose and d-Allulose Using Glucose and Its Response Prediction Using a DNN Model. Industrial & Engineering Chemistry Research 2022, 61
(6)
, 2524-2537. https://doi.org/10.1021/acs.iecr.1c04631
- Eisuke Sato, Mayu Fujii, Hiroki Tanaka, Koichi Mitsudo, Masaru Kondo, Shinobu Takizawa, Hiroaki Sasai, Takeshi Washio, Kazunori Ishikawa, Seiji Suga. Application of an Electrochemical Microflow Reactor for Cyanosilylation: Machine Learning-Assisted Exploration of Suitable Reaction Conditions for Semi-Large-Scale Synthesis. The Journal of Organic Chemistry 2021, 86
(22)
, 16035-16044. https://doi.org/10.1021/acs.joc.1c01242
- Steven M. Kearnes, Michael R. Maser, Michael Wleklinski, Anton Kast, Abigail G. Doyle, Spencer D. Dreher, Joel M. Hawkins, Klavs F. Jensen, Connor W. Coley. The Open Reaction Database. Journal of the American Chemical Society 2021, 143
(45)
, 18820-18826. https://doi.org/10.1021/jacs.1c09820
- Samuel Boobier, Yufeng Liu, Krishna Sharma, David R. J. Hose, A. John Blacker, Nikil Kapur, Bao N. Nguyen. Predicting Solvent-Dependent Nucleophilicity Parameter with a Causal Structure Property Relationship. Journal of Chemical Information and Modeling 2021, 61
(10)
, 4890-4899. https://doi.org/10.1021/acs.jcim.1c00610
- Alexander J. S. Hammer, Artem I. Leonov, Nicola L. Bell, Leroy Cronin. Chemputation and the Standardization of Chemical Informatics. JACS Au 2021, 1
(10)
, 1572-1587. https://doi.org/10.1021/jacsau.1c00303
- Kirill Karpov, Artem Mitrofanov, Vadim Korolev, Valery Tkachenko. Size Doesn’t Matter: Predicting Physico- or Biochemical Properties Based on Dozens of Molecules. The Journal of Physical Chemistry Letters 2021, 12
(38)
, 9213-9219. https://doi.org/10.1021/acs.jpclett.1c02477
- Markus Meuwly. Machine Learning for Chemical Reactions. Chemical Reviews 2021, 121
(16)
, 10218-10239. https://doi.org/10.1021/acs.chemrev.1c00033
- Udit Gupta, Dionisios G. Vlachos. Learning Chemistry of Complex Reaction Systems via a Python First-Principles Reaction Rule Stencil (pReSt) Generator. Journal of Chemical Information and Modeling 2021, 61
(7)
, 3431-3441. https://doi.org/10.1021/acs.jcim.1c00297
- Joonyoung F. Joung, Minhi Han, Jinhyo Hwang, Minseok Jeong, Dong Hoon Choi, Sungnam Park. Deep Learning Optical Spectroscopy Based on Experimental Database: Potential Applications to Molecular Design. JACS Au 2021, 1
(4)
, 427-438. https://doi.org/10.1021/jacsau.1c00035
- Andrzej M. Żurański, Jesus I. Martinez Alvarado, Benjamin J. Shields, Abigail G. Doyle. Predicting Reaction Yields via Supervised Learning. Accounts of Chemical Research 2021, 54
(8)
, 1856-1865. https://doi.org/10.1021/acs.accounts.0c00770
- Hanyu Gao, Jean Pauphilet, Thomas J. Struble, Connor W. Coley, Klavs F. Jensen. Direct Optimization across Computer-Generated Reaction Networks Balances Materials Use and Feasibility of Synthesis Plans for Molecule Libraries. Journal of Chemical Information and Modeling 2021, 61
(1)
, 493-504. https://doi.org/10.1021/acs.jcim.0c01032
- Michael R. Maser, Alexander Y. Cui, Serim Ryou, Travis J. DeLano, Yisong Yue, Sarah E. Reisman. Multilabel Classification Models for the Prediction of Cross-Coupling Reaction Conditions. Journal of Chemical Information and Modeling 2021, 61
(1)
, 156-166. https://doi.org/10.1021/acs.jcim.0c01234
- Somesh Mohapatra, Nina Hartrampf, Mackenzie Poskus, Andrei Loas, Rafael Gómez-Bombarelli, Bradley L. Pentelute. Deep Learning for Prediction and Optimization of Fast-Flow Peptide Synthesis. ACS Central Science 2020, 6
(12)
, 2277-2286. https://doi.org/10.1021/acscentsci.0c00979
- Andrew F. Zahrt, Jeremy J. Henle, Scott E. Denmark. Cautionary Guidelines for Machine Learning Studies with Combinatorial Datasets. ACS Combinatorial Science 2020, 22
(11)
, 586-591. https://doi.org/10.1021/acscombsci.0c00118
- Krupal P. Jethava, Jonathan Fine, Yingqi Chen, Ahad Hossain, Gaurav Chopra. Accelerated Reactivity Mechanism and Interpretable Machine Learning Model of N-Sulfonylimines toward Fast Multicomponent Reactions. Organic Letters 2020, 22
(21)
, 8480-8486. https://doi.org/10.1021/acs.orglett.0c03083
- Evan Komp, Stéphanie Valleau. Machine Learning Quantum Reaction Rate Constants. The Journal of Physical Chemistry A 2020, 124
(41)
, 8607-8613. https://doi.org/10.1021/acs.jpca.0c05992
- Thomas J. Struble, Juan C. Alvarez, Scott P. Brown, Milan Chytil, Justin Cisar, Renee L. DesJarlais, Ola Engkvist, Scott A. Frank, Daniel R. Greve, Daniel J. Griffin, Xinjun Hou, Jeffrey W. Johannes, Constantine Kreatsoulas, Brian Lahue, Miriam Mathea, Georg Mogk, Christos A. Nicolaou, Andrew D. Palmer, Daniel J. Price, Richard I. Robinson, Sebastian Salentin, Li Xing, Tommi Jaakkola, William. H. Green, Regina Barzilay, Connor W. Coley, Klavs F. Jensen. Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis. Journal of Medicinal Chemistry 2020, 63
(16)
, 8667-8682. https://doi.org/10.1021/acs.jmedchem.9b02120
- Steven L. Rohall, Lydia Auch, Jonathan Gable, Jacob Gora, Johanna Jansen, Yipin Lu, Eric Martin, Margaret Pancost-Heidebrecht, Bill Shirley, Nikolaus Stiefl, Mika Lindvall. An Artificial Intelligence Approach to Proactively Inspire Drug Discovery with Recommendations. Journal of Medicinal Chemistry 2020, 63
(16)
, 8824-8834. https://doi.org/10.1021/acs.jmedchem.9b02130
- Edward Kim, Zach Jensen, Alexander van Grootel, Kevin Huang, Matthew Staib, Sheshera Mysore, Haw-Shiuan Chang, Emma Strubell, Andrew McCallum, Stefanie Jegelka, Elsa Olivetti. Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks. Journal of Chemical Information and Modeling 2020, 60
(3)
, 1194-1201. https://doi.org/10.1021/acs.jcim.9b00995
- Manuel Moliner, Yuriy Román-Leshkov, Avelino Corma. Machine Learning Applied to Zeolite Synthesis: The Missing Link for Realizing High-Throughput Discovery. Accounts of Chemical Research 2019, 52
(10)
, 2971-2980. https://doi.org/10.1021/acs.accounts.9b00399
- Tzyy-Shyang Lin, Connor W. Coley, Hidenobu Mochigase, Haley K. Beech, Wencong Wang, Zi Wang, Eliot Woods, Stephen L. Craig, Jeremiah A. Johnson, Julia A. Kalow, Klavs F. Jensen, Bradley D. Olsen. BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules. ACS Central Science 2019, 5
(9)
, 1523-1531. https://doi.org/10.1021/acscentsci.9b00476
- Eric Walker, Joshua Kammeraad, Jonathan Goetz, Michael T. Robo, Ambuj Tewari, Paul M. Zimmerman. Learning To Predict Reaction Conditions: Relationships between Solvent, Molecular Structure, and Catalyst. Journal of Chemical Information and Modeling 2019, 59
(9)
, 3645-3654. https://doi.org/10.1021/acs.jcim.9b00313
- Kaushik Sivaramakrishnan, Anjana Puliyanda, Dereje Tamiru Tefera, Ajay Ganesh, Sushmitha Thirumalaivasan, Vinay Prasad. A Perspective on the Impact of Process Systems Engineering on Reaction Engineering. Industrial & Engineering Chemistry Research 2019, 58
(26)
, 11149-11163. https://doi.org/10.1021/acs.iecr.9b00280
- John S. Schreck, Connor W. Coley, Kyle J. M. Bishop. Learning Retrosynthetic Planning through Simulated Experience. ACS Central Science 2019, 5
(6)
, 970-981. https://doi.org/10.1021/acscentsci.9b00055
- Zach Jensen, Edward Kim, Soonhyoung Kwon, Terry Z. H. Gani, Yuriy Román-Leshkov, Manuel Moliner, Avelino Corma, Elsa Olivetti. A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction. ACS Central Science 2019, 5
(5)
, 892-899. https://doi.org/10.1021/acscentsci.9b00193
- Shogo Nakamura, Nobuaki Yasuo, Masakazu Sekijima. Molecular optimization using a conditional transformer for reaction-aware compound exploration with reinforcement learning. Communications Chemistry 2025, 8
(1)
https://doi.org/10.1038/s42004-025-01437-x
- Wenlong Wang, Chenyang Xu, Jian Du, Lei Zhang. Developing deep learning-based large-scale organic reaction classification model via sigma-profiles. Green Chemical Engineering 2025, 6
(2)
, 181-192. https://doi.org/10.1016/j.gce.2024.06.003
- Wei Deng, Lijun Liu, Xiaohang Li, Yanyu Huang, Ming Hu, Yafang Zheng, Yuan Yin, Yan Huan, Shuxun Cui, Zhaoyan Sun, Jun Jiang, Xiaoniu Yang, Dapeng Wang. Machine‐Learning‐Enhanced Trial‐and‐Error for Efficient Optimization of Rubber Composites. Advanced Materials 2025, https://doi.org/10.1002/adma.202407763
- Amir Hossein Sheikhshoaei, Ali Khoshsima, Davood Zabihzadeh. Predicting the heat capacity of strontium-praseodymium oxysilicate SrPr4(SiO4)3O using machine learning, deep learning, and hybrid models. Chemical Thermodynamics and Thermal Analysis 2025, 17 , 100154. https://doi.org/10.1016/j.ctta.2024.100154
- Ella Gale, Leo Lobski, Fabio Zanasi. A categorical model for organic chemistry. Theoretical Computer Science 2025, 1032 , 115084. https://doi.org/10.1016/j.tcs.2025.115084
- Junu Kim, Kozue Okamura, Mohamed Rami Gaddem, Yusuke Hayashi, Sara Badr, Hirokazu Sugiyama. Impact of modeling and simulation on pharmaceutical process development. Current Opinion in Chemical Engineering 2025, 47 , 101093. https://doi.org/10.1016/j.coche.2025.101093
- Eunjae Shim, Ambuj Tewari, Tim Cernak, Paul M. Zimmerman. Recommending reaction conditions with label ranking. Chemical Science 2025, 16
(9)
, 4109-4118. https://doi.org/10.1039/D4SC06728B
- Qilei Liu, Haitao Mao, Lu Wang, Lei Zhang. Hunting for Better Aromatic Chemicals with AI Techniques. 2025, 23-77. https://doi.org/10.1002/9783527845491.ch2
- Emil I. Jaffal, Sangjoon Lee, Danila Shiryaev, Alex Vtorov, Nikhil Kumar Barua, Holger Kleinke, Anton O. Oliynyk. Composition and structure analyzer/featurizer for explainable machine-learning models to predict solid state structures. Digital Discovery 2025, 4
(2)
, 548-560. https://doi.org/10.1039/D4DD00332B
- Arunangshu Das, Anita Verma, Naba Hazarika. A comprehensive review on integration of cellular metabolic engineering and cell-free systems for microbial platforms. Process Biochemistry 2025, 149 , 222-236. https://doi.org/10.1016/j.procbio.2024.12.010
- Yuting Shang, Xiang Gao, Hongqin Wei, Zhengzheng Wang, Liqing Xi, Yantao Wang, Meijing Liu, Ying Feng, Juan Wang, Qingping Wu, Moutong Chen, Yu Ding. Advancements of prokaryotic Argonautes in molecular diagnostics and future perspectives. TrAC Trends in Analytical Chemistry 2025, 183 , 118122. https://doi.org/10.1016/j.trac.2024.118122
- Zexi Zhang, Zhanxiang Cai, Wenbin Zhang, Hua Lu, Mao Chen. Machine learning-assisted investigations toward polymer synthesis. Chinese Science Bulletin 2025, 70
(4-5)
, 471-480. https://doi.org/10.1360/TB-2024-0800
- Xinghai Li, Zhisen Wu, Lijing Zhang, Shengyang Tao. Machine learning enables the prediction of amide bond synthesis based on small datasets. Acta Physico-Chimica Sinica 2025, 41
(2)
, 100010. https://doi.org/10.3866/PKU.WHXB202309041
- P. Selvakumar, C. Preethi, P. Nehru, A. Saravanan, Abhijeet Das. AI in Green Chemistry Sustainable Manufacturing Processes. 2025, 297-318. https://doi.org/10.4018/979-8-3693-7483-2.ch011
- Weilong Hu, Enzhe Jing, Haoke Qiu, Zhao-Yan Sun. Discovering polyimides and their composites with targeted mechanical properties through explainable machine learning. Journal of Materials Informatics 2025, 5
(1)
https://doi.org/10.20517/jmi.2024.59
- Zihan Wang, Kangjie Lin, Jianfeng Pei, Luhua Lai. Reacon: a template- and cluster-based framework for reaction condition prediction. Chemical Science 2025, 16
(2)
, 854-866. https://doi.org/10.1039/D4SC05946H
- Inbal Lorena Eshel, Shahar Barkai, Sergio Barranco, Monica Hevia Perez-Temprano, Anat Milo. Probability Guided Chemical Reaction Scopes. 2025https://doi.org/10.2139/ssrn.5138219
- Zhenzhi Tan, Qi Yang, Sanzhong Luo. AI molecular catalysis: where are we now?. Organic Chemistry Frontiers 2025, 2 https://doi.org/10.1039/D4QO02363C
- Joseph C. Davies, Jonathan D. Hirst. Software Tools for Green and Sustainable Chemistry. 2025, 414-425. https://doi.org/10.1016/B978-0-443-15742-4.00049-1
- Yixin Wei, Leyu Shan, Tong Qiu, Diannan Lu, Zheng Liu. Machine learning-assisted retrosynthesis planning: Current status and future prospects. Chinese Journal of Chemical Engineering 2025, 77 , 273-292. https://doi.org/10.1016/j.cjche.2024.10.014
- C.F. Blanco, N. Pauliks, F. Donati, N. Engberg, J. Weber. Machine learning to support prospective life cycle assessment of emerging chemical technologies. Current Opinion in Green and Sustainable Chemistry 2024, 50 , 100979. https://doi.org/10.1016/j.cogsc.2024.100979
- Wen Luo, Yangyi Shen, Chengfan Fu, Xiao Feng, Qiang Huang. Exploring the CO2 conversion activated by the dielectric barrier discharge plasma assisted with photocatalyst via machine learning. Journal of Environmental Chemical Engineering 2024, 12
(6)
, 114428. https://doi.org/10.1016/j.jece.2024.114428
- Buyong Ma, Yiguo Wang, Xingzi Li, Chang Shen, Hao Lin, Chenxi Du, Shanlin Yang, Ruoqing Zeng, Xuyang Tang, Jinglei Hu, Yukun Yang, Jingwen Wang, Jiawei Zhu, Xingqian Shan, Yu Zhang, Jiaqing Hu. Recent Advancements in the Application of Artificial Intelligence in Drug Molecular Generation and Synthesis Planning. Pharmaceutical Fronts 2024, 06
(04)
, e394-e405. https://doi.org/10.1055/s-0044-1796647
- Lung-Yi Chen, Yi-Pei Li. Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions. Journal of Cheminformatics 2024, 16
(1)
https://doi.org/10.1186/s13321-024-00805-4
- Maarten R. Dobbelaere, István Lengyel, Christian V. Stevens, Kevin M. Van Geem. Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices. Journal of Cheminformatics 2024, 16
(1)
https://doi.org/10.1186/s13321-024-00834-z
- Lung-Yi Chen, Yi-Pei Li. AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry. Journal of Cheminformatics 2024, 16
(1)
https://doi.org/10.1186/s13321-024-00869-2
- Tieu-Long Phan, Klaus Weinbauer, Thomas Gärtner, Daniel Merkle, Jakob L. Andersen, Rolf Fagerberg, Peter F. Stadler. Reaction rebalancing: a novel approach to curating reaction databases. Journal of Cheminformatics 2024, 16
(1)
https://doi.org/10.1186/s13321-024-00875-4
- Saman Raza, Satya, Tahmeena Khan, Manisha Singh. AI Tools for Teaching-Learning Chemistry. 2024, 173-193. https://doi.org/10.2174/9789815305180124010011
- Xinyu Zhang, Daobin Mu, Shijie Lu, Yuanxing Zhang, Yuxiang Zhang, Zhuolin Yang, Zhikun Zhao, Borong Wu, Feng Wu. Ab Initio Design of Ni‐Rich Cathode Material with Assistance of Machine Learning for High Energy Lithium‐Ion Batteries. ENERGY & ENVIRONMENTAL MATERIALS 2024, 7
(6)
https://doi.org/10.1002/eem2.12744
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Abstract
Figure 1
Figure 1. Change of the loss functions with the number of epochs (left figure, overall; right figure, chemical context and temperature).
Figure 2
Figure 2. Relationship between the true temperature and the top-one predicted temperature (left panel), and predicted temperature if the predicted context matches the chemical context (right panel).
Figure 3
Figure 3. Example of model predictions compared with recorded context (temperature rounded to the closest integer; black text represents the recorded conditions, and blue text represents the predicted conditions). (A) Nucleophilic epoxidation. (B) Deprotection of fluorenylmethyloxycarbonyl (Fmoc). (C) Luche reduction of eneone, TBS = tert-butyl(dimethyl)silyl. (D) Buchwald–Hartwig aryl amination, BINAP = 2,2′-bis(diphenylphosphino)-1,1′-binaphthyl. (E) Suzuki-Miyaura coupling, CyJohnPhos = (2-biphenyl)dicyclohexylphosphine. (F) Hoveyda–Grubbs cross metathesis.
Figure 4
Figure 4. Examples of the reactions with the fewest chemical elements matching the recorded context (temperature rounded to the nearest integer; black text represents the recorded conditions, and red text represents the predicted conditions). (A) Birch alkylation. (B) Hoveyda–Grubbs cross metathesis, TBS = tert-butyl(dimethyl)silyl. (C) Suzuki-Miyaura coupling. (D) Azide reduction.
Figure 5
Figure 5. Embedding of the most common 50 solvents projected onto a two-dimensional space using t-SNE. Solvents are naturally clustered into their corresponding classes (manually annotated).
Figure 6
Figure 6. Embedding of the most common 50 reagents projected onto a two-dimensional space using t-SNE. Reagents are naturally clustered into their corresponding classes (manually annotated).
Figure 7
Figure 7. Graphical representation of the neural-network model for context recommendation (“Hard Selection” refers to setting the value of the maximal element to one and zero for the rest, although the output of each classification task is a probability distribution).
References
This article references 48 other publications.
- 1Robinson, R. LXIII.–A Synthesis of Tropinone. J. Chem. Soc., Trans. 1917, 111 (0), 762– 768, DOI: 10.1039/CT91711007621https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaC2sXhsFShsw%253D%253D&md5=860007da88776b311b4f0070c5e56e7fSynthesis of tropinoneRobinson, RobertJournal of the Chemical Society, Transactions (1917), 111 (), 762-8CODEN: JCHTA3; ISSN:0368-1645.Of the derivs. of tropinone (I) none is so suitable for the identification small amts. of (I) as dipiperonylidenetropinone (A), from (I) and piperonal boiled in alc. with aq. KOH for 15 min., yellow needles, m. 214°, sparingly sol. in most org. solvents; yield, quant., gives a blue color with H2SO4; the acetate dissolves in H2O with a bright orange-yellow color, the soln. giving cryst. ppts. with HCl, HBr, HNO3, H2SO4, (CO2H)2, and picric acid; hydrochloride (B), yellow microneedles. As a test for (I) the formation of (A) may be carried out as follows: The soln. may first be acidified and evapd. in vacuo in order to remove volatile impurities such as acetone; alc., excess of piperonal and KOH are added, and the soln. heated on the H2O bath for a few min. and poured into Et2O. This is washed with H2O and shaken with a little dil. HCl, whereupon (B) crysts. (CH2CHO)2 was prepd. according to Harries (Ber. 34, 1494(1901)) from 7 g. of the oxime in 30 cc. H2O, 50 cc. H2O added, and the HNO2 and HNO3 neutralized with pptd. CaCO3. 5 g. acetone and 4.5 g. MeNH2 in 20 cc. H2O were added and the mixt. let stand 0.5 hr., after which the soln. was tested for (I) as above and a small amt. of (B) isolated. If the mixt. was allowed to stand overnight other products were formed which interfered with the identification of (I). A more certain result was obtained when the acetone was substituted by OC(CH2CO2Et)2, letting the mixt. stand overnight, distg. off the alc., boiling with dil. H2SO4 for 0.5 hr., evapg. in a high vacuum, making alk. with KOH, and distg. off the (I) with steam and identifying as above. Best results were obtained, however, when OC(CH2CO2H)2 was used and the mixt. let stand 50 hrs., filtered, acidified with HCl, and concd. in vacuo. The residue was made alk. and distd. with steam and the distillate acidified with HCl, concd. in vacuo, made alk., and extd. with Et2O. After drying rapidly over KOH and evapg. the residue crystd. partly, the crystals being identical with a sample of (I) prepd. by the oxidation of tropine, and an additional amt. of (B) being obtained from the non-cryst. portion. Final conditions for the synthesis have not as yet been worked out, but in another expt. a yield of 42% of the theory was obtained, based on the estimation of the amt. of aldehyde used by conversion of a portion into the diphenylhydrazone and estimation of (I) by conversion into (B). It is hoped that the reaction will prove of general application, and an attempt will be made to synthesize ψ-pelletierine (II) by condensing H2C(CH2CHO)2 with MeNH2 and an acetone deriv.
- 2Corey, E. J.; Wipke, W. T. Computer Assisted Design of Complex Organic Syntheses. Science (Washington, DC, U. S.) 1969, 166 (3902), 178– 192, DOI: 10.1126/science.166.3902.1782https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaF1MXlt1eju78%253D&md5=67a17fda766d2a14be6255cd33bfaa65Computer-assisted design of complex organic synthesesCorey, Elias J.; Wipke, W. ToddScience (Washington, DC, United States) (1969), 166 (3902), 178-92CODEN: SCIEAS; ISSN:0036-8075.The application of digital computers to the generation of paths for the synthesis of complex org. mols. was discussed. Given the requisite computing and graphic communications hardware a set of programs of such power as to make computerized synthetic anal. an indispensable aid is possible.
- 3Cook, A.; Johnson, A. P.; Law, J.; Mirzazadeh, M.; Ravitz, O.; Simon, A. Computer-Aided Synthesis Design: 40 Years On. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2 (1), 79– 107, DOI: 10.1002/wcms.613https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvFGls7g%253D&md5=a659f7b50039d0ba6de10e4e829b6a9eComputer-aided synthesis design. 40 years onCook, Anthony; Johnson, A. Peter; Law, James; Mirzazadeh, Mahdi; Ravitz, Orr; Simon, AnikoWiley Interdisciplinary Reviews: Computational Molecular Science (2012), 2 (1), 79-107CODEN: WIRCAH; ISSN:1759-0884. (Wiley-Blackwell)A review. The discipline of retrosynthetic anal. is now just over 40 years old. From the earliest day, attempts were made to incorporate this approach into computer programs to test the extent in which chem. perception and synthetic thinking could be formalized. Despite pioneering research efforts, computer-aided synthetic anal. failed to achieve widespread routine use by chemists, which can be attributed in part to the difficulty of building the required high-quality retrosynthetic transform databases required for credible analyses. However, with the advent over the past 25 years of large comprehensive reaction databases, work on successfully automating the construction of reliable and comprehensive reaction rule databases is promising to revitalize research in this field. This review compares and contrasts the diverse approaches taken by selected programs in both the design and implementation of mol. feature perception and reaction rule representation, and the concepts of synthetic strategy selection, representation, and execution were reviewed. In particular, the current work on automating the construction of reliable and comprehensive synthetic rule sets from available reaction databases in newer programs such as ARChem were discussed. The authors argued that the progress achieved in this aspect paves the way to a deeper exploration of computer approaches to applying strategy and control in the synthesis problem.
- 4Warr, W. A. A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility. Mol. Inf. 2014, 33 (6–7), 469– 476, DOI: 10.1002/minf.2014000524https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXovFCmur8%253D&md5=392354d1d71f1ad31f1087d56b9afd57A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic FeasibilityWarr, Wendy A.Molecular Informatics (2014), 33 (6-7), 469-476CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)This article is the text for a pedagogical lecture to be given at the Strasbourg Summer School in Chemoinformatics in June 2104. It covers a very wide range of reaction topics including structure and reaction representation, reaction centers, atom-to-atom mapping, reaction retrieval systems, computer-aided synthesis design, retrosynthesis, reaction prediction and synthetic feasibility. In the time available, the coverage of each topic can only be cursory; the main usefulness of this article to the research community is the extensive bibliog.
- 5Engkvist, O.; Norrby, P. O.; Selmi, N.; Lam, Y. H.; Peng, Z.; Sherer, E. C.; Amberg, W.; Erhard, T.; Smyth, L. A. Computational Prediction of Chemical Reactions: Current Status and Outlook. Drug Discovery Today 2018, 23 (6), 1203– 1218, DOI: 10.1016/j.drudis.2018.02.0145https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkslSitrk%253D&md5=527a5783d106ec915e6f7afbd82db210Computational prediction of chemical reactions: current status and outlookEngkvist, Ola; Norrby, Per-Ola; Selmi, Nidhal; Lam, Yu-hong; Peng, Zhengwei; Sherer, Edward C.; Amberg, Willi; Erhard, Thomas; Smyth, Lynette A.Drug Discovery Today (2018), 23 (6), 1203-1218CODEN: DDTOFS; ISSN:1359-6446. (Elsevier Ltd.)A review. Over the past few decades, various computational methods have become increasingly important for discovering and developing novel drugs. Computational prediction of chem. reactions is a key part of an efficient drug discovery process. In this review, we discuss important parts of this field, with a focus on utilizing reaction data to build predictive models, the existing programs for synthesis prediction, and usage of quantum mechanics and mol. mechanics (QM/MM) to explore chem. reactions. We also outline potential future developments with an emphasis on pre-competitive collaboration opportunities.
- 6Coley, C. W.; Green, W. H.; Jensen, K. F. Machine Learning in Computer-Aided Synthesis Planning. Acc. Chem. Res. 2018, 51 (5), 1281– 1289, DOI: 10.1021/acs.accounts.8b000876https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXosFKhsb0%253D&md5=a1ea72c55942f3c0f0a99ab080f96899Machine Learning in Computer-Aided Synthesis PlanningColey, Connor W.; Green, William H.; Jensen, Klavs F.Accounts of Chemical Research (2018), 51 (5), 1281-1289CODEN: ACHRE4; ISSN:0001-4842. (American Chemical Society)Computer-aided synthesis planning (CASP) is focused on the goal of accelerating the process by which chemists decide how to synthesize small mol. compds. The ideal CASP program would take a mol. structure as input and output a sorted list of detailed reaction schemes that each connect that target to purchasable starting materials via a series of chem. feasible reaction steps. Early work in this field relied on expert-crafted reaction rules and heuristics to describe possible retrosynthetic disconnections and selectivity rules but suffered from incompleteness, infeasible suggestions, and human bias. With the relatively recent availability of large reaction corpora (such as the United States Patent and Trademark Office (USPTO), Reaxys, and SciFinder databases), consisting of millions of tabulated reaction examples, it is now possible to construct and validate purely data-driven approaches to synthesis planning. As a result, synthesis planning has been opened to machine learning techniques, and the field is advancing rapidly. In this Account, we focus on two crit. aspects of CASP and recent machine learning approaches to both challenges. First, we discuss the problem of retrosynthetic planning, which requires a recommender system to propose synthetic disconnections starting from a target mol. We describe how the search strategy, necessary to overcome the exponential growth of the search space with increasing no. of reaction steps, can be assisted through a learned synthetic complexity metric. We also describe how the recursive expansion can be performed by a straightforward nearest neighbor model that makes clever use of reaction data to generate high quality retrosynthetic disconnections. Second, we discuss the problem of anticipating the products of chem. reactions, which can be used to validate proposed reactions in a computer-generated synthesis plan (i.e., reduce false positives) to increase the likelihood of exptl. success. While we introduce this task in the context of reaction validation, its utility extends to the prediction of side products and impurities, among other applications. We describe neural network-based approaches that we and others have developed for this forward prediction task that can be trained on previously published exptl. data. Machine learning and artificial intelligence have revolutionized a no. of disciplines, not limited to image recognition, dictation, translation, content recommendation, advertising, and autonomous driving. While there is a rich history of using machine learning for structure-activity models in chem., it is only now that it is being successfully applied more broadly to org. synthesis and synthesis design. As reported in this Account, machine learning is rapidly transforming CASP, but there are several remaining challenges and opportunities, many pertaining to the availability and standardization of both data and evaluation metrics, which must be addressed by the community at large.
- 7Goodman, J. M. Reaction Prediction and Synthesis Design. Appl. Chemoinformatics Achiev. Futur. Oppor. 2018, 86– 105, DOI: 10.1002/9783527806539.ch4bThere is no corresponding record for this reference.
- 8Reaxys. https://new.reaxys.com/ (accessed on Sept 28, 2017).There is no corresponding record for this reference.
- 9Lowe, D. M. Patent Reaction Extractor (v1.0); 2014.There is no corresponding record for this reference.
- 10Szymkuć, S.; Gajewska, E. P.; Klucznik, T.; Molga, K.; Dittwald, P.; Startek, M.; Bajczyk, M.; Grzybowski, B. A. Computer-Assisted Synthetic Planning: The End of the Beginning. Angew. Chem., Int. Ed. 2016, 55 (20), 5904– 5937, DOI: 10.1002/anie.20150610110https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XlvVagsbs%253D&md5=f07bb3b25f9b87c549860d7d081ed28fComputer-Assisted Synthetic Planning: The End of the BeginningSzymkuc, Sara; Gajewska, Ewa P.; Klucznik, Tomasz; Molga, Karol; Dittwald, Piotr; Startek, Michal; Bajczyk, Michal; Grzybowski, Bartosz A.Angewandte Chemie, International Edition (2016), 55 (20), 5904-5937CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)Exactly half a century has passed since the launch of the first documented research project (1965 Dendral) on computer-assisted org. synthesis. Many more programs were created in the 1970s and 1980s but the enthusiasm of these pioneering days had largely dissipated by the 2000s, and the challenge of teaching the computer how to plan org. syntheses earned itself the reputation of a "mission impossible". This is quite curious given that, in the meantime, computers have "learned" many other skills that had been considered exclusive domains of human intellect and creativity-for example, machines can nowadays play chess better than human world champions and they can compose classical music pleasant to the human ear. Although there have been no similar feats in org. synthesis, this Review argues that to concede defeat would be premature. Indeed, bringing together the combination of modern computational power and algorithms from graph/network theory, chem. rules (with full stereo- and regiochem.) coded in appropriate formats, and the elements of quantum mechanics, the machine can finally be "taught" how to plan syntheses of non-trivial org. mols. in a matter of seconds to minutes. The Review begins with an overview of some basic theor. concepts essential for the big-data anal. of chem. syntheses. It progresses to the problem of optimizing pathways involving known reactions. It culminates with discussion of algorithms that allow for a completely de novo and fully automated design of syntheses leading to relatively complex targets, including those that have not been made before. Of course, there are still things to be improved, but computers are finally becoming relevant and helpful to the practice of org.-synthetic planning. Paraphrasing Churchill's famous words after the Allies' first major victory over the Axis forces in Africa, it is not the end, it is not even the beginning of the end, but it is the end of the beginning for the computer-assisted synthesis planning. The machine is here to stay.
- 11Segler, M. H.S.; Preuss, M.; Waller, M. P. Learning to Plan Chemical Syntheses. 2017, arXiv:1708.04202. arXiv.org e-Print archive. https://arxiv.org/abs/1708.04202.There is no corresponding record for this reference.
- 12Law, J.; Zsoldos, Z.; Simon, A.; Reid, D.; Liu, Y.; Khew, S. Y.; Johnson, A. P.; Major, S.; Wade, R. A.; Ando, H. Y. Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation. J. Chem. Inf. Model. 2009, 49 (3), 593– 602, DOI: 10.1021/ci800228y12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhsFegtL8%253D&md5=ef949298d3c3201c70bcfc2af56218eaRoute Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule GenerationLaw, James; Zsoldos, Zsolt; Simon, Aniko; Reid, Darryl; Liu, Yang; Khew, Sing Yoong; Johnson, A. Peter; Major, Sarah; Wade, Robert A.; Ando, Howard Y.Journal of Chemical Information and Modeling (2009), 49 (3), 593-602CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Route Designer, version 1.0, is a new retrosynthetic anal. package that generates complete synthetic routes for target mols. starting from readily available starting materials. Rules describing retrosynthetic transformations are automatically generated from reaction databases, which ensure that the rules can be easily updated to reflect the latest reaction literature. These rules are used to carry out an exhaustive retrosynthetic anal. of the target mol., in which heuristics are used to mitigate the combinatorial explosion. Proposed routes are prioritized by an empirical rating algorithm to present a diverse profile of the most promising solns. The program runs on a server with a web-based user interface. An overview of the system is presented together with examples that illustrate Route Designer's utility.
- 13Liu, B.; Ramsundar, B.; Kawthekar, P.; Shi, J.; Gomes, J.; Luu Nguyen, Q.; Ho, S.; Sloane, J.; Wender, P.; Pande, V. Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models. ACS Cent. Sci. 2017, 3 (10), 1103– 1113, DOI: 10.1021/acscentsci.7b0030313https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVahu7fI&md5=61b6213efc544c8e24e4aa8b750e28d3Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence ModelsLiu, Bowen; Ramsundar, Bharath; Kawthekar, Prasad; Shi, Jade; Gomes, Joseph; Nguyen, Quang Luu; Ho, Stephen; Sloane, Jack; Wender, Paul; Pande, VijayACS Central Science (2017), 3 (10), 1103-1113CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 exptl. reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations assocd. with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step toward solving the challenging problem of computational retrosynthetic anal.
- 14Bøgevig, A.; Federsel, H.-J.; Huerta, F.; Hutchings, M. G.; Kraut, H.; Langer, T.; Löw, P.; Oppawsky, C.; Rein, T.; Saller, H. Software Tool as an Idea Generator for Synthesis Prediction. Org. Process Res. Dev. 2015, 19 (2), 357– 368, DOI: 10.1021/op500373eThere is no corresponding record for this reference.
- 15Coley, C. W.; Rogers, L.; Green, W. H.; Jensen, K. F. Computer-Assisted Retrosynthesis Based on Molecular Similarity. ACS Cent. Sci. 2017, 3 (12), 1237– 1245, DOI: 10.1021/acscentsci.7b0035515https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVSmurfN&md5=53ca1ab17142a856afce4ffb67e6aceaComputer-Assisted Retrosynthesis Based on Molecular SimilarityColey, Connor W.; Rogers, Luke; Green, William H.; Jensen, Klavs F.ACS Central Science (2017), 3 (12), 1237-1245CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)We demonstrate mol. similarity to be a surprisingly effective metric for proposing and ranking one-step retrosynthetic disconnections based on analogy to precedent reactions. The developed approach mimics the retrosynthetic strategy defined implicitly by a corpus of known reactions without the need to encode any chem. knowledge. Using 40000 reactions from the patent literature as a knowledge base, the recorded reactants are among the top 10 proposed precursors in 74.1% of 5000 test reactions, providing strong quant. support for our methodol. Extension of the one-step strategy to multistep pathway planning is demonstrated and discussed for two exemplary drug products.
- 16Segler, M. H.S.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604– 610, DOI: 10.1038/nature2597816https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXmsVGqt7c%253D&md5=400e9945ff83ffe2d12278aa4c562893Planning chemical syntheses with deep neural networks and symbolic AISegler, Marwin H. S.; Preuss, Mike; Waller, Mark P.Nature (London, United Kingdom) (2018), 555 (7698), 604-610CODEN: NATUAS; ISSN:0028-0836. (Nature Research)To plan the syntheses of small org. mols., chemists use retrosynthesis, a problem-solving technique in which target mols. are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here, we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in org. chem. Our system solves for almost twice as many mols., thirty times faster than the traditional computer-aided search method, which is based on extd. rules and hand-designed heuristics. In a double-blind AB test, chemists on av. considered our computer-generated routes to be equiv. to reported literature routes.
- 17Kayala, M. A.; Baldi, P. F. Learning to Predict Chemical Reactions. J. Chem. Inf. Model. 2011, 51 (9), 2209– 2222, DOI: 10.1021/ci200207y17https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtFSks7jP&md5=1b40eaa05deaeb36da0ede14bf8d29f9Learning to Predict Chemical ReactionsKayala, Matthew A.; Azencott, Chloe-Agathe; Chen, Jonathan H.; Baldi, PierreJournal of Chemical Information and Modeling (2011), 51 (9), 2209-2222CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Being able to predict the course of arbitrary chem. reactions is essential to the theory and applications of org. chem. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) phys. laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles, resp., are not high throughput, are not generalizable or scalable, and lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a phys. inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approxns. of MOs (MOs) and use topol. and physicochem. attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chem. data set consisting of 1630 full multistep reactions with 2358 distinct starting materials and intermediates, assocd. with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top-ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of nonproductive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal (http://cdb.ics.uci.edu) under the Toolkits section.
- 18Kayala, M. A.; Baldi, P. ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning. J. Chem. Inf. Model. 2012, 52 (10), 2526– 2540, DOI: 10.1021/ci300303918https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtlGjtrjF&md5=3507988197eb3c3400045d0bfc939ff1ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine LearningKayala, Matthew A.; Baldi, PierreJournal of Chemical Information and Modeling (2012), 52 (10), 2526-2540CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Proposing reasonable mechanisms and predicting the course of chem. reactions is important to the practice of org. chem. Approaches to reaction prediction have historically used obfuscating representations and manually encoded patterns or rules. Here we present ReactionPredictor, a machine learning approach to reaction prediction that models elementary, mechanistic reactions as interactions between approx. MOs (MOs). A training data set of productive reactions known to occur at reasonable rates and yields and verified by inclusion in the literature or textbooks is derived from an existing rule-based system and expanded upon with manual curation from graduate level textbooks. Using this training data set of complex polar, hypervalent, radical, and pericyclic reactions, a two-stage machine learning prediction framework is trained and validated. In the first stage, filtering models trained at the level of individual MOs are used to reduce the space of possible reactions to consider. In the second stage, ranking models over the filtered space of possible reactions are used to order the reactions such that the productive reactions are the top ranked. The resulting model, ReactionPredictor, perfectly ranks polar reactions 78.1% of the time and recovers all productive reactions 95.7% of the time when allowing for small nos. of errors. Pericyclic and radical reactions are perfectly ranked 85.8% and 77.0% of the time, resp., rising to >93% recovery for both reaction types with a small no. of allowed errors. Decisions about which of the polar, pericyclic, or radical reaction type ranking models to use can be made with >99% accuracy. Finally, for multistep reaction pathways, we implement the first mechanistic pathway predictor using constrained tree-search to discover a set of reasonable mechanistic steps from given reactants to given products. Webserver implementations of both the single step and pathway versions of ReactionPredictor are available via the chemoinformatics portal http://cdb.ics.uci.edu/.
- 19Segler, M. H.S.; Waller, M. P. Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. Chem. - Eur. J. 2017, 23 (25), 5966– 5971, DOI: 10.1002/chem.20160549919https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXjtlynsrw%253D&md5=e4f689a132ea6ad8713fa2d3d9422c78Neural-Symbolic Machine Learning for Retrosynthesis and Reaction PredictionSegler, Marwin H. S.; Waller, Mark P.Chemistry - A European Journal (2017), 23 (25), 5966-5971CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)Reaction prediction and retrosynthesis are the cornerstones of org. chem. Rule-based expert systems have been the most widespread approach to computationally solve these two related challenges to date. However, reaction rules often fail because they ignore the mol. context, which leads to reactivity conflicts. Herein, we report that deep neural networks can learn to resolve reactivity conflicts and to prioritize the most suitable transformation rules. We show that by training our model on 3.5 million reactions taken from the collective published knowledge of the entire discipline of chem., our model exhibits a top10-accuracy of 95 % in retrosynthesis and 97 % for reaction prediction on a validation set of almost 1 million reactions.
- 20Jin, W.; Coley, C. W.; Barzilay, R.; Jaakkola, T. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA; 2017; pp 2604– 2613.There is no corresponding record for this reference.
- 21Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3 (5), 434– 443, DOI: 10.1021/acscentsci.7b0006421https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXmtVyqtb0%253D&md5=cc3e8a50d8dd9d68e294a21a558200dcPrediction of Organic Reaction Outcomes Using Machine LearningColey, Connor W.; Barzilay, Regina; Jaakkola, Tommi S.; Green, William H.; Jensen, Klavs F.ACS Central Science (2017), 3 (5), 434-443CODEN: ACSCII; ISSN:2374-7951. (American Chemical Society)Computer assistance in synthesis design has existed for over 40 years, yet retrosynthesis planning software has struggled to achieve widespread adoption. One crit. challenge in developing high-quality pathway suggestions is that proposed reaction steps often fail when attempted in the lab., despite initially seeming viable. The true measure of success for any synthesis program is whether the predicted outcome matches what is obsd. exptl. We report a model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks. Using 15 000 exptl. reaction records from granted United States patents, a model is trained to select the major (recorded) product by ranking a self-generated list of candidates where one candidate is known to be the major product. Candidate reactions are represented using a unique edit-based representation that emphasizes the fundamental transformation from reactants to products, rather than the constituent mols.' overall structures. In a 5-fold cross-validation, the trained model assigns the major product rank 1 in 71.8% of cases, rank ≤3 in 86.7% of cases, and rank ≤5 in 90.8% of cases.
- 22Schwaller, P.; Laino, T. “Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions Using Neural Sequence-to-Sequence Models. 2017, arXiv:1711.04810. arXiv.org e-Print archive. https://arxiv.org/abs/1711.04810.There is no corresponding record for this reference.
- 23Reizman, B. J.; Jensen, K. F. Simultaneous Solvent Screening and Reaction Optimization in Microliter Slugs. Chem. Commun. 2015, 51 (68), 13290– 13293, DOI: 10.1039/C5CC03651H23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXht1SqtL7O&md5=a27eeadd9328ade293e218e0167c87b8Simultaneous solvent screening and reaction optimization in microliter slugsReizman, Brandon J.; Jensen, Klavs F.Chemical Communications (Cambridge, United Kingdom) (2015), 51 (68), 13290-13293CODEN: CHCOFS; ISSN:1359-7345. (Royal Society of Chemistry)An automated, continuous flow droplet screening system is presented, enabling real-time simultaneous solvent and continuous variable optimization. An optimal design of expts. strategy is applied to the alkylation of 1,2-diaminocyclohexane in 16 μL droplets, with scale-up demonstrated. Anal. of segmented flow results suggests correlation of yield with solvent hydrogen bond basicity.
- 24Sans, V.; Porwol, L.; Dragone, V.; Cronin, L. A Self Optimizing Synthetic Organic Reactor System Using Real-Time In-Line NMR Spectroscopy. Chem. Sci. 2015, 6 (2), 1258– 1264, DOI: 10.1039/C4SC03075C24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFeisb%252FO&md5=e3d6d3dbfb1ba77055defb4078800dc1A self optimizing synthetic organic reactor system using real-time in-line NMR spectroscopySans, Victor; Porwol, Luzian; Dragone, Vincenza; Cronin, LeroyChemical Science (2015), 6 (2), 1258-1264CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)A configurable platform for synthetic chem. incorporating an in-line bench-top NMR capable of monitoring and controlling org. reactions in real-time is discussed. The platform is controlled by a modular LabView software control system for hardware, NMR, data anal., and feedback optimization. Using this platform, real-time advanced structural characterization of reaction mixts., including 19F, 13C, DEPT, 2-dimensional NMR spectroscopy (COSY, HSQC, 19F-COSY), are reported for the first time. The potential of this technique was demonstrated by optimizing a catalytic org. reaction in real-time, showing its applicability to self-optimizing systems using criteria such as stereo-selectivity, multi-nuclear measurements, or 2-dimensional correlations.
- 25Holmes, N.; Akien, G. R.; Savage, R. J.D.; Stanetty, C.; Baxendale, I. R.; Blacker, A. J.; Taylor, B. A.; Woodward, R. L.; Meadows, R. E.; Bourne, R. A. Reaction Chemistry & Engineering Online Quantitative Mass Spectrometry for the Reactors †. React. Chem. Eng. 2016, 1 (1), 96– 100, DOI: 10.1039/C5RE00083A25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtlSlt7zM&md5=a8755576a3a4e7c07c10d7f1f3f675f2Online quantitative mass spectrometry for the rapid adaptive optimisation of automated flow reactorsHolmes, Nicholas; Akien, Geoffrey R.; Savage, Robert J. D.; Stanetty, Christian; Baxendale, Ian R.; Blacker, A. John; Taylor, Brian A.; Woodward, Robert L.; Meadows, Rebecca E.; Bourne, Richard A.Reaction Chemistry & Engineering (2016), 1 (1), 96-100CODEN: RCEEBW; ISSN:2058-9883. (Royal Society of Chemistry)An automated continuous reactor for the synthesis of org. compds., which uses online mass spectrometry (MS) for reaction monitoring and product quantification, is presented. Quant. and rapid MS monitoring was developed and calibrated using HPLC. The amidation of Me nicotinate with aq. MeNH2 was optimized using design of expts. and a self-optimization algorithm approach to produce >93% yield.
- 26Reizman, B. J.; Jensen, K. F. Feedback in Flow for Accelerated Reaction Development. Acc. Chem. Res. 2016, 49 (9), 1786– 1796, DOI: 10.1021/acs.accounts.6b0026126https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhtlaru7%252FI&md5=ddaef4153972b8b045df12015e5b5d14Feedback in Flow for Accelerated Reaction DevelopmentReizman, Brandon J.; Jensen, Klavs F.Accounts of Chemical Research (2016), 49 (9), 1786-1796CODEN: ACHRE4; ISSN:0001-4842. (American Chemical Society)The pharmaceutical industry is investing in continuous flow and high-throughput experimentation as tools for rapid process development accelerated scale-up. Coupled with automation, these technologies offer the potential for comprehensive reaction characterization and optimization, but with the cost of conducting exhaustive multifactor screens. Automated feedback in flow offers researchers an alternative strategy for efficient characterization of reactions based on the use of continuous technol. to control chem. reaction conditions and optimize in lieu of screening. Optimization with feedback allows expts. to be conducted where the most information can be gained from the chem., enabling product yields to be maximized and kinetic models to be generated while the total no. of expts. is minimized. This account opens by reviewing select examples of feedback optimization in flow and applications to chem. research. Systems in the literature are classified into (i) deterministic black box optimization systems that do not model the reaction system and are therefore limited in the utility of results for scale-up, (ii) deterministic model-based optimization systems from which reaction kinetics and/or mechanisms can be automatically evaluated, and (iii) stochastic systems. Though diverse in application, flow feedback systems have predominantly focused upon the optimization of continuous variables, i.e., variables such as time, temp., and concn. that can be ramped from one expt. to the next. Unfortunately, this implies that the screening of discrete variables such as catalyst, ligand, or solvent generally does not factor into automated flow optimization, resulting in incomplete process knowledge. Herein, a system and strategy is presented developed for optimizing discrete and continuous variables of a chem. reaction simultaneously. The approach couples automated feedback with high-throughput reaction screening in droplet flow microfluidics. This account details the system configuration for on-demand creation of sub-20 μL droplets with interchangeable reagents and catalysts. These droplets are reacted in a fully automated microfluidic system and analyzed online by LC/MS. Feeding back from the online anal. results, a design of expts. (DoE)-based adaptive response surface algorithm is employed that deductively removes candidate reagents from the optimization as optimal reaction conditions are refined, leading to rapid convergence. Using the automated optimization platform, case studies are presented for solvent selection in a competitive alkylation chem. and for catalyst-ligand selection in heteroarom. Suzuki-Miyaura cross-coupling chemistries. For the monoalkylation of trans-1,2-diaminocyclohexane, polar aprotic solvents at moderate temps. are shown to be favorable, with optimality accurately identified with DMSO as the solvent in 67 expts. For Suzuki-Miyaura cross-couplings, the optimality of precatalysts and continuous variable conditions are obsd. to change in accordance with the coupling reagents, providing insights into catalyst behavior in the context of the reaction mechanism. Future opportunities in automated reaction development include the incorporation of chemoinformatics for faster anal. and machine-learning algorithms to guide and optimize the synthesis. Adoption of this technol. stands to reduce graduate student and postdoc time on routine tasks in the lab., while feeding back knowledge used to guide new research directions. Moreover, the application of this technol. in industry promises to lessen the cost and time assocd. with advancing pharmaceutical mols. through development and scale-up.
- 27Baumgartner, L. M.; Coley, C. W.; Reizman, B. J.; Gao, K. W.; Jensen, K. F. Optimum Catalyst Selection over Continuous and Discrete Process Variables with a Single Droplet Microfluidic Reaction Platform. React. Chem. Eng. 2018, 3 (3), 301– 311, DOI: 10.1039/C8RE00032H27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXntlCntbo%253D&md5=a77e6dec5646d61ecc5eb42d04786fb4Optimum catalyst selection over continuous and discrete process variables with a single droplet microfluidic reaction platformBaumgartner, Lorenz M.; Coley, Connor W.; Reizman, Brandon J.; Gao, Kevin W.; Jensen, Klavs F.Reaction Chemistry & Engineering (2018), 3 (3), 301-311CODEN: RCEEBW; ISSN:2058-9883. (Royal Society of Chemistry)A mixed-integer nonlinear program (MINLP) algorithm to optimize catalyst turnover no. (TON) and product yield by simultaneously modulating discrete variables-catalyst types-and continuous variables-temp., residence time, and catalyst loading-was implemented and validated. Several simulated case studies, with and without random measurement error, demonstrate the algorithm's robustness in finding optimal conditions in the presence of side reactions and other complicating nonlinearities. This algorithm was applied to the real-time optimization of a Suzuki-Miyaura cross-coupling reaction in an automated microfluidic reaction platform comprising a liq. handler, an oscillatory flow reactor, and an online LC/MS. The algorithm, based on a combination of branch and bound and adaptive response surface methods, identified exptl. conditions that maximize TON subject to a yield constraint from a pool of eight catalyst candidates in just 60 expts., considerably fewer than a previous version of the algorithm.
- 28Kamlet, M. J.; Abboud, J. L.M.; Abraham, M. H.; Taft, R. W. Linear Solvation Energy Relationships. 23. A Comprehensive Collection of the Solvatochromic Parameters,.Pi.*,.Alpha., and.Beta., and Some Methods for Simplifying the Generalized Solvatochromic Equation. J. Org. Chem. 1983, 48 (17), 2877– 2887, DOI: 10.1021/jo00165a01828https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaL3sXkvVOgsbc%253D&md5=99b60c3cc817d5508b28d85f3ad6c8a1Linear solvation energy relationships. 23. A comprehensive collection of the solvatochromic parameters, π*, α, and β, and some methods for simplifying the generalized solvatochromic equationKamlet, Mortimer J.; Abboud, Jose Luis M.; Abraham, Michael H.; Taft, R. W.Journal of Organic Chemistry (1983), 48 (17), 2877-87CODEN: JOCEAH; ISSN:0022-3263.A generalized equation for linear solvation energy relations or complexation energy relations is developed which involves 6 terms: π* (a solvent dipolarity-polarizability term), α (solvent hydrogen-bond acceptor term), β (solvent hydrogen-bond donor term), δ (solvent polarizability correction term), δH (Hildebrand soly. parameter), and ξ. This equation is reduced to a more manageable form by a judicious choice of solvents and reactants or indicators. One-, two- or three-parameter LFER involving different combinations of the above parameters and various types of physicochem. properties are obsd. A comprehensive collection of π*, α, and β for 217 solvents is presented.
- 29Struebing, H.; Ganase, Z.; Karamertzanis, P. G.; Siougkrou, E.; Haycock, P.; Piccione, P. M.; Armstrong, A.; Galindo, A.; Adjiman, C. S. Computer-Aided Molecular Design of Solvents for Accelerated Reaction Kinetics. Nat. Chem. 2013, 5 (11), 952– 957, DOI: 10.1038/nchem.175529https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhsV2jsrbP&md5=2fcd40cb79d2e6db80227791a628da0bComputer-aided molecular design of solvents for accelerated reaction kineticsStruebing, Heiko; Ganase, Zara; Karamertzanis, Panagiotis G.; Siougkrou, Eirini; Haycock, Peter; Piccione, Patrick M.; Armstrong, Alan; Galindo, Amparo; Adjiman, Claire S.Nature Chemistry (2013), 5 (11), 952-957CODEN: NCAHBB; ISSN:1755-4330. (Nature Publishing Group)Solvents can significantly alter the rates and selectivity of liq.-phase org. reactions, often hindering the development of new synthetic routes or, if chosen wisely, facilitating routes by improving rates and selectivities. To address this challenge, a systematic methodol. is proposed that quickly identifies improved reaction solvents by combining quantum mech. computations of the reaction rate const. in a few solvents with a computer-aided mol. design (CAMD) procedure. The approach allows the identification of a high-performance solvent within a very large set of possible mols. The validity of the authors' CAMD approach is demonstrated through application to a classical nucleophilic substitution reaction for the study of solvent effects, the Menschutkin reaction. The results were validated successfully by in situ kinetic expts. A space of 1,341 solvents was explored in silico, but required quantum-mech. calcns. of the rate const. in only nine solvents, and uncovered a solvent that increases the rate const. by 40%.
- 30Marcou, G.; Aires De Sousa, J.; Latino, D. A.R.S.; De Luca, A.; Horvath, D.; Rietsch, V.; Varnek, A. Expert System for Predicting Reaction Conditions: The Michael Reaction Case. J. Chem. Inf. Model. 2015, 55 (2), 239– 250, DOI: 10.1021/ci500698a30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXnvFGnsQ%253D%253D&md5=205b5b0bd2af7f06f63174e2f6f799b1Expert System for Predicting Reaction Conditions: The Michael Reaction CaseMarcou, G.; Aires de Sousa, J.; Latino, D. A. R. S.; de Luca, A.; Horvath, D.; Rietsch, V.; Varnek, A.Journal of Chemical Information and Modeling (2015), 55 (2), 239-250CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A generic chem. transformation may often be achieved under various synthetic conditions. However, for any specific reagents, only one or a few among the reported synthetic protocols may be successful. For example, Michael β-addn. reactions may proceed under different choices of solvent (e.g., hydrophobic, aprotic polar, protic) and catalyst (e.g., Bronsted acid, Lewis acid, Lewis base, etc.). Chemoinformatics methods could be efficiently used to establish a relationship between the reagent structures and the required reaction conditions, which would allow synthetic chemists to waste less time and resources in trying out various protocols in search for the appropriate one. In order to address this problem, a no. of 2-classes classification models have been built on a set of 198 Michael reactions retrieved from literature. Trained models discriminate between processes that are compatible and resp. processes not feasible under a specific reaction condition option (feasible or not with a Lewis acid catalyst, feasible or not in hydrophobic solvent, etc.). Eight distinct models were built to decide the compatibility of a Michael addn. process with each considered reaction condition option, while a ninth model was aimed to predict whether the assumed Michael addn. is feasible at all. Different machine-learning methods (Support Vector Machine, Naive Bayes, and Random Forest) in combination with different types of descriptors (ISIDA fragments issued from Condensed Graphs of Reactions, MOLMAP, Electronic Effect Descriptors, and Chem. Development Kit computed descriptors) have been used. Models have good predictive performance in 3-fold cross-validation done three times: balanced accuracy varies from 0.7 to 1. Developed models are available for the users at http://infochim.u-strasbg.fr/webserv/VSEngine.html. Eventually, these were challenged to predict feasibility conditions for ∼50 novel Michael reactions from the eNovalys database (originally from patent literature).
- 31Lin, A. I.; Madzhidov, T. I.; Klimchuk, O.; Nugmanov, R. I.; Antipin, I. S.; Varnek, A. Automatized Assessment of Protective Group Reactivity: A Step toward Big Reaction Data Analysis. J. Chem. Inf. Model. 2016, 56 (11), 2140– 2148, DOI: 10.1021/acs.jcim.6b0031931https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhslemsL%252FM&md5=734545a7f2a3b34f49af6713d443b6beAutomatized Assessment of Protective Group Reactivity: A Step Toward Big Reaction Data AnalysisLin, Arkadii I.; Madzhidov, Timur I.; Klimchuk, Olga; Nugmanov, Ramil I.; Antipin, Igor S.; Varnek, AlexandreJournal of Chemical Information and Modeling (2016), 56 (11), 2140-2148CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The authors report a new method to assess protective groups (PGs) reactivity as a function of reaction conditions (catalyst, solvent) using raw reaction data. It is based on an intuitive similarity principle for chem. reactions: similar reactions proceed under similar conditions. Tech., reaction similarity can be assessed using the Condensed Graph of Reaction (CGR) approach representing an ensemble of reactants and products as a single mol. graph, i.e., as a pseudomol. for which mol. descriptors or fingerprints can be calcd. CGR-based inhouse tools were used to process data for 142,111 catalytic hydrogenation reactions extd. from the Reaxys database. Results reveal some contradictions with famous Greene's Reactivity Charts based on manual expert anal. Models developed in this study show high accuracy (∼90%) for predicting optimal exptl. conditions of protective group deprotection.
- 32Segler, M. H.S.; Waller, M. P. Modelling Chemical Reasoning to Predict and Invent Reactions. Chem. - Eur. J. 2017, 23 (25), 6118– 6128, DOI: 10.1002/chem.20160455632https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXislersw%253D%253D&md5=ac3b304ec62b8d90110b7722305e2b3dModelling Chemical Reasoning to Predict and Invent ReactionsSegler, Marwin H. S.; Waller, Mark P.Chemistry - A European Journal (2017), 23 (25), 6118-6128CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)The ability to reason beyond established knowledge allows org. chemists to solve synthetic problems and invent novel transformations. Herein, we propose a model that mimics chem. reasoning, and formalises reaction prediction as finding missing links in a knowledge graph. We have constructed a knowledge graph contg. 14.4 million mols. and 8.2 million binary reactions, which represents the bulk of all chem. reactions ever published in the scientific literature. Our model outperforms a rule-based expert system in the reaction prediction task for 180 000 randomly selected binary reactions. The data-driven model generalises even beyond known reaction types, and is thus capable of effectively (re-)discovering novel transformations (even including transition metal-catalyzed reactions). Our model enables computers to infer hypotheses about reactivity and reactions by only considering the intrinsic local structure of the graph and because each single reaction prediction is typically achieved in a sub-second time frame, the model can be used as a high-throughput generator of reaction hypotheses for reaction discovery.
- 33Mikie Kanada, R.; Taniguchi, T.; Ogasawara, K. Asymmetric Hydrogen Transfer Protocol for a Synthesis of (+)-Frontalin and (−)-Malyngolide. Tetrahedron Lett. 2000, 41 (19), 3631– 3635, DOI: 10.1016/S0040-4039(00)00430-5There is no corresponding record for this reference.
- 34Faroux-Corlay, B.; Clary, L.; Gadras, C.; Hammache, D.; Greiner, J.; Santaella, C.; Aubertin, A. M.; Vierling, P.; Fantini, J. Synthesis of Single- and Double-Chain Fluorocarbon and Hydrocarbon Galactosyl Amphiphiles and Their Anti-HIV-1 Activity. Carbohydr. Res. 2000, 327 (3), 223– 260, DOI: 10.1016/S0008-6215(00)00055-0There is no corresponding record for this reference.
- 35Wang, H.; Yu, S. Synthesis of Isoquinolones Using Visible-Light-Promoted Denitrogenative Alkyne Insertion of 1,2,3-Benzotriazinones. Org. Lett. 2015, 17 (17), 4272– 4275, DOI: 10.1021/acs.orglett.5b0196035https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtlOjur3K&md5=86034a6db6d0456dbe780e5faec49e63Synthesis of Isoquinolones Using Visible-Light-Promoted Denitrogenative Alkyne Insertion of 1,2,3-BenzotriazinonesWang, Hao; Yu, ShouyunOrganic Letters (2015), 17 (17), 4272-4275CODEN: ORLEF7; ISSN:1523-7052. (American Chemical Society)A visible-light-promoted regioselective denitrogenative insertion of terminal alkynes into 1,2,3-benzotriazinones is reported. This mechanistically novel process allows the synthesis of substituted isoquinolones in satisfactory isolated yields (24 examples, 46-84% yield) at room temp. under visible-light irradn. with the assistance of a photocatalyst. The proposed single-electron-transfer pathway was supported by TEMPO trapping, radical clock expts., and Stern-Volmer anal.
- 36Li, K.; Zeng, Y.; Neuenswander, B. Sequential Pd (II) -Pd (0) Catalysis for the Rapid Synthesis of Coumarins. J. Org. Chem. 2005, 70 (16), 6515– 6518, DOI: 10.1021/jo050671lThere is no corresponding record for this reference.
- 37Mavunkel, B.; Xu, Y.; Goyal, B.; Lim, D.; Lu, Q.; Chen, Z.; Wang, D.-X.; Higaki, J.; Chakraborty, I.; Liclican, A. Pyrimidine-Based Inhibitors of CaMKIIδ. Bioorg. Med. Chem. Lett. 2008, 18 (7), 2404– 2408, DOI: 10.1016/j.bmcl.2008.02.056There is no corresponding record for this reference.
- 38Lautens, M.; Maddess, M. L. Chemoselective Cross Metathesis of Bishomoallylic Alcohols : Rapid Access to Fragment A of the Cryptophycins. Supplementary Material The Following Includes Representative Experimental Procedures and Details for Isolation of Compounds. Full Characterisat. Org. Lett. 2004, 6 (12), 1883– 1886, DOI: 10.1021/ol049883fThere is no corresponding record for this reference.
- 39Krüger, T.; Vorndran, K.; Linker, T. Regioselective Arene Functionalization: Simple Substitution of Carboxylate by Alkyl Groups. Chem. - Eur. J. 2009, 15 (44), 12082– 12091, DOI: 10.1002/chem.20090177439https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtlyqu77F&md5=b143b98bc1a436b16bf909a9347629dcRegioselective Arene Functionalization: Simple Substitution of Carboxylate by Alkyl GroupsKrueger, Tobias; Vorndran, Katja; Linker, TorstenChemistry - A European Journal (2009), 15 (44), 12082-12091CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)Arenes with various alkyl side-chains were synthesized in high yields and excellent regioselectivities. The carboxylate group in toluic and naphthoic acids was conveniently substituted by alkyl halides by Birch redn. and subsequent decarbonylation. The method is characterized by inexpensive starting materials and reagents, and methylation of arenes was realized. Besides simple alkyl substituents, the scope of arene functionalization was extended by benzyl as well as fluoro-, amino-, and ester-contg. alkyl groups. The alkylation of 1-naphthoic acid during Birch redn. can be controlled by the addn. of tert-butanol. This allowed the regioselective synthesis of mono and bis-substituted naphthalenes from the same starting material.
- 40Palmes, J. A.; Paioti, P. H.S.; De Souza, L. P.; Aponick, A. PdII-Catalyzed Spiroketalization of Ketoallylic Diols. Chem. - Eur. J. 2013, 19 (35), 11613– 11621, DOI: 10.1002/chem.20130172340https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtFShtLrN&md5=4271b9729abcbf0817cd65ea2cd5fb49PdII-Catalyzed Spiroketalization of Ketoallylic DiolsPalmes, Jean A.; Paioti, Paulo H. S.; de Souza, Leonardo Perez; Aponick, AaronChemistry - A European Journal (2013), 19 (35), 11613-11621CODEN: CEUJED; ISSN:0947-6539. (Wiley-VCH Verlag GmbH & Co. KGaA)A high-yielding stereoselective method for forming spiroketals from simple ketone-allylic diol derivs. was reported. Using catalytic [PdCl2(MeCN)2] in THF at 0° these dehydration-cyclization reactions require only mild conditions to produce vinyl-substituted spiroketals in high yields after brief reaction times with water as the only byproduct. Using this method, the stereochem. information embedded at the nucleophile is transmitted down-the-chain and efficiently sets the stereochem. at both the anomeric carbon atom and the newly formed allylic stereocenter. The title compds. thus formed included a spiroketal (I) and related substances, such as ro[5.5]undecane derivs. and 1,6-dioxaspiro[4.5]decane derivs., carbohydrate monosaccharide analogs, such as a 5,9-anhydro-2,3,4-trideoxy-D-gluco-5-deculo-5,1-pyranose deriv. The synthesis of the target compds. was achieved by a palladium-catalyzed cyclization of a chiral 3,13-dihydroxy-11-tetradecen-7-one deriv. (II) and similar compds., for example (8E)-1,10-dihydroxy-8-decen-5-one.
- 41Liu, J.; Fitzgerald, A. E.; Mani, N. S. Facile Assembly of Fused Benzo[4,5]Furo Heterocycles. J. Org. Chem. 2008, 73 (7), 2951– 2954, DOI: 10.1021/jo8000595There is no corresponding record for this reference.
- 42Schaub, C.; Müller, B.; Schmidt, R. R. Sialyltransferase Inhibitors Based on CMP-Quinic Acid. Eur. J. Org. Chem. 2000, 2000 (9), 1745– 1758, DOI: 10.1002/(SICI)1099-0690(200005)2000:9<1745::AID-EJOC1745>3.0.CO;2-8There is no corresponding record for this reference.
- 43Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Advances in neural information processing systems; NIPS: Lake Tahoe, 2013; pp 3111– 3119.There is no corresponding record for this reference.
- 44García-Alonso, C. R.; Pérez-Naranjo, L. M.; Fernández-Caballero, J. C. Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms. Ann. Oper. Res. 2014, 219 (1), 187– 202, DOI: 10.1007/s10479-011-0841-3There is no corresponding record for this reference.
- 45Open-source. RDKit: Open-Source Cheminformatics Software , 2006 (accessed on Sept 28, 2017).There is no corresponding record for this reference.
- 46Schneider, N.; Lowe, D. M.; Sayle, R. A.; Landrum, G. A. Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity. J. Chem. Inf. Model. 2015, 55 (1), 39– 53, DOI: 10.1021/ci500661446https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitFGrsrrE&md5=694e4122d2642de22065e46d7a0826ccDevelopment of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and SimilaritySchneider, Nadine; Lowe, Daniel M.; Sayle, Roger A.; Landrum, Gregory A.Journal of Chemical Information and Modeling (2015), 55 (1), 39-53CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Fingerprint methods applied to mols. have proven to be useful for similarity detn. and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chem. reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calcd. physicochem. properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also obsd. when applying the classifier to reactions from an inhouse electronic lab. notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster anal. that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the anal. are provided in the Supporting Information.
- 47Williams, R. J.; Zipser, D. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Comput. 1989, 1 (2), 270– 280, DOI: 10.1162/neco.1989.1.2.270There is no corresponding record for this reference.
- 48Gobbi, A.; Poppinger, D. Genetic Optimization of Combinatorial Libraries. Biotechnol. Bioeng. 1998, 61 (1), 47– 54, DOI: 10.1002/(SICI)1097-0290(199824)61:1<47::AID-BIT9>3.0.CO;2-Z48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXisVeksb8%253D&md5=3cfe243bf7cea6951886c5551f5ab3c6Genetic optimization of combinatorial librariesGobbi, Alberto; Poppinger, DieterBiotechnology and Bioengineering (1998), 61 (1), 47-53CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)Most agrochem. and pharmaceutical companies have set up high-throughput screening programs which require large nos. of compds. to screen. Combinatorial libraries provide an attractive way to deliver these compds. A single combinatorial library with four variable positions can yield >1012 potential compds., if one assumes that ∼1000 reagents are available for each position. This is far more than any high-throughput screening facility can afford to screen. We have proposed a method for iterative compd. selection from large databases, which identifies the most active compds. by examg. only a small fraction of the database. In this article, we describe the extension of this method to the problem of selecting compds. from large combinatorial libraries.
Supporting Information
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscentsci.8b00357.
Description of computational methods, frequency vs rank plots, extended lists of reaction examples, and comparison of the model performance against baseline models (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.