Current and Future Roles of Artificial Intelligence in Medicinal Chemistry Synthesis

Artificial intelligence and machine learning have demonstrated their potential role in predictive chemistry and synthetic planning of small molecules; there are at least a few reports of companies employing in silico synthetic planning into their overall approach to accessing target molecules. A data-driven synthesis planning program is one component being developed and evaluated by the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) consortium, comprising MIT and 13 chemical and pharmaceutical company members. Together, we wrote this perspective to share how we think predictive models can be integrated into medicinal chemistry synthesis workflows, how they are currently used within MLPDS member companies, and the outlook for this field.

■ SECTION 1: WHERE CAN COMPUTER AIDED SYNTHESIS PLANNING AID MEDICINAL CHEMISTRY DISCOVERY? Introduction. Current estimates of the cost to bring a single drug to market are in excess of 2−3 billion dollars. 1−3 A significant portion of this cost may be attributed to two factors: the historically high attrition of candidate molecules going through clinical trials (an attrition rate of over 85% 4 ) and the complexity of the preceding discovery phase, which requires considerable investments in time and resources. A stronger pipeline of preclinical candidates will have beneficial downstream effects in terms of total approvals. Advances in both computer hardware and in silico methods aim to expedite as well as improve various aspects of the medicinal chemistry's quintessential design-make-test-analyze (DMTA) drug discovery cycle (Figure 1). One area of increasing interest is the use of data-driven synthetic prediction tools for the make phase to accelerate and reduce failures in the synthesis of new molecular entities.
Computer aided synthesis planning (CASP) has a rich history that dates back to the 1960s when the Corey group first disclosed LHASA, 5 a rule-based approach to retrosynthetic planning. This seminal publication was key in defining the heuristics of chemical synthesis that might be necessary for a synthesis planning software. Many groups disclosed advances in computer-assisted synthesis planning between the 1960s and 1990s but were largely limited by computational resources and primarily relied on human-encoded reaction rules. 6−9 These early progenitors serve as the inspiration for some of the commercial software packages, such as Synthia (formerly Chematica) and ICSynth, where hand-coded reaction rules are used in conjunction with guiding heuristics to navigate synthetic pathways. 10,11 Only in the past two decades have more automated methods for synthesis planning, such as those that use a subset of AI methods called machine learning (ML) to infer patterns of reactivity from published reaction data, emerged as viable alternatives to "expert" rule-based algorithms. 12 Both expert encoded rules and ML methods can be considered AI approaches: the former as an example of the so-called "first wave of AI" using crafted knowledge and the latter as an example of the "second wave" using statistical learning. Each brings its own distinct advantages to synthesis planning software. Expert-encoded rules have the opportunity to excel in low-data regimes where only 1−4 reactions might be recorded for a particular transformation. Although there is active research on using machine learning for low data, this has not yet been successfully applied to synthesis planning. However, machine learning methods can be easily extended to incorporate new reactions, as they are published due to the automation of extraction/training pipelines, which reduces the burden on experts (Ph.D.-level chemists). As more reactions are run within a company, the automated pipeline allows for predictions to be more robust. Ref 13 provides detailed descriptions of and comparisons between available software tools. 13 Both machine learning and rule-based approaches have demonstrated successes in planning synthetic routes that have been executed in the lab or evaluated by chemists as worth attempting. For example, Synthia has been used to find routes to medicinally relevant compounds and has even improved the overall yields compared to expert-developed routes; 14 Segler et al. found that chemists expressed no preference for literaturevalidated routes over their algorithmically proposed routes in a double-blind evaluation. 15 Automated platforms have been coupled to synthesis planning tools, with albeit varying levels of human intervention. 16,17 Although the field is still in the early stages of using CASP for fully automated synthesis planning, these initial successes demonstrate the utility of the tools in a DMTA cycle.
Starting in May 2018, a team of researchers from MIT has been working closely with 13 pharmaceutical and chemical companies within the context of the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) Consortium. 18, 19 Among the many goals of the consortium is the  (1) Retrosynthesis can be broken into subproblems of (a) the generation of retrosynthetic suggestions one step at a time and (b) the recursive use of the singe step suggestions to identify full, multistep routes. (2) Reaction conditions that will lead to a successful forward reaction must be recommended in order for suggestions to be actionable. (3) Reaction prediction, predicting the possible products from a set of starting materials and conditions, is used to validate the proposed synthetic steps. development of machine-learning-based algorithms and tools to accelerate the make phase of the DMTA cycle ( Figure 1). This article will encompass the topic of machine learning for synthesis planning through the lens of the activities of the consortium and the open-source tools 18,20 being developed at MIT and adopted by member companies.
More specifically, this perspective will describe the many roles of artificial intelligence in medicinal chemistry synthesis, including those that (1) can be integrated within a medicinal chemistry workflow, (2) are already integrated in certain pharmaceutical companies, and (3) require further development to accomplish even more ambitious tasks. We focus on the three primary tasks of computer-aided synthesis planning (CASP) in Figure 2: retrosynthetic planning, condition recommendation, and forward-reaction prediction.
Use of ML-Based CASP in Retrosynthetic Planning. Identification of Synthesizable Targets and Route Planning. The traditional approach for synthesizing new small molecules in DMTA iterations involves manual planning and manual execution. Expert chemists are tasked with assessing the synthesizability of proposed targets, resulting in a slowdown when evaluating hundreds or possibly thousands of molecules. A particular series of lead compounds might be preferred over another due to synthetic accessibility (SA), as financial resources and time constraints limit the number of compounds that can be pursued or designed in parallel. Retrosynthesis software mitigates the bottleneck of manual synthesis assessment by generating hypothetical synthetic routes that can be used to rapidly prioritize compounds by ease of synthesis, thereby providing chemists a more focused set of compounds as a starting point for expert route planning. Finally, the use of a retrosynthetic planning platform can be beneficial for team members that do not have years of training in synthetic chemistry by providing these individuals with synthetic suggestions that may not have come to mind.
Two categories of methods for scoring compounds by synthesizability are the use of simplified structure-based heuristics 21−23 or full retrosynthetic tree expansions. Heuristics aim to capture broad trends in SA from molecular structures and have traditionally been using expert-defined functions of molecular attributes. Nonlinear regression (e.g., using machine-learning techniques) can instead recapitulate subjective scores assigned by expert chemists 24 or be used in a semisupervised setting to learn from examples of chemical reactions. 25 In reality, however, the ability to synthesize a target is highly dependent on the availability of specific buyable building blocks and is not a smooth function of molecular structure. Since the availability of building blocks depends on the setting (e.g., organization, budget, and discovery versus process development), a more generalizable method for assessing synthesizability is to use retrosynthetic expansion with a custom database of buyable compounds that is tailored to the application. The benefit of an explicit retrosynthetic expansion is the knowledge that transformations to access a target of interest do exist and suitable starting materials are available; however, it comes at a higher computational cost. With access to a retrosynthetic planning tool and enough time and training, however, neural network models can begin to approximate this highly nonlinear function. 26 The two major categories of retrosynthetic planning software are those that use expert-encoded rules or heuristics to generate recommendations and those that learn (or infer) how to generate recommendations. Many retrosynthetic methods rely on the use of reaction templates, which are reaction rules that can be stored in a SMARTS or SMIRKS format. A general procedure for the algorithmic extraction of templates from a reaction data set is to (1) identify the reaction center or changing atoms, (2) identify atoms adjacent to the reaction center, and (3) add generalized functional groups that are involved in the reaction. 27,28 This approach captures the local reaction environments but, in most algorithmic implementations, does not capture the global features of molecules that contribute to reactivity. Expertencoded methods 11 may better describe functional group requirements but cannot be tailored to an individual organization's capabilities. Automated pipelines for extracting reaction templates allow for facile (re)training on propriety data sets but are also inconsistent with the expert approach.
For the actual application of reaction templates to generate reactant molecules from an input product molecule, several machine-learning-based approaches have focused on learning which templates provide the most strategic disconnections, with varying degrees of sophistication. 29,30 An alternative approach is the use of sequence-to-sequence models that treat the one-step retrosynthetic task as a translation between products and reactants. 31−33 A single-step retrosynthetic recommender is sufficient for a chemist to construct routes manually, one step at a time. 34 The single-step retrosynthetic capabilities can be extended to full route design by using a tree search. Each step can produce thousands of precursors, which requires a guided search strategy to prevent combinatorial explosion. Candidate precursors can be filtered by SA heuristics 11 or by a learned expansion policy 15 to have a more tractable list of chemicals to transform in the next cycle. Full pathways can be constructed by recursively suggesting single-step retrosynthetic precursors that become progressively simpler until a stop criterion is met. Different implementations of tree searches have been investigated including depth-first, best-first, and proof-number search and Monte Carlo tree search algorithms; 15,35 a direct comparison of methods is difficult because quantitative scoring remains a challenge. Generally, a retrosynthetic search is terminated once precursors are found that can be purchased. This complicates benchmarking retrosynthesis algorithms because a larger, more diverse database of buyable chemicals will have a higher probability of termination and naturally appear more successful. Other stop criteria such as the number of occurrences in the literature or chemical logic (defining the allowable number of carbon, nitrogen, and oxygen atoms) can be used, the latter of which can provide greater standardization but is less relevant to actual applications. Moreover, the ability to identify a pathway does not guarantee its chemical feasibility. Since there are multiple routes that can synthesize the same target, the best method of validation would be to perform the chemistry in the lab; this is obviously prohibitively expensive to undertake for every route generated, timeintensive, and not a scalable approach to validating new methods in retrosynthesis planning.
Recommendation and Evaluation of Reaction Conditions. Planning a retrosynthetic route is arguably only one aspect of a full CASP system. To be an actionable suggestion that a chemist can take into the laboratory, we must propose a set of reaction conditions that are able to achieve the desired transformation. Finding the optimal or acceptable set of conditions for a reaction can require time-consuming empirical screening to determine what works best; often, a chemist will employ "typical" conditions for that family of reactions without tailoring their choice to the particular substrate of interest. Biases for choosing reaction conditions can result from individual experience or immediate availability of reagents. In principle, machine-learning models for condition recommendation can more objectively infer suitable conditions if appropriately trained on historical condition data.
In practice, such models are difficult to develop due to a lack of high-quality data. The main data issues that hamper progress are inadequate disclosure of the (1) amounts, volumes, or concentrations; (2) reaction times or kinetics; and (3) order of the addition of reagents and catalysts. Despite these issues, data-driven approaches have demonstrated the ability to suggest conditions for specific reaction classes 36,37 and for more diverse reaction sets. 38,39 These models provide a strong basis for empirical optimization of reaction conditions but still lack the full details necessary for execution. Condition recommendation models would likely be developed to suit the needs of a particular area of chemistry such as medicinal chemistry or process chemistry. The objective of the reaction is different in many cases, such as the importance of yields and side-product formation. One objective might be the prediction of the "best" conditions for a set of reactions that we would like to run in parallel in a single well-plate. More specific predictions may be required for finding the optimal conditions for a single reaction where a new combination of conditions or new catalysts or reagents are designed.
Even though it is hard to escape empirical optimization of reaction conditions, particularly for complex substrates or in tandem catalysis, opportunities for techniques in artificial intelligence to accelerate this process also exist. Reaction optimization is a well-established field, 40−42 and there exist many statistical techniques for selecting experimental conditions to iteratively improve performance (e.g., in terms of yield, turnover number, or throughput) exist. In machine learning parlance, these are active learning frameworks. The most popular methods are model-based techniques, which construct a surrogate model of reaction performance as a function of reaction conditions. Various search strategies (e.g., Bayesian optimization) can be layered on top of these models to help select the next set of conditions to try and refine the model. While these concepts are not new, machine-learningbased models have the potential to provide better estimates of performance and uncertainty to accelerate the search. 43−45 Forward-Reaction Prediction. The third key task of CASP is to ensure that recommendations obtained through algorithmic synthesis design are robust and actionable by anticipating, at least qualitatively, reaction products. A chemist might assess the feasibility of a reaction by searching for similar transformations, reading the literature, and determining if the synthetic method will generalize to the substrates of interest. Data-driven techniques can learn to perform the same generalization when trained on a broad set of reactions. Machine learning methods for reaction prediction include attempting to infer reaction rules from a predefined list of rules or templates, 29,46,47 graph convolutional neural networks that predict atom and bond changes from starting materials to products, 48,49 and sequence-to-sequence models which predict product SMILES. 50−52 Compared to the evaluation of retrosynthesis models, forward synthesis models are more straightforward to evaluate quantitatively, as there is, in principle, only one true answer. In practice, however, the absence of precise concentration, time, and temperature data make reaction prediction an ill-posed problem.
These forward-reaction predictors can also be used for sideproduct prediction. Knowledge of the most probable products helps identify reactions that would produce potentially harmful or difficult-to-separate intermediates. Many reactions can lead to multiple regio-or stereoisomeric compounds. Information about a reaction's selectivity and possible side products is a crucial aspect of prioritizing syntheses, and can potentially assist with structure assignment. Once these models are able to make quantitative predictions, they will be indispensable for the consideration and design of purification strategies.
In addition to use in CASP, there are other applications for reaction prediction. Many make-on-demand virtual libraries are enumerated based on expert-defined reaction templates that focus on a limited set of chemistries intended to be as robust as possible. 53 Reportedly, successful delivery of compounds in make-on-demand libraries within 4 weeks is around 85% and within 6 weeks is 93%. 54 This high success rate demonstrates the robustness of rule-based methods using well-established chemistry. Either using heuristically extracted templates or template-free methods, new reaction space (e.g., novel synthetic methods described in new publications) can be included in an automated pipeline in real-time. If targets have been identified and retrosynthetic plans are in place, a search can be performed for all the combinations of available starting materials that could be substituted. For example, if the first reaction is a Suzuki coupling reaction, all the combinations of available boronic acids and aryl halides can be enumerated. A forward predictor is then useful for scoring which combinations will likely lead to successful reactions. By further ranking this set in terms of compounds' properties of interest, a rapid assessment of the accessible chemical space surrounding the target can be made, e.g., for hit expansion in drug discovery. This capability is closely related to integrating the goals of diversity-oriented synthesis into CASP.

■ SECTION 2: HOW IS CASP CURRENTLY USED IN THE PHARMACEUTICAL AND CHEMICAL INDUSTRY?
There are many ways to envision integration of AI-based CASP tools into the medicinal chemistry workflow, and adoption appears to be on the rise. The discussion below will largely focus on the use of tools within the open source ASKCOS software suite. 20 However, the applications are general. We will break the use cases into multistep route planning, forwardreaction prediction, and condition recommendation, as outlined in the introduction. Finally, we will briefly discuss how the incorporation of programmatic interfaces can aid DMTA workflow and the general feedback from MLPDS member companies on ASKCOS functionality and its adoption in their organizations.
Multistep Route Planning. Many of the available commercial and academic synthetic route planning software provide a stand-alone graphical user interface (GUI) or webbased interface where users can interact with the suggested routes and predictions. The target users of the software range from nonchemists, without much knowledge of chemical reactivity, to highly trained expert chemists who want to streamline their synthesis workflow. The MLPDS consortium member companies report that the primary users of the software are expert, Ph.D.-level chemists, and adoption is reported to vary from indifference up to enthusiastic and Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective everyday use. At Janssen, many chemists use synthesis planning tools in parallel with traditional database lookups of known reactions to generate ideas more quickly. Other users are computational chemists and chemical engineers who may not have as much practice at retrosynthetic planning but are involved with molecular design or process development. Most companies pilot small rollouts to select expert chemists, who are in the strongest positions to evaluate the capabilities of machine learning CASP tools and identify key limitations. At BASF, experts from different stages in product development (e.g., early phase and process development) provided understanding of the different expectations across business areas. These small rollouts are necessary to understand the obstacles to wider adoption and further integration into synthesis pipelines. A close contact is necessary between the company's beta testers and the developers of retrosynthesis algorithms since the true assessment of performance must be carried out by trained experts who can validate the model(s) suggestions. The proof of principle for full pathway planning has been established, but further refinement will require the input of chemists who can objectively evaluate retrosynthetic predictions. Input from the MLPDS member companies has identified some general trends in which the machine learning algorithms perform well and poorly. Generally, target molecules that are in a similar chemical space to product molecules found in Reaxys or USPTO tend to perform well using the ASKCOS suite of tools. These target molecules can be accessed using well-established chemistries and the models can perform adequately within their domains of applicability.
For example, input of the structure of branebrutinib (BMS-986195, 1, Figure 3) resulted in an ASKCOS proposed synthesis. While the synthetic route was first reported in 2016, 55,56 the training data for ASKCOS stops before the initial disclosure of this molecule, which demonstrates the ability of the machine-learning models to generalize to new target compounds. The ASKCOS-proposed synthesis begins from commercially available starting materials that are similar to those in the reported route and uses several types of reactions (C−N cross-coupling, heterocycle formation, diazotization/ reduction) to arrive at the final product. While the overall synthesis appears plausible, the selective Boc deprotection of 6 to 5 is likely problematic and could be easily substituted with an orthogonal protecting group when deciding on the final route. As previously noted, one area for improvement is better prediction of regioselectivity of reactions where there are two pendant reactive groups. For the proposed CN-bond-forming reaction, ASKCOS suggests Boc protection of the N−H of the alkynamide intermediate (3) while the second coupling partner (4) contains a free carboxamide. In the literature synthesis of 1, the authors note that a carboxamide unexpectedly prevents the Buchwald (C−N) coupling from proceeding. Thus, this team performs the reaction with the nitrile substituted for the carboxamide of 4. While the exact ASKCOS C−N coupling has never been tried, the fact that chemists attempted the C−N coupling with a free carboxamide demonstrates that the ASKCOS prediction is reasonable (i.e., worth trying), but the carboxamide would likely need to be converted to a nitrile. This example is just one small piece of evidence that ASKCOS can reasonably disconnect modern drug-like molecules and how nuanced experimental results (including failures), not captured in large reaction databases, can negatively impact algorithm performance.
Many different aspects play into the "success" of machinelearning-based path-planning tools. One of the simplest factors in whether these programs are able to find pathways is the coverage of the database of compounds considered to be Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective commercially available; simply put, a larger starting material database increases the odds that a search will terminate successfully. In an effort to better understand how the database of buyable chemicals affects tree search outcomes, GlaxoS-mithKline compared the stock ASKCOS database of buyable compounds (138k) and a larger set that was augmented with their internal compounds/vendors (8M). On an internal set of 69 target molecules, using the most liberal path-planner settings, a route was found by ASKCOS for 54% of compounds with the stock database and 67% of compounds with their internal database. These results highlight the dependency of path-planning algorithms on the database used for a stop criterion. The dependence on a buyable database, however, complicates the comparison of CASP tools since every software package uses a different (typically undisclosed) buyable database. This problem may be alleviated by the implementation of straightforward utilities to load and use custom building-block sets in every CASP tool. This requirement is generally useful since all MLPDS corporate members maintain internally large collections of building blocks.
How the availability of starting materials affects the SA can be seen by enumerating methylated variants of compound A 57 followed by an evaluation using ASKCOS's retrosynthesis tree search. While an expert chemist can easily infer that compound A is disconnected by amide formation and C−C crosscoupling, the general knowledge of the commercial availability of the methylated starting materials is less likely. The input of methyl analogues of A results in the expected bond disconnections (Figure 4, representative ASKCOS results shown). Since the stop criteria for the tree search is commercial availability, the algorithm will assess at each disconnection whether the suggested starting materials are purchasable. In this example, the starting material for compound 8 can be purchased after only a few retrosynthetic steps. Compounds 9 and 10 require an extra step compared to 8. Finally, access to indole 11 would necessitate further steps to synthesize, which draws the step count to almost double that of the synthesis of compound 8. Notably, the information is obtained using ASKCOS with one search per compound. This assessment now provides chemists with the information on which analogues are most synthetically accessible and can factor into the decision-making process for prioritization of target molecules.
An expected feature of machine-learning methods for predictive chemistry is that retraining models on proprietary data ought to allow companies to achieve better predictive ability on chemistries that are used in-house. 58 These in-house chemistries may not be well represented in public or published data sets, which most of the CASP systems are trained on. Researchers from AstraZeneca and the University of Bern applied a workflow for retrosynthetic template extraction 28 and training/application 29 to several public and proprietary data sets and compared the performance of the different models. 59 They found that Reaxys has the most unique reaction templates, of which 2% are shared between all the data sets used in the study, and only 0.6% are shared between Reaxys and a subset of their proprietary ELN data. Eli Lilly identified a subset of 6k target compounds from approved, experimental, and investigational drugs to represent the chemical space of interest to the company. Using the Lilly database of building blocks and ChemoPrint, an in-house synthetic planning platform, retrosynthetic expansion was performed using a template set from (1) only Lilly eLN data (13297 templates) and (2) Lilly eLN data plus patent data (13297 + 50275 templates). Routes could be found for 40.1% of the 6k  There are still many molecular structures for which retrosynthesis planning fails to find any route. The MLPDS consortium members have identified lack of coverage for several company-specific target molecules or reactions in fullpath planning. Commonly identified substructures that are not successful in full-path planning are small, densely functionalized carbon cores with or without many contiguous stereocenters, caged structures where 3D geometry is crucial for selectivity, newly discovered heterocycles, and complicated polycyclics. Some of these substructures, such as densely functionalized carbon cores, require chemistry that is specific to each core's substructural environment (perhaps with <5 precedents in the literature). Using the conventional template extraction procedures, the model will not be able to generalize due to the high specificity of the template. Conversely, path planning of some target molecules will find numerous pathways but include many poor retrosynthetic suggestions where regio-or stereoselectivity may not be predicted appropriately. To correct the issues of selectivity, further filtering using an accurate forward-prediction model will provide richer route suggestions. Another set of failures are due to the limitation of the search methods for navigating a synthetic tree. Since recursive retrosynthetic expansion has to restrict the search to avoid combinatorial explosion, most implementations cannot yet navigate a search path deeper than 15 synthetic steps. If chemists are using CASP tools for the ideation of routes and pathway planning cannot successfully navigate a synthetic graph to produce a route, another solution is necessary.
When full-path planning fails, a chemist may resort to using single-step retrosynthetic predictions to manually construct a route. Figure 5 is an example where a path to branebrutinib is manually explored. Interestingly, the suggestion of using a nitrile, which was found to be ideal in practice, is in the precursors lists but is ranked #37, so a chemist would have to sort through many higher-ranked suggestions. Manually constructing a route from tens to thousands of disconnections is a time-consuming task. A synthesis planning feature that was born from discussions between MLPDS member companies and MIT was the implementation of an interactive path planner using single-step retrosynthetic predictions. The interactive planner addresses the issue of displaying diverse suggestions and having more control over a synthetic plan. When chemists are initially developing a route, the precise choice of leaving groups matters less, and as the routes are refined, specific leaving groups are chosen based on the desired reactivity. Machine-learning models for retrosynthesis generally handle all possible reactants as distinct options. For example, the chloride, bromide, and iodide form of a halogenated precursor are not normally lumped into a single category. It is inconvenient for chemists to sort through numerous suggestions that have the same fundamental disconnection but different leaving groups. Thus, a clustering algorithm was developed to group similar suggestions (based on a k-means clustering using a reaction fingerprint 61 ) and expedite the exploration of distinct disconnections. Several routes are displayed using one visualization, which can be download and shared. Although none of the underlying machine-learning models were changed, expert users are much happier with exploring pathways interactively when an automated pathplanning job fails. This success demonstrates the that tight collaboration between end users and the developers of Forward-Reaction Prediction. The purpose of a machine-learning-based forward prediction is to validate routes that are furnished from full-path planning. In our implementation, forward prediction is not carried out automatically during the tree search via the GUI but can be performed on reactions after expansion. In practice, forward-reaction prediction tools are currently used mainly to identify potential side products and impurities rather than for confirming routes. Similar to retrosynthetic planning, the use of company data should improve the quality of predictions for in-house use by aligning the types of chemicals/reactions used for training and prospective prediction. A recent study between Pfizer and the University of Cambridge demonstrates that retraining a sequence-to-sequence forward prediction model on propriety data does boost accuracy for company-specific chemistry. 58 Condition Recommendation. Of all the MLPDS modules deployed at the member companies, condition recommendation is used least often and receives the least feedback. Previous research has reported recommendations of very specific conditions limited to a single reaction class. 37,62,63 These focused models do not approach the global intuition of reactivity that expert chemists have but may be useful when very specific conditions are necessary. A general model for condition recommendation, such as the one included in ASKCOS, 39 that can provide a good starting point for reaction execution would be preferable for medicinal chemistry workflows. However, these generalized models encounter limitations subject to the training sets' domain of applicability. Chemists currently can use ASKCOS to get a good starting point for planning a reaction, but many reasons may contribute to the lower adoption of condition recommendation. One is that the model suggestions are not specific enough (concentrations, time, order of addition, etc. are missing) to give conditions that are actionable. The conditions the model provides can be obtained through a literature search of similar transformations, which is the mechanism still preferred by practicing chemists. We find that chemists often use the model to confirm some set of the conditions they have already proposed or simply to evaluate the suggestions and give feedback to the model developers. Long-term, there is an opportunity to impact automated experimentation once it is possible to make quantitative recommendations, but currently, the utility of condition recommendation is limited.
A retrospective analysis was performed of reactions used in the SAR discovery phase of Novartis's LSZ102 (compound 12) and its derivatives. ASKCOS path planning identified routes where LSZ102 can be assembled via two classes of palladium catalyzed C−C coupling (C−H activation and Suzuki−Miyaura), as shown in Figure 6. Indeed both coupling strategies were widely utilized during the SAR discovery phase toward LSZ102. 64 Further retrospective analysis of the top-rated disconnection, a Pd-catalyzed C−H activation, identified the requirement for both high temperature and polar aprotic solvents (DMF/ DMA) in the top 3 conditions proposed. In reality, a screening optimization of the ligand and base was necessary to maximize the performance for the specific scaffold. The optimized conditions were applied to a diverse range of substrate starting materials with yields in the range of 39−97%. The initial temperature and solvent conditions proposed by ASKCOS were not far from those actually employed and would have provided a good starting point for either scouting or screening efforts.
One appealing application for context recommendation models is helping chemists and chemical engineers identify opportunities to leverage specific technologies at the onset of synthesis design. In doing so, more efficient and sustainable conditions aligned with green chemistry principles could be readily identified. One such example of this would be in the application of surfactant-based technology at Novartis, which seeks to replace undesirable solvents with a greener micellarwater surfactant system. Indeed, in the above example of LSZ102, the Suzuki−Miyaura retrosynthetic analysis identified by ASKCOS could be realized under such surfactant conditions. 65−68 In comparison with the same reaction performed in a standard organic solvent, the generation of the desbromo side product was significantly reduced from 8 to 0.7%. 65 By training ASKCOS with relevant internal data, it is envisaged that context recommendation models will be able to identify and propose more favorable conditions versus the historical conditions more prevalent in the existing literature using flexible user-provided definitions of "favorable".
Programmatic Interfaces for Incorporation into Company Platforms. Although a graphical user interface is the primary method of use by chemists, computational tools can be straightforwardly integrated with other computational Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective pipelines. A closer integration with in-house tools for molecular design represents an additional value proposition for CASP and could lead to greater adoption. For example, a programmatic interface for sending requests to path-planning software from in house design modules allows the automation of running retrosynthetic expansions and the accumulation of the necessary data for prioritization of target molecules. Eli Lilly has designed an in-house workflow named Kernel where target compounds from chemists, or screening hits, are submitted and prioritized in an automated fashion. Once Kernel identifies the prioritized compounds, full retrosynthetic path planning is performed on all of the molecules utilizing the ChemoPrint API 69 and the Lilly building block collection, which is then added to the compound listing; team members are informed of the results by e-mail. 70 This frees users of the design software from having to open standalone CASP tools. In addition, the use of a centralized data repository to store ideated compounds and their routes facilitates sharing and collaborative prioritization.
As do many companies, Merck & Co., Inc., Kenilworth, NJ, US uses Spotfire to organize designed molecules and their measured or predicted properties. They have initiated incorporation of the programmatic interface of the ASKCOS single-step retrosynthesis into a workflow to triage hits from virtual screening libraries. The results can be presented in many ways, but they analyze hits by availability and price of precursors broken down by each step. This rapidly informs chemists which target molecules may be able to be synthesized in parallel and, with incorporation of forward scoring, allow for consideration of which reactions may be most successful.
BASF has developed an integrated platform for linking literature references and internal electronic lab notebooks to the synthesis reaction template suggestions and integrated inhouse compound stock databases into the recursive path planning to optimize usage of internal resources. Molecules used in a proposed synthetic pathway are connected to an inhouse suite of tools for the prediction of physical and toxicological properties enabling an in silico assessment of reaction feasibility and safety before undertaking laboratory work.
One could envision that programmatic interfacing could be useful for de novo molecular generation as well. A common complaint from chemists about de novo methods is that the molecules are not synthetically accessible. Calculated SA scores have the benefit of speed, but imposing a bias to generate molecules using full recursive path planning would ensure that routes do exist to the generated molecules. Of course, this limits the chemical space in which the generative model will operate, but the improvement in synthesizability may be worth the trade-off.
Automated Synthesis Platforms. Synthesis planning is a crucial component of a fully automated reaction platform. Research toward automated synthesis platforms has been restricted to a relatively small set of reactions and largely remains in the proof-of-concept phase in both academia and industry. Current automated platforms still require a significant amount of human setup and planning but the process may become more streamlined with the integration of predictive chemistry tools. One opportunity was demonstrated using ASKCOS synthesis planning software that was coupled to a robotic flow synthesis platform. 16 This is proof-of-principle that machine-learning CASP tools can be useful for recommending routes and conditions for automated execution; however, the route and conditions suggestions still needed to be refined (e.g., to specify concentrations and reaction times) and were optimized offline (e.g., to be amenable for flow chemistry) before being executed on the robotic platform. In this case, the requirement for manual intervention is partially attributable to the dearth of training data for flow chemistry compared to the prevalence of batch chemistry results but could have been circumvented by using more traditional batch methods or parallel plate-based methods. Other options for automated systems include a closed-loop DMTA cycle using cyclofluidics, 71−73 automated laboratories, 74 and ultrahigh throughput experimentation. 75−77 Integration of retrosynthesis planning software into closedloop automation is currently underway at some pharmaceutical companies. At Eli Lilly, ChemoPrint has been successfully integrated into an automated platform for chemical synthesis. 74 Lilly has previously demonstrated as a proof-of-concept that the whole DMTA cycle can be automated and executed with minimal intervention from expert chemists. 17 At present, these examples are limited to single-step synthetic plans and in initial literature reports 70 and did not have a large impact on driving the project. As a proof-of-concept, this experiment demonstrates the feasibility of coupling CASP and automation to drive a DMTA cycle. Although closed-loop lead optimization has not been fully implemented for multistep synthesis, rapid progress is being made by both academic and industrial researchers.
Adoption by Users. In 2017, a small group of chemists surveyed at three pharmaceutical companies were asked to define the most important features of a synthesis planning platform to encourage adoption. 13 The top 6 important features to respondents were (1) an easy use and intuitive interface for interaction with the routes, (2) a method to explore the literature precedents associated with route suggestions, (3) the user can define bonds they desire to be broken to guide the search, (4) routes are terminated in purchasable starting materials, (5) functional group incompatibilities and unstable compounds are identified with protecting group strategies proposed to bypass these complications, and (6) a scoring system is implemented for ranking routes (discussed further in Establishment of Success Metrics). In our experience, these desired data are shared by end users at most organizations. Out of all of the important features, many are implemented, to varying extents, in the ASKCOS software package and many of the companies' inhouse tools.
At AstraZeneca, historically, CASP tools have mainly been used in pharmaceutical development to identify alternative routes. 10 However, during 2019, a customized adaption of the ASKCOS interface has been rolled out to all medicinal chemists. The interface includes both the models developed by the MLPDS consortium and internally developed models based on integrating internal ELN data with licensed and public data. 78,79 So far, the uptake of the ASKCOS tool at AstraZeneca has been positive.
As stated previously, the users can range from nonexpert chemists to practicing chemists. Many of the early evaluators at companies are computational chemists and informaticians who are deciding the correct method for integration into workflows. Evaluation by chemists is tricky without first defining what success and failure look like. The natural tendency of expert synthetic chemists is to input a favorite target compound (often a very complicated natural product) into the full Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective pathway search and look for familiar routes. Users may be dissuaded from future use of the tools if known/published routes are not displayed or ranked near the top suggestions. Adoption is anecdotally higher when basic training is provided for chemists to introduce the theory behind the software and examples of how to effectively use the different modules in each package. Importantly, this training should convey that one goal of data-driven programs such as ASKCOS is to go beyond lookups of known routes; the proposed routes are predictions based on generalizing from known reaction data. Instructions on how the models work, what the goal of the methods are, the model limitations, and how to change the inputs to obtain useful information have been noted to greatly increase the engagement of chemists. At many companies, a 1−2 h tutorial and interactive session has been seen as sufficient to train new users and enable them to effectively use CASP tools.
■ SECTION 3: HOW CAN WE, AS A RESEARCH COMMUNITY, MAKE CASP BETTER? Integration of CASP into medicinal chemistry workflows is a work in progress but there remain many challenges to developing and deploying machine-learning CASP tools in practice. Adoption of synthesis planning software is gaining momentum and is beginning to have an impact on the DMTA cycle by facilitating the "make" portion. Although more chemists are employing CASP tools, progress and reproducibility is hampered by a number of groups who publish advances in synthesis planning without making their code open-source or available upon request. In addition, standardization of metrics should be agreed upon with publicly available data sets since propriety data are usually not, or cannot be, shared.
Establishment of Success Metrics. The most common metric for assessing single-step retrosynthetic model performance is top-k accuracy. This metric is evaluated using a test set of known single-step reactions and is calculated based on an exact match of the true disconnection being in the top-k of the predictions. Although top-1 accuracy is informative for the model development, it is a poor metric since there are always multiple retrosynthetic disconnections that could be successfully executed in the lab. Model evaluation using top-k accuracy for small k (1−3) implies that the published method is one of a few "correct answers", when in reality retrosynthesis is a fuzzier prediction. While multiple answers are not recorded in the database, there may be many correct ground truths, so metrics like top-10 accuracy (or larger k) are more appropriate but can also inflate the accuracy, which might not correctly reflect the model performance. A simple example is if a program chose bromine and chlorine as possible leaving groups for a simple substitution; either of these might be successful in an experiment, depending on the attempted reaction.
Another important but less commonly reported metric is the diversity of predictions. From most chemists' point of view, the top-k accuracy may not always be the most important factor for choosing a retrosynthesis tool. For route planning, a key disconnection that has not been thought of yet is just as important as the feasibility of the suggestions. There is a tradeoff when developing models for retrosynthesis, where the suggestions have to be feasible, useful, and sometimes nonobvious for idea generation. An example of a highly feasible, not useful, but obvious suggestion would simple functional group interconversions, where the complexity is not being built (but is often seen in the historical reaction data). Conversely, a nonfeasible, very useful, and nonobvious suggestion would be one that suggests breaking a bond where no known chemistry has been developed to actually carry out the reaction. One can envision where this would be very useful for idea generation but not practical when planning routes where timelines are of the essence. The holy grail would be to suggest a feasible, useful, and nonobvious disconnection, which is a difficult task without true metrics for each of these categories. Using top-k accuracy to score single-step predictions allows us to achieve feasible suggestions while employing heuristics guides the models toward usef ul disconnections. The trade-off can only be assessed by expert chemists who can sort through many suggestions; however, these chemists' scores are subjective and often biased to the chemistries they are familiar with. The difficulty in defining "ideal" metrics for community-wide adoption is to balance the development of accurate models and ones that provide diverse suggestions.
Similar to defining metrics for single-step retrosynthesis predictions, the main obstacle in developing f ull path-planning algorithms is the difficulty in assessing the predicted pathways. Each individual retrosynthetic step can be evaluated as above, with the efficiency of the path search as an additional criterion. Simple metrics such as the existence of a route, the length of the longest linear sequence, or the price of the starting materials are sometimes are used for assessment of route planning software but cannot fully capture the complexity of the many desires from different types of chemists. A question that one might ask is whether the models are able to suggest a route that has been previously published. One would not want only published routes to be suggested because a lookup would be sufficient. Since there is a combinatorial space of options for many disconnections (e.g., chloride or iodide instead of bromide), it is not desirable to penalize route planning based on only suggestions appearing in the literature. Another question for path planning is whether the models are able to suggest routes that are chemically feasible. Since scoring the feasibility has yet to be addressed quantitatively and would have its own errors and limitations, the evaluation of different CASP packages, based on synthetic feasibility, is difficult.
A major requirement for path planning is the diversity of pathways, but diversity, in terms of full pathways, remains undefined. Route diversity depends not only on the single-step suggestions but also on the method for navigating the full synthetic trees. Diversity could mean the suggestion of many routes, some of which are very similar, but among all the pathways, there are some that employ completely different disconnections. Similar to single-step suggestions, there is no point to diverse route recommendations if they are not feasible, and currently, the only true validation available is to carry out the suggested syntheses. 14,70 The final important factor for path planning is speed. Speed is dependent on what the stop criteria used for the search is and what one considers to be buyable chemicals (in addition to what transformations are allowed, of course). A trade-off between the speed and quality of routes is often observed, but this could be tuned to the needs of the user.
The relative importance of each of these desires is subjective and highly dependent on the area of chemistry in which one is operating in. Medicinal chemists might like to see routes that share common intermediates that can be elaborated. This delivers a common route that leads to the highest diversity of Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective target molecules but might not be the optimal route to any individual compound. On the other hand, process chemists might want to see highly convergent routes but would like to visualize many diverse route suggestions for ideation, as they will have more complex considerations that are not captured by computational models. Finally, results need to be available quickly to provide the most value over traditional database searching and manual path planning. Balancing the scoring of retrosynthetic algorithms that encompass the accuracy to assess model performance, the diversity to satisfy the application to chemistry, and the convergency to a difficult multiobjective optimization problem.
In their seminal publication, Segler et al. used double blind A/B testing to evaluate whether chemists preferred published routes over computer generated routes. 15 If the participants of A/B testing span many organizations and departments (i.e., process chemistry, medicinal chemistry, etc.), personal biases may average out. This evaluation, however, does not represent a reproducible or scalable approach to obtaining a quantitative score.
Data, Determining Common Benchmarking, and Evaluation Methods. Of course, machine-learning models are thought to benefit from a higher quantity and richer data. The mechanism that companies and universities use to capture and report data is vital for the further advancement of datadriven methods in synthetic planning. An example of data that are not often recorded in databases is alternate reactions or conditions that have been tested in route to a new compound or natural product. A discussion about the evolution of a route is recorded in the literature reports but does not always get captured when being translated to database entries. This information is highly useful for chemists to determine a strategy in planning routes but is not captured when building models. Another consideration is that databases that record literature reports typically only include positive data with higher yields. Most reaction predictors are trained on successful reactions (USPTO and Reaxys data sets), and as a result, they cannot predict whether a reaction will have a low conversion. Additionally, the full characterization of side or byproducts in reaction mixtures is not often disclosed, due to the high time and costs associated with identifying all chemical species. This limits our ability to construct predictive models for reactivity. Finally, there are data that are being captured but not reported such as unpublished catalyst screens or HTE campaigns. However, data capture is increasingly becoming a topic of interest at many companies, and their reporting will hopefully make its way into the public data sets. CASP approaches using expert-encoded rules are less sensitive to data availability than ones using statistical learning because humans can facilitate the generalization of small numbers of reaction precedents to broader rules. Nevertheless, these methods would still benefit from the availability of richer data (e.g., more complete substrate scope tables and byproduct identification) because the expert that encodes the rules would have a better understanding of the reaction.
Retraining machine-learning models on company data has not been thoroughly investigated by all MLPDS member companies. As discussed previously, Eli Lilly only found a modest benefit to including USPTO with their own internal data when training a retrosynthesis model. These results indicate that internal reaction data sets may contain enough representative examples of the main "workhorse" reactions 80 most often used in medicinal chemistry programs. This brings up the question: will retraining models on company data simply give suggestions that will reinforce 81 the chemistries that are most popular? The answer depends on the chemistry setting where a CASP tool may be employed.
In a medicinal chemistry program, where accessing a chemical space dissimilar to current molecules is desired, new reaction types might be essential to synthesize nontraditional, increasingly complex target molecules. However, if targets can be synthesized through robust chemistries, it is appropriate for CASP to recommend a route with wellestablished chemistry rather than a creative route with many unknowns. Identifying routes with common chemistry also allows medicinal chemist to more accurately predict timelines to targets by allowing for prioritization of syntheses that can be outsourced versus ones that will need to be executed in-house. Nevertheless, users of CASP tools describe wanting to see more creative recommendations, particularly those working in process chemistry divisions. This is often because there are more complex or subjective considerations of pathway optimality, as discussed above, than the tools can handle. A skilled process chemist may have already considered the obvious disconnections from a step-efficiency perspective and may be looking for a more process-friendly and safer alternative.
Furthermore, if CASP tools are being used in conjunction with automated synthesis platforms, the question of reinforcing reactions 81 may not be as important. For automated synthesis, if a CASP system recommends simple and robust chemistries that have been used frequently within companies, a large burden could be taken off the chemists from having to plan or execute simple chemistry. Even if only a small portion of reactions in a medicinal chemistry program could be automated, a large impact could be made on the timelines for accessing new target molecules. This allows chemists more time to focus on rare chemistries that are key complexity building steps, which facilitates the expansion into new chemical space. As new chemistries are developed, they may be used to further train and refine the CASP models.
Different approaches to synthesis planning exist, and comparisons between different models are not currently standardized. Although current metrics for scoring full synthetic pathways are imperfect, an open access shared benchmarking platform or data set(s) need to be developed so researchers can compare retrosynthetic software and algorithms. Providing a common test set for investigators to benchmark their systems is a nontrivial task. As the field progresses, molecules provided for a test set will have to evolve, because over time, they will be included in databases for training. The underlying distribution of training data evolves with time as well, so it would be likely that the metrics for a common test set on newer models look better due to a higher representation of new reactions and structures in the training data. Additionally, it would be even better to also provide an open data set for training such that the training and test set will be common for every data-driven model that is published. Benchmarking retrosynthetic software is also complicated by the fact that some systems incorporate expert-encoded rules in their algorithms. This means that there is likely an overlap between these coded rules with a test set, whereas pure data-driven methods have a clear separation between the training and test data. 82 Even with better or more standardized metrics, chemists ultimately will use a program that fits their definition of useful within the scope of chemistry Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective that they are operating and proves its utility by suggesting routes that are successful in the lab. Additional Opportunities for CASP. Although discussion has focused on the difficulty of evaluating and benchmarking current synthesis planning tools, advances in many other predictive tasks may have a large impact on medicinal chemistry synthesis workflows. For instance, condition recommendation systems generally focus on predicting known reagent(s) in known or possibly unique combinations. Data for catalytic reactions are very limited in that there are often a lower number of distinct catalysts/ligands proven to be successful. Improvement in machine-learning models will be necessary to handle the many low-data environments that exist in drug discovery. One can envision principles similar to de novo molecular design being applied to the generation of unique catalysts/ligands structures, which might raise the probability of success for a given transformation. Advances for the design of large catalytic systems in materials science 83−85 have already been reported but less have been disclosed in catalysis for the synthesis of small molecule organic compounds. The chemical space of catalytic reactions is often very constrained (in terms of starting materials and reagents), which poses a problem for the generation of data sets that are useful for molecular generation. As with many prediction tasks in medicinal chemistry, the further development of models that can learn from small, constrained data sets is crucial and will likely require new input representations to capture a richer description of the molecular structure. Additional opportunities exist for the prediction of ligands in stereoselective reactions but will require the development of new 3D representations. The final complicating factor to de novo design of catalysts/ligands is that often the synthesis and characterization of new catalysts/ligands can consume a significant amount of time. The addition of a multistep synthesis simply for a catalyst/ligand would be prohibitive in most medicinal chemistry programs but may be of interest to academic chemists or process chemists who are highly focused on optimizing each step of the reaction sequence.
A time-consuming step for all synthetic organic research is the characterization of products and side products and the unequivocal assignment of final target molecule structure. Incorrect assignment of target structures can lead to incorrect data for further structure/activity optimization and can even result in patent disputes. 86 A naive approach for structural assignment would be to use forward-prediction models to identify possible side products in a reaction. These predictions could be verified with mass spectrometry (MS) or IR to confirm side products in a reaction mixture. However, this method would not be able to distinguish between constitutional isomeric or diastereomeric compounds, which limits its use to reactions that give well-defined isomeric products. Standardized data that are required to elucidate small molecule organic structures and are required for publication include MS, 1 H NMR, and 13 C NMR and can often require further experiments using 2-dimentional NMR experiments or other NMR active nuclei. When all these data are combined, the structural features can be determined, and it is feasible to train models to predict structures from their spectra. Learning complex nonlinear patterns between the different pieces of data is a perfect application for machine learning. However, data sets containing all these experiments are rare. The less common analytical methods (e.g., 2D NMR) tend to be more useful for structural determination. Models could conceivably be trained on data sets where each compound does not have every data point necessary, and the model could learn to use the higher quality data when needed.
Finally, this discussion of outlooks, standardization of data and models, and dissemination of code can have a large impact on the full pipeline working toward fully autonomous synthesis. A recent review points out many areas where improvement is needed to realize autonomous chemical synthesis both in terms of data/software and hardware. 87 Among these, the development of data-efficient and interpretable models are discussed. Interpretability of models is important to many users because they want to understand why a machine-learning model makes certain predictions. With the multitude of data that could be generated by automated experimentation, the ability to use that data to build predictive models with a low computational overhead and a short time to produce results will enable constructing rounds of experiments that most efficiently reach the goal. Another consideration that is important for machine learning in synthesis planning and automated experimentation is the improvement of uncertainty estimation, particularly in the low-data regime. Improved uncertainty estimates in active learning will produce richer experiments that will reduce time and cost. Finally, evaluation metrics that are specific to the goals of automated synthesis need to be established and standardized, which can focus on molecular targets that test the ability of the models and hardware to reach a new chemical space.

■ CONCLUSION
The integration of machine-learning models for predictive chemistry into the DMTA cycle is currently underway at companies within and beyond the MLPDS consortium. Companies have begun integration of ASKCOS (and independently developed CASP platforms) in workflows, and computational developers are working closely with synthetic chemists to find the emerging areas where new research will have the greatest impact. For the pace of machine-learningbased CASP research to accelerate, standardized metrics and shared data sets need to be established with a common benchmarking scheme. Fundamental advances to representation, robustness in low-data scenarios, and generalizability will be important for more robust machine-learning-based synthetic tools. Further research into hybrid machine learning and expert-encoded CASP 82 tools may be able to leverage the most useful aspects of each approach. The impact of machinelearning-based predictive chemistry is already being observed at some companies, and adoption by chemists is on the rise.
Many of the current CASP tools are developed for planning routes using robust reproducible chemistry. The aim of these tools is not to suggest only transformations that an experienced chemist could not identify. Rather, particularly with the current machine-learning-based CASP tools, their aim is to enable chemists to lighten the cognitive burden of synthesis planning. If chemists within an organization can each offload planning "easy" routes to computers, even for a relatively small fraction (e.g., 10%) of targets, there is the potential for significant total time savings. With the continued development of machinelearning models for synthesis planning and an increase in chemists' acceptance of using CASP to lighten their workload, the tools will be improved to fit the needs of different fields of chemistry and to handle synthetic challenges of increasing complexity.
Journal of Medicinal Chemistry pubs.acs.org/jmc Perspective