Retooling Asymmetric Conjugate Additions for Sterically Demanding Substrates with an Iterative Data-Driven Approach

The development of catalytic enantioselective methods is routinely carried out using easily accessible and prototypical substrates. This approach to reaction development often yields asymmetric methods that perform poorly using substrates that are sterically or electronically dissimilar to those used during the reaction optimization campaign. Consequently, expanding the scope of previously optimized catalytic asymmetric reactions to include more challenging substrates is decidedly nontrivial. Here, we address this challenge through the development of a systematic workflow to broaden the applicability and reliability of asymmetric conjugate additions to substrates conventionally regarded as sterically and electronically demanding. The copper-catalyzed asymmetric conjugate addition of alkylzirconium nucleophiles to form tertiary centers, although successful for linear alkyl chains, fails for more sterically demanding linear α,β-unsaturated ketones. Key to adapting this method to obtain high enantioselectivity was the synthesis of modified phosphoramidite ligands, designed using quantitative structure–selectivity relationships (QSSRs). Iterative rounds of model construction and ligand synthesis were executed in parallel to evaluate the performance of 20 chiral ligands. The copper-catalyzed asymmetric addition is now more broadly applicable, even tolerating linear enones bearing tert-butyl β-substituents. The presence of common functional groups is tolerated in both nucleophiles and electrophiles, giving up to 99% yield and 95% ee across 20 examples.


■ INTRODUCTION
The development of catalytic asymmetric methods usually begins with the examination of a simple, readily available and prototypical substrate. While this approach is undeniably useful, it also often leads to a reaction protocol that is not widely applicable beyond the simple starting scaffold. Extending the scope of new reactions to include a variety of more complex substrates offers a wider range of potential applications. Reoptimization of reactions is often driven by empirical trial-and-error screening, a process that relies heavily on chance and intuition, making this a formidable challenge. There is a pressing need for a rational and operationally simple process to extend catalytic asymmetric methods to encompass electronically and/or sterically different starting materials to those used during optimization.
The copper-catalyzed asymmetric conjugate addition (ACA) of organometallic species is a powerful tool to synthesize new C−C bonds from α,β-unsaturated carbonyl compounds. 1−8 After tremendous attention for more than 20 years, the ACA is now arguably one of the most useful asymmetric transformations available to synthetic chemists, and has been used in the synthesis of a variety of natural products. 8−15 However, there are still a number of challenges that need to be met to reach its full potential. A lack of robustness in Cu-catalyzed ACAs is well-known, and widely implicated in preventing the approach from enriching mainstream synthetic strategies and methods, 16,17 though it should be mentioned that examples of ACAs to give >50 g of product have recently been reported. 18,19 Another reason for the underutilization of this method stems from method development being carried out with commonly available substrates, 20 so that seemingly obvious extensions to slightly unusual or more highly decorated reaction partners do not display the desired reactivity patterns. 21,22 There is a significant gap in the scope of products theoretically accessible through ACA methods and those that can be produced in practice, and an incomplete understanding of how to address this unmet need limits further applications in complex molecule settings. To truly make Cu-catalyzed ACA a "go-to methodology", 16 several advances are necessary, including an operationally simple reaction setup under convenient conditions, and tolerance to a wider array of substrates.
During our work on Cu-catalyzed asymmetric conjugate additions 23−26 of alkylzirconium species (generated from olefins) we found that asymmetric additions to linear enones bearing linear alkyl chains work well (>90% ee), 27 but additions to electronic or sterically deactivated enones gave only very poor results (<50% ee). This limitation is not unusual in ACA chemistry. 22,28 Simple linear substrates are also more challenging than their cyclic counterparts, as the population of both s-cis and s-trans conformers of the enone substrate can lower the enantioselectivity. 22,29 The use of such substrates is a long-standing challenge in asymmetric catalysis that motivated us to explore rational approaches to expand the scope of previously optimized catalytic asymmetric reactions.
Here, we report that new phosphoramidite ligands, 8,30,31 developed with the aid of quantitative structure−selectivity relationships (QSSRs), allow highly enantioselective Cucatalyzed ACAs of alkylzirconium species to linear enones bearing branched substituents or conjugated aromatic rings (Scheme 1). Selection of the best ligand from this series achieved high selectivity and reactivity with linear α,βunsaturated ketones bearing β-substituents as bulky as tertbutyl groups.
■ RESULTS AND DISCUSSION Initial Results. Benzylideneacetone 1a was previously found to be a challenging substrate for the Cu-catalyzed ACA 27 and was therefore chosen as a good candidate for examination. Previous conditions for related asymmetric additions were reoptimized, and we subsequently found that the use of copper(I) triflate and a phosphoramidite ligand in the presence of TMSCl were critical to achieve high reactivity. A combination of CH 2 Cl 2 and Et 2 O in a 1:4 mixture at 0°C also proved to be optimal for selectivity, consistent with previously reported studies. 24 Structure−Selectivity Relationships. We then examined structurally diverse phosphoramidite ligands (see Scheme S1 in the SI for more details) to explore the Phosphoramidite Ligand Space with the objective of finding a ligand "lead" structure to develop further. This preliminary screen uncovered the initially promising ligand L1, giving 90% yield and 71% ee. Structural diversification of the L1 scaffold provided the qualitative ligand structure-enantioselectivity relationship shown in Scheme 2A.
Several trends in ligand performance are apparent from these data: The aminoindane ring size is relatively unimportant (cf. L1, L2) while the BINOL configuration dictates which is the major product enantiomer. The stereogenic center on the indane provides a matched−mismatched effect (cf. L3, L4) and enantioselectivity can be tuned by variation of the R group, giving results from 67% to 92% ee (cf. L2, L4−L6). However, the variation in enantioselectivity as a function of relatively minor changes to the alkyl group was unexpected. Assuming Curtin−Hammett behavior, 32 the Gibbs energy difference between competing diastereomeric transition states (ΔΔG ⧧ ) for ligand L4 with an isopropyl moiety is 3.8 kJ/mol at 0°C, whereas a simple replacement of isopropyl to isononyl (L6) more than doubles this value to 7.7 kJ/mol. Nonintuitive effects of ligand structure on enantioselectivity are common in asymmetric transition metal catalysis, 23 usually due to the complexity of interactions involved and the involvement of several competitive transition structures. As shown by the data collected thus far, qualitative conclusions can be drawn from a structure−selectivity relationship but offer limited design guidance beyond intuitively increasing the length of the alkyl chain, without any notion of shape or properties. Ideally, one would prefer to make decisions based on a predicted ee value possessing a tight confidence interval to start the next ligand synthesis.
Multivariate Modeling. Inspired by Sigman's development of predictive and mechanistic multivariate linear regression models for reaction development, 33 we recently reported the optimization of Cu-catalyzed ACA to βsubstituted cyclopentenones 23 and cyclohexenone 34 with the aid of QSSR. This approach allows one to correlate experimentally observed enantioselectivities against molecular descriptors, quantitative parameters that capture structural and/or electronic differences between the ligands used. These descriptors may be derived from experimental or computed properties even in the absence of detailed mechanistic knowledge, and indeed, the resulting models may then be useful in formulating a mechanistic hypothesis. Computational mechanistic studies (e.g., using density functional theory) have previously aided the optimization of phosphoramidite ligands used in metal-catalyzed asymmetric transformations. 35,36 However, these approaches are significantly more expensive and require prior detailed knowledge of mechanism and competing stereodetermining transition structures. Impressive predictive accuracies of ∼2 kJ/mol have been obtained using QSSR models, which should be viewed in a favorable light when compared with the bounds of chemical accuracy attainable by quantum chemical calculations of around ∼4 kJ/mol. 37 Furthermore, statistical modeling can accelerate the design of new ligands by prioritizing the most useful syntheses, which remains the principal bottleneck of the design process. 38 Other promising methods also exist. 39

Research Article
Incomplete mechanistic understanding and the absence of quantitative guidelines led us toward the use of QSSR, where our strategy was to conduct rounds of statistical model construction and ligand synthesis in parallel. Iterative refinement of the model could then be accomplished as new data were collected. We aimed to achieve high enantioinduction of 1a and assumed higher reactivity could also be obtained in the process. The assertion that selectivity generally decreases with increasing reactivity is a long-standing myth in organic chemistry, 40 and indeed, we have previously found new ligands to increase both yield and enantioselectivity.
Our ligand design workflow started with the collection and curation of all available data, regardless of the achieved selectivity (Scheme 2B). This was followed by the generation of steric and electronic descriptors for each ligand, optimized after a conformational search. Internal and external validation of the model was a critical step to obtain a statistically valid model. One could finally predict the enantioselectivity of ligands in silico and discard unpromising structures. We only synthesized ligands that would provide useful information to the model or that would likely achieve high enantioselectivity. These synthesized ligands could then be fed to the model such that the QSSR model would gradually become stronger in an iterative way.
Guided by the qualitative structure−selectivity relationship (Scheme 2A), we restricted ligand modification to structural diversification of the aliphatic R-group only. We reasoned that this reduced search space for ligand optimization could be explored more efficiently, while still providing sufficient variation in selectivity values (as discussed above) from which to extract meaningful structure−selectivity trends. The BINOL backbone and indanyl group were not modified further and were retained in a matched configuration. Following these criteria, nine data points were initially used for model building out of sixteen ligands explored in the initial screening (see Scheme S2 in the SI).
Molecular feature descriptors were generated to quantify the steric and electronic properties of the phosphoramidite ligands (see Table S2 in the SI). A statistically significant and validated correlation (p < 0.05, R Train 2 and R Test 2 and R CV 2 > 0.6) 41 was obtained between enantioselectivity (expressed in terms of ΔΔG ⧧ ) and the lipophilicity parameter log P, the logarithm of n-octanol/water partition coefficient generated with the ALOGPS 42 algorithm (Figure 1). Model I has an R 2 of 89% and a root mean squared error (RMSE) of 0.66 kJ/mol. There are six ligands in the training set for only one parameter, and the ANOVA test confirmed the statistical significance of the parameter (p < 0.05). An external validation test set formed from a hold-out subset of ligands has a good fit (R 2 = 87%) and an RMSE of 1.93 kJ/mol. Internal validation with leave-oneout cross validation (LOOCV) also showed the model to be fairly robust, particularly in light of the limited amount of data (R 2 = 78% and RMSE = 0.96 kJ/mol). All measured values were determined by HPLC on a chiral nonracemic stationary phase and are an average of at least two reaction repeats. Experimental error was found to be within ±3% for yields and within ±1% for ee values. The maximum accuracy achievable with the model is therefore of 1% ee due to the experimental error.
We set boundaries for the exploration of Phosphoramidite Ligand Space based on synthetic accessibility. Ligand synthesis currently represents the bottleneck in our approach, and so we considered only those structures accessible from readily available commercial sources or fragments that could easily be synthesized within four well-established synthetic steps. Ligand synthesis and enantioselectivity prediction were carried out in parallel. Although there is no singular definition for the applicability domain (AD) of a statistical model, and the utility of this concept is contested, 43 we only envisaged potential in silico ligands possessing aliphatic R groups. Therefore, no heteroelements were added to the alkyl substituent even if the lipophilicity value could have been improved. As a rule of

ACS Catalysis
Research Article thumb, the quality of extrapolative predictions deteriorates further away from the area of feature space spanned by the training data. Inside this space interpolative predictions can be made confidently, and so we focused on aliphatic substituents only. 44 An in silico library of twenty-two synthetically accessible ligands (see Scheme S3 in the SI) was developed to satisfy the above considerations. Molecular descriptors were computed for each of these ligands and submitted into the model, represented as gray dots in Figure 1. The predicted levels of enantioselectivity were used to plan the next phase of ligand synthesis. We selected evenly spaced values along the range of predicted selectivities (gray labels), focusing our efforts in the region above 6.0 kJ/mol (>85% ee). Inspired by Bayesian Optimization approaches, 45 for which data acquisition is a trade-off between exploring regions of high uncertainty versus exploring regions of lower uncertainty, but higher expected values, we set out to improve the predictive power of our model while also targeting higher enantioselectivities.
The enantioselectivity of L12 was predicted between 79% and 93% ee (95% confidence interval) according to the initial model. Experimentally this was determined as 94% ee. This Figure 1. Continuous refinement of the model with new input of data. The model correlates experimentally measured enantioselectivity and predicted enantioselectivity. The gray area represents the standard error at 95% confidence interval and ee's were averaged from at least two reaction repeats.

ACS Catalysis
Research Article new data point could now be used to refine (i.e., retrain) the statistical model. By expanding the feature space spanned by the training data, predictions for new ligands can be made more confidently. Accordingly, incorporating the newly generated data into model training led to almost identical statistical performance across the training set, but with narrower error intervals. The in silico ligand library was then predicted again, guiding us next to synthesize L13, predicted to give between 90% and 98% ee and afforded 92% ee. Slightly narrower confidence intervals were again achieved by feeding the model with more information, and similar model quality was achieved. Synthesis and testing of L14 resulted in 92% ee, within 0.7 kJ/mol of the predicted range of 94−99% ee. This approach is illustrative of how a targeted data-collection strategy can be used to iteratively refine an underlying statistical model and generate more confident predictions. For a (multivariate) linear regression, optimization of the output necessarily involves extrapolation to a previously unexplored region of feature space, so the above approach proves particularly useful. Unlike linear models, the optimal values of nonlinear parametric models (e.g., higher order polynomials, 46 support-vector machines, 44 random forests 47 ) can lie within the bounds of existing feature space, such that extrapolative prediction may not be necessary to accomplish reaction optimization. Nevertheless, predictive performance can still be enhanced by additional data collection in sparsely covered regions of chemical space.
We hypothesized that the correlation of enantioselectivity and lipophilicity might be due to catalyst solubility, whereby lipophilic R groups could help to either solubilize the active catalyst or disperse inactive aggregates. The concentration of active catalyst was varied by an order of magnitude to test this hypothesis. As shown in Figure 1, both reactivity and selectivity were unaffected by concentration, forcing us to abandon this assumption.
We decided to challenge model I by preparing phosphoramidite ligands with unsymmetric and more branched alkyl groups, with the indane and BINOL moieties unchanged. Even though the predictions were acceptable and allowed for a slight improvement of enantioselectivity, we decided to build more predictive models with tighter confidence intervals through a more widely distributed and uniform sample of data points.
L15, containing a β-cyclocitral derivative in the R group, behaved surprisingly well as it afforded 75% ee with ee values predicted between 75% and 88%. L16−L18 however behaved unexpectedly, and the correlation started to break. L17 gave a striking difference between predicted and measured enantioselectivity and shows how small structural changes can result in large "cliffs" in terms of enantioselectivity. Such cliff-edge effects are unpredictable by nature and have similarities to the so-called "magic methyl effect" encountered in drug discovery. 48 As shown in Figure 2A, we observed a jump in selectivity and reactivity in moving from L4 (93%, 67% ee) to L17 (99%, 92% ee). Our model only focuses on enantioselectivity, but our objective as always is to achieve good selectivity and reactivity with ACA. Thus, ligand L17 afforded a similar level of selectivity as previously achieved with L6, but far better reactivity (99% versus 63% isolated yield). Reaction kinetics were also about an order of magnitude faster, with the reaction now typically complete in 30 min.
This substituent effect was not captured by changes in the lipophilicity descriptors. As shown in Figure 2A, the conformation of L17 differs from that of L2 such that it affects the ΔΔG ⧧ by +2.92 kJ/mol. A gauche conformation is preferred by the acyclic 3-pentyl group that might cause a longdistance change in the active catalyst that leads to better enantioselectivity. Superimposition of L4 and L2 proved identical whereas L17 and L18 both had similar gauche conformations that avoid destabilizing syn-pentane interactions and which was consistent with the grouping of the observed enantioselectivities for these four ligands.
We examined whether the inclusion of additional descriptors would allow us to capture the effect of methylation (exemplified by ligand L4, L2, L17 in Figure 2A). Conformations likely play an important role in enantioselectivity here as highlighted by the improvement obtained by comparing L2 (73% ee) to L6 (92% ee) (Scheme 2A), but steric parameters failed to show promise (e.g., Sterimol). Molecular descriptors might then fail to grasp the important features responsible for enantioinduction since flexible chains are often treated statically in a single conformation. For example, Sterimol steric parameters refer to a particular geometry and do not automatically take into account effects of a conformational ensemble. 33 In contrast to this, weighted Sterimol (wSterimol) 49 parameters report on the Boltzmann average along with minimum and maximum values across the ensemble. Upon examination, wSterimol parameters confirmed the anticipated impact of conformation on the output values and its error (on average ±6 kJ/mol, see Figure S2 in the SI), although no meaningful correlation was obtained using these descriptors.
Inspired by Doyle's use of electronic structure calculations to generate atomic and molecular descriptors, 47 we used the Spartan package 50 to generate parameters from which the highest occupied molecular orbital (HOMO) energy and

ACS Catalysis
Research Article dipole moment of the global minimum energy conformer of each ligands were found to correlate with reaction outcome (Figure 1). L19 was then predicted at 78% ee and actually afforded 75% ee, so we decided to continue with this new model. The descriptors of synthetically accessible ligands (represented as gray dots) were computed again and were fed to the newly generated model. L20 followed by L21 and L22 were thus predicted and then synthesized.
The final model, model II, possesses a good fit (14 ligands, R 2 = 84%, RMSE = 0.91 kJ/mol). The external test set also showed satisfactory correlation (six ligands, R 2 = 86%, RMSE = 0.93 kJ/mol), and LOOCV remains acceptable (R 2 = 75%, RMSE = 1.16 kJ/mol). There are 14 ligands in the training set for only two descriptors in the model equation, and the ANOVA test confirmed the statistical significance of the descriptors (p < 0.05).
The ligand HOMO energy relates to the nonbonding phosphorus lone pair. Although the classification of molecular descriptors as either electronic or steric is not absolute, 51 a higher HOMO energy is indicative of a more electron-rich σdonating ligand with a stronger metal−ligand bond. A positive coefficient in the regression model indicates that higher HOMO energies lead to higher levels of selectivity. On the other hand, the dipole moment describes the overall charge distribution in the ligand and also captures the gross molecular shape (e.g., 2.26 D with L4 and 2.09 D with L17, which is an 8% relative difference arising due to changing the length of the alkyl chain). This parameter therefore indirectly reflects steric as well as electronic differences and is sensitive to the length and branching of the N-alkyl substituent. The model coefficient is negative meaning that smaller dipole moments lead to higher levels of selectivity.
In total, an in silico library of 24 synthetically accessible ligands was predicted using the final model. As none of the newly predicted selectivities were in excess of previously realized experimental values, ligand optimization was halted at this stage. We had reached a maximum in selectivity based on the structural diversification of aliphatic R groups.
Retrospectively, the value in developing a multivariate model arises from not having to synthesize all of the ligands that were considered as potential candidates. Even for cases where structures could be developed using chemical intuition alone, the overall reduction in non-value-added ligand syntheses is a critical component in the acceleration of the ligand design process. This approach allowed us to systematically discard unpromising ideas and to rationally prioritize the synthesis of the most useful ligands, from both practical and statistical points of view. Additionally, we were also able to gain mechanistic insights into L17, as shown in Figure 2A

ACS Catalysis
Research Article optimization campaign, based on in silico predictions that no further improvements would be forthcoming.
Multiobjective Ranking. The ultimate objective of asymmetric catalyst development is to achieve high reactivity and enantioselectivity. Accordingly, catalyst selection should address both criteria. Therefore, we ranked our synthesized ligands according to their yield and enantioselectivity to identify the best all-round performance. Plotting yield versus enantioselectivity, the equation in Figure 2B represents the normalized distance to the origin (0% yield, 0% ee). The simultaneous ranking of more than one objective function (e.g., yield and selectivity) produces sets of equally good, nondominated solutions rather than a singular value. 52 The Pareto optimal set 39 contains those ligands for which there are no other examples superior in both yield and selectivity. The analysis showed that L17 was the best ligand in our library, placed equal first with L14. The synthesis of L14 is more tedious due to the need to synthesize the corresponding ketone in three steps with mediocre yields. It was therefore decided to continue with L17 (derived from a commercially available ketone) as the best ligand in our library that gives the largest yield of the product major enantiomer. This ligand quickly proved to have an impact outside this work, giving higher levels of reactivity in other reactions such as the desymmetrization of meso-bisphosphates. 53 Scope. The scope of the reaction was finally investigated with our new ligand L17. As well as varying the nucleophiles used we also probed the effects of putting substituents in various positions that were not tolerated in our previous system (Scheme 3). A phenyl ring at the R 2 position (2) gave the desired product with 72% yield and 92% ee. An isopropyl bearing electrophile (3) led to similar levels of selectivity. To our delight, even a tert-butyl group in 4, which is well-known to be unsuitably reactive, gave satisfactory yield (71%) and 82% ee. The examination of two other branched and hindered electrophiles at the 4-position provided product with high ee (5 and 6).
Substitution on R 1 is well-accepted by the catalyst, providing high ee and excellent reactivity in the case of branched aliphatic or aromatic substituents (14−16). Even chalcone, to give 17, was tolerated although this was obtained with a lower selectivity (96% yield, 78% ee).

■ CONCLUSIONS
In conclusion, an iterative protocol has guided the development of a new ligand for transition-metal-catalyzed asymmetric reactions (L17). The addition onto linear α,β-unsaturated ketones possessing bulky or aromatic groups, chosen as a challenging case study, now proceeds satisfactorily even with bulky tert-butyl β-substituents. Key to selectivity was the finetuning of phosphoramidite ligands, designed with the aid of quantitative structure−selectivity relationships. The QSSR approach allowed us to quickly discard unpromising potential ligand structures, which easily justifies the time spent generating models as ligand synthesis remains the bottleneck of the design process. A key lesson from this work is that one should aim for tighter confidence intervals and not just statistically significant models as this allows for a more useful ranking of the in silico ligands. Selectivity optimization using multivariate linear regression is fundamentally and inescapably an exercise in extrapolative prediction: the targeted collection of new data in unexplored areas of chemical space should be prioritized. At the end, we improved our understanding to reach higher levels of enantioinduction, and the method now achieves up to 99% yield and 95% ee on a broader range of substrates. We hope that this work will be used as an example on how to "fix" an asymmetric reaction, but we also showcase how copper-catalyzed ACA is becoming a more robust reaction potentially capable of enriching mainstream synthetic methodologies.

Notes
The authors declare no competing financial interest.

■ ACKNOWLEDGMENTS
A.V.B. is grateful to the EPSRC Centre for Doctoral Training in Synthesis for Biology and Medicine (EP/L015838/1) for a studentship, generously supported by AstraZeneca, Diamond Light Source, Defence Science and Technology Laboratory, Evotec, GlaxoSmithKline, Janssen, Novartis, Pfizer, Syngenta, Takeda, UCB, and Vertex. R.S.P. acknowledges computational resources from the RMACC Summit supercomputer supported by the National Science Foundation (ACI-1532235 and ACI-1532236), the University of Colorado Boulder and Colorado State University, and the Extreme Science and Engineering Discovery Environment (XSEDE) through allocation TG-CHE180056 (Theory of New Organic Reactions). XSEDE is supported by the National Science Foundation (ACI-1548562).

ACS Catalysis
Research Article