Uncertainty-Aware First-Principles Exploration of Chemical Reaction Networks

Exploring large chemical reaction networks with automated exploration approaches and accurate quantum chemical methods can require prohibitively large computational resources. Here, we present an automated exploration approach that focuses on the kinetically relevant part of the reaction network by interweaving (i) large-scale exploration of chemical reactions, (ii) identification of kinetically relevant parts of the reaction network through microkinetic modeling, (iii) quantification and propagation of uncertainties, and (iv) reaction network refinement. Such an uncertainty-aware exploration of kinetically relevant parts of a reaction network with automated accuracy improvement has not been demonstrated before in a fully quantum mechanical approach. Uncertainties are identified by local or global sensitivity analysis. The network is refined in a rolling fashion during the exploration. Moreover, the uncertainties are considered during kinetically steering of a rolling reaction network exploration. We demonstrate our approach for Eschenmoser–Claisen rearrangement reactions. The sensitivity analysis identifies that only a small number of reactions and compounds are essential for describing the kinetics reliably, resulting in efficient explorations without sacrificing accuracy and without requiring prior knowledge about the chemistry unfolding.


Introduction
If chemical compounds react in a flask in the laboratory, there will be a large number of reaction paths conceivable, leading to a complex network of elementary reaction steps with potentially many products.Detailed knowledge of such a reaction network is required for any kind of rational reaction optimization in order to prevent the formation of side products while promoting a desired reaction path.Constructing such reaction networks is facilitated by automated reaction network exploration protocols based on quantum chemical calculations (see Refs. 1-7 for reviews).These protocols construct large reaction networks with automated algorithms, therefore reducing the amount of manual work and the chance of overlooking essential reaction channels compared to manual investigations.After calculating the free energies for all compounds and rate constants in the network, these networks can be directly subjected to microkinetic modeling to predict products, key intermediates of the reaction, and concentration profiles.
Since the objective of a reaction network exploration is to derive a quantitative highfidelity model of a chemical reaction in experiment, the emerging chemical reaction network should focus on the chemistry of the reactive system under experimental conditions.This means that the automated exploration must be autonomously steered toward the kinetically relevant part of the network.To address this challenge, we proposed an automated kinetics-interlaced exploration algorithm 8 (KIEA) that achieves this through analysis of concentration fluxes obtained from microkinetic modeling during the generation of the network.][15] The algorithm in RMG follows a greedy strategy during the exploration, focusing on an in-depth exploration of single reaction paths 16 rather than on a broad exploration, as facilitated by KIEA.Sumiya and Maeda 17 suggested an alternative approach to steer automated explorations by only analyzing the rate constant matrix of the reaction network and avoiding explicit microkinetic modeling.However, their approach is restricted to a single potential energy surface, implying that the atom composition of every compound in the network must be the same.Apart from these approaches, a shortest-path analysis, [18][19][20] such as provided by Pathfinder, 20 which takes kinetic information of the reaction network into account, can also quantify how accessible a compound in the reaction network is and, hence, steer the exploration of reaction networks.
All these steering approaches depend crucially on the accuracy of the kinetic and thermodynamic parameters of the underlying reaction network.However, accurate quantum chemical methods require tremendous computational resources, making a large-scale ex-ploration of tens of thousands of reactions challenging, if not impossible.2][23][24] Because of the high computational cost of accurate quantum chemical calculations, the refinement is generally executed after the exploration and is limited only to a small set of reactions and compounds that dominate the overall kinetics. 8,25[27][28][29] Since autonomous steering of an automated reaction network exploration should depend on the kinetics of the network, reliable kinetic parameters are crucial during exploration.Therefore, we propose to explicitly interweave (i) an unfolding exploration of the reaction network with (ii) the identification of kinetically relevant reactions and compounds and (iii) the refinement of the kinetic parameters in one algorithm.
Our algorithm combines KIEA to steer the exploration with an integrated refinement of structures and energies (IRES) that identifies important reactions and compounds through local one-at-a-time (OAT) or Morris sensitivity analysis 30 of the microkinetic modeling output.IRES then refines structures, reaction paths, and energies in the network fully automatically.The Morris sensitivity analysis not only identifies important parameters in the microkinetic model, it also quantifies the uncertainty in the predicted concentrations.We exploit this fact and demonstrate how the uncertainties can be directly included in KIEA.
This work is structured as follows: First, we develop the IRES algorithm in Section 2, detailing our microkinetic modeling and sensitivity analysis approaches.In Section 3, we provide technical details and introduce the Eschenmoser-Claisen reaction, which serves as an example for developing our exploration approach.We then demonstrate the IRES-KIEA in Section 4 and conclude in Section 5.

Conceptual Considerations Microkinetic Modeling
For microkinetic modeling, the ordinary differential equations describing the mass-action kinetics of a chemical reaction network are integrated to obtain the concentration trajec-tories c n (t) for each species n.The forward (+) and backward (−) reaction rates f +/− I of the reaction I are given as with the forward and backward reaction rate constants k + I and k − I , respectively, and the stoichiometric coefficients S +/− nI of the species in reaction I Accordingly, the differential equation describing the change of concentration of species n is given by and the total concentration flux passing through reaction I by where t 0 and t max denote the start and end times of the microkinetic modeling simulation, respectively.The concentration flux passing through species n thus reads We approximate the reaction rate constants k + I by Eyring's absolute rate theory 31,32 k where G ‡ I is the free energy of activation of reaction I, h is Planck's constant, T the temperature, k B Boltzmann's constant, and Γ the transmission coefficient (assumed to be Γ = 1 in the following).To ensure that the reaction is thermodynamically balanced, the reverse rate constant k − I is then expressed with the equilibrium constant K I as The equilibrium constant K I is defined as usual with the free energies G n of the species on the reaction's right-hand side (RHS) and left-hand side (LHS):

Sensitivity Analysis
The calculation of the parameters G n and ∆G ‡ I required for the microkinetic modeling [see Eqs. ( 5) and ( 8)] will always be subject to various approximations, leading to an uncertain microkinetic modeling output.To reduce the uncertainty in the microkinetic modeling output, our IRES approach identifies the most influential parameters (G n and ∆G ‡ I ) through sensitivity analysis and refines them by carrying out more accurate calculations in a fully autonomous fashion.The objective of IRES is to increase the accuracy of the continued reaction network exploration driven by KIEA, which relies on the concentration fluxes c flux for compounds with zero starting concentration, it is the key output of the microkinetic modeling simulation and, therefore, analyzed by sensitivity analysis.
In local OAT sensitivity analysis, the relevance of an input parameter on the model output is calculated by changing one input parameter x i at a time from the baseline parameters X X X base (such as the most accurate free energies available) and evaluating the model output.Therefore, only one parameter differs from the baseline parameters during model evaluation.To provide an upper limit for the error of c max n , the maximum effect of the parameter uncertainty on c max n is crucial.For realistic variations of the parameters, we vary the free energies in the microkinetic modeling within their uncertainty bounds.We can expect the effect of this variation to be the largest if we change the parameter by its uncertainty, i.e., to the edge of the range of likely values.Therefore, we define the modification of the input parameters as where u(x i ) is the uncertainty we expect for parameter x i , and x u i and x l i denote the most extreme upper and lower parameter values of i, respectively.Care must be taken when modifying the free energies to avoid negative backward barriers.In such cases, the forward reaction barrier is increased to give a zero backward barrier.
To derive a sensitivity measure δc max i , we collect the maximum concentrations c max n (X X X l/u i ) from the OAT model evaluations and calculate their absolute maximum change compared to the baseline model's maximum concentrations c max n (X X X base ), where X X X l/u i = (x 1 , . . ., x i−1 , x l/u i , x i+1 , . . .x k ) are the modified parameters from the OAT procedure and k is the total number of parameters.
Because KIEA disregards any compound with negligible concentration flux in following microkinetic modeling steps, 8 refinement of these compounds cannot affect the exploration.Therefore, the sensitivity analysis can be accelerated by (i) varying free energies only if the associated species shows a concentration flux c flux n > τ kin flux , and (ii) by varying free energies of activation only if the reaction exhibits a flux F I > τ kin flux .
The baseline parameters X X X base can be understood as one point in the possible input space given by all possible values within the input's uncertainty.Because local OAT sensitivity analysis samples only a tiny part of this input space close to the baseline point, it is often criticized for being unreliable in identifying essential model parameters and may fail to provide the correct picture of the sensitivities and model output uncertainties. 33,34computationally affordable alternative to local sensitivity analysis is Morris sensitivity analysis, 30 where a grid of equally spaced input values is formed for each parameter from the range of possible values.This range is given as the interval between the values of x i in Eq. (9).Afterward, the model is evaluated for a set of N samples X X X r = (x r,0 , ...x r,k ), drawn at random from initially selected parameter values, where x r,0 , ...x r,k are the k model parameters for sample r.Then, each parameter value of X X X r is changed one-at-a-time in random order to a neighboring value x ′ r,i on the parameter grid.The parameters x ′ r,i are not returned to their initial values x r,i .Therefore, this algorithm creates a trajectory X X Xr through the input space starting at X X X r .By this procedure, Morris sensitivity analysis covers a significantly larger part of the input space than local OAT analysis.It is able to identify crucial parameters in the model with a relatively small number of samples N , typically in the range between 10 and 20. 33 quantify the maximum effect of an input parameter on the maximum concentrations, we define a sensitivity measure as where µ * ni is the expectation value of the absolute elementary effect 35 for parameter i, and maximum concentration c max n (X X X r ) Here, ∆ is the difference between the values for parameter i on its parameter grid, and the tilde (i.e., xr,j instead of x r,j ) highlights that these parameters may have been changed before because of the random order in the parameter modification during the sensitivity analysis.
Since Morris sensitivity analysis provides an adequate sampling of the input parameter space, the spread in the microkinetic modeling output provides an uncertainty measure for the concentrations.This allows us to define an uncertainty-aware version of KIEA.Instead of exploring unimolecular and bimolecular reactions based on the criteria c flux n > τ flux and c max n c max m > τ max , respectively, 8 we include the concentration spread by reformulating these criteria as and Here, cflux n and σ(c flux n ) are arithmetic mean and standard deviation of the concentration flux of compound n, and cmax n and σ(c max n ) are arithmetic mean and standard deviation of the compound's maximum concentration, respectively.Mean and standard deviation are calculated over the ensemble of microkinetic modeling simulations in the Morris sensitivity analysis.

Computational Methodology The Eschenmoser-Claisen Rearrangement
To demonstrate our IRES-KIEA approach, we chose the Eschenmoser-Claisen rearrangement 36 of allyl alcohol a1 and of furfuryl alcohol f1.The rearrangement of furfuryl alcohol was first reported in 1969 37 in dimethylformamide at 160 • C after 24 h.However, there is no experimental report on the Eschenmoser-Claisen rearrangement of allyl alcohol.Still, allyl alcohol represents the main reactive moiety in the reaction, making it an ideal model reactant for a general Eschenmoser-Claisen rearrangement.A sketch of the reaction mechanisms is shown in Fig. 1.The elevated reaction temperature is required for the rate-limiting initial alcohol exchange and methanol elimination to form the intermediates a3 and f3, respectively, before the Claisen rearrangement step occurs. 38In the case of the furfuryl-based rearrangement [Fig.1(b)], the product of the Claisen-rearrangement (f4) step undergoes an H-shift to re-establish aromaticity in the furan moiety and form the final product f5.The Eschenmoser-Claisen rearrangement reaction is an E stereo-selective, The notation with multiple arrows indicates that the reaction may not be a single elementary reaction step.[3, 3] sigmatropic rearrangement of allyl alcohols and N, N-dimethylacetamide-dimethyl acetal a at reduced temperatures of around 150 • C compared to other Claisen-type rearrangements. 39[42][43]

Reaction Network Exploration
In our Scine software framework, 44 reaction networks are encoded in terms of structures, which are local minima on Born-Oppenheimer potential energy surfaces, and elementary steps, which represent transitions between local minima on a potential energy surface. 4hese transitions proceed either through a transition state or are barrier-less processes (e.g., in the case of the association of two molecules to form a weakly interacting complex).Several structures (typically conformers) are grouped into compounds according to their charge, spin multiplicity, and the abstract molecular graph and structure representation determined by our software module Molassembler. 45,46A structure containing multiple molecules is grouped into so-called flasks, in which reactive complexes are formed.Elementary steps are grouped into reactions so that compounds or flasks associated with the structures that are connected by the elementary steps can be related.

Microkinetic Modeling
The mass-action kinetics were integrated at the level of compounds and flasks as kinetic species and reactions describing the transition between these species.Because we did not perform exhaustive conformer searches for every compound, flask, and transition state, we approximated G n by the minimum of the harmonic-oscillator/particle-in-a-box/staticrotor free energy approximation G HPS i calculated for any structure i of the compound or flask n where G HPS ni is given by Here, E elec i , δG vib i , δG rot i , δG trans i , δG solv are the electronic energy, the harmonic vibrational free energy correction, the free energy correction from the static rotor model, the translational free energy correction from the particle-in-a-box model, and the solvation free energy correction, respectively.We calculated the translational free energy contribution for a concentration of 1.0 mol L −1 to account for the typical standard state free energy correction in solution. 47milar to Eq. ( 15), we calculated the free energies of activation ∆G ‡ I as i.e., as the difference between the minimal G HPS i approximation for a transition state of the reaction and LHS's free energy.In the case of barrier-less reactions, where transition states are not available, the free energy approximation for the transition state G ‡ I = min i∈I (G HPS Ii ) was replaced by the maximum of the free energies of RHS and LHS

Electronic Structure Models
To reduce the number of single-point calculations, we refined electronic energies by coupled cluster calculations 48,49 only for a subset of structures.Either the structures were discovered as part of an elementary step with a barrier lower than 250.0 kJ mol −1 or during the sensitivity-based refinement.
To maximize the efficiency of the exploration and achieve sufficient accuracy for the microkinetic modeling, we calculated the electronic energy contribution to the free energy with a different electronic structure method than employed for the reaction exploration, structure optimization, and harmonic frequency calculations.These model combinations will be denoted as electronic energy model//structure optimization and frequency model.We applied the following three ranks for our refinement-based exploration strategy: Here, we denote the exchange-correlation hybrid functional by Adamo and Barone 50 as PBE0 and the pure functional by Perdew, Burke, and Ernzerhof's as PBE. 51Both functionals were corrected for long-range dispersion by Grimme's D3 correction, 52 including Becke-Johnson damping. 53GFN2-xTB denotes the semi-empirical tight binding method developed by Bannwarth et al., 54 and DLPNO-CCSD(T) refers to domain-based local pair natural orbital coupled cluster with singles, doubles, and perturbative triples excitations 55,56 with tight pair natural orbital (PNO) thresholds.PBE0-D3 and DLPNO-CCSD(T) calculations were carried out with the def2-TZVP basis set 57 and PBE-D3 calculations with the def2-SV(P) basis set. 57Furthermore, the conductor-like screening model 58 represented the solvent in the DFT calculations (dielectric constants (ϵ), and solvent radii (r solv ): toluene: ϵ = 2.38, r solv = 3.48 au., acetonitrile: ϵ = 37.5, r solv = 2.76 au.), whereas the generalized Born and surface area 59,60 model described the solvent in the GFN2-xTB calculations, and the conductor-like polarizable continuum model 61 (toluene: ϵ = 2.4, r solv = 1.3 au., acetonitrile: ϵ = 36.6,r solv = 1.3 au.) represented the solvent in the DLPNO-CCSD(T) calculations.
Free energies G n for the microkinetic modeling were calculated according to the first two ranks of the electronic structure model hierarchy, i.e., the electronic energies were always calculated with PBE0-D3 to ensure comparable energies.The hierarchy was implemented as follows: If the free energy calculated with PBE0-D3//PBE-D3 was available in the database, it was preferred over a PBE0-D3//GFN2-xTB free energy approximation.The free energies of activation ∆G ‡ I were calculated similarly, including all three hierarchy ranks.
All IRES-based explorations were performed with PBE0-D3//GFN2-xTB as the initial electronic structure method.During the exploration, free energies found to be important by the sensitivity analysis were refined with PBE0-D3//PBE-D3 and free energies of activation with DLPNO-CCSD(T)//PBE-D3.
The initial transition state GFN2-xTB structures were refined by double-ended reaction path optimizations with PBE-D3 (basis set and solvent models as detailed above) for the ten energetically most favorable elementary steps within 20.0 kJ mol −1 of the lowest PBE0-D3//GFN2-xTB free energy transition state, as described in Ref. 8. In this double-ended reaction path optimization, the minimum energy path is obtained by curve optimization, 62 the transition state is optimized, and the reactants and reaction products are obtained from an intrinsic reaction coordinate scan.Then, we calculated the electronic energy for each newly optimized stationary point with DLPNO-CCSD(T) and the vibrational harmonic frequencies with PBE-D3.To increase the number of successfully refined reactions, we restarted any unsuccessful transition state optimization with a lowered trust radius (to 0.05 Bohr instead of the original 0.1 Bohr) and increased the maximum number of iterations (to 250 instead of the original 100).The accuracy of the free energies of the compounds and flasks was increased by optimizing the ten structures with the lowest value of G HPS i with PBE-D3.These structures were chosen to be at most 20.0 kJ mol −1 higher in energy (PBE0-D3//GFN2-xTB) than the most stable structure.Then, PBE0-D3 electronic energies were calculated for the re-optimized structures, and the vibrational harmonic frequencies were calculated with PBE-D3.

Exploration Protocols
The reaction network was explored with the programs of the SCINE software suite.
Chemoton 63,64 was employed to sort structures and elementary steps and create the input for the individual electronic structure calculations.The exploration calculations were then performed by Puffin 65 and ReaDuct. 62,66,67The electronic structure calculations were performed by external programs: Electronic energies and nuclear gradients were provided by Turbomole 68,69 (version 7.4.1)and xTB 70 (version 6.5.1) for all DFT models and for GFN2-xTB, respectively.The DLPNO-CCSD(T) electronic energies were calculated with Orca 71 (version 5.0.2). Specific reaction conditions for the Eschenmoser-Claisen rearrangement of allyl alcohol were not reported in the literature.We assumed a temperature of 150 • C and toluene as a solvent for our exploration because these conditions are close to the conditions re-ported in the original publication of the Eschenmoser-Claisen rearrangement 36 and for Eschenmoser-Caisen rearrangements in general. 39Furthermore, the reaction network of the rearrangement of furfuryl alcohol was explored at 160 • C and acetonitrile as a solvent instead of dimethylformamide as reported in Ref. 37. Acetonitrile was assigned a dielectric constant of ϵ = 37.5, which is similar to that of dimethylformamide (ϵ = 37), but, in contrast to dimethylformamide, solvent parameters were available for all electronic structure methods employed.
We explored the reaction networks of both reactions combined with the local OAT sensitivities for IRES-KIEA with the thresholds τ max = 1•10 −3 mol 2 L −2 and τ flux = 1•10 −2 mol L −1 to select compounds for the exploration of bimolecular and unimolecular reactions, respectively.The maximum time for the microkinetic modeling simulations was set to t max = 24 h to match the experimental reaction conditions.We set the starting concentrations for both reactants to 1 mol L −1 to avoid biasing the exploration to unimolecular kinetics of a (note that a is commonly used in excess of 1.3 (Ref.In addition to the local OAT-based IRES strategy, we explored both Eschenmoser-Claisen reactions with the uncertainty-aware algorithm based on Morris sensitivity analysis and the KIEA exploration conditions given in Eqs.(13) and (14).The Morris sensitivity indices were calculated with four levels in the parameter grid and N = 20 samples.This definition of the exploration criteria in Eqs. ( 13) and ( 14) explicitly includes a measure of the uncertainty of the maximum concentrations and concentration fluxes through their standard deviation.Therefore, we chose the thresholds τ max = 1 • 10 −2 mol 2 L −2 and τ flux = 1 • 10 −1 mol L −1 significantly higher than in the local OAT-based explorations.We refined parameters i if µ * max i exceeded a refinement threshold τ ref = 5 • 10 −2 mol L −1 , which means that a small modification of the parameter is expected to change at least one maximum concentration by 5 • 10 −2 mol L −1 .Similar to the threshold choice for the local OAT sensitivities (vide supra), we chose the value of τ ref such that it was close to τ flux , therefore reducing the uncertainty in concentration fluxes and maximum concentrations for compounds which were close to being considered for further exploration.
For the Morris sensitivity analysis, we select the sampling trajectories X X Xr of the microkinetic models with up to 1000 parameters (elements of X X X r , that is free energies G n or activation energies ∆G ‡ I ) through a variant of Morris sensitivity analysis proposed by Saltelli and coworkers 35 that maximizes the input space covered by the sensitivity analysis, instead of relying on an initially small number of random points, as discussed in Section 2. This modified Morris approach became prohibitively slow for large microkinetic models with more than 1000 parameers, for which we relied on random trajectories, as proposed originally by Morris. 30Furthermore, we applied a variant of the flux-based screening procedure from the local OAT sensitivities in the case of microkinetic models with more than 1000 parameters.In such cases, we restricted the Morris sensitivity analysis to parameters associated with compounds and flasks with c flux n > 1•10 −9 mol L −1 , and reactions with F I > 1 • 10 −9 mol L −1 in the baseline microkinetic modeling simulation.We chose this screening procedure as a compromise to prevent the tens of thousands of microkinetic model evaluations from becoming the bottleneck of the exploration.We chose the screening threshold as 1•10 −9 mol L −1 , and hence, significantly lower than for the local OAT sensitivities.Note that our uncertainty-aware exploration protocol also considers the variance in the concentration flux, which is only available after the sensitivity analysis.The Morris sensitivity analysis and sampling were performed through an interface to the Sensitivity Analysis Library. 72,73All microkinetic modeling simulations in this work were executed by an interface to the program Reaction Mechanism Simulator. 11,74

Elementary Step Searches
The reaction network exploration was based on single-ended reaction trial calculations run with the second-generation Newton-trajectory-type algorithm detailed in Ref. 63.For these calculations, the number of bond modifications was limited to two, with at least one intermolecular bond formation for bimolecular reactions.Furthermore, the reaction trials were restricted by a set of element-specific rules that were chosen to reflect the general textbook-known reactivity of functional groups involved in the mechanism: • Oxygen and nitrogen atoms were always considered reactive.
• Hydrogen atoms were considered reactive if part of an ammonium group or at a distance of two bonds to an sp 2 -hybridized carbon atom or acetal group.
• Carbon atoms were considered reactive if sp 2 -hybridized or neighbors of an sp 2hybridized carbon atom.Furthermore, reaction coordinates were restricted in such a way that they always involved different polarized atoms in bond formation and breaking processes.Atoms were assigned positive and negative polarization identifiers according to their Pauling electronegativities, as described in Ref. 8.Moreover, we always assigned positive identifiers to hydrogen atoms and both positive and negative identifiers to sp 2 -hybridized carbon atoms.

Uncertainty Estimates
For both sensitivity analysis approaches considered in this work, we required estimates for the uncertainties of G n and ∆G ‡ I for PBE0-D3//GFN2-xTB, PBE0-D3//PBE-D3, and DLPNO-CCSD(T)//PBE-D3.For this, we compared the reaction networks for the Eschenmoser-Claisen rearrangement explored with PBE0-D3//GFN2-xTB and DLPNO-CCSD(T)//PBE-D3 by matching flasks, compounds, and reactions that are accessible from the starting compounds by crossing reaction barriers of less than 400.0 kJ mol −1 .We then calculated the differences ∆G n of the free energies and the differences ∆∆G ‡ I of the activation free energies Note that we calculated ∆∆G ‡ I for forward and backward reactions whereas the ∆G ‡ I parameters in the microkinetic modeling are defined with respect to the LHS of the reaction.
The differences ∆∆G ‡ I and ∆G n [see Eqs.To be consistent with the MAD of ∆∆G ‡ I , we chose the uncertainty of ∆G ‡ I with PBE0-D3//GFN2-xTB to be a constant value of u(∆G ‡ I ) = 15.0 kJ mol −1 .Furthermore, we chose the uncertainty bounds for G n and PBE0-D3//GFN2-xTB as u(G n ) = 10.0 kJ mol −1 , as a compromise between the MAD and the fact that ∆G n is significantly smaller for small molecules.
Even for our most accurate electronic structure model combination DLPNO-CCSD(T)//PBE-D3, there remain a large number of error sources, such as the approximations intrinsic to local coupled cluster, errors in the solvation-free energy approximation, anharmonicities in the vibrations, and significant contributions from the conformational entropy, which all contribute to the uncertainty of ∆G ‡ I .Quantifying all these uncertainty sources would be highly desirable but exceeds the scope of this work.Therefore, we restricted our investigation to the uncertainty of the approximations from the DLPNO ansatz by calculating ∆G ‡ I with normal (pair truncation threshold t pair = 1 • 10 −4 E h , PNO truncation threshold t PNO = 3.33 • 10 −7 ) and tight PNO (t pair = 1 • 10 −5 E h , t PNO = 1 • 10 −7 ) settings and taking the absolute differences δ PNO ∆G ‡ I .Accuracies for relative energies of 1 kcal/mol and 1 kJ/mol were reported previously for normal and tight PNO settings, respectively, compared to canonical CCSD(T). 75We defined the uncertainty as We chose a minimum uncertainty of 5.0 kJ mol −1 to account for the other error sources that we did not quantify in this work.
Because G n are absolute energies in our model, there is no clear approach to quantify the uncertainty in the electronic energy contribution from PBE0-D3 in the PBE0-D3//PBE-D3 method combination.Apart from the electronic energy uncertainty, the same uncertainty sources are present for DLPNO-CCSD(T)//PBE-D3.Therefore, we chose a constant uncertainty of 5.0 kJ mol −1 .An overview of our uncertainty estimates is given in Table I.

Local Sensitivity Analysis
To analyze the efficiency of the local OAT sensitivities-based IRES exploration for the Eschenmoser-Claisen rearrangement of allyl alcohol, we compared the microkinetic model extracted from the IRES exploration to the models obtained from the PBE0-D3//GFN2-xTB and DLPNO-CCSD(T)//PBE-D3 explorations.The concentration trajectories of the main product a4, methanol, the allyl alcohol a1, N, N-dimethylacetamide-dimethyl acetal a, and the mixed acetal a2 (sum of the concentrations for both enantiomers) are shown in Fig. 3 (a)-(c).The microkinetic model extracted from the reaction network explored with our IRESbased approach [Fig.3(b)] shows the fastest product formation, reaching a concentration of more than 0.9 mol L −1 within 24 h.The product formation predicted by the PBE0-D3//GFN2-xTB model [Fig.3(a)] is slower, showing only 0.76 mol L −1 after 24 h, while the product concentration predicted by the model based on the DLPNO-CCSD(T)//PBE-D3 [Fig.3(c)] is only 0.25 mol L −1 after 24 h, and therefore significantly slower than both other models.
The disagreement between DLPNO-CCSD(T)//PBE-D3 and the IRES-based model is somewhat surprising since the refinement-based approach should systematically improve the parameters from PBE0-D3//GFN2-xTB to DLPNO-CCSD(T)//PBE-D3.The difference between both models is due to the significantly lower free energy of activation of the methanol-catalyzed methanol elimination from the initial acetal a for the IRES-based model compared to DLPNO-CCSD(T)//PBE-D3, shown in Fig. 4. To illustrate the effect of this favorable transition state, we removed it from the reaction network.After removing it from the network, the resulting concentration trajectories agree qualitatively with the DLPNO-CCSD(T)//PBE-D3 concentrations, as shown in Fig. 3(d).Because the lower reaction barrier for the methanol-catalyzed methanol elimination is a result of the refinement with the DLPNO-CCSD(T)//PBE-D3 model combination, the refined reaction network [concentration plots in Fig. 3(b)] is a better model for the reaction than the pure DLPNO-CCSD(T)//PBE-D3 network, which failed to find this transition state.It is likely that the pure DLPNO-CCSD(T)//PBE-D3 did not discover this transition state because it relied exclusively on the Newton-trajectory-type approach to locate transition state guesses.By contrast, the IRES-based strategy employed a double-ended curve optimization to locate transition state guesses for the refinement, which was more successful in this case.
Furthermore, the IRES-based reaction network exploration required significantly fewer high-cost calculations, as shown in Table II The computational savings are smaller for the DLPNO-CCSD(T) single-point calculations because, in the DLPNO-CCSD(T)//PBE-D3 exploration, electronic energies were only refined for elementary steps with a barrier lower than 250.0 kJ mol −1 s.Nevertheless, the IRES-based exploration required more than a factor 5 fewer DLPNO-CCSD(T) calculations than the full DLPNO-CCSD(T)//PBE-D3 exploration (468 vs. 2561 calculations).
The concentration trajectories calculated with the microkinetic modeling parameters from the local OAT sensitivity-based IRES exploration of the Eschenmoser-Claisen rearrangement of furfuryl alcohol are shown in Fig. 5 (a).The microkinetic model predicts only very slow product formation.Most of the reactants are converted to the post-Claisen compound f4 only, and significant concentrations of furfuryl aldehyde and N,N-dimethyletheneamine (see Fig. 5(d) for the Lewis structures) are produced, effectively leading to a deactivation of the reactants.However, for this reaction, the experimental yield after 24 h starting from 42 mmol furfuryl alcohol and 84 mmol 1-methoxy-N,N-dimethylethen-1-amine was reported to be 70 %-80 %. 37 This experimental observation suggests that the free energy of activation for the rearomatization (f4 → f5) of the post-Claisen compound f4 is overestimated, and that f4 is formed too slowly in our model.
To better understand the disagreement of our model with the experimental observation and to estimate the uncertainty in the concentrations, we performed a Morris sensitivity analysis with the same settings discussed in Section 2. The mean concentrations of all model evaluations, the 90 % percentiles, and the concentration trajectories calculated with the baseline (best) parameters are shown for the post-Claisen compound f4 and the product f5 in Fig. 5(b), and for the side-products furfuryl aldehyde and N,Ndimethyletheneaminin in Fig. 5(c).The mean concentrations predicted for the product f5 and the post-Claisen intermediate f4 are significantly higher than the concentrations predicted by the baseline model.This clearly shows that a faster formation of the post-Claisen intermediate f4 and the product f5 is possible within the uncertainty assumed for the microkinetic modeling parameters.However, the experimental yields are not covered by the 90 % percentile of the product f5, suggesting that we may have underestimated the error in the parameters.
The side-products furfuryl aldehyde and N,N-dimethyletheneaminin remain at moderate concentrations even if we consider their concentration's uncertainty [see Fig. 5(c)].Therefore, our model is qualitatively correct as it predicts the experimental product f5 and the post-Claisen intermediate f4 as the main reaction products.

Uncertainty-aware Explorations
The mean concentration trajectories for the product a4 of the rearrangement of allyl alcohol, methanol, the mixed acetal a2, and the reactants a/a1 calculated with our uncertainty-aware exploration approach are shown with their counterpart from the local OAT-based exploration in Fig. 6.The mean trajectories show slower formation of the product a4 and, in turn, slower reactant consumption than the results from the local OAT-based exploration.However, in all cases, the local OAT-based concentration trajectories are within the 90 % percentiles of the uncertainty-aware exploration trajectories, i.e., the uncertainty-aware exploration and local OAT-based exploration agree in their predictions.
The concentration trajectories of the product f5, intermediate f4, and the side products furfuryl aldehyde and N,N-dimethylethenamine for the uncertainty-aware exploration of the Eschenmoser-Claisen rearrangement of furfuryl alcohol are shown in Fig. 7.The concentration trajectories for f4 and f5 are similar to the trajectories calculated based on the Morris sensitivity analysis of the local OAT-based exploration presented in Fig. 5. vs. 5119) compared to the local OAT-based exploration.This significant increase in discovered compounds and flasks can be attributed to the increased number of bimolecular reaction trials, which were 28486 for the local OAT-based exploration and 42011 for the uncertainty-aware exploration.However, most of the newly found compounds, flasks, and reactions did not contribute significantly to the uncertainty of the concentration prediction and were, therefore, not refined, as is evident from the only moderately increased number of elementary step refinement calculations between the explorations (local OATbased 199, uncertainty-aware 256).Furthermore, the only compound explored in addition to the uncertainty-aware exploration was propanol, originating from the disproportionation of allyl alcohol into propanol and prop-2-en-1-al, which did not reach any significant concentrations even if the uncertainty was considered.
The number of exploration trials and refinement calculations required to converge the exploration of the Eschenmoser-Claisen rearrangement of furfuryl alcohol are very similar between the uncertainty-aware and the local OAT-based approaches.The uncertaintyaware exploration ansatz required roughly 10,000 fewer bimolecular reaction trial calculations (163,354 vs. 153,192) and 1,268 fewer unimolecular reaction trial calculations, while the number of double-ended elementary step refinement calculations increased by 30 from 218 to 248.
Table III: Overview of the number of compounds, flasks, and reactions in the networks, the number of unimolecular and bimolecular single-ended reaction trials, and double-ended refinement calculations required to explore the networks.The numbers in parenthesis denote the number of flasks/compounds fulfilling the exploration criteria of KIEA.Databases containing the full reaction networks are available on Zenodo.

Conclusions
We presented a fully automated first-principles exploration approach, KIEA-IRES, that combines automated reaction network exploration, microkinetic modeling-based exploration steering, sensitivity analysis, and refinement of kinetic parameters for reactions, compounds, and flasks.
We explored the reaction network of the Eschenmoser-Claisen rearrangement containing tens of thousands of reactions and compounds with KIEA-IRES.KIEA-IRES correctly predicted the product of the rearrangement of furfuryl alcohol known from experiment and predicted the product of the rearrangement of allyl alcohol (not reported experimentally so far), as expected based on experimental studies for similar molecules. 36e exploration approach requires no prior knowledge of the chemistry that is explored.The only remaining input of general chemistry knowledge in our approach is the restriction of the reaction trial calculations by a small set of rules applicable to organic chemistry, as discussed in Section 3.[80] Our approach effectively exploits the fact that, out of the thousands of reactions and compounds in the network, only a small subset determines the kinetics.These reactions and compounds were automatically identified by global or local sensitivity analysis.The kinetic parameters encoded for them were refined with more accurate but computationally more costly quantum chemical methods.This refinement-driven approach led to significant computational savings compared to a full exploration with accurate but costly methods without loss in accuracy.For instance, IRES-KIEA required almost a factor of 100 fewer computationally costly DFT-based exploration trial calculations than a full DLPNO-CCSD(T)//PBE-D3-based KIEA reference exploration.
Furthermore, we compared the activation energies and free energies calculated for GFN2-xTB structures with the same quantities calculated for PBE-D3 structures.We found a significant spread in the error for the activation energies and a correlation between the error in the free energies and the absolute free energy value.The large spread for the activation energies highlights the importance of considering the uncertainty in the kinetic parameters in microkinetic modeling simulations and even in qualitative discussions of reaction mechanisms based on activation energies.
Our local OAT-based explorations and the uncertainty-aware exploration protocol relying on Morris sensitivity analysis both predicted the same products and kinetics for the example reactions.Nevertheless, the uncertainty-aware exploration approach is conceptually more appealing since it directly provides meaningful uncertainties for the concentrations and considers the microkinetic modeling parameters as distributions rather than as fixed values, which may prove crucial if the initial exploration method (here PBE0-D3//GFN2-xTB) turns out to be qualitatively wrong by favoring an incorrect reaction path.The local OAT-based sensitivity analysis can be considered a low-cost alternative for reaction networks in which the microkinetic model is extraordinarily large and the flux-based screening procedure cannot reduce the number of model parameters.

n
and maximum concentrations c max n encountered during the microkinetic modeling.Because c max n is a lower bound for c flux n

Figure 1 :
Figure 1: Sketch of the reaction mechanisms of the Eschenmoser-Claisen rearrangement reactions of allyl alcohol (a) and furfuryl alcohol (b).The notation with multiple arrows indicates that the reaction may not be a single elementary reaction step.

> 1 •
36) to 2 (Ref.37)equivalents in the experiment).For comparison, we explored the reaction network of the Eschenmoser-Claisen rearrangement of allyl alcohol with PBE0-D3//GFN2-xTB and DLPNO-CCSD(T)//PBE-D3 with the same KIEA settings as in the local OAT-based explorations.Note that we calculated the free energies for the microkinetic modeling in the DLPNO-CCSD(T)//PBE-D3 exploration with PBE0-D3//PBE-D3 and only the free energies of activation with DLPNO-CCSD(T)//PBE-D3.The sensitivity measures δc max i were calculated after each microkinetic modeling simulation in KIEA with a truncation threshold of τ kin flux = 1 • 10 −5 mol L −1 .Refinement calculations were started for reactions, compounds, and flasks if δc max i 10 −2 mol L −1 for their associated free energy of activation or free energy parameter i.We chose a threshold of 1 • 10 −2 mol L −1 for the maximum concentration change to match the threshold τ flux , as this choice reduced the uncertainty in c flux n and c max n for compounds that are either significantly populated during the exploration or at the edge of being explored further by KIEA.

Figure 3 :
Figure 3: Concentration trajectories of reactants, products, and main intermediates for the Eschenmoser-Claisen allyl rearrangement of the allyl alcohol.The trajectories were calculated based on the reaction networks explored with PBE0-D3//GFN2-xTB (a), the local OAT sensitivity-based IRES exploration (b), DLPNO-CCSD(T)//PBE-D3 (c), and the IRES-based exploration without a favorable transition state for the MeOH catalyzed MeOH elimination from a.

Figure 4 :
Figure 4: Mechanistic sketch of the methanol-catalyzed methanol elimination from a.
. While the overall number of reaction trial calculations (single-ended or double-ended transition state searches) with GFN2-xTB for the IRES-based exploration is 29150, and therefore higher than the number of reaction trial calculations required for the pure DLPNO-CCSD(T)//PBE-D3 exploration(19763  TableII: Number of DFT and GFN2-xTB reaction trial calculations, DLPNO-CCSD(T) single point calculations (sp.), and DFT geometry optimizations (opt.)required for the DLPNO-CCSD(T)//PBE-D3 exploration and the IRES based on local OAT sensitivities.DLPNO-CCSD(T)//PBE-D3 local OAT IRES these calculations are multiple orders of magnitude faster and contribute only little to the required computational resources.Compared to the high number of 19763 PBE-D3-based exploration trials for the DLPNO-CCSD(T)//PBE-D3 exploration, only 199 PBE-D3-based trials were needed for the IRES-based exploration, reducing computational demands by nearly a factor of 100.These savings by two orders of magnitude are significantly higher than the computational time spent on less demanding additional 434 PBE-D3 structure optimization for the G n refinement.The structure optimizations require only few computational resources compared to an exploration trial calculation because each reaction trial calculation consists of several structure optimizations, a transition state search, and intrinsic reaction coordinate scans.63

2 Figure 5 :
Figure 5: Concentration trajectories simulated for the reaction network explored with the local OAT sensitivity-based IRES.(a) Concentration trajectories for the most populated compounds were calculated with the best available parameters (baseline).(b, c) Uncertainty estimation based on the Morris sensitivity analysis model evaluations.90 % of trajectories are within the shaded area."Baseline" and "Mean" denote the trajectory calculated with the baseline (best available) parameters and the mean of the simulation ensemble, respectively.(d) Lewis structures and trajectory color coding.The black dashed lines denote the experimental yield of 70 %-80 % after 24 h.

Figure 6 :Figure 7 :
Figure 6: (a) Concentration trajectories for the main products (a4, MeOH), intermediates (a2), and (b) reactants (a, a1) calculated based on the reaction network of the Eschenmoser-Claisen rearrangement of allyl alcohol with the uncertainty-aware exploration approach.90 % of trajectories are within the shaded area."Mean" denotes the mean trajectory of the simulation ensemble and "Local OAT" denotes the trajectories from the exploration based on local OAT sensitivities. 76,77