Organic Reactivity Made Easy and Accurate with Automated Multireference Calculations

In organic reactivity studies, quantum chemical calculations play a pivotal role as the foundation of understanding and machine learning model development. While prevalent black-box methods like density functional theory (DFT) and coupled-cluster theory (e.g., CCSD(T)) have significantly advanced our understanding of chemical reactivity, they frequently fall short in describing multiconfigurational transition states and intermediates. Achieving a more accurate description necessitates the use of multireference methods. However, these methods have not been used at scale due to their often-faulty predictions without expert input. Here, we overcome this deficiency with automated multiconfigurational pair-density functional theory (MC-PDFT) calculations. We apply this method to 908 automatically generated organic reactions. We find 68% of these reactions present significant multiconfigurational character in which the automated multiconfigurational approach often provides a more accurate and/or efficient description than DFT and CCSD(T). This work presents the first high-throughput application of automated multiconfigurational methods to reactivity, enabled by automated active space selection algorithms and the computation of electronic correlation with MC-PDFT on-top functionals. This approach can be used in a black-box fashion, avoiding significant active space inconsistency error in both single- and multireference cases and providing accurate multiconfigurational descriptions when needed.

1 Diels-Alder Reaction We present the reproduction of the hand-selected Diels-Alder results by the APC (12,12) method applied to the RGD1 dataset.

RGD1 Database
We discuss the distribution of multiconfigurational character between the different datasets.
We then show the absolute and signed deviations from the single-reference limits of each methods.We also discuss the surprising robustness of tPBE0 towards active space inconsistency error (ASIE), the use of C 2 0 as a multireference diagnostic, and give more detail about the case studies.Figure S1 shows the calculated distributions of multiconfigurational character found with the automated approach in the three subsets of data from RGD1, for both the energy difference between the reactants and products (∆E) and the energy difference between the end points and the transition states (E a ).The ∆E results provide an intuitive distribution of multiconfigurational character, with the dataset of small systems having significantly smaller multiconfigurational character than the B2F2 and B2F1 datasets, due to the loss of ground-state multiconfigurational nature such as aromaticity.Likewise, the B2F1 dataset has a significantly larger tail of multiconfigurational systems due to a number of reactants and products with unstable binding conformations.Further in line with one's intuitive understanding of multiconfigurational character is the much larger proportion of transition states that exhibit significant static correlation than the products or reactants.However, the distribution of multiconfigurational character across the subsets of data is very similar in the transition states.Consistent with the picture of all bond dissociations engendering some multiconfigurational character, all subsets of data have long tails of reactions with large M-diagnostics.

Absolute and Signed Deviations from the Single Reference
Limit

Discussion of Hybrid PDFT
The hybrid PDFT energy expression is given by a weighted average of the CASSCF energy and the tPBE energy: For tPBE0, we define X = 0.25, equivalent to the percent of exchange mixing in PBE0.S4 Thus the single reference limit is given by and the deviation from the single-reference limit is given as a weighted average of the SRL deviations of tPBE and CASSCF: Because of this, it would well be expected that the robustness of tPBE0 in the singlereference limit would be diminished by the poor performance of CASSCF shown in figure S2.However, this is not the case -surprisingly, we find that tPBE0 is slightly more robust towards ASIE than tPBE with mean absolute deviations of 1.1/2.4kcal/mol for M < 0.05 for ∆E/E a , respectively.The only explanation is that the deviations from the SRL for tPBE and CASSCF are inversely correlated, resulting in an unexpectedly robust method against ASIE.The correlation between the signed ASIE of tPBE and signed ASIE of CASSCF (those shown in Figure S3) is given in Figure S4.For ∆E, the inverse correlation is particularly clear, though a weaker correlation remains for E a .This correlation produces favorable error cancellation and thus on average slightly smaller deviations for tPBE0 from the SRL.S-7 2.4 Absolute and Signed Deviations from B3LYP-D3

Discussion on the Robustness of MC-PDFT Towards ASIE
As shown in several examples throughout the manuscript and supporting information, MC-PDFT appears to have a dramatically increased robustness towards ASIE as compared to CASSCF and NEVPT2.The theoretical reasoning for why this is the case is worth discussing in greater detail here.As presented in the introduction, we hypothesize that the greater ASIE of CASSCF and NEVPT2 is due to the unequal energetic contributions of inconsistent orbitals to the 2-RDM between geometries.As MC-PDFT excludes the 2-RDM from its energy expression (except indirectly through the on-top density Π), it is significantly more robust to these orbital inconsistencies.
Additionally, NEVPT2 can often be more prone to ASIE than CASSCF (as shown for MR 3361 1 below) as the perturber states also change as a function of the active orbitals.
While it is true that NEVPT2 should converge appropriately in the FCI limit,making the perturbative treatment redundant, the CAS active space size is very far from this limit even for the small-to-medium-sized molecules studied here.For example, the molecules in Figure 3 contain a total of 25 and 38 valence orbitals, respectively, significantly larger than the (12,12) space treated multiconfigurationally.As such, larger APC active spaces can often perform much worse with these active spaces due to ASIE.If larger active spaces are imbalanced or not converged, significant changes in NEVPT2 results relative to smaller, balanced active spaces can occur, highlighting the strong dependence of NEVPT2 on the active space.Comparatively, MC-PDFT is more robust to the active space choice, making it an advantageous method for multireference reactivity calculations.
Below, we provide studies of the ASIE in CASSCF, NEVPT2, and tPBE with increasing active space size for the reactions shown in Figure 3, as well as orbitals and cc-pVTZ results for all case studies shown in the manuscript.Table S4: Reaction energies for reaction MR 3361 1 as a function of APC active space size with the cc-pVDZ basis.The NEVPT2 result is shown to converge to the correct singlereference limit as the active space size is decreased.Table S6: Reaction, forward activation, and reverse activation energies for reaction MR 619998 2 as a function of APC active space size with the cc-pVDZ basis.The NEVPT2 result is shown to converge to the single-reference limit as the active space size is decreased.

Active Space
Method          In the larger basis, increased ASIE is found, which can be greatly reduced by recalculating energetics with only the most correlated orbitals in the active space.This is seen in the improved agreement between (4,4)-tPBE results across basis sets when compared to APC(12,12)-tPBE results, which deviate significantly between cc-pVDZ and cc-pVTZ.

Figure S1 :
Figure S1: Distribution of MR character, defined by the M diagnostic, in the reactant and product states (left) and the transition states (right).

Figure S2 :
Figure S2: Whisker plots of absolute deviations from single reference limits (right: E a , left: ∆E) of APC-tPBE, APC-NEVPT2, APC-CASSCF, and APC-tPBE0, stratified by the M diagnostic.Surprisingly, we find that tPBE0 demonstrates similar robustness compared to tPBE against ASIE in the weakly correlated systems (M < 0.05).

Figure S3 :
Figure S3: Whisker plots of signed deviations from single reference limits (right: E a , left: ∆E) of APC-tPBE, APC-NEVPT2, APC-CASSCF, and APC-tPBE0, stratified by the M diagnostic.No systematic error is observed except for the somewhat systematic underestimation of CASSCF in the multireference cases (M > 0.1), due to the larger amount of static correlation captured by CASSCF in the transition state.

Figure S4 :
Figure S4: Correlation plots (right: E a , left: ∆E) of CASSCF signed error (relative to HF) and tPBE signed error (relative to HF-PBE).The correlation coefficient is given for each plot.Pearson's r = -0.42 and -0.20 for ∆E and E a respectively.

Figure S27 :
Figure S27: APC(12,12) selected active orbitals for the product in reaction MR 673407 0 with the cc-pVDZ (left) and cc-pVTZ (right) basis sets.The biradical nature of the product is well-described.

Table S2 :
Energies of each state in the CTS and biradical reaction pathways relative to the reactants using PBE-D3, B3LYP-D3 and CCSD(T).

Table S3 :
M Diagnostic for each state examined in the two Diels-Alder reaction pathways.M diagnostics are shown with each active space selection method: hand selected(6,6)S1 and  APC(6,6)andAPC(12,12).The APC M-diagnostics reproduce those of the hand-selected active spaces in all cases.

Table S5 :
Reaction energies for reaction MR 3361 1 with the larger cc-pVTZ basis set.

Table S7 :
∆E E af E ar Reaction and forward activation energies for reaction MR 619998 2 with the larger cc-pVTZ basis set.

Table S8 :
Reaction, forward, and reverse activation energies for reaction MR 186317 0 with all methods used with the cc-pVDZ basis.Here SR are single reference methods and APC(12,12) refers to results using APC selected active spaces of size(12,12).

Table S9 :
Reaction, forward, and reverse activation energies for reaction MR 186317 0 with the larger cc-pVTZ basis set.

Table S10 :
Reaction, forward, and reverse activation energies for reaction MR 673407 0 with all methods used in the cc-pVDZ basis.Here SR are single reference methods and APC(12,12) refers to results using APC selected active spaces of size(12,12).(4,4) and (6,6) are energetics resulting from selecting the 4 or 6 most correlated orbitals from the APC(12,12) active space and carrying out a CASCI calculation.

Table S11 :
Reaction, forward, and reverse activation energies for reaction MR 673407 0 with the larger cc-pVTZ basis set.(4,4) and (6,6) are energetics resulting from selecting the 4 or 6 most correlated orbitals from the APC(12,12) active space and carrying out a CASCI calculation.The (4,4)-tPBE results with the cc-pVTZ basis align well with the (4,4) results with the cc-pVDZ basis.This supports the conclusion that the deviation between the two basis with the APC(12,12) active spaces is largely due to different lowly correlated orbitals being selected to the active space with each basis set.

Table S12 :
Reaction, forward, and reverse activation energies for reaction MR 673407 0 with the cc-pVTZ basis set as a function of N, the number of virtual orbital removal steps in the APC scheme.N Method ∆E E af E ar