The MOBH35 Metal–Organic Barrier Heights Reconsidered: Performance of Local-Orbital Coupled Cluster Approaches in Different Static Correlation Regimes

We have revisited the MOBH35 (Metal–Organic Barrier Heights, 35 reactions) benchmark [Iron, Janes, J. Phys. Chem. A, 2019, 123 ( (17), ), 3761−378130973722; ibid. 2019, 123, 6379–6380] for realistic organometallic catalytic reactions, using both canonical CCSD(T) and localized orbital approximations to it. For low levels of static correlation, all of DLPNO-CCSD(T), PNO-LCCSD(T), and LNO-CCSD(T) perform well; for moderately strong levels of static correlation, DLPNO-CCSD(T) and (T1) may break down catastrophically, and PNO-LCCSD(T) is vulnerable as well. In contrast, LNO-CCSD(T) converges smoothly to the canonical CCSD(T) answer with increasingly tight convergence settings. The only two reactions for which our revised MOBH35 reference values differ substantially from the original ones are reaction 9 and to a lesser extent 8, both involving iron. For the purpose of evaluating density functional theory (DFT) methods for MOBH35, it would be best to remove reaction 9 entirely as its severe level of static correlation makes it just too demanding for a test. The magnitude of the difference between DLPNO-CCSD(T) and DLPNO-CCSD(T1) is a reasonably good predictor for errors in DLPNO-CCSD(T1) compared to canonical CCSD(T); otherwise, monitoring all of T1, D1, max|tiA|, and 1/(εLUMO – εHOMO) should provide adequate warning for potential problems. Our conclusions are not specific to the def2-SVP basis set but are largely conserved for the larger def2-TZVPP, as they are for the smaller def2-SV(P): the latter may be an economical choice for calibrating against canonical CCSD(T). Finally, diagnostics for static correlation are statistically clustered into groups corresponding to (1) importance of single excitations in the wavefunction; (2a) the small band gap, weakly separated from (2b) correlation entropy; and (3) thermochemical importance of correlation energy, as well as the slope of the DFT reaction energy with respect to the percentage of HF exchange. Finally, a variable reduction analysis reveals that much information on the multireference character is provided by T1, IND/Itot, and the exchange-based diagnostic A100[TPSS].


■ INTRODUCTION
Reactions of transition metal (TM) complexes are of the greatest importance for progress in sustainable catalysis, 1 energy storage, 2 biological chemistry, 3 drug development, 4 and many other research areas. For their computational modeling and design (see a recent review), 5 density functional methods are widely used, but wavefunction ab initio methods are increasingly used for calibration of DFT approaches.
The single-reference CCSD(T) [coupled cluster with all singles and doubles, augmented with quasiperturbative triple excitations 6,7 ] has a well-deserved reputation as a "gold standard" for systems where electron correlation is predominantly dynamic, i.e., where the Hartree−Fock solution is a reasonable zero-order approximation. To a large extent, it benefits from error cancellation 8−12 between neglect of (usually antibonding) higher-order connected triple excitations (as accounted for in full CCSDT 13 ), which tend to be repulsive, and complete neglect of (universally bonding) connected quadruple excitations, as treated in the CCSDT(Q) 14 or full CCSDTQ 15,16 methods. (For an alternate perspective from perturbation theory, see Stanton's work. 17 ) CCSD(T) is comparatively "black-box" (requiring zero or minimal operator "judgment calls") and has "only" n 3 N 4 asymptotic CPU time scaling (with n the number of electrons and N the size of the basis set), compared to n 3 N 5 for CCSDT and n 4 N 6 for CCSDTQ. Additionally, however, a number of localized orbital approximations to CCSD(T) have been developed 18−23 (see below) that in principle scale linearly with system size and hence could be applied directly to the catalysts of interest. For main-group elements, we recently showed 24,25 that when local wavefunction methods are embedded into composite wavefunction protocols, their accuracy is comparable to the reference energies in the GMTKN55 database 26 of 1500 reactions for species of maingroup elements.
There has been a number of recent benchmark studies using coupled cluster methods on dissociation energies of transition metal systems, such as a recent update and extension 27 of the 20 metal−ligand bond energies dataset, 28 3dMLBE20, or the copper, silver, and gold clusters' benchmarks. 29,30 It has been argued 28 that any differences between experimental and CCSD(T) dissociation energies of diatomic 3d TMs lie in the expected range once the experimental energies are revised. 31 However, in catalysis, one is more interested in complexation energies and barrier heights of "real-life" organometallic systems. To this end, Dohm et al. 32 developed the MOR41 (41 metal− organic reactions) benchmark, and Iron and Janes 33,34 the MOBH35 benchmark of forward and reverse barrier heights for 35 representative metal−organic reactions. In both cases, reference data were obtained by complete basis set extrapolation of DLPNO-CCSD(T) data (domain localized pair natural orbital coupled cluster 18,19,35,36 ); Iron and Janes additionally considered 34 the improved DLPNO-CCSD(T 1 ) approach. Both sets of authors then proceeded to assess the performance of a large number of density functional and approximate wavefunction approaches. Most recently, the work of Dohm et al. was followed up by a similar study 37 on open-shell transition metal reactions, where DLPNO-CCSD(T 1 )/CBS (complete basis set extrapolated) benchmarks were presented and used for the assessment of many DFT and semiempirical methods.
In a recent study on polypyrroles 38 (extended porphyrins where Huckel and bicyclic structures are mostly "singlereference", but Mobius structures have pronounced static correlation), we, however, found that the performance of different localized approaches depends strongly on the degree of static correlation in the system. In particular, we found that the local natural orbital coupled cluster, LNO-CCSD(T), 21,39−42 proved to be considerably more resilient to static correlation than the DLPNO-CCSD(T 1 ) and PNO-LCCSD(T) 20,43,44 approaches. At least a subset of the MOBH35 reactions can be expected to exhibit significant static correlation.
In the present work, we will carry out canonical CCSD(T) calculations for a large subset of MOBH35, evaluate the performance of different localized methods, and attempt to rationalize this by means of a large number of diagnostics for static correlation. As a byproduct, updated reference data for MOBH35 are obtained. It is shown that, by and large, the DLPNO-CCSD(T 1 )-based reference data are adequate for evaluating all but the most accurate DFT functionals. However, for one particular system (the product of reaction 9), static correlation is severe enough that it causes a catastrophic breakdown of localized approaches. To be fair, that case is arguably well beyond the safe use margin of CCSD(T) itself.

■ COMPUTATIONAL METHODS
Geometries for reactants, transition states, and products of reactions 1−9, 11−16, and 21−34 were taken verbatim from the Supporting Information (SI) of Dohm et al., 45 except that the product geometry of reaction 5 (which erroneously is the same as the reactant geometry in the said SI) was taken instead from Iron and Janes, 33 as were the structures for the bimolecular reactions 10 and 35. The bimolecular reactions 17−20 involved transition states too large for canonical calculations and were hence omitted.
The Weigend−Ahlrichs basis set family 46 of def2-SV(P), def2-SVP, def2-TZVP, def2-TZVPP, and def2-QZVPP was used throughout, as it was designed as a compromise between the requirements of wavefunction and DFT calculations. (We note that for reactions 1−9, which involve first-row transition metals (TMs), they do not contain ECPs; for the second-and third-row TMs, they contain Stuttgart RECPs. 47 ) Where relevant, the standard def2-JK fitting basis set 48 was used for Coulomb and  exchange integrals, and the standard def2-nZVPP-RI 49,50 basis  sets for RI-MP2 and RI-CCSD calculations. Where possible, conventional canonical CCSD(T) 6,7 calculations with the def2-SVP basis set were carried out using MOLPRO 2020 and 2021, 51 devoid of any fitting basis sets. For the largest species, involved in reactions 1, 2, 5, 8, and 9, RI-CCSD(T) was instead carried out using the algorithm 52 in PSI4 version 1.4. 53 For verification purposes, we also reran the smaller species with this algorithm and compared them with the conventional results as well as with a newer algorithm 54 in MRCC 2020. 55 (For reactions 3−4 and 6−7, energies obtained with PSI4 and MRCC perfectly match, while differences with the canonical answers were negligible. For some second-and third-row TMs, there were subtle differences between PSI4 and MOLPRO that eventually could be narrowed down to the SCF reference energy and hence to subtle differences in the ECP parameters; no such issue was found for MRCC.) Convergence to the lowest closed-shell singlet SCF solution was verified by comparing the SCF energies from each program with those obtained using Gaussian 16 56 in the same basis set. SCF thresholds of 10 −9 E h were used in MOLPRO and ORCA, while the default thresholds of 10 −6 E h in the SCF energy and 10 −7 for the RMS change in the density matrix were employed in MRCC; in practice, the energies are converged 2−3 decimal places better than the energy threshold because of the stringent density criterion.
Three different families of localized orbital approaches were considered: (a) MOLPRO was used for the PNO-LCCSD(T) localized orbital coupled cluster calculations, 20,44,57,58 both using built-in "default" and "tight" cutoff parameter ensembles (see Table 1 in ref 20 for details). (b) For LNO-CCSD(T), 41,42,59 the implementation in MRCC 2020 55 was employed using the Normal, Tight, and vTight cutoff parameter ensembles, as detailed in Table 1 of ref 59. In addition, we considered "Tight+", which combines the vTight value of 10 −6 E h for wpairtol with Tight values for the other parameters: in a recent study on polypyrrole isomerization, 38 we found this to perform almost as well as vTight. (c) DLPNO-CCSD(T) 60 and the version with improved iterative triples, DLPNO-CCSD(T 1 ), 36 were calculated using ORCA 61  Unless stated otherwise, ORCA "chemical cores" were frozen throughout, i.e., valence electrons only were correlated except for the (n − 1)sp electrons on the metal. There are 10 inner shell electrons for the 3d elements, and the associated ECPs were employed for the 4d and 5d elements as mentioned above (see Table S3 in the SI for the specific numbers of frozen core electrons for each species in this study).
Complete basis set extrapolation (CBS) parameters for the Weigend−Ahlrichs basis sets were taken from Table 3 in Neese  and Valeev. 65 For CCSD(T), this is the well-known partialwave-like formula 66 A number of diagnostics for nondynamical correlation have been considered. One group relates to the single excitation amplitudes. First of all, there is the well-known T 1 diagnostic of Lee and Taylor, 78 which is defined as the Euclidean vector norm (2-norm) of the CCSD single amplitudes vector, divided by the square root of the number of correlated electrons, as follows: (The denominator ensures that T 1 is size-intensive.) The D 1 diagnostic 79−81 instead is based on a matrix norm, namely, (max λ[t 1 ·t 1 T ]) 1/2 the square root of the largest eigenvalue of the outer product. It can easily be seen from an example like BN...noctadecane that the large T 1 diagnostic of singlet boron nitride will be "quenched" by the aliphatic chain, while D 1 will be that of the worst fragment. (The ratio D 1 /T 1 , or rather its deviation from 2 1/2 , has been proposed as a measure for homogeneity of the system. 81 ) Finally, max|t 1 | or the largest single excitation amplitude in absolute value can also be seen as |t 1 | ∞ the infinitynorm of t 1 .
From the double excitation amplitudes, the D 2 diagnostic 79 has been defined analogously to D 1 , and of course, we have The pitfalls of relying on just a single diagnostic have been discussed at length in ref 82. It is sufficient to say here, for example, that the F 2 molecule has deceptively low T 1 = 0.011 and max|T 1 | = 0.02 but large D 2 = 0.23 and max|T 2 | = 0.17 on account of a prominent double excitation.
−T e (SCF) is minus the first excitation energy from a STABLE calculation using Gaussian 56 16. This number will be positive for a closed-shell molecule that has an RHF-UHF instability (which is itself an indicator of type A static correlation 83 a.k.a. absolute near-degeneracy correlation, such as in a molecule stretched past its Coulson−Fischer point 84 a.k.a. the RHF-UHF bifurcation point). Alternatively, it is straightforward to perform the stability analysis in PSI4 or considering CIS or TD-HF that are more or less equivalent. For RHF-UHF instabilities, the same result can be obtained, slightly less conveniently, by separate RHF and UHF with a HOMO−LUMO mixed guess (or Fermi smearing guess 85,86 ).
In the "W4 theory" paper, 87 a number of energetics-based diagnostics were proposed. One was %TAE[(T)], the percentage of the total atomization energy accounted for by the connected triple excitations: it was found repeatedly 87−89 that this is a very good indicator for the importance of post-CCSD(T) contributions. The other was %TAE[SCF], the percentage of the atomization energy recovered at the Hartree− Fock level, which is about 70−80% for molecules dominated by dynamical correlation but drops as static correlation becomes more important and actually becomes negative (molecule metastable in the absence of correlation) for such cases as F 2 and O 3 . We could of course instead look at its converse, % TAE corr = 100% − %TAE[SCF], which will increase with increasing levels of static correlation.
A number of diagnostics are derived from natural orbital occupation numbers n i , such as the Truhlar M diagnostic, 90 which for closed-shell molecules reduces to simply (2 − n HOMO + n LUMO )/2, Matito's nondynamical correlation index 91,92 I ND , and the size-intensive variation that we proposed, 93  In terms of DFT-based diagnostics, we considered the fractional orbital density (FOD), obtained from the fractional orbital occupations generated by a Fermi smearing calculation at a high finite temperatures. 95 (They were evaluated using ORCA's built-in procedure with default settings.) Additionally, we considered the A diagnostic proposed by Fogueri et al., 82 which is effectively the slope of the DFT total atomization energy with respect to the percentage of Hartree− Fock exchange. We also evaluated the recently proposed % TAE[ΔX,TPSS] diagnostic, 96  Finally, we should mention diagnostics based on the coefficient of the HF determinant (in a method with regular normalization, like CASSCF or CI) or the sum of squares of all the excitation amplitudes (in intermediate normalization, like in coupled cluster theory). However, as pointed out by a reviewer to ref 38, these quantities are not size-intensive: as a remedy, we used here 2 ln(A)/N, which remains constant for an assembly of any number of identical systems at infinite separation.

■ RESULTS AND DISCUSSION
Our best estimated forward and reverse barrier heights were obtained by a three-layer extrapolation process: calculations on all systems other than the bimolecular reactions 17−20.
The final computed results are given in Table 1, compared with the previous computed reference values of Iron and Janes. The latter were obtained at what those authors term an approximation to W1 theory, namely, DLPNO- For the most part, differences between old and new reference values are fairly modest, with a 1.70 kcal/mol RMS (root mean square). However, there are two outliers: reactions 8 and, especially, 9, which are really two consecutive steps of the same reaction (Scheme 1).
For reaction 9, discrepancies between old and new values reach a startling 5.3 and 7.0 kcal/mol on the forward and reverse barriers, respectively; for reaction 8, the 2.7 kcal/mol value for the forward reaction is paralleled by 1.8 kcal/mol for reaction 13, and 1.7 kcal/mol for the reverse reaction comes quite close to the largest errors of reverse reactions 15, 16, and 32.
If we delete these two reactions, then the RMSD (root-meansquare difference) between the two sets of best estimates (Iron− Janes and present work) drops from 2.41 to 1.68 kcal/mol; if we only delete reaction 9, then the RMSD is 1.73 kcal/mol. For overall reaction energies ΔE r = ΔE # fwd − ΔE # rev , the errors in reaction 9 mutually amplify, leading to an alarming 12.3 kcal/ mol discrepancy between the two sets of numbers. (In reaction 8, this drops to just 1.3 kcal/mol owing to a felicitous mutual cancellation.) Why is such a thing happening? In a previous study by Sylvetsky et al. 38 on isomerism in polypyrroles, it was reported that the accuracy of the DLPNO-CCSD(T) and, to a lesser extent, DLPNO-CCSD(T 1 ) methods suffers for systems that have significant static correlationin that case, the Mobius rings and the transition states leading to them. (The Huckel and figure-eight isomers, in contrast, were much easier for all localized approaches.) Iron and Janes best est. in this work c  ] difference in a def2-TZVP basis set, but we will also make use of the less confusing T 1 −T 0 symbolism for that term; T 1 stands for an iterative treatment of the triples terms in DLPNO-CCSD(T 1 ), whereas T 0 refers to a semicanonical perturbative treatment of the same term in DLPNO-CCSD(T 0 ). b Modified values at revised geometries from Dohm et al. 45 These energies come from the PWPB95-D4/ def2-{T,Q}ZVPP level of theory from the modified MOBH35 article. c Best estimated energies calculated in this work (see the text for discussion).

Scheme 1. Consecutive Reactions 8 and 9 in the MOBH35 Database
Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article Table 2 presents a large number of diagnostics for static correlation for selected structures presently under study: the full set can be found in the SI.
For product 9, T 1 = 0.05, D 1 = 0.52, D 2 = 0.30, and max|t i A | = 0.19 all indicate severe static correlation. Moreover, these indicators monotonically increase from reactant 9 via TS 9 to product 9; hence, no error compensation can reasonably be hoped for. (We also note that the HOMO−LUMO gap steadily narrows from 10.6 via 9.3 to 7.4 eV.) In fact, in the earliest communication on DLPNO-CCSD(T 1 ) states, it was expected then that the T 0 approximation would fail in rare cases of small ΔΕ H-L gaps. 36 In contrast, these indicators, while elevated, are similar for reactant 8 and product 8 (≡reactant 9), which may explain the observed error compensation in the reaction energy. (However, ΔΕ H-L alone clearly does not tell the whole story, as evidenced by the even smaller ΔΕ H-L in reactant 10, where, however, T 1 , D 1 , and max|t i A | are much smaller than those for product 9. We note that in other situations where both the singles-and doubles-based diagnostics are elevated, like in reaction 16, the same problem is experienced in a milder form.) We also ought not to lose sight of the fact that the CCSD(T) approximation itself will become problematic for sufficiently strong static correlation, and hence, its accurate reproduction under such circumstances may be a somewhat misguided target.
A standard Tukey box plot 99 of errors in all MOBH35 forward and reverse barriers, as well as reaction energies, is given in Figure 1; see also Table S2 in the SI for the percentile summaries in a numerical form. As always, the box indicates the IQR or interquartile range, the distance between the 25th and 75th percentile, and the dividing line in the box the median of the distribution. The whiskers were defined in the most commonly used standard manner: from the box to the last data point that is still within 1.5 IQR from it. Points outside are indicated as outliers (if between 1.5 and 3 IQR from the box) and extreme outliers (if further than 3 IQR away). For a normal distribution, the whiskers would be symmetric and encompass ±2.698σ or 99.3% of all data points. Reactant species (R), products (P), and transition states (TS) in the MOBH35 database. Energy differences are all in a def2-SV(P) basis set. Heat mapping for diagnostics, within each column, is from green for the lowest to red for the highest, while for energy differences, it is from blue for the most negative via white for zero to red for the most positive. DLPNO-CCSD(T 1 ) is within TightPNO settings, LNO-CCSD(T) is based on standard tight thresholds, and PNO-LCCSD(T) is also on tight settings. The very large errors for reaction 9 in DLPNO-CCSD(T 1 ) are obvious and become even larger with the common DLPNO-CCSD(T 0 ) approximation. Yet, the boxes for both approaches are comparatively narrow: the distribution is strongly leptokurtic (long-tailed), and reactions 9, 8, and to a lesser extent 16 are "extreme outliers", outliers beyond 3 IQR.
Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article Our largest canonical calculations (reactions 8 and 9) took several weeks each on Intel Skylake and Cascade Lake machines with 36 and 40 cores, respectively, and 384 GB of RAM. Clearly, repeating this for def2-TZVPP is not going to be a realistic option with the available hardware. However, would canonical calculations with the even smaller def2-SV(P) basis have been equally informative? As can be seen in Figure S1 in the Supporting Information (SI), the box plot looks extremely  As we can see in Figure 1, this is not purely a problem with the connected triples: reaction 9 is an outlier for the DLPNO-CCSD vs CCSD difference as well. The boxes have similar widths between DLPNO-CCSD and DLPNO-CCSD(T 1 ), while that of DLPNO-CCSD(T 0 ) is about double the width. This implies that the more economical T 0 approximation comes at a significant price in accuracy even for "nonoutlier" systems. Ma and Werner's PNO-LCCSD with default cutoffs has a much wider box than DLPNO-CCSD with tight cutoffs, but interestingly, the outliers (reaction 9 and the reverse barrier of 8) are much less far off. The boxes of PNO-LCCSD and PNO-LCCSD(T) (the latter both with and without the T 0 approximation) have similar widthsjust the outliers more spread out with than without the triples. Still, PNO-LCCSD(T) with tight criteria does very well for systems other than the outliers.
LNO-CCSD even with "normal" cutoffs appears to be still more resilient to outliers than PNO-LCCSD "tight"; for LNO-CCSD(T), performance with normal criteria is about the same quality as PNO-LCCSD(T) "tight".
In ref 38, the best results were obtained for LNO-CCSD(T) with "tight" criteria, and in addition, wpairtol tightened further to 10 −6 (the value from "vTight")we denote this combination "tight+", which yielded almost the same accuracy as "vTight" but at considerably reduced cost. However, in that regard, the very extended π systems (polypyrroles) of ref 38 are very different from the systems at hand, where "tight+" and standard "tight" yield fundamentally the same error distribution for CCSD(T). (For CCSD, tight+ has a narrower IQR, 0.10 kcal/mol, than tight, 0.13 kcal/mol.) At the expense of increasing CPU time by a factor of about 3−4, "tight" yields a quite narrow error distribution (IQR = 0.15 for LNO-CCSD(T)); we still have two outliers for reaction 9, but they are now in the 1 kcal/mol range, while the whisker span is comparable to PNO-LCCSD(T) tight.
Would tightening LNO criteria further eventually lead to convergence to the canonical results? One could set all cutoffs to zero, but then, one would essentially have a clumsy way to do canonical CCSD(T). Instead, we used the "vTight" or "veryTight" cutoffs throughout, which increase our computational cost by about a further factor of three for the largest systems. However, the payoff was an IQR for LNO-CCSD vTight of just 0.05 kcal/mol (!) and for LNO-CCSD(T) vTight of just 0.07 kcal/mol. True, LNO-CCSD(T) vTight has an "extreme" outlier of 0.56 kcal/mol, but that is more of a testimony to the very small IQR than to any sort of breakdown of the approximation.
We conclude that of all the localized orbital-based approaches, LNO-CCSD(T) vTight is the most resilient to systems with severe static correlation and can be used as an alternative to canonical CCSD(T) if the latter is simply beyond available computational resources. This is especially true for larger basis sets where the canonical calculation would be completely intractable.
Would tightening wpairtol by itself to the "Tight" value, and otherwise using "Normal" cutoffs, be adequate? From Figure 1a, this seems to rein in some outliers in LNO-CCSD, but from Figure 1b, it is clear that this does not help for LNO-CCSD(T).
One of the original purposes of MOBH35 was to use the data to evaluate DFT functionals for organometallic and catalysis applications. Given the way that localized coupled cluster methods struggle to cope with the static correlation in reaction 9, one might legitimately wonder if canonical CCSD(T) is itself adequate for this reaction and whether it can still be considered a reasonable test for any DFT method.
We checked the effects of deleting reaction 9 on the performance statistics of various functionals in Iron & Janes. The comparison is given in the SI for a number of DSD3 and other common functionals and is based on the revised reference energies reported in this work; for some newer functionals not covered in Iron & Janes, we would like to refer the reader to Figure 3 in both refs 100 and 101. The bottom line is that while MAD drops slightly for many functionals, it does not affect any conclusions about their relative accuracy; for the best double hybrids, however, one may wish to exercise caution. (For reasons related to basis set superposition error rather than static correlation, one may wish to eliminate the bimolecular reactions 17−20, unless one is prepared to carry out at least a {T,Q} basis set extrapolation or apply the counterpoise method.) It may be argued that def2-SVP is too small as a basis set for the various methods to be at their best. Unfortunately, a comparison against canonical CCSD(T)/def2-TZVPP is simply not feasible for the systems of greatest interest, but an approximate set of CCSD(T)/def2-TZVPP reference values can be obtained by applying the additivity approximation and, similarly, This graph is presented in Figure 2a,b for CCSD(T) and CCSD, respectively, and the corresponding numerical values are shown in Table S1 in the SI. With some quantitative changes, the same basic conclusions apply, and the box plot widths and outliers are similar, though errors of PNO and DLPNO approaches appear to be narrowed.
This implies that our observations about the performance of the various localized CCSD and CCSD(T) approaches are fairly basis set-independent. This would actually make sense if the differences are largely driven by type A static correlation (as Hollett and Gill 83 denote absolute near-degeneracy correlation). However, as shown by Karton et al., 88 the higher one goes in the coupled cluster hierarchy beyond CCSD(T), the more rapidly their contributions converge with the basis set to the point that, for example, quintuple excitations are fully converged even with an unpolarized (!) double-zeta basis set (as we showed there, e.g., for ozone and C 2 ).
As even with normal cutoffs, LNO-CCSD(T) performs surprisingly well for the TZVPP basis set; we have elected to use LNO-CCSD(T) Normal /def2-{T,Q}ZVPP for the CBS extrapolation of the correlation energy. The HF component was extrapolated separately. (See the Computational Methods section for extrapolation details.) For the (T) component, on the other hand, we carried out LNO-CCSD(T) Tight+ /def2-{SVP,TZVPP} extrapolation.
As the third tier of our composite "best estimate", we use the difference between canonical CCSD(T)/def2-SVP and LNO-CCSD(T) Tight+ /def2-SVP.
A comparison between our best estimates thus obtained, and the older values of Iron and Janes, can be found in Table 1. Only for two reactions, 8 and 9, are there discrepancies significantly in excess of 2 kcal/mol; additional discrepancies at or near 2 kcal/ mol are found for reactions 11, 13, 16, 22, and 32. Let us now consider the diagnostics for these reactions in more detail.
Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article For reaction 8, we see D 1 to be similar between the reactant and the product but elevated for the transition state, and likewise for T 1 . The FOD(TPSS) is likewise elevated for the transition state. This is consistent with the observation that the forward and reverse barriers differ quite significantly both between the two sets of best estimates, and between DLPNO-CCSD(T 1 ) and canonical CCSD(T), but that for the reaction energy, the errors largely cancel.
Conversely, for reaction 9 (the reaction of which is the product of 8), we see a steady increase in T 1 , D 1 , and indeed in max|t i A | from the reactant via the transition state to the product. Moreover, the reciprocal HOMO−LUMO gap increases in tandem; all of this reflects steadily increasing type A static correlation. Not only are there large discrepancies between the two sets of best estimates for both barrier heights, but the errors actually amplify each other in the reaction energy.
Reactions 16, and to a lesser extent 15, have milder versions of the same scenario but in reverse: the T 1 , D 1 , and max|t i A | diagnostics all decrease from the reactant to the transition state to the product, in tandem with the reciprocal HOMO−LUMO gap. Here too, discrepancies for forward and reverse barriers amplify each other in the reaction energy.
Furthermore, while reaction 32 has less multireference character, we see the same progression as in reactions 15 and 16 and hence similar observations about discrepancies.
At the request of a reviewer, an examination of the percentage of correlation energy recovered was carried out, and for DLPNO-CCSD(T 1 ), the following additional accuracy settings aside from NormalPNO and TightPNO were considered: VeryTightPNO as detailed in the Computational Methods section, TightPNO with TCutPNO = 10 −6 E h , and TightPNO with 10 −8 E h (10× looser and tighter, respectively, than the default). The latter was also employed in the 2-point {6,7} and {7,8} TCutPNO extrapolation formulas for the DLPNO-CCSD(T 1 ) correlation energy proposed by Altun et al. 102 The percentages of correlation energy recovered (E corr ) are depicted in Figure 3, and the corresponding RMSD values are listed in Table S6.
First, the percentage of E corr recovered is clearly sensitive to the degree of static correlation in all cases, especially for DLPNO-CCSD(T 1 ), where even veryTightPNO is not immune to significant errors for reactions 8 and 9.
Second, TightPNO DLPNO-CCSD(T1) and Tight PNO-LCCSD(T) both have a median %E corr recovery of around 99.8%, although the former has both a wider "box" (larger interquartile range, IQR) and more outliers. With {6,7} extrapolation, which aside from TightPNO DLPNO-CCSD-(T 1 ) requires an additional calculation at roughly NormalPNO cost, the IQR is much reduced and the median lies near 100%, similar to the LNO-CCSD(T) calculations in both aspects. Significant "under-recovered" outliers remain, however, particularly for reactions 8 and 9. {7,8} Extrapolation, which requires calculations at TightPNO and VeryTightPNO cost, mitigates the outliers but does not improve the IQR. Even {6,7} extrapolation appears to be superior to VeryTightPNO on its own for this application.
For the LNO-CCSD(T) series, even Normal has a remarkably narrow spread, aside from the outliers for reactions 8 and 9; the 1.5 IQR "whiskers" span a range comparable to {6,7} DLPNO extrapolation. LNO-CCSD(T) Tight has whiskers of similar width to Normal but no more outliers outside them. It is not clear that further gains from vTight are commensurate with the great additional computational cost for the present application. (This latter situation is apparently quite different for accurate benchmarks of noncovalent interactions. 103

)
A Remark on Static Correlation Diagnostics. In order to get our bearings concerning static correlation diagnostics, we obtained them for a set of small closed-shell molecules, with all systems under investigation here being closed-shell as well and open-shell versions of some diagnostics not being uniquely defined. Lee 81  We took the subset of 160 closed-shell species in the W4−17 dataset as our reference sample. Many of the diagnostics were extracted from the SI of ref 104; others were newly obtained for this work.
Subsequently, principal component analysis (PCA) on the correlation matrix and variable clustering analysis were carried out in JMP Pro 16. The first four principal components are given in Table 3; variables are grouped by the four clusters from the cluster analysis.
The separation between the two clusters designated as 2a and 2b is somewhat weak, but clusters 1 and 3 behave quite distinctly from each other and from 2a/2b. In a PCA on the correlation matrix, the eigenvalues should add up to the total number of variables; we can thus see that just the first two PCs contain the same information as 14.45 variables out of 18, and the first four the same as 16.61 variables. In PC1 all four clusters move in parallel, while in PC2 clusters 1 and 3, they move opposite to clusters 2a and 2b.
Diagnostics based on fractions of the total atomization energy are really intended for small molecules and become a bit unwieldy for, say, a molecule with 67 atoms (like in reactions 8 and 9). Moreover, post-CCSD(T) corrections are quite simply unrealistic here. The results of a PCA (and cluster analysis) on such diagnostics as we were able to evaluate for MOBH35 are given in Table 4. (The correlation matrix can be found in Table  S7 in the SI.) There are some differences with the behavior of the diagnostics for W4−17, but a few things stand out: notably, both the W4−17 and MOBH35 diagnostic collections have a clear "principal component of strong correlation", and the difference between DLPNO-CCSD(T 1 ) and DLPNO-CCSD-(T 0 ) is closely tied to the degree of static correlation. Furthermore, as shown in Figure 4, that same difference appears to be correlated with the error relative to canonical CCSD(T).
In the above PCA, we identified the intercorrelations among diagnostics and grouped them into clusters of variables, but could we select a subset of k "principal variables" that would span a space similar to the first k principal components? This statistical problem is addressed by the "subselect" module developed by Cadima et al. 105 for the R statistical environment 106 (version 4.1.2). Using the "eleaps" algorithm within that module with the GCD (generalized coefficient of determination) as the objective function, we obtained the following subsets of increasing size: While we truncated the PCA after 4 PCs, we will list k = 5 for the sake of completeness: Let us now inspect the variables of the generated subsets and consider the clusters from the variable clustering analysis of the W4−17 diagnostics (Table 3). D 2 belongs to the cluster 2b subblock for a single variable, while for two variables, T 1 and FOD(PBE0) belong to clusters 1 and 2b, respectively. For k = 3, I ND /I tot and A 100 [TPSS] are representatives of clusters 2a and 3, respectively, while T 1 and D 1 both belong to cluster 1. For k = 4, we have T 1 , I ND /I tot , FOD(PBE0), and A 100 [TPSS] as one representative each of all four clusters: in that order, 1, 2a, 2b, and 3, respectively. Intriguingly, in the second solution, the reciprocal HOMO−LUMO gap (ε LUMO − ε HOMO ) −1 belongs to cluster 2a.
Timing Comparisons. The calculations being reported above are run on a highly heterogeneous cluster, and de facto clock speeds vary because of differing workloads on the nodes and server cooling fluctuations. Therefore, we carried out separate timing calculations under controlled conditions. Table 5 presents timings for a single system (the product of reaction 16), all run on the same type of 16-core Intel Haswell servers with 256 GB RAM and 3.5 TB SSD scratch disk arrays. These servers were otherwise idle during the timing measurements. (The total memory and hard disk requirements (both in GB) can be found in Tables S4 and S5 in the SI, respectively.) For such a small system, of course, canonical calculations are clearly faster except for the largest basis sets; needless to say, of course, for larger systems, canonical CCSD(T) CPU times will    Every notch in accuracy that LNO-CCSD(T) is turned up, from Normal to Tight to vTight, roughly triples CPU times. The same applies to both DLPNO-CCSD(T) and DLPNO-CCSD(T 1 ). For PNO-LCCSD(T), the timings indicate that having tight thresholds takes ∼1.5 times as long compared to default ones.
We note that for smaller basis sets, LNO Normal, which yields 99.98% of the CCSD(T) correlation energy, comes at a computational cost comparable to DLPNO-CCSD(T 1 ) Tight-PNO with TcutPNO = 10 −6 E h ; the latter recovers 99% of E corr . However, for larger basis sets, LNO becomes more economical.
Adequacy of def2-nZVP Basis Sets for the Treatment of (n − 1)sp Correlation. As the present paper was finalized for publication, a reviewer of ref 107 raised an issue that might potentially have some bearing on the present work.
Given the very small gaps between subvalence (n − 1)sp and valence d orbitals in many of these transition metals, Bistoni et al. 108 have argued for the inclusion of these "subvalence" orbitals in "chemical cores" even with the Weigend−Ahlrichs basis sets, and this is indeed the default in the ORCA program system from that group.
As stated above, this practice was adopted also in the present work. While this issue is orthogonal to the relative performance of various localized orbital methods, technically, the def2 basis sets do not include core−valence correlation functions for the transition metals (they do for heavy alkaline and alkaline-earth metals). This raises the question whether this might introduce a significant error in our reference data and whether it would not have been preferable to employ Dunning−Peterson cc-pwCVnZ-PP basis sets instead. 109−113 A direct comparison between def2-nZVP and cc-pwCVnZ-PP basis set extrapolation would introduce another, less obvious, source of discrepancy: the cc-pwCVnZ-PP basis sets for certain elements use newer-generation relativistic pseudopotentials, which we found can cause discrepancies reaching 1 kcal/mol even at the Hartree−Fock level. However, the differential (n − 1)sp contributions should be directly comparable between the two basis set families. A comparison between def2-QZVPP and cc-pwCVQZ-PP can be found in Table 6, while a similar comparison between the smaller def2-TZVPP and cc-pwCVTZ-PP is shown in Table S8, and for {T,Q} extrapolations, the differences are listed in Table S9 of the SI.
The mean absolute differences between the two sets are 0.29 and 0.27 kcal/mol, respectively, for forward and reverse barriers; the RMS differences are 0.41 and 0.35 kcal/mol. (For {T,Q} extrapolation, these statistics slightly increase to MAD of 0.36 and 0.32 kcal/mol and RMSD of 0.53 and 0.43 kcal/mol, respectively.) The largest single discrepancy with the QZ basis sets is found for reaction 10, which involves Na in addition to the early second-row transition metal niobium: this reaction is somewhat anomalous, as the (2s, 2p) subvalence orbitals of Na and the (4s, 4p) subvalence orbitals of Nb are nearly degenerate with each other, and the valence−"subvalence" orbital energy gap is a measly 0.2 hartree (!). These numbers are in fact smaller than what was found by Efremenko and Martin 107 for Ru ligand coordination reactions but similar to their statistics for ligand replacement. The broader question of the proper treatment of subvalence correlation in transition metals will be investigated in greater detail, and for broader datasets, in a forthcoming paper.
It is sufficient to say for now that our conclusions on the performance of localized coupled cluster methods are unlikely to be affected by the impact of using def2 basis sets on the    Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article description of metal (n − 1)sp correlation, as it impacts barrier heights and reaction energies.

■ CONCLUSIONS
We have revisited the MOBH35 (Metal−Organic Barrier Heights) benchmark for realistic organometallic catalytic reactions, using both canonical CCSD(T) and localized orbital approximations to it. For low levels of static correlation, all of DLPNO-CCSD(T), PNO-LCCSD(T), and LNO-CCSD(T) perform well; for moderately strong levels of static correlation, DLPNO-CCSD(T) and (T 1 ) may break down catastrophically, and PNO-LCCSD(T) is vulnerable as well. In contrast, LNO-CCSD(T) converges smoothly to the canonical CCSD(T) answer with increasingly tight convergence settings (Normal, Tight, and vTight). Admittedly, in the kind of case where DLPNO breaks down, the CCSD(T) approximation itself may be questionable, although for this size of a system, we have no computationally tractable way of estimating the impact of connected quadruple excitations. The only two reactions for which our revised MOBH35 reference values differ substantially from the original ones are reaction 9 and to a lesser extent 8, both involving iron. For the purpose of evaluating DFT methods for MOBH35, it would be best to remove reaction 9 entirely as its severe level of static correlation is just too demanding for a test of any DFT or low-cost wavefunction method. Also, reaction 8 is another electronically challenging case, and we thus advocate its removal as a means of having reference energetics with weak multireference character. The magnitude of the difference between DLPNO-CCSD(T) and DLPNO-CCSD(T 1 ) is a reasonably good predictor for errors in DLPNO-CCSD(T 1 ) compared to canonical CCSD-(T); otherwise, monitoring all of T 1 , D 1 , max|t i A |, and 1/(ε LUMO − ε HOMO ) should provide adequate warning for potential problems. Our conclusions are not specific to the def2-SVP basis set but are broadly conserved for the larger def2-TZVPP, as they are for the smaller def2-SV(P): the latter may be an economical choice for calibrating against canonical CCSD(T). The diagnostics for static correlation are statistically clustered into groups corresponding to (1) importance of single excitations in the wavefunction; (2a) the small band gap, weakly separated from (2b) correlation entropy; and (3) thermochemical importance of correlation energy, as well as the slope of the DFT reaction energy with respect to the percentage of HF exchange. As a final remark, a variable reduction analysis using the subselect algorithm reveals that the variables that contain much information on the multireference character in MOBH35 are T 1 , Matito's ratio I ND /I tot , and the exchange-based diagnostic A 100 [TPSS].
On a broader note, given a choice between a more rigorous coupled cluster method with a small basis set and a cruder coupled cluster approximation with a more extended basis set, our study indicates that, at least for barrier heights, the former may be the lesser of the two evils. However, both approaches may be combined in a composite scheme along the lines familiar from main-group thermochemistry and noncovalent interactions work.
Box-and-whisker plot for the energy deviations of local CC methods ( Figure S1) and MAD statistics for xDSD and ωDSD functionals ( Figure S2); data for box plots with quartiles and medians (Tables S1 and S2), frozen core electrons per molecule in MOBH35 (Table S3), total memory requirements (Table S4) and disk usage requirements (Table S5) for local CC single-point calculations, correlation energy recovered from local coupled cluster methods and RMSD values (Table S6), correlation matrix of the MR character diagnostics (Table  S7), and effects of core−valence correlation on forward and reverse barriers (Tables S8 and S9