Canonical and DLPNO-Based G4(MP2)XK-Inspired Composite Wave Function Methods Parametrized against Large and Chemically Diverse Training Sets: Are They More Accurate and/or Robust than Double-Hybrid DFT?

The large and chemically diverse GMTKN55 benchmark was used as a training set for parametrizing composite wave function thermochemistry protocols akin to G4(MP2)XK theory (Chan, B.; Karton, A.; Raghavachari, K. J. Chem. Theory Comput. 2019, 15, 4478–4484). On account of their availability for elements H through Rn, Karlsruhe def2 basis sets were employed. Even after reparametrization, the GMTKN55 WTMAD2 (weighted mean absolute deviation, type 2) for G4(MP2)-XK is actually inferior to that of the best rung-4 DFT functional, ωB97M-V. By increasing the basis set for the MP2 part to def2-QZVPPD, we were able to substantially improve performance at modest cost (if an RI-MP2 approximation is made), with WTMAD2 for this G4(MP2)-XK-D method now comparable to the better rung-5 functionals (albeit at greater cost). A three-tier approach with a scaled MP3/def2-TZVPP intermediate step, however, leads to a G4(MP3)-D method that is markedly superior to even the best double hybrids ωB97M(2) and revDSD-PBEP86-D4. Evaluating the CCSD(T) component with a triple-ζ, rather than split-valence, basis set yields only a modest further improvement that is incommensurate with the drastic increase in computational cost. G4(MP3)-D and G4(MP2)-XK-D have about 40% better WTMAD2, at similar or lower computational cost, than their counterparts G4 and G4(MP2), respectively: detailed comparison reveals that the difference lies in larger molecules due to basis set incompleteness error. An E2/{T,Q} extrapolation and a CCSD(T)/def2-TZVP step provided the G4-T method of high accuracy and with just three fitted parameters. Using KS orbitals in MP2 leads to the G4(MP3|KS)-D method, which entirely eliminates the CCSD(T) step and has no steps costlier than scaled MP3; this shows a path forward to further improvements in double-hybrid density functional methods. None of our final selections require an empirical HLC correction; this cuts the number of empirical parameters in half and avoids discontinuities on potential energy surfaces. G4-T-DLPNO, a variant in which post-MP2 corrections are evaluated at the DLPNO-CCSD(T) level, achieves nearly the accuracy of G4-T but is applicable to much larger systems.


I. INTRODUCTION
Among applied computational chemists, density functional theory (DFT) is presently the most widely used electronic structure approach, followed by wave function ab initio theory (WFT). WFT has a precisely defined Hamiltonian and a clear "road map" for systematic refinement, but in unmodified form hits an "exponential scaling wall" that limits application to small molecules. DFT "tunnels" through the scaling wall by reducing the many-electron Schrodinger equation to a set of coupled one-particle equations for an approximate exchangecorrelation functional. 1 The accuracy of DFT stands or falls with the functional.
One well-established approach for reducing the computational cost of WFT methods has been the introduction of composite WFT (cWFT) protocols such as the following: Gaussian-n theory (Gn) 2−8 by the Pople group (see ref 9 for a review); The CBS-QB3 10,11 and related methods 12 by Petersson and co-workers; Multicoefficient correlation methods of Zhao and coworkers; 13−15 In a higher accuracy regime, the ccCA approach 16−18 of Wilson and co-workers; In still higher accuracy regimes, the Weizmann-n approaches 19−24 of the present group and lower-cost modifications thereof (e.g., refs 25−27); the HEAT-n approaches of an international team around Stanton; 28 Especially the first two types of methods, being built into the popular Gaussian program system, 35 are applicable in a fairly black-box fashion, and hence are in fairly widespread use by nonspecialists. For recent reviews of cWFT methods and their performance for the thermochemistry and thermochemical kinetics of organic molecules, see Karton 36 and Chan; 37 for a recent general review of both experimental and computational thermochemistry, see Ruscic and Bross. 38 One feature that cWFTs all share is the additivity approximation of the following form: with nearly 2500 main-group molecules, we found that the best DHDFT functionals, ωB97M(2) by Mardirossian and Head-Gordon 49 and revDSD-PBEP86-D4 by our group, 45 have WTMAD2 (weighted mean absolute deviation) statistics around 2.2 kcal/mol, competitive with or superior to the cWFT methods we tested. Needless to say, double hybrids are computationally much more economical, especially if RI (resolution of the identity) approximations 50−52 are applied. An additional impetus for the present work was the recent publication of the G4(MP2)-XK method, 53 which employs Weigend−Ahlrichs 54 def2 basis sets rather than Pople basis sets, and is thus applicable to the first five rows of the Periodic Table (H−Rn). Somewhat to our surprise, its WTMAD2 proved inferior to the best DHDFT functionals.
Deeper inspection of performance for the subsets revealed that, while the cWFT methods yielded better performance for heats of formation of small molecules (i.e., the W4-11 benchmark 55 ), this was outweighed by degraded performance for larger-molecule subsets (see below and Table 1). As essentially all cDFT methods are parametrized against smallmolecule training sets, this prompts the question whether this result was not an artifact of parametrization biasif overall performance hadn't been "sacrificed on the altar of smallmolecule thermochemistry", so to speak.
The purpose of the present paper is twofold. The first aim is to investigate whether superior, or simply more transferable, cWFT methods can be obtained by employing a large and diverse training set like GMTKN55. We shall show below that this is the case and shall present three options of ascending accuracy and computational cost.
The second objective is to see if such methods still have an edge in accuracy over the best DHDFT methods. We shall show that the answer is "yes, but not as big as you might expect".

II. COMPUTATIONAL DETAILS
All calculations were carried out on the ChemFarm HPC cluster of the Faculty of Chemistry at the Weizmann Institute. CCSD(T) calculations, 56,57 as well as standard G4, 4 G4-(MP2), 5 and CBS-QB3 10,11 composite methods calculations, were performed using Gaussian 16, revision C.01; 35 mediumbasis set MP3 and large-basis set MP2 (the latter using the RI-MP2 approximation 50,52 ) calculations were performed using Q-CHEM 5.2; 58 the DLPNO-MP2 59 and DLPNO-CCSD-(T) 60 calculations were carried out using ORCA 4.2.1. 61 The conventional calculations benefited from 4TB SSD (solid state disk) arrays on our dedicated nodes, although some of the largest MP3 calculations exceeded that limit; for them we resorted to a 40TB shared-over-InfiniBand storage server custom-developed for us by Access Technologies of Ness Ziona, Israel.
For the original G4(MP2)-XK approach, 53 we employed the "minimally augmented" Karlsruhe basis sets ma-TZVP, ma-QZVP, ma-TZVXP (the prefix indicates addition of a single shell of diffuse valence exponents obtained by dividing the smallest s and p exponents by a factor of 3). In addition, the CCSD(T) step was carried out in a def2-SVSP basis set, which corresponds to def2-SVP with the polarization functions on hydrogen removed. Calculations were performed and results processed with the script provided in the Supporting Information to ref 53. Where not already available internally in the respective codes, basis sets were retrieved from the Basis Set Exchange. 63 Auxiliary basis sets for RI-MP2 and DF-HF were used as stored in the Q-CHEM internal library; see refs 52, 64, 65, and 66 for the original references.
As in the original G4(MP2)-XK approach and the standard G4 and CBS-QB3 methods, all open-shell species were treated using UHF (unrestricted HF) orbitals. We mention in passing the existence of ROHF-based variants of G4(MP2) that are more suitable to radical research. 67 For the alkaline and alkaline earth metals heavier than Ne, we correlated the (n −1)sp subvalence electrons, as it is wellknown that these orbitals intrude into the valence band, especially of {O, F, Cl} compounds. 45 For the heavy p-block elements, we additionally correlated the (n − 1)d subvalence electrons for similar reasons. As standard in def2 basis sets, small-core energy-consistent relativistic pseudopotentials 68 were used for elements heavier than Kr.
Dispersion corrections were evaluated using the Grimme et al. D3 model 69 with the Becke−Johnson damping function. 70 This combination is conventionally denoted D3(BJ). Based on our experience with double-hybrid functional parametrization, 45 the damping function's shape parameters were held fixed at a 1 = s 8 = 0, a 2 = 5.5, leaving the R −6 overall scaling parameter s 6 as the only adjustable parameter.
Our training set was the GMTKN55 benchmark; 48 as in our previous study on the revDSD functionals, 45 reference energetics, geometries, and charge/multiplicity information were obtained via the ACCDB database of Morgante and Peverati 76 and used as-is. (For the G4, G4(MP2), and CBS-QB3 implementations in Gaussian, "NoOpt" was specified to suppress the geometry optimization at these composite methods' standard levels of theory.) The calculations for the subsets C60ISO (isomerization energies of C 60 molecules) 77 and UPU23 (relative energies of uracil dinucleotides) 78 were currently not within reach for MP3 and canonical CCSD(T) methods. While the reported WTMAD2 values of DH-DFT and MP2-based WFT methods, i.e., SCS-MP2, are based on analysis of all subsets, excluding C60ISO and UPU23 does not appreciably affect WTMAD2 owing to the small (UPU23) and very small (C60ISO) weights these two subsets have in eq 1. For instance, C60ISO and UPU23 contribute only 0.3% and 2.5%, respectively, to the WTMAD2 for the ωB97M(2) double hybrid. Hence, despite excluding these two subsets, our WTMAD2 results are de facto equivalent to all-GMTKN55 WTMAD2 values, and we shall hence refer to them as GMTKN55 throughout. Statistical analysis was automated using a Fortran code developed in-house and available upon request. (We note that, as the reference data are complete basis set limit CCSD(T) calculations in the hypothetical motionless state, zero-point and thermal corrections need not be evaluated.) The main performance metric for the GMTKN55 database is the WTMAD2, 48 which takes into account the different sizes and energy ranges of each subset; it is obtained according to the following equation: where ΔE̅ corresponds to the mean signed deviation from the reference reaction energies for the ith subset, N i is the number of systems in the subset, and the MAD i is the mean absolute deviation between calculated and reference energies for the ith subset.
At one extreme, interaction energies |ΔE| for the RG18 subset 48 of rare-gas complexes range from 0.1 to 1.5 kcal/mol. At the other extreme, decomposition reaction energies |ΔE| for the MB16-43 subset 48 of artificial molecules run the gamut from −363 to 1290 kcal/mol. Thus, while a deviation of a few kcal/mol for an MB16-43 reaction would not materially affect the overall statistical error, a deviation of 1 kcal/mol at the RG18 subset would make it "shoot" up. Through trial and error, the application of numerous statistical schemes across the GMTKN24, 79 GMTKN30, 80 and GMTKN55 48 databases demonstrated in these papers that the balanced metric should encompass the number of reactions per subset Ν i , the mean absolute deviation MAD i between the calculated and reference relative energies, and the total mean absolute deviation per subset |ΔE̅ | i . That metric was dubbed the weighted mean absolute deviation, type 2, conventionally abbreviated WTMAD2. 48 In prior papers 45,46,48 (as in the present work), MAD was favored over the RMSD (root-mean-square deviation) as MAD is a more "robust" measure 81 in the statistical sense of being less sensitive to one or a few outlier data points. For a pure normal distribution without systematic error, it can easily be shown 82 that RMSD MAD = 2 π = 1.25331 ... ≈ 5 4 , which may be helpful to readers more attuned to RMSD error measures. The MAD, RMSD, and the MAD/RMSD ratio values for each subset are included in the Supporting Information for the final selected methods.
The optimization of the parameters for each method was accomplished by the late Michael Powell's BOBYQA 83 (Bound Optimization BY Quadratic Approximation) derivative-free constrained optimizer. Different initial guesses were provided to ensure a global minimum was obtained for each optimization.
The reference energies of the GMTKN55 database are obtained from high-level ab initio methods including CCSD-(T)/CBS, CCSD(T)-F12/CBS, or Weizmann-n theories such as W1-F12 or W2-F12. 84 These energies are nonrelativistic and ZPVE exclusive where all electrons were correlated for most calculations. For example, the references energies for the subset of rare-gas complexes RG18 were calculated at CCSD(T)/CBS(aug-cc-pVTZ/aug-cc-pVQZ), while for the subset of protonated isomers "PAREL", they were calculated at CCSD(T)/CBS(def2-TZVPP/def2-QZVPP). 48 The subsets of GMTKN55 are described in ref 48 as well as in the Supporting Information of the present paper.

III. DESCRIPTION AND NOMENCLATURE OF THE
THEORETICAL METHODS III.A. Description of the Theoretical Methods. The original G4(MP2)-XK method has the following energy expression: where each energy term is given by the following equations. The Hartree−Fock energy extrapolated to the complete basis set limit may be written as The The "high-level correction" HLC, like in G4 and G4(MP2) theory, is defined as The variables n α and n β (by convention n α ≥ n β ) correspond to the number of α and β valence electrons, respectively, according to the conventional largest noble-gas-core definition.
The parameters A, A′, B, C, D, and E were established through parameter optimization based on the chosen training set of reference energies. The HLC was originally incorporated in the Gaussian-n theories and the recent G4(MP2)-XK method to account for residual basis set incompleteness error; its oldest incarnation was based on correcting the total energies for the hydrogen atom and the hydrogen molecule. 7 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article The inexpensive two-tier methods considered here share the G4(MP2)-XK energy expression, albeit with different parameters.
In all of the above, varying the HF extrapolation parameter (originally from ref 4) had only insignificant effect, and we have hence held it fixed. Detailed equations and parameters for all individual composite methods considered in this work are given in the Supporting Information. A minor but significant practical detail about the HLC needs to be clarified here. When introducing separate coefficients for molecules and separate atoms in the HLC, the authors of the original G4 papers probably never envisaged an application to noble gas complexes: hence, the unmodified HLC yields an abnormally large relative error in the RG18 subset, which through RG18's large weight in WTMAD2 has an effect of several kcal/mol there. The problem can be solved by the simple expedient of treating the closed-shell rare gas atoms as if they were molecules: we have done so throughout the present paper. The original G4(MP2)-XK yields a discouraging WTMAD2 of 6.31 kcal/mol; a closer inspection revealed a contribution of 1.58 kcal/mol to the WTMAD2 from the RG18 subset. After treating the "rare gas atoms" as molecules, the RG18 contribution to WTMAD2 plummets to 0.17 kcal/ mol, bringing the total down to 3.71 kcal/mol.
A few examples will further clarify the nomenclature. When we use the formalism of the original G4(MP2)-XK, the methods' names retain the "XK" label, and suffixes contain information about the HLC and the basis set used for the CCSD(T) step. G4(MP2)-XK-T-H6-v1 has an energy expression similar to the original G4(MP2)-XK; the letter "T" stands for the CCSD(T)/def2-TZVP step and "H6" for the six-parameter HLC, while "v1" denotes the variant with dispersion correction included. One primary goal was to limit the empirical parameters of the composite methods, and the "XK" label is omitted for the minimally empirical methods where we did not scale the E2 correlation energy of the smaller basis set compared to the original G4(MP2)-XK. G4-T-v2 is based on a CCSD(T)/def2-TZVP step, without HLC, and the suffix "v2" stands for the second variant of the G4-T. As we proceed toward the final choice of the minimally empirical methods, we will drop the variant label "vn" for the best overall methods.

IV. RESULTS AND DISCUSSION
Full statistics and parameters for all the different empirical methods considered are given in the Supporting Information. The component breakdown of WTMAD2 for selected methods is presented in Table 1. A selection of the most relevant data is summarized by the rows in Table 2, and the final selected composite methods are briefly presented in Table  3. Figure 2 depicts WTMAD2 values for several DFT functionals and cWFT methods taken from refs 45 and 46 as well as for the cWFT methods developed in the present work.
The best approximation has 13 empirical parameters. Can we reduce this number? Setting A = A′ comes at a minimal cost of just 0.02 kcal/mol. Instead eliminating the dispersion correction while retaining the HLC has a slightly greater cost, 0.07 kcal/mol, but then also setting A = A′ (leaving 11 parameters) barely affects statistics. (Conveniently, setting A = A′ eliminates a problem for radical reactions first identified by Chan, Coote, and Radom. 95 ) However, deleting the HLC entirely and leaving just seven parameters yields a result of the same quality, 1.42 kcal/mol, which increases marginally to 1.43 kcal/mol if we remove the dispersion correction. Inspection of the WTMAD2 component breakdown reveals that dropping the HLC slightly degrades the thermochemistry and largemolecule reaction components but (as expected) leaves the intermolecular and conformer sets unaffected and actually slightly improves the barrier heights.
Extrapolating E2/{T,Q} instead [G4-T-v1 and G4-T-v2] gives WTMAD2 = 1.48 and 1.51 kcal/mol, respectively, with and without the dispersion correction: the latter especially could be regarded as a def2-based version of ccCA, albeit with the addition of three empirical parameters.
These latter statistics can be marginally lowered (about 0.02−0.03 kcal/mol) through separate extrapolation of E 2αβ and E 2ss ≡ Ε 2αα + Ε 2ββ , at the expense of introducing one additional parameter.
IV.B. Using Three Basis Set Tiers. Obviously, CCSD(T)/ def2-TZVP is computationally quite costly for larger systems, and in fact proved out of reach for the very largest species in GMTKN55. Hence, we considered a three-tier approach akin to, but different from, G4 theory. The (scaled, see below) MP3 component was evaluated using the def2-TZVPP basis set, while for the CCSD(T)-MP3 difference, we fell back to the smallest basis set used in G4(MP2)-XK, namely, def2-SVSP, which is just def2-SVP but omitting the p polarization functions on hydrogen. For MP2, we used the same basis sets as in the previous paragraph.
Our best result here, WTMAD2 = 1.63 kcal/mol (G4-(MP3)-D-H6-v1), at first sight, represents a degradation of 0.3 kcal/mol over our best result in (III.A)however, the comparison is not entirely fair as the 1.63 result is weighted over almost 200 additional (larger-molecule) reactions. If we re-evaluate for the same 1257 systems as above, then we find Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article WTMAD2 = 1.55, which represents about a 0.2 kcal/mol degradation for a vast reduction in CPU time. By way of comparison, standard G4 theorywhich employs a similar three-tier basis set schemeachieves 2.52 kcal/mol on 1273 reactions. (It should be noted that this required tweaking the HLC for the rare gas atoms, as detailed at the end of the Methods section.) We note that the 2.52 kcal/mol number is in fact inferior to several of the best double-hybrid density functionals, notably 46 ωB97M(2) of the Head− Gordon group (2.18 kcal/mol) and revDSD-PBEP86-D4 (2.33 kcal/mol) and xrevDSD-PBEP86-D4 (2.23 kcal/mol) of the Weizmann group. In contrast, our best approach here has comfortably lower WTMAD2 than all of these.
CBS-QB3, which is likewise a three-tiered method, but with MP4(SDQ) as the middle level, reaches WTMAD2 = 3.10 kcal/mol. This figure would be considerably lower were it not for the poor performance of the intermolecular interactions subsets (which at 1.55 kcal/mol accounts for half the total WTMAD2); detailed inspection reveals this to be due to basis set superposition error. Numerous DH-DFT methods are in the 2.3−2.7 kcal/mol WTMAD2 range, and they show significant overall improvement over CBS-QB3; i.e., ωB97M(2) and xrevDSD-PBEP86-D4 are at 0.49 and 0.47 kcal/mol for the intermolecular interactions, respectively. At this point, we can correlate our CBS-QB3 findings with the conclusions of the detailed theoretical study of Karton and Goerigk on pericyclic reactions, 96 where dispersion was shown to play an important role (stabilizing the transition states). These authors compared the performance of WFT, cWFT, and DH-DFT methods with W1-F12 reference energies and found CBS-QB3 to be inferior (RMSD = 2.6 kcal/mol) to the DH-DFT methods DSD-BLYP-D3, B2GP-PLYP-D3, and B2K-PLYP-D3 (RMSD = 1.4 kcal/mol for all three).
Our best value requires 12 empirical parameters. Can we winnow this down at an acceptable cost in accuracy? Setting A = A′ costs a negligible 0.01 kcal/mol; excising the dispersion correction costs a still smallish 0.04 kcal/molbut eliminating the HLC, thus cutting our number of parameters in half, actually achieves a slightly lower WTMAD2 at 1.65 kcal/mol. (This has the additional benefit of eliminating discontinuities in HLC as a bond is broken.) Once again, that small cost is confined to thermochemistry and large-molecule reactions.
Further reduction from six to five parameters can be achieved by eliminating the dispersion term, but now this comes at a cost of 0.11 kcal/mol. (Note the negative coefficient of −0.19 for the dispersion term, which appears to indicate it compensates for overbinding due to BSSE in the noncovalent interaction subsets. For detailed discussions of that issue, see refs 97 and 98.) Using E2/{T,Q} extrapolation instead of scaling does come at a small cost in accuracy; however, the dispersion correction can then be eliminated at the expense of just 0.02 kcal/mol, leaving us with 1.84 kcal/mol (G4(scsMP2.X)-D-v8) for four parameters.
One additional refinement, at zero computational cost, we can consider would be to carry out separate extrapolations for E 2αβ and E 2ss . Thus, introducing an additional parameter turns out to yield only negligible benefits across the board, and we have hence abandoned this.
The conventional MP3/def2-TZVPP turned out to be something of a bottleneck for the largest systems. This could obviously be mitigated with an RI-MP3 algorithm (e.g., such as reported in refs 99 and 100), but none was available to us. This prompted the question at what price the middle tier could be eliminated.
IV.C. Eliminating the Middle Tier. Our best result then (G4(MP2)-XK-D-H6-v1) is 2.52 kcal/mol, which represents a substantial degradation of over 1.63 kcal/mol above. A component breakdown (Table 1) reveals that the said degradation happens across the board for all subsets.
On the one hand, that implies the middle MP3-like tier (incidentally, inspired by the MP2.X method 101 ) is clearly beneficial; on the other hand, that means that without it we are, in fact, at a slight disadvantage compared to the best double hybrids.
The costs of setting A = A′ in the HLC, and of eliminating the HLC outright, follow the same trends as in the previous section. This leaves us with 2.56 kcal/mol for seven parameters; additionally, dropping the dispersion correction brings us up to 2.73 kcal/mol.
The G4(MP2)-XK paper applies separate scaling coefficients to the overall CCSD correlation energy and the same-spin and opposite-spin CCSD components. Setting c CCSD-MP2 = c (T) , i.e., treating all post-MP2 terms as a single correction, incurs a comparatively low cost. Somewhat surprisingly, perhaps, when eliminating (T) altogether (if dispersion is left in), WTMAD2 still stays below 3 kcal/mol. For perspective, however, we are now approaching the performance territory of the ωB97M-V range-separated hybrid meta-GGA 91 (WTMAD2 = 3.29 kcal/mol), 46 the best rung-4 functional! To put this in context, let us consider the original G4(MP2) and G4(MP2)-XK methods. For a smaller sample of 1208 molecules where we have G4(MP2) results, the WTMAD2 of G4(MP2) was equal to 2.96 compared to our best method, 2.31 kcal/mol.
(A short note is in order about some of the DFT functionals. WTMAD2 statistics for ωB97X-V and ωB97M-V were first published by Najibi and Goerigk, 102 and for B3LYP-D3 and M06-2X-D3(0) in the original GMTKN55 paper. These authors use the def2-QZVP basis set, augmented with diffuse s and p functions for the anionic subsets G21EA, AHB21, and IL16, while we, in the present work and in refs 45 and 46, employ def2-QZVPP as our baseline and def2-QZVPPD for the three aforementioned subsets plus RG18 and WATER27. Part of the difference could be traced to our larger basis sets, and the remainder is likely due to the superfine integration grids used here: meta-GGAs are wellknown to display strong grid sensitivity, 103

even though
Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article mitigation measures had been taken in the development of wB97M-V. For MP2-D3, the larger basis sets used here impart WTMAD2 improvements for the anionic species, but also for ACONF and RG18, which have large weights in the WTMAD2 formula.) IV.D. Reducing the MP2 Basis Set to TZ. If we do so, we effectively obtain a reparametrized version (rev-G4MP2XK-H6-v1) of G4(MP2)-XK. 53 The refitted WTMAD2, 3.53 kcal/ mol, is not much lower than the 3.71 kcal/mol obtained with the original G4(MP2)-XK parameters. This relatively small reduction suggests that the training set used in ref 53 was fairly representative. Both the original and refitted WTMAD2 values are, in fact, inferior to ωB97M-V and represent a marked deterioration over G4(MP2)-XK-T-H6-v1 in the previous section; a component breakdown of WTMAD2 (Table 1) reveals that this happens across the board. Considering the fairly low cost of an RI-MP2/def2-QZVPPD calculation if an RI-MP2 capable code is available (be it MOLPRO, Q-Chem, PSI4, ORCA, ...), there is really no valid reason not to "walk that additional tenth of a mile".
If we include the dispersion correction, WTMAD2 drops to 3.28 kcal/mol (note the sizable coefficient c Disp = −0.43). Without D3(BJ), eliminating HLC comes at a price of about 0.5 kcal/mol, but with D3(BJ), no such large price need be paid: WTMAD2 = 3.35 kcal/mol, with c Disp = −0.81. As above, the physical meaning of the large negative coefficient is clearly to compensate for basis set superposition error, particularly in the noncovalent interaction subsets.
IV.E. Using the E2 Correlation Energy from a Double-Hybrid Calculation. The good performance of double hybrids, inspired by GLPT2 (second-order Gorling−Levy perturbation theory), 104 elicits the question: could we improve performance by using KS orbitals in the MP2 for the largebasis steps?
We have investigated this possibility both for the two-tier scenario from subsection IV.C and for the three-tier scenario from subsection IV.B. The E2 energy components were taken from a ωrevDSD-PBEP86-D3BJ calculation with x = 0.72, ω = 0.16.
In the two-tier scenario, we see no noticeable improvement, but an intriguing phenomenon can be observed in the triple excitations coefficient c T , which takes on relative small negative values. In the three-tier scenario (G4(MP3|KS)-D-H5-v1), we can get down to WTMAD = 1.89 kcal/molbut c T ≈ 0, and in addition c CCSD−MP3 ≈ 0. This offers the tantalizing possibility of eliminating the costly CCSD(T) calculation step entirely (i.e., c CCSD-MP3 = c T = 0): doing so [G4(MP3|KS)-D-H5-v5 and G4(MP3|KS)-D-H5-v6] yields WTMAD = 1.89 with or without c Disp = 0. This offers a performance superior to double hybrids but requiring no steps more expensive than MP3/def2-TZVPP and with only three parameters (plus the six of the HLC). Can we cut out the HLC as well? Doing so [G4(MP3| KS)-D-v5 and G4(MP3|KS)-D-v6] only slightly increases WTMAD2 to 1.93 kcal/mol (with dispersion) and 1.96 kcal/mol (without dispersion): these last two options only have four and three parameters total, respectively, i.e., fewer than the double hybrids.
We note that code limitations precluded carrying out MP3 calculations in a basis of KS orbitals; hence, the MP3 term could only be considered in a basis of HF orbitals. From another perspective, that of fifth-rung density functional methods, this points toward a path for further improving on ωB97M(2) and revDSD-PBEP86-D4 at a relatively modest computational cost. Past attempts involving post-MP2 methods by Chan, Goerigk, and Radom, 105 using correlation ranging from MP3 to CCSD(T), showed no improvement: however, these authors for obvious cost reasons used the G2/ 97 small-molecule thermochemistry data set of 148 small molecules (about one-tenth the size of the present training set) and also restricted themselves to TZ quality basis sets. In view of our findings about the importance of a QZ quality basis set for the MP2-like component, it is quite possible that any improvements from post-MP2 methods in ref 105 were masked by residual basis set incompleteness error in the KS-MP2 contribution. The proverb about a chain being no stronger than its weakest link comes to mind.
The benefits of using KS orbitals in the MP2 for the largebasis steps can be illustrated from a different angle: if we use instead (spin-component-scaled) MP2, MP3, and a dispersion correction and optimize the four adjustable parameters (MP2.X-Q), we obtain WTMAD2 = 3.28 kcal/mol, which stays the same if we cut out the dispersion correction. Comparison with the MP2|KS cognates of these two methods (1.93 and 1.96 kcal/mol, see above) reveals that substituting MP2|KS for MP2 imparts a benefit of about 1.2−1.3 kcal/mol. A component breakdown shows that the difference lies in the small-molecule thermochemistry, barrier heights, and largemolecule reaction energies, while the two noncovalent interaction subsets taken together stay approximately unchanged in quality.
For this "empirical SCS-MP2x" approach, basis set extrapolation does offer a slight improvement (on the order of 0.05 kcal/mol).
IV.F. Performance of the DLPNO-CCSD(T)-Based Composite Methods. As the O(n 7 ) CPU time scaling with a system size of canonical CCSD(T) will present an insurmountable obstacle for application to larger systems, we considered its replacement by the DLPNO-CCSD(T) approach, which is linear-scaling with system size. Specifically, for two-tier approaches, we replaced the post-MP2 correction,  Recently, Neese and co-workers 106 found for the GMTKN55 data set and the aug-cc-pVDZ basis set that DLPNO-CCSD(T 1 ) with TightPNO has a WTMAD2 from the canonical results of 0.87 kcal/mol (1.58 kcal/mol for NormalPNO). Intriguingly, the present results show a much smaller difference between corresponding canonical and DLPNO approaches, implying a degree of error compensation between the MP2 and post-MP2 components. Moreover, replacing DLPNO-CCSD(T) by DLPNO-CCSD(T 1 ) yields a marginal improvement of statistics for the G4-DLPNO-T1 type methods in this framework (see the Supporting Information).
Gas-phase thermochemistry of small organic molecules has previously been studied employing DLPNO-CCSD(T) electronic energies by Paulechka and Kazakov, 107 who compared experimental gas-phase formation enthalpies with those at DLPNO-CCSD(T)/def2-QZVP//RI-MP2/def2-QZVP. They considered four variants, the most costly scheme being singlepoint energies at DLPNO-CCSD(T)/def2-QZVP and a geometry optimization at RI-MP2/def2-QZVP. Their data set consisted of 45 small organic molecules, ranging in size from benzene and pyridine to biphenyl. That implementation was available only for C, H, O, and N containing closed-shell molecules. Mielczarek et al. 108  IV.G. Final Selected Methods. It is clear from the discussion above that the WTMAD2 improvement from including the HLC is not commensurate with having to include an extra six parameters. This winnows our choices to the non-HLC ones.
We then find that the benefit of the dispersion corrections is negligible for the QZ/TZ two-tier approaches but rather less negligible for the QZ/TZ/DZ three-tier approaches and the Q/D two-tier. This leaves us with the following hierarchy of three methods; the variant label is omitted for the final selected methods, and the original method's name is in brackets: GOOD If one considers DLPNO-CCSD(T) as a gentler-scaling alternative to canonical CCSD(T), then G4-T-DLPNO becomes an attractive option, with WTMAD2 = 1.66 kcal/ mol, no step costlier than DLPNO-CCSD(T)/def2-TZVPP, and just three fitted parameters. Somewhat superior accuracy can be obtained (WTMAD2 = 1.52 kcal/mol) at the G4-Q-DLPNO level, where DLPNO-CCSD(T)/def2-QZVPP is the costliest step.
In Table 3, we present the final selected cWFT methods. They combine the highest accuracy for the molecular energies across the GMTKN55 database with a minimal number of empirical parameters. The energy expressions that we employed eliminate any redundant parameters (HLC) and render the listed methods transferable to various chemical systems, other than those of the 2459 molecules of GMTKN55. Extensive benchmark sets and the survival of the fittest approach, where the proper energy expressions eliminate empirical parameters, improved the DFT performance significantly within chemical accuracy for energetics. Representative cases include ωB97M(2) and revDSD-PBEP86-D4 that have been successfully employed to expanded porphyrins 109 or transition metal reaction barrier heights of the MOBH35 database. 110,111 Example input and output files for the popular Gaussian 16 electronic structure system, as well as postprocessing scripts, have been given in the Supporting Information. The geometry optimization and frequencies steps (and scale factors) are the same there as in the original G4(MP2)XK specification. 53 When perusing the CPU and wall clock times in these outputs, it should be kept in mind that this code has no RI-MP2 implementation, and hence the largest basis set MP2 step will weigh much heavier on the total than it would with another code like ORCA or Q-CHEM.
Further performance assessment of the presently proposed cWFT methods is especially desirable for transition metal (and heavy p-block) energetics and barrier heights, particularly in

V. CONCLUSIONS
We have attempted to develop a hierarchy of composite WFT computational thermochemistry protocols based on Weigend− Ahlrichs basis sets but parametrized against the very large and chemically diverse GMTKN55 benchmark. The original G4(MP2)-XK approach of Chan, Karton, and Raghavachari could be considered the lowest tier of our approach, using no larger basis set than def2-TZVPP for the MP2 correlation and def2-SVSP for the post-MP2 corrections. For GMTKN55, with the original parameters (but including a fix for rare gas atoms), G4(MP2)-XK is intermediate in performance between the ωB97X-V and ωB97M-V combinatorially optimized range-separated hybrid functionals. Refitting lowers WTMAD2 somewhat, but not spectacularly. Adding an empirical dispersion correction brings G4(MP2)-XK-D in the WTMAD2 = 3.3 kcal/mol range (equal to ωB97M-V, the bestperforming rung-4 functional) and additionally allows us to eliminate the "high-level correction".
If an RI-MP2 code is available, expanding the MP2 basis set to def2-QZVPPD is relatively inexpensive in both CPU time and storage requirements. The resulting G4(MP2)-XK-D method performs in the same range as revDSD double hybrids but is still inferior to the best among them like revDSD-PBEP86-D4, as well as to the combinatorially optimized ωB97M(2) range-separated double hybrid. Especially if a dispersion correction is included, the HLC turns out to be superfluous, bringing the total number of empirical parameters down to just six. A significant improvement is achieved with the three-tier G4(MP3)-D method, which entails RI-MP2/def2-QZVPPD, MP3/def2-TZVPP, and CCSD(T)/def2-SVSP calculations. We can then achieve WTMAD2 = 1.65 kcal/mol, markedly superior to even the best double hybrids.
A further enhancement is possible as a two-tier approach with RI-MP2/def2-QZVPPD and CCSD(T)/def2-TZVP, but at WTMAD2 = 1.43 kcal/mol, the substantial premium in computational cost is hard to justify by a performance benefit of just 0.2 kcal/mol over G4(MP3)-D. Still, further improvements probably will require going to quintuple-zeta basis sets for the MP2 step, including inner-shell correlation, considering post-CCSD(T) correlation effects, or a combination thereof.
While G4(MP2)-XK-D and G4(MP3)-D are comparable in computational cost to G4(MP2) and G4, respectively, their WTMAD2 metrics over GMTKN55 are markedly superior. A component breakdown of WTMAD2 reveals that the advantage of G4(MP2)-XK-D and G4(MP3)-D resides primarily in the large-molecule, intramolecular NCI (noncovalent interactions), and intermolecular NCI subsets: we note that the HLC as defined for G4(MP2) and G4 cancels exactly between the reactant and product sides of these reactions (as for any reactions that only involve closed-shell molecules), and hence it is unable to correct for basis set incompleteness.
The combination of (MP2|KS)/def2-QZVPPD correlation from a double-hybrid calculation with scaled MP3/def2-TZVPP correlation energy yields very promising results, WTMAD2 = 1.96 kcal/mol, at a comparatively low computational cost. This points toward an avenue for further improving the performance of empirical double-hybrid density functionals.
The DLPNO-CCSD(T)-based variants of the two-tier family represent the lowest-cost approach due to their minimal requirements compared to the canonical CCSD(T)-based cWFT. (By way of illustration: the most time-consuming step of G4-T-DLPNO on a melatonin conformer took about 2 h on eight cores of a Skylake machine, compared to about 2 days for the canonical G4-T. This gap will only widen for larger molecules.) G4-D-DLPNO was found to have WTMAD = 2.38 kcal/mol, i.e., 1.33 kcal/mol lower that the original G4(MP2)-XK at a fraction of its cost and with nine fewer parameters. The G4-Q-DLPNO and G4-T-DLPNO had the lowest WTMAD2 = 1.52 and 1.65 kcal/mol for the GMTKN55 database, respectively. Out of the different DLPNO-based approaches, G4-T-DLPNO appears to offer the best price−performance ratio.
Overall, we can conclude that composite WFT methods still offer advantages over the best double hybridsbut that the gap has narrowed substantially. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Funding
Research was supported by the Israel Science Foundation (grant 1358/15) and by the Yeda-Sela-SABRA program (Weizmann Institute of Science). The work of E.S. on this scientific paper was supported by the Onassis Foundation Scholarship ID: F ZP 052-1/2019-2020.

Notes
The authors declare no competing financial interest.

■ ACKNOWLEDGMENTS
J.M.L.M. acknowledges Dr. Mark Vilensky (scientific computing manager of ChemFarm) and Mr. Jacov Wallerstein (CEO and CTO of Access Technologies) for development of the bscratch shared high-bandwidth scratch storage server, without which some of the largest conventional calculations would have been impossible. Peer reviewers are thanked for helpful suggestions. We also thank Mr. Golokesh Santra and Dr. Mark A. Iron for helpful discussions.