Canonical and DLPNO-Based Composite Wavefunction Methods Parametrized against Large and Chemically Diverse Training Sets. 2: Correlation-Consistent Basis Sets, Core–Valence Correlation, and F12 Alternatives

A hierarchy of wavefunction composite methods (cWFT), based on G4-type cWFT methods available for elements H through Rn, was recently reported by the present authors [J. Chem. Theor. Comput.2020, 16, 4238]. We extend this hierarchy by considering the inner-shell correlation energy in the second-order Møller–Plesset correction and replacing the Weigend–Ahlrichs def2-mZVPP(D) basis sets used with complete basis set extrapolation from augmented correlation-consistent core–valence triple-ζ, aug-cc-pwCVTZ(-PP), and quadruple-ζ, aug-cc-pwCVQZ(-PP), basis sets, thus creating cc-G4-type methods. For the large and chemically diverse GMTKN55 benchmark suite, they represent a substantial further improvement and bring WTMAD2 (weighted mean absolute deviation) down below 1 kcal/mol. Intriguingly, the lion’s share of the improvement comes from better capture of valence correlation; the inclusion of core–valence correlation is almost an order of magnitude less important. These robust correlation-consistent cWFT methods approach the CCSD(T) complete basis limit with just one or a few fitted parameters. Particularly, the DLPNO variants such as cc-G4-T-DLPNO are applicable to fairly large molecules at a modest computational cost, as is (for a reduced range of elements) a different variant using MP2-F12/cc-pVTZ-F12 for the MP2 component.


■ INTRODUCTION
Composite wavefunction theoretical (cWFT) methods continue to be a mainstay for reaching kcal/mol level "chemical accuracy" for reaction energies. Some of the well-established approaches include the Gaussian-n (Gn), 1−7 CBS-QB3, 8,9 multicoefficient correlation methods (MCCM), 10−12 the correlation-consistent composite approach (ccCA), 13−15 and, in sub-kcal/mol accuracy regimes, the Weizmann-n variants, 16−23 the HEAT-n methods, 24−26 and the Feller− Peterson−Dixon (FPD) 27−29 approach. All of these share a canonical coupled-cluster CCSD(T) 30,31 component. One step toward the pursuit of accurate low-cost cWFTs was a recent DLPNO-CCSD(T)-based method (DLPNO-ccCA) 32 suitable for the elements of the first and second rows of the PTE; it was parametrized to the small G2/97 training set 33,34 of 148 small closed-shell species, the largest organic molecule in it being benzene.
The above methods, in their original form, focused on light elements. Very recently, Chan, Karton, and Raghavachari (CKR) 35 extended the applicability of G4(MP2) to the entire spd blocks of H-Rn through a switch to Weigend−Ahlrichs/ Karlsruhe/def2-type basis sets. 36 When we applied this G4(MP2)-XK to the larger and more chemically diverse GMTKN55 benchmark suitegeneral main-group thermochemistry, kinetics, and noncovalent interactions, with 55 problem sets 37 entailing with almost 2500 unique calculations on systems as large as 81 atomswe were astonished to find 38 WTMAD2, weighted mean absolute deviation type 2, values inferior to the best available doublehybrid 39 (see refs 40−43 for reviews) density functional theory (DFT) functionals, 43,44 which reach WTMAD2 values in the 2.2−2.3 kcal/mol range.
As it turned out, by refitting to GMTKN55 and carefully monitoring statistical significance of empirical parameters, we were able to develop 38 a new family of cWFT methods using def2 basis sets: in particular, G4-T and G4-T-DLPNO methods reached WTMAD2 values of just 1.51 and 1.66 kcal/mol, respectively. Particularly, G4-T-DLPNO is an economical option.
In these papers, inner-shell correlation was generally excluded, as the def2 basis sets are not designed for core− core (CC) and core−valence (CV) correlation. Yet it cannot be taken for granted that its neglect does not compromise accuracy. Addressing this question is the subject of the present study.
Now doing so would require switching to a different family of basis sets that have the required radial and angular flexibility in the core−valence, i.e., high-exponent, region, such as the cc-pwCVnZ (correlation-consistent, core−valence weighted, ntuple ζ basis sets) of Peterson and Dunning. 45 But such basis sets would likely have a different convergence behavior for the valence contribution as well, and hence require reparametrization. In fact, there is evidence 21 that additional radial flexibility such as offered by core−valence basis sets benefits valence properties as well.
Basis set convergence for core−core (CC) and core−valence (CV) correlation contributions to atomization energies was recently studied in great detail in ref 46. The importance of inner-shell correlation will not be homogeneous across the GMTKN55 test suite; while it is well known (see, e.g., ref 47) that small-molecule atomization energies have inner-shell correlation components of several kcal/mol, their contribution to noncovalent interactions between firstand second-row compounds will generally be very small 48 owing to the longdistance nature of dispersion and the fairly short-range nature of CV correlation. Correlation from the (n − 1)d subvalence shells of Br and I is rather more important in halogen bonding. 49 For conformer equilibria and large-molecule isomerization reactions, one can expect a large degree of CC and CV cancelation between reactants and products. However, at least in principle, the inclusion of CV correlation should improve the overall GMTKN55 performance.
In this paper, we will present a hierarchy of cc-G4-and cc-G4-DLPNO-type approaches. Core−valence correlation will only be considered at the MP2 level. We will show that we can actually reduce WTMAD2 below the 1 kcal/mol threshold. Somewhat surprisingly, we find that this improvement in accuracy is due much less to core−valence correlation itself than to basis set expansion. The correlation-consistent methods deliver an attractive compromise between accuracy and computational cost for systems dominated by dynamic correlation. For systems with severe static correlation, 50 CCSD(T) is inadequate anyhow and one needs to resort to approaches such as W4, 51,52 W4-F12, 21 HEAT-III, 26,53 or FPD. 27−29 ■ COMPUTATIONAL DETAILS All calculations were performed on the ChemFarm HPC cluster of the Faculty of Chemistry at the Weizmann Institute. Valence-only CCSD(T) calculations 30,31 were carried out using Gaussian 16, revision C.01. 54 The MP3 and RI-MP2 55,56 calculations were performed with Q-CHEM 5.2, 57 and for the DLPNO-MP2, 58 DLPNO-CCSD(T), 59 and DLPNO-CCSD(T 1 ) 60 calculations, we utilized ORCA 4.2.1. 61 Most RI-MP2-F12 calculations likewise relied on ORCA, but for technical reasons, we employed MOLPRO 62 version 2019.2 for the heavy p-block systems. The 4TB SSD (solid-state disk) arrays on the "heavyio" nodes of ChemFarm benefited the canonical MP3 and CCSD (T) calculations, even though they were insufficient for a few of the largest MP3/ def2-TZVPP jobs; for those, we turned to a 40TB shared-over-InfiniBand storage server custom-developed for us by Access Technologies of Ness Ziona, Israel.
In MP2 calculations, where all electrons were correlated, we employed the following basis sets: for H and He, the correlation-consistent aug-cc-pVmZ 63,64 along with the corresponding RI auxiliary basis sets; 65 for the first-row (Li−Ne) and second-row (Na−Ar) atoms, the weighted core−valence basis sets aug-cc-pwCVmZ 45,63,64 along with the corresponding auxiliary RI ones. 66,67 The third-row alkali and alkaline earth metals K and Ca were also treated using aug-cc-pwCVmZ, but for molecules containing K or Ca, the RI approximation was not considered due to lack of the corresponding auxiliary basis sets. In all-electron MP2 as well as in valence-only MP2 calculations, we correlated all of the electrons of the alkaline and alkaline earth metals. For the heavy p-block elements, Ga-Kr, In-Xe, and Tl-Rn, we resorted to aug-cc-pwCVmZ-PP 68 (where PP stands for pseudopotential) associated with smallcore multiconfigurational Dirac−Hartree−Fock (MCDHF) relativistic pseudopotentials. 69,70 As auxiliary basis sets for the heavy p-block elements, we employed cc-pVmZ-PP-F12/ MP2FIT. 71 The cardinal number m refers to T or Q, and in this text, we shall denote the core−valence basis sets used as aug-cc-pwCVmZ(-PP). We have also carried out a series of valence-only MP2 calculations for the species in GMTKN55 using the same aug-cc-pwCVmZ(-PP) basis sets. Finally, at the request of a reviewer, for comparison, we also considered the simple cc-pVnZ basis sets in the valence-only MP2 step, with some slight ad hoc modifications: for the anion-containing subsets AHB21, G21EA, IL16, RG18, WATER27, and BH76, we employed aug-cc-pV{T,Q}Z, and (to accommodate pseudohypervalent molecules like SO 3 ) cc-pV(n+d)Z for the second-row atoms in W4-11. 72 For technical reasons, aug-cc-pwCV{T,Q}Z was applied for MB16-43, on account of the multiple alkali and alkaline earth metals appearing in these species.
For the valence-only CCSD(T), we applied the frozen-core (FC) approximation as in the G4-type methods along with the def2-SVSP (standard def2-SVP without the polarization functions on hydrogen) and def2-TZVP basis sets of the Weigend−Ahlrichs/Karlsruhe def2 family. 36 Hence, we were able to repurpose the CCSD(T) total energies from our previous study. 38 Similarly, we repurposed MP3/def2-TZVPP along with the def2 small-core energy-consistent relativistic pseudopotentials 73 for elements heavier than Kr. The augmented def2 basis sets are available for the elements H− La and Hf−Rn; the nonaugmented ones are additionally available for Ce−Lu. We retrieved the core−valence basis sets and their auxiliary variants from the Basis Set Exchange 74 and the ccRepo basis set repository. 75 We have provided all basis sets files used in this work in the Supporting Information (SI).
The open-shell cases were treated with unrestricted HF orbitals (UHF), analogously to the reported G4-type approaches, CBS-QB3, and G4 methods. A few G4(MP2) variants use ROHF determinants, 76 which might be more appropriate for radicals prone to severe spin contamination. The dispersion model considered was that of Grimme et al., 77 with the Becke−Johnson damping function 78 denoted as "D3(BJ)". In our previous work on double-hybrid functional parametrization, 44 we arrived at {a 1 = s 8 = 0, a 2 = 5.5} as reasonable compromise values for the damping function's shape parameters; we retained these parameters in the present work, leaving the R −6 overall scaling parameter s 6 as the single fitted parameter.
For the DLPNO-CCSD(T) and DLPNO-CCSD(T 1 ) calculations, the details are the same as in our previous work. 38 The "TightPNO" option 79 was applied throughout, and "chemical cores" were kept frozen as in ref 38, while, as with all "def2" basis sets, the deepest core electrons of elements heavier than Kr were modeled utilizing the Stuttgart−Cologne relativistic energy-consistent pseudopotentials. 73,80 The SCF convergence threshold was equal to 10 −9 au; the RIJCOSX chain-of-spheres-exchange approximation 81,82 for constructing the Fock matrices was applied, both with the default GRIDXS2 integration grid and with the most stringent GRIDX9 grid. GRIDXS2 corresponds to angular Lebedev-110 angular and radial Gauss-Chebyshev with IntAcc = 4.01, where the number of radial points is given by 83 n rad = 15IntAcc + 5n A − 40, and n A = 30 + 5n rowPTE,A , with n rowPTE,A being the number of the row in the periodic table that atom A belongs to; in GRIDX9, the angular grid is Lebedev 302 and IntAcc = 4.67. We employed the def2-SVPD, def2-TZVPP, and def2-QZVPP basis sets along with the auxiliary versions of def2/J (see ref 84) and def2-SVP/C, def2-TZVPP/C, and def2-QZVPP/C (see ref 85), as stored in ORCA's internal basis set library. For the subsets AHB21, G21EA, IL16, RG18, and WATER27, we similarly applied the diffuse-function augmented def2-TZVPPD and def2-QZVPPD, 86 inspired by ref 44. For the avoidance of doubt, in the DLPNO-CCSD(T)-based or DLPNO-CCSD(T 1 )-based cWFTs discussed here, the E CCSD-MP2 term is calculated by subtracting the DLPNO-MP2 energy from separate single-point calculations in the same basis set, and not from the "semi-local (SL) MP2" energy reported at the post-SCF stage of a DLPNO-CCSD(T) or DLPNO-CCSD(T 1 ) run.
For the explicitly correlated RI-MP2-F12 calculations, 87 the computational details largely follow those in ref 49. We consider here the cc-pVTZ-F12 and cc-pVQZ-F12 basis sets 88 along with the corresponding auxiliary basis sets aug-cc-pVnZ/ JK, 89 cc-pVnZ-F12-MP2-FIT, and cc-pVnZ-F12-OPTRI 90 (n = T or Q for cc-pVTZ-F12 or cc-pVQZ-F12, respectively), as implemented in ORCA. For molecules involving heavy p-block elements (e.g., halogen-bonded species involving bromine and iodine), the subvalence (n − 1)d shell has been correlated, analogous to ref 49. For these elements, we employed the cc-pVQZ-PP-F12 basis set 71 along with the various auxiliary basis sets set from the same reference, as stored in the internal basis set library of MOLPRO 2019.2. 62 The fixed amplitude ansatz 91 is considered throughout, and the geminal exponent (β) was set equal to 1.0, as recommended in ref 92 for the RI-MP2-F12 calculations.
The primary standard for training the presented cWFT methods was the GMTKN55 benchmark; 37 as in ref 44 for the minimally empirical double hybrids, and in our previous paper on G4-T-type methods, 38 the reference data, geometries, and charge/multiplicity information were extracted from the ACCDB database of Morgante and Peverati 93 and reused verbatim (without geometry optimization). See ref 37 for details of all of the reference data, which were either CCSD(T)/CBS (i.e., extrapolation to the complete basis set limit) or higher level, many taken from previous benchmark studies in our group. These reference data are in the hypothetical motionless state without ZPE and thermal corrections. The most computationally demanding subsets C60ISO (isomerization energies of fullerene C 60 molecules) 94 and UPU23 (relative energies of uracil dinucleotides) 95 were currently not within reach for MP3/def2-TZVPP and canonical CCSD(T)/def2-TZVP. As these subsets have comparatively small weights in the WTMAD2 formula, their omission does not significantly affect WTMAD2, as has been explained at some length in ref 38. Likewise, for the standard G4, G4(MP2), and CBS-QB3 calculations carried out for comparison, these reference geometries were not reoptimized.
The calculation of the reaction energies from total energies and a reference data file, and the evaluation of WTMAD2 and associated statistics, were performed using a Fortran program developed in-house, which is available upon request. In addition, we employed the BOBYQA 96 (Bound Optimization BY Quadratic Approximation) gradient-free deterministic optimizer to optimize the energy coefficients. Numerous initial guesses were evaluated, and reoptimizations ensured that a global minimum was indeed reached.
The performance of the presented cWFT methods for the GMTKN55 database was quantified utilizing the weighted mean absolute deviation, type 2 (abbreviated WTMAD2), as defined in eq 2 of the GMTKN55 paper. 37 (1) where N i is the number of systems of subset i, MAD i is its mean absolute deviation from the reference values, and the average absolute reference reaction energy |Δ | E i ensures that errors are weighted proportionally to their importance on the varying energy scales of the different subsets.
WTMAD2 is based on MAD, a more "robust" metric in the statistical sense 97 of the word that it is less prone to perturbation by outliers, than the root-mean-square deviation (RMSD), and more suitable for the GMTKN55 37 database. Besides, WTMAD2 was the key metric in developing the double-hybrid DFT 43,44 and G4-type cWFT 38 methods.
Finally, at the request of a reviewer, we carried out a computational cost comparison with W4 theory for a small illustrative sample of molecules. For consistency, all of these calculations were carried out on identical hardware platforms, namely, two 18-core Intel Xeon Gold 6240 processors (2.30 GHz). We employed ORCA throughout the timing comparisons of cc-G4-T-and DLPNO-CCSD(T)-based methods, while for the composite method W4, we utilized MOLPRO 62 for CCSD and CCSD(T) and the MRCC 98−100 arbitrary-order coupled clyster program for the post-CCSD(T) steps.
Description and Nomenclature of Correlation-Consistent cc-G4-Type Methods. The standard G4-type methods share similar energy expressions with the correlation-consistent cc-G4-type ones: post-MP2 terms, particularly valence CCSD (T), are evaluated with the same def2 basis sets, in part to permit recycling the most CPU-intensive parts of the calculation from the previous work. This leaves the extrapolated Hartree−Fock reference and E 2 correlation energies as the key differences; both of them are calculated using the aug-cc-pwCV{T,Q}Z(-PP) basis sets. The commonly used shorthand notation cc-pV{T,Q}Z for "extrapolation from cc-pVTZ and cc-pVQZ basis sets", and analogously for aug-cc-pV{T,Q}Z, def2-{T,Q}ZVP, etc., is employed throughout the present paper.
We note that the coefficients of the E corr,MP2,OS and E corr,MP2,SS terms are adjustable parameters obtained together with the other parameters through minimization of WTMAD2 for the GMTKN55 database, and should not be misconstrued as identical to the original SCS-MP2. 104,105 For the three-tier basis set methods, we similarly follow the pattern of the G4(MP3)-type variants from ref 38 but substitute HF/aug-cc-pwCV{T,Q}Z(-PP) and E 2 /aug-cc-pwCVQZ(-PP). Both valence-only MP3 and CCSD (T) energies are retained with the same basis sets as in ref 38.
The five-parameter "high-level correction" (HLC), as defined in eq 7 of ref 38 following CKR, was originally introduced in ref 3 as a correction for residual basis set incompleteness. In its original, simplest, two-parameter form introduced as part of G1 theory, 6 the two parameter values were fixed from the hydrogen atom and hydrogen molecule's exact total energies. 6 In the present work (see below), like in our previous study, 38 we found that the addition of HLC did not significantly enhance statistics, especially not at any level that would justify the introduction of five additional parameters. It indeed introduces discontinuities on bondbreaking surfaces and might otherwise jeopardize transferability to other chemical systems. Hence, none of our final recommended levels include an HLC term.
The naming of the correlation-consistent cWFTs is analogous to the original one for the standard G4-type methods ( Figure 1). 38 The extrapolation of the total E 2 or of the individual E 2,OS and E 2,SS , all using aug-cc-pwCV{T,Q}Z(-PP), determines the method's name. In the first case, combining the total E 2 /CBS extrapolated in Schwenke style 102 (see ref 103 for equivalence relations with other twopoint extrapolations) with CCSD(T) leads to cc-G4-n or otherwise to cc-G4-DLPNO-n if DLPNO-CCSD(T) is used instead of CCSD (T). Inserting an MP3 step for an intermediate basis set size leads to the cc-G4(MP2.X)-n variants. If we similarly scale the same-and opposite-spin E 2 terms, we will denote this as cc-G4-scs-n or, if an MP3 step is added, cc-G4(scsMP2.X)-n. The cardinal number n = D, T, or Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article Q refers to the basis set employed in the CCSD(T) step, i.e., def2-SVPD, def2-TZVPP, and def2-QZVPP, respectively.

■ RESULTS AND DISCUSSION
The error statistics and the WTMAD2 component breakdown for selected methods (in kcal/mol) appear in Table 1. The respective abbreviations "Thermo", "Barrier", "Large", "Confor", and "Intermol" refer to the five basic subdivisions of GMTKN55: basic thermochemistry, barrier heights, reactions of large molecules, conformer equilibria, and intermolecular interactions. The table is grouped into four blocks: presently obtained "correlation-consistent" cWFT methods; cWFT from the literature; simple WFT; and the better-performing and most commonly used DFT methods. Table 2 are cc-G4-T-v1 and cc-G4-T-v2, both with WTMAD2 of just 0.87 kcal/mol. cc-G4-T-v1 has four adjustable parameters, of which the fourth is the coefficient of the dispersion correction, being just −0.006. If we set it to zero instead (i.e., eliminate the dispersion correction), we obtain cc-G4-T-v2 with just three adjustable parameters (see Table S2 for the WTMAD2 contributions per subset).

Effects of Basis Set Expansion and of CV Correlation Energy Inclusion. The best approximations in
This represents a substantial improvement over the WTMAD2 statistics of 1.46 and 1.49 kcal/mol, respectively, for G4-T-v1 and G4-T-v2 from ref 38. Breakdown by the five top-level subdivisions of GMTKN55 (Table 1) reveals that all five of them benefit, least so the large-molecule reactions and most so the intermolecular interactions, for which the WTMAD2 component is cut in half, from 0.63 to 0.32 kcal/ mol.
Particularly for intermolecular interactions, which are a longrange phenomenon where one would intuitively expect the impact of core−valence correlation to be negligible, rationalizing this improvement entirely in terms of core−valence correlation seems implausible. But in truth, we are making two major changes at once: basis sets and core−valence correlation. Disentangling these two requires carrying out an additional set of calculations in which the same frozen-core approximation as in G4-T-v2 is applied to cc-G4-T-v2. Somewhat surprisingly, perhaps, such a cc-G4-T-v2(FC) ["frozen core"] recovers the lion's share of the improvement, at WTMAD2 = 0.94 kcal/ mol.
A more detailed inspection of the individual subsets reveals that RG18 and HAL59 are the two largest contributors to the difference, with MADs reduced by 2/3. Next are BSR36 (MAD reduced by 3/4), HEAVY28 (MAD reduced by 4/10), following by a string of subsets like BH76, W4-11, G21EA, MB16-43, and the conformer subsets BUT14DIOL and AMINO20X4, for which improvements of 40−60% are seen. Because of the way subsets are weighted in WTMAD2, the small reaction energies in HAL59 and HEAVY128, and especially RG18, have an outsize contribution: the same is true to a lesser extent of BSR36 and BH76. However, while the improvement in W4-11 atomization energies does not weigh greatly in WTMAD2, the RMSD for this subset is cut in half: for small-molecule thermochemistry, an improvement from RMSD = 2.9 to 1.6 kcal/mol is significant, as is an RMSD improvement for electron affinities from 0.08 to 0.05 eV. Nevertheless, for the AMINO20X4 and BUT14DIOL conformer sets, the larger and more flexible basis sets prove very useful, although arguably, we already have quite small errors to begin with.
The difference between WTMAD2 = 0.94 kcal/mol with frozen cores, and WTMAD2 = 0.87 kcal/mol without frozen cores, implies that MP2 inner-shell correlation accounts for just 0.07 kcal/mol on WTMAD2. Half of that fairly meager improvement comes from just RSE43, where the error is cut in half. But incremental improvements for many other sets are outweighed by a deterioration in BSR36, where the valence calculation fortuitously leads to outstanding results. Still, for W4-11, RMSD drops from 1.57 to 1.27 kcal/mol, which smallmolecule thermochemists would likely regard as a nontrivial improvement. It was found previously 21 in the context of the W4-F12 paper that the additional radial flexibility in core− valence basis sets is beneficial even when only valence electrons are correlated.
We considered a similar breakdown for several additional cases and consistently found that the improvement in WTMAD2 from the larger basis sets is about an order of magnitude more important than the effect of including those additional core electrons. Therefore, is their inclusion computationally wasteful? For the largest calculations in our sample, the CPU time for the largest basis set RI-MP2 calculation is, in any case, still dominated by the SCF step. The including of the additional core electrons in that step will only insignificantly add to total computational overhead; hence, we have decided to include them throughout in what follows below.
The three fitted parameters of cc-G4-T-v2 (Table 2) Figure 2 depicts the contribution to the WTMAD2 (kcal/mol) of each subset for the best two-tier methods.
We might indeed go one step further and set c CCSD-MP2 = c T = 1.0, leaving just a single adjustable parameter. The resulting method is arguably akin to the ccCA approach without relativistic corrections. With c MP2/CBS = 0.642, this "inspired by ccCA" composite approach with inner-shell correlation, cc-G4-T-v7 has WTMAD2 = 0.922 kcal/mol. Its counterpart cc-G4(FC)-T-v7 without core−valence correlation has WTMAD2 = 0.993 kcal/mol for c MP2/CBS = 0.727. However, we could take this one final step and replace the one-parameter two-point extrapolation with the parameter-free three-point extrapolation combo in ccCA-PS3. 14 For such a "quasi-ccCA", we obtain WTMAD2 = 1.01 kcal/mol without inner-shell correlation, and for quasi-ccCA(noFC) without frozen cores, WTMAD2 = 0.92 kcal/mol. It is quite intriguing from a scientific point of view that in the guise of cc-G4-T-v2, we obtained something not dissimilar from ccCA from a completely different angle and the comparatively small improvement in WTMAD2 obtainable by introducing the adjustable parameters represents a "feather in the cap" of the original designers of ccCA.
At this stage, we will examine whether, for the frozen-core MP2, the larger correlation-consistent aug-cc-pwCVnZ(-PP) basis sets have a significant edge over the smaller nonaugmented cc-pVnZ basis sets. We found WTMAD2 = 0.993 kcal/mol in cc-G4(FC)-T-v7 and c MP2/CBS = 0.728 as the single adjustable parameter for MP2/aug-cc-pwCV{T,Q}Z(-PP) (note that c CCSD-MP2 = c T = 1.0). When reducing the MP2 basis sets to cc-pV{T,Q}Z while retaining all post-MP2 components of cc-G4(FC)-T-v7, WTMAD2 more than doubles to 2.007 kcal/mol. The most notable improvements when using larger basis sets lie in the HAL59, HEAVY28, RG18, and W4-11 subtests, and to a lesser extent in S66. Now for noncovalent interaction sets, inner-valence flexibility (i.e., the pwC component of aug-cc-pwCVnZ) will not be very beneficial since we are dealing with long-range interactions. It is very well known, however (see, e.g., ref 115 and references therein), that diffuse functions, i.e., the "aug-" component, significantly reduces basis set superposition error provided the underlying basis set is not too small. In W4-11, on the other hand, we are dealing with short-range covalent bonds, and it was previously shown in the W4-F12 paper 21 that the valence correlation contribution to total atomization energies benefits from additional radial flexibility in the basis sets. All individual contributions to the WTMAD2 per subset are listed in Table  S10 along with the relative energies per reaction for the two G4-T-type methods with aug-cc-pwCVnZ or cc-pVnZ basis sets in MP2(FC).  Table S3 in the Supporting Information).
By scaling the E 2,SS , E 2,OS , and E CCSD-MP2 terms, we can eliminate two semiredundant parameters at the expense of ΔWTMAD2 = 0.14 kcal/mol, attaining 2.35 kcal/mol for cc-G4(MP2)-D-v1. The switch from def2 to cc basis sets, and the incorporation of the CV correlation energy, together lower WTMAD2 by 0.33 kcal/mol for cc-G4(MP2)-D-v1 relative to G4(MP2)-D-v1 (see Table S4 in the Supporting Information), the same subsets as above being most affected.
For the top-performing methods, deviations from reference reaction energies for individual data points are available in the Supporting Information. The most considerable deviations occur in the MB16-43 subset, where MB stands for the selfdescribed "mindless benchmarking" dataset of Korth and Grimme,116 i.e., machine-generated artificial structures (a handful of which suffer from severe spin contamination). Their reference energetics had been reevaluated in the GMTKN55 article, 37 at the W1-F12 level of theory.
Including the CV Correlation Energy in the Three-Tier Methods. Reducing the basis set of CCSD(T) from def2-TZVPP to split-valence greatly reduces the overall computational cost, and when a scaled MP3/def2-TZVPP correction was added as a middle tier, G4(MP3)-D was obtained, with WTMAD2 = 1.65 kcal/mol and six parameters. 38 Said threetier method employed the post-HF components from MP2/ def2-QZVPPD, a scaled MP3−MP2 difference with the def2-TZVPP basis set, and the E CCSD(T)-CCSD and E CCSD-MP3 differences from CCSD(T) using the smallest basis set def2-SVSP (i.e., def2-SVP without p polarization functions on hydrogen atoms). The present three-tier methods substitute HF and MP2 with aug-cc-pwCVQZ(-PP) basis sets and retain  Table 2 of the original D3(BJ) paper. 78 f The WTMAD2 component breakdown is for 1320 reactions (+63 more than in our previous work). 38 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article   the other components, with energy expressions of the correlation-consistent methods following the previously reported three-tier approaches (aside from the inner-shell correlation energy being included in the MP2 part). This leads to cc-G4(MP3)-D-v1 with WTMAD2 = 1.37 kcal/mol and six parameters. The overall WTMAD2 improvement relative to the standard G4(MP3)-D-v1 is 0.28 kcal/mol, when using larger basis sets and including the CV energy in cc-G4(MP3)-D-v1; it is an accumulative amelioration rising from most subsets as they slightly benefit from the larger basis sets and the CV inclusion (Table S5 in  Correlation-Consistent DLPNO-CCSD(T)-Based cWFT Methods. One way to improve computational efficiency would be to substitute DLPNO-CCSD(T) for CCSD (T), leading to two-tier cWFTs that combine RI-MP2 for the larger basis sets with a smaller basis set CCSD(T)-MP2 correction computed at the DLPNO-CCSD(T) level. While the latter asymptotically scales linearly with system size, RI-MP2 still has O(n 5 ) scaling; however, as discussed in Section 5 of Weigend et al., 56 and demonstrated in Table 5 there, the prefactor of the O(n 5 ) term is small enough that it does not dominate until molecules of the size of 44-alkane are reached.
These correlation-consistent cWFT approaches follow the previously reported energy expressions ( Table 2 of  We obtain the lowest WTMAD2, 1.001 kcal/mol, for cc-G4-Q-DLPNO-v1; in view of the very small c Disp = 0.04, omitting the dispersion correction unsurprisingly leads to an essentially identical WTMAD2 = 1.005 kcal/mol for cc-G4-Q-DLPNO-v2, compared to 1.52 kcal/mol for G4-Q-DLPNO-v2. According to Table S6 in the Supporting Information, the half-dozen subsets that present the largest reduction in WTMAD2 are RG18, PCONF21, HAL59, AMINO20X4, HEAVY28, and BH76, the main difference from our observations for cc-G4-T-v2 above being the presence of PCONF21. (These latter calculations are computationally feasible for DLPNO-CCSD(T)/def2-QZVPP but proved too demanding for canonical CCSD(T) even with the smaller def2-TZVPP basis set.) Once again, the improvement for W4-11 only contributes −0.022 kcal/mol to the change in WTMAD2, but an improvement of RMSD from 2.9 to 0.9 kcal/mol is most definitely significant for small-molecule thermochemistry.
In the DLPNO-CCSD(T)-based variants, we had applied the RIJCOSX approximation to avoid a scenario for large molecules where the SCF step would dominate overall CPU time. Thus far, we had only applied the default GridXS2 grid in RIJCOSX. Did this fairly coarse grid introduce significant error? To elucidate this point, we repeated the DLPNO-CCSD(T)/def2-TZVPPD and DLPNO-MP2/def2-TZVPPD calculations using GRIDX9 (the most stringent built-in option) for the RIJCOSX step. By way of illustration, for phenol and for melatonin, this added just 7.0 and 3.  Table S8). Clearly, one can set c CCSD-MP2 = 1.0 with impunity, reducing the number of empirical parameters to just two. In light of the very small cost penalty, we recommend using GRIDX9 throughout if one avails oneself of RIJCOSX.
Regarding the explicitly correlated RI-MP2-F12-based approximations, we begin by considering additivity approximations of the following form  Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article molecules (see Table 2). A detailed comparison by subsets can be found in Table S9 of the SI.
Reducing the RI-MP2-F12 basis set to TZ only marginally increases WTMAD2 by 0.014 kcal/mol [cc-G4-F12- T-v1]. This is a testament to the ability of explicitly correlated methods 119−122 to drastically speed up basis set convergence, typically by two or three angular momentum steps over their orbital counterparts. 123 We attempted RI-MP2-F12/cc-pV-{T,Q}Z-F12 extrapolation using the Schwenke coefficient c E2/CBS = 0.446336 from Table 6 of Hill et al. 92 {E 2 /CBS = E 2 (L + 1) + c E2/CBS [E 2 (L + 1) − E 2 (L)]}, but found that WTMAD2 slightly increases to 1.048 kcal/mol. Likely, seeing an advantage to larger basis sets in the F12 step would require tightening the post-MP2 steps as well. Thus, TZ quality basis sets in RI-MP2-F12 and the post-MP2 terms represent a "sweet spot" for cc-G4-F12-T-v1.
Next Final Selected Methods. The hierarchy of the cc-G4-type cWFT closely parallels that of the standard G4-type cWFT methods, especially for the two-tier approaches. When a CCSD(T)/def2-TZVP or DLPNO-CCSD(T)/def2-TZVPP component is present in these cc-cWFTs, WTMAD2 values below 1 kcal/mol can be reached for GMTKN55.
The cc-G4-type methods provide a low-cost and quite accurate approximation to the CCSD(T) electronic energy at the complete basis set limit. As such, an inherent limitation is the shortcomings of the CCSD(T) method itself for species with strong static correlation; post-CCSD(T) approaches are   Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article currently out of reach for the larger species in GMTKN55. This may limit the applicability of these cWFT methods to transition metals, although they may still be valuable for applications on second-and third-row precious-metal catalysts (see, e.g., refs 124, 125 for reviews), where static correlation effects are milder. 126,127 This issue will be investigated in future work. Suffice to say for now that for main-group systems such as those in the GMTKN55 dataset, correlation-consistent cWFT approachesboth ccCA and the parametrized approaches offered hererepresent felicitous combinations of moderate cost and fairly high accuracy, with WTMAD2 values less than half what can be achieved by the best empirical double hybrids. 43,44,106 Timing Comparisons of Selected Methods. At the request of a reviewer, we now briefly compare the computational cost of the top-performing cWFT methods with each other and with the high-accuracy W4 approach. To keep the playing field level, identical hardware is used for each calculation in these comparisons. Some timing data are presented in Table 4. The post-CCSD(T) steps in W4 particularly CCSDT(Q), which scales as O(n 4 N 5 ) with system size, and CCSDTQ, which scales as O(n 4 N 6 )quickly become the dominant factor in the W4 CPU time, and it is hence not surprising that cc-G4-T and cc-G4-T-DLPNO are 1−2 orders of magnitude faster. (Their costliest steps are CCSD(T)/def2-TZVP and DLPNO-CCSD(T)/def2-TZVPPD, respectively.) For such small molecules (e.g., those in the W4-11 test set), cc-G4-T-DLPNO can actually be more expensive than the canonical counterpart, cc-G4-T. However, for larger molecules that are no longer amenable to W4 calculations with presentday hardware, cc-G4-T-DLPNO does offer an increasing speedup over its canonical sibling as molecules grow larger (illustrated in Table 4 for the n-alkane dimer series), owing to the nearly linear CPU time scaling of DLPNO-CCSD(T). Additionally, replacing RI-MP2 by RI-MP2-F12 offers a very substantial further speedup. For example, for n-heptane dimer, the wall clock time is just 2.5 h for cc-G4-F12-T-DLPNO, compared to 8.6 h cc-G4-T-DLPNO and 13.9 h for cc-G4-T. This overall improvement is primarily attributed to the accelerated basis set convergence of RI-MP2-F12 in cc-G4-F12-T-DLPNO, allowing us to stop at cc-pVTZ-F12 for that step. Execution times for other W4-11 species are listed in the Supporting Information.
Recommendations Concerning ZPE, Thermal, and Relativistic Corrections. All GMTKN55 reference values we considered here and in ref 38 are free of relativistic corrections and at the bottom of the well.
For "turnkey" computational thermochemistry, G4, G4-(MP2), G4(MP2)-XK, and the like include automatic computation of the zero-point vibration energy (ZPVE) and thermal corrections, typically at the same level as used for the geometry optimization. All frequencies are then scaled by a uniform scale factor appropriate for the zero-point vibrational energy: see refs 128−130 for detailed discussions. Suffice to say that, as first pointed out in refs 131, 132, scaling factors for matching experimental infrared and Raman spectra are too small for zero-point vibrational energies, which require less "scaling down" as the anharmonic corrections in them are proportionally only 25% as large. As shown in ref 128, CCSD(T) near the basis set limit has a scaling factor of essentially 1.0 for harmonic frequencies, but about 0.987 for ZPVEs: that such a rudimentary approach as uniform scaling works at all is due to anharmonicity constants being roughly proportional to the corresponding harmonic frequencies. 128 In G4 and G4(MP2), the level of theory chosen is B3LYP/ 6-31G(2df,p), frequencies scaled by 0.9854: in ref 128, this level was found to lead to an RMSD of 0.10 kcal/mol on the ZPVE part of the HFREQ2014 database. G4(MP2)-XK employs the BMK functional 133 instead (frequencies scaled by 0.9766 for ZPVE, 0.9647 for entropy, and 0.9791 for enthalpy corrections): BMK by design 133 will be more reliable for locating transition states but leaves something to be desired in terms of harmonic frequencies.
One might instead consider the most recent range-separated hybrid functionals ωB97X-V 110 or ωB97M-V, 109 which for many properties are best-in-class. 43 Another alternative, however, would be to employ double-hybrid functionals, for which multiple codes have analytical firstand even secondderivative implementations. For codes that lack RI-MP2, double hybrids come at a premium; the most widely used such code, Gaussian 16, in any case cannot evaluate ωB97X-V or ωB97M-V. If RI-MP2 is available, however, double hybrids may actually be faster than range-separated hybrids, except for quite large molecules. The original DSD-PBEP86 functional 107,134 was shown 128 to be capable of reproducing the HFREQ2014 harmonic frequencies database 128 to an RMSD of just 10 cm −1 , which translates into a ZPVE contribution of with RMSD values of 0.048, 0.046, and 0.044 kcal/mol, respectively. Clearly, revDSD-PBEP86-D3(BJ)/def2-TZVP is a felicitous compromise between accuracy and computational cost, and we recommend it as our geometry optimization and harmonic frequencies level of theory. For spin−orbit coupling, like for G4, G4(MP2), and G4(MP2)XK, we recommend using the atomic first-order splitting corrections commonly applied, which are obtained from experimental fine structures (e.g., the Harvard atomic spectra database 135 ): compilations can be found in the Supporting Information of the present paper as well as of ref 136. This leaves the scalar relativistic contribution. For applications like noncovalent interactions and conformer equilibria of biomolecules, this can safely be omitted, while for heavy-element compounds, one may need to use relativistic Hamiltonians from the ground up. In between these two regimes, we know that at least for some of the W4-11 atomization energies (e.g., SO 3 , SiF 4 ), scalar relativistic corrections can reach the 1 kcal/mol regime. 72 (We recently proposed 137 a very simple additive model to rationalize and semiquantitatively predict the magnitudes of relativistic corrections in terms of changes in s population.) One fairly inexpensive workaround would be to apply the same relativistic correction as ccCA, namely, E[DKH2-MP2/ cc-pVTZ-DK] − E[MP2/cc-pVTZ], where cc-pVTZ-DK is a recontraction of the cc-pVTZ basis set for the second-order Douglas−Kroll−Hess (DKH2) 138 Hamiltonian. We will benchmark relativistic correction schemes in greater detail in the near future, in the context of studies on transition-metal compounds and catalysts.

■ CONCLUSIONS
We have extended the hierarchy of the composite wavefunction methods by (a) considering inner-shell correlation in the second-order Møller−Plesset step and (b) replacing the Karlsruhe basis sets by augmented correlation-consistent core−valence basis sets of triple and quadruple ζ quality. The resulting cc-G4-type methods reach WTMAD2 statistics below 1 kcal/mol for the large and diverse GMTKN55 benchmark suite.
Somewhat to our surprise, the lion's share of the improvement did not come from core−core and core−valence contributions, but from enhancements in the basis sets. Nevertheless, the extra cost of including the additional electrons in the RI-MP2 step is such a small component of the overall CPU time that there is no downside to including them.
A thorough investigation of each subset's contribution showed that the statistical improvement for the two-tier methods lies in the larger molecules, e.g., improved energies of amino acid conformers, barrier heights of pericyclic reactions, binding energies in halogenated dimers, relative energies in the tri-and tetrapeptide conformers, binding energies of noncovalently bound dimers, and relative energies in tautomers. In contrast, amelioration in the three-tier methods is seen across the board and is not concentrated in specific subsets.
The minimally empirical cc-G4-T breaches the 1 kcal/mol threshold, with WTMAD2 = 0.90 kcal/mol and only two fitted parameters for the chemically diverse GMTKN55 database. The two-tier cc-G4-T-v7 (inspired by ccCA) reaches WTMAD2 of 0.92 kcal/mol; it is available for the spd block of H-Rn in the PTE, and c MP2/CBS is the single parameter since c CCSD-MP2 = c T = 1.0. As post-HF corrections, the extrapolated E 2 /aug-cc-pwCV{T,Q}Z ( The three-tier cc-G4(MP3)-D method is in the same accuracy range as cc-G4-T-DLPNO. Said method has an MP2/aug-cc-pwCVQZ(-PP) term, an MP3/def2-TZVPP component, and the lower-cost CCSD(T)/def2-SVPD; this combination yields a WTMAD2 of 1.37 kcal/mol. An efficient RI-MP3 algorithm will render cc-G4(MP3)-D more efficient and dramatically reduce its cost and I/O overhead.
All in all, the cc-G4-type approaches offer a material improvement in terms of accuracy over G4-T and similar cWFT approaches, at a comparable computational cost.