MP2-F12 Basis Set Convergence near the Complete Basis Set Limit: Are h Functions Sufficient?

We have investigated the title question for the W4-08 thermochemical benchmark using l-saturated truncations of a large reference (REF) basis set, as well as for standard F12-optimized basis sets. With the REF basis set, the root-mean-square (RMS) contribution of i functions to the MP2-F12 total atomization energies (TAEs) is about 0.01 kcal/mol, the largest individual contributions being 0.04 kcal/mol for P2 and P4. However, even for these cases, basis set extrapolation from {g,h} basis sets adequately addresses the problem. Using basis sets insufficiently saturated in the spdfgh angular momenta may lead to exaggerated i function contributions. For extrapolation from spdfg and spdfgh basis sets, basis set convergence appears to be quite close to the theoretical asymptotic ∝ L–7 behavior. We hence conclude that h functions are sufficient even for highly demanding F12 applications. With one-parameter extrapolation, spdf and spdfg basis sets are adequate, aug-cc-pV{T,Q}Z-F12 yielding a RMSD = 0.03 kcal/mol. A limited exploration of CCSD(F12*) and CCSD-F12b suggests our conclusions are applicable to higher-level F12 methods as well.


■ INTRODUCTION
Explicitly correlated quantum chemistry methods (see refs 1−3 for reviews) get their name from the inclusion of basis functions that involve explicit interelectronic distances (socalled "geminal" functions, as distinct from "orbital" functions that only involve a single electronic position).
The de facto standard at this point are the F12 geminals introduced by Ten-no: 4 γ γ = − F r r ( ) 1 exp( ) 12 12 (1) which in the limit for small r 12 corresponds to adding r 12 , and hence to satisfying the Kato cusp condition. 5 (For reasons of computational convenience, the Slater function is typically approximated by a linear combination of Gaussians, which actually is reminiscent of the Gaussian geminal approach of Persson and Taylor 6 a decade earlier.) Exigencies for the underlying orbital basis set are quite different from those in a pure orbital calculation, as not so much effort needs to be invested in describing correlation near the interelectronic cusp. Hence basis sets specifically optimized for F12 calculations, such as the cc-pVnZ-F12 (n = D,T,Q,5) family by Peterson and co-workers, 7,8 or their anion-friendly aug-cc-pVnZ-F12 variants, 9 perform much better in an F12 setting than similar-sized basis sets for orbital calculations (such as the correlation consistent family of Dunning and coworkers). Indeed, both their contraction patterns and their exponents are markedly different 7 from those for the corresponding orbital-optimized basis sets. In fact, it has been shown 10 that non-F12 basis sets in thermochemical applications lead to erratic, nonmonotonous basis set convergence due to elevated basis set superposition error.
It is well-known that F12 calculations exhibit fairly rapid basis set convergence in terms of the maximum angular momentum L in the basis set: for two-electron model systems, Kutzelnigg and Morgan 11 showed ∝ L −7 for explicitly correlated calculations, compared to L −3 for singlet-coupled pair correlation energies in pure orbital calculations.
Hence, few F12 practitioners go beyond L = 4 (i.e., g functions), that is, beyond cc-pVQZ-F12 or aug-cc-pVQZ-F12 basis sets. In accurate thermochemical applications, one may find (e.g., refs 10 and 12) cc-pV5Z-F12 or aug-cc-pwCV5Z 13−15 applied, both of which go up to h functions. At least one major electronic structure code often used for F12 calculations, MOLPRO, 16 has a hard limit of i functions overall, and hence (because of the need for one extra L step in the RI-MP2 auxiliary basis set) in practice the orbital basis set in F12 calculations tops out at h functions. Two other codes, Turbomole 17 and the most recent versions of ORCA, are capable of going up to at least i functions in an F12 context.
i functions are routinely employed in accurate orbital-only calculations, usually with basis set extrapolation as in the W4, 18−20 HEAT, 21,22 and FPD 23−25 thermochemical protocols. One study in our group 12 on the W4−17 thermochemical benchmark 26 went up to k functions.
In the present note, we will investigate basis set convergence at the MP2-F12 level for the total atomization energies (TAEs) in the W4-08 subset 27 of W4-17. We will show that, while i functions may still make some contributions to the atomization energies of some molecules (particularly, those featuring multiple bonds between second-row elements), this can be adequately addressed through ∝ L −7 basis set extrapolation, and de facto F12 basis set convergence is achieved with h functions.

■ COMPUTATIONAL METHODS
All calculations were carried out using ORCA 5, 28−30 with density fitting for HF and MP2 disabled through the NoCoSX and NoRI keywords. This leaves only the CABS (complementary auxiliary basis set for F12) as a fitting basis set, thus eliminating the Coulomb-exchange and RI-MP2 fitting basis sets as possible "confounding factors". For technical reasons, UHF references were used for open-shell species.
The largest spdfgh basis set we considered was the reference basis set from Hill et al. 31 which was also used in refs 8 and 10, for calibrating the V5Z-F12 basis set. The sp part of this is the aug-cc-pV6Z (AV6Z for short) basis set from which two (first row) or one (second row) additional primitives were decontracted; the dfgh part is made up of large even-tempered sequences. This basis set we denote REF-h. For its CABS, we used the large uncontracted "reference-ri" basis set associated with it. REF-f and REF-g basis sets were generated by simple truncation of REF-h at f and g functions, respectively. (The linear dependency threshold for CABS was left at its default value of 10 −8 throughout. The MP2-F12 ansatz used in ORCA corresponds to version D, which is a slight simplification 32 of ansatz C, 33,34 together with fixed geminal amplitudes 4,31 determined from the Kato cusp condition. This is basically equivalent to the default of "3C(Fix)" in MOLPRO. 16 ) In addition, for a large subset of molecules, we considered an even larger REF-i basis set to which four i functions have been added. Here, for the CABS basis set, we added a k function with the same exponent to every i function.
Aside from VTZ-F12, VQZ-F12, and V5Z-F12 basis sets, we considered aug-cc-pV(n+d)Z (n = T,Q,5,6; AVnZ+d for short), 35−38 as well as the core−valence correlation basis sets aug-cc-pwCVnZ (n = T,Q,5; awCVnZ for short), 15 and aug-cc-pCVnZ (n = Q,5,6; ACVnZ). 15 (We are only correlating valence electrons here, but it has been shown 8,10,12 that the additional radial flexibility of core−valence basis sets is beneficial for F12 calculations as well.) For VnZ-F12 (n = D,T,Q; VnZ-F12 for short) and AVnZ (n = T,Q,5), standard CABS "OptRI" basis sets 39,40 are available. For AV6Z+d we carried out two sets of calculations; in the first, we repurposed Haẗtig's unpublished DF-AV6Z basis set from the Turbomole library (downloaded from the Basis Set Exchange 41 ) as the CABS; in the second, we employed the reference-RI from ref 31 instead. For the ACV6ZnoI basis set, which is a simple truncation of ACV6Z at h functions, we employed reference-RI as the CABS.
As for the geminal exponent, for VTZ-F12 and VQZ-F12 we set γ = 1.0, and for V5Z-F12 γ = 1.2, as recommended in the literature for these respective basis sets. 7,8 For the REF-n and ACVnZ, a fixed γ = 1.4 was used throughout as per common Throughout this paper, we will refer to two-point basis set extrapolation using the braces notation, for example, V{T,Q}Z-  Table 1. We were able to obtain TAEs for all 96 species in W4-08 through REF-h, plus REF-i for all species except six: B 2 H 6 , C 2 H 6 , CH 3 NH 2 , CH 2 NH 2 , CH 3 NH, and NCCN (dicyanogen).
As mentioned in the introduction, according to Kutzelnigg and Morgan, 11 the l-saturated basis set convergence behavior in an R12 calculation, for a closed-shell pair energy, should asymptotically be ∝ L −7 . Is this indeed the case here? One simple test would be to consider the RMS difference between MP2-F12 atomization energies obtained by A + B. L −7 extrapolation from the REF-{g,h} basis set pair with those obtained in the same manner from the REF-{h,i} pair. As can be seen in Table 1, the RMS difference between these two columns is just 0.006 kcal/mol. We obtain the same value to three decimal places if we minimize RMSD with respect to the REF-{g,h} extrapolation exponent, for which we find 6.514 as the optimum value. Both observations indicate that for these large l-saturated basis sets we are essentially in the ∝ L −7 regime (note that even for REF-{f,g} we already obtain 6.232 as the optimum extrapolation exponent). Incidentally, they also suggest that the MP2-F12/REF-{h,i} values are as close as we can reasonably hope to get to a CBS limit reference. The largest individual differences are for cyclic S 4 , 0.020 kcal/mol, and Si 2 H 6 , −0.020 kcal/mol, followed by 0.017 kcal/mol for AlCl 3 , 0.0152 kcal/mol for N 2 , and 0.012 kcal/mol for CS 2 , SiH 4 , and N 2 O. Convergence for some representative molecules is summarized in Table 2 Using The RMS Δh = 0.046 kcal/mol, but individual values can reach as high as 0.23 kcal/ mol for P 4 , 0.18 kcal/mol for P 2 , and 0.13 kcal/mol for S 4 . Δi is much smaller, 0.011 kcal/mol RMS, but for P 4 and P 2 it reaches 0.042 and 0.039 kcal/mol, respectively. The latter are definitely amounts of interest for high-accuracy electronic structure calculations (as the MP2-F12 basis set incompleteness error is at least a semiquantitative indication of what the corresponding CCSD(T)-F12 error would be). For some perspective, however, the corresponding numbers for orbitalonly MP2 are RMS(Δh) = 1.01 and RMS(Δi) = 0.43 kcal/mol, with the largest individual values (again for P 4 ) being 3.30 and 1.57 kcal/mol, respectively, followed by S 4 , 2.63 and 1.13 kcal/ mol. It is thus clear that MP2-F12 Δi values are 1−1.5 orders of magnitude smaller, and that basis sets convergence has essentially been achieved by the time one gets to REF-h. For the anomalous cases of P 2 and P 4 , the unusually slow basis set convergence has been documented in detail by Persson and Taylor 47 (see also Karton and Martin 48 ). And even for those, L −7 basis set extrapolation can clearly "fill in the cracks": the REF-{h,i} − REF-{g,h} difference for P 4 is just 0.011 kcal/mol, while that for P 2 vanishes entirely. We conclude that, at least for REF-l l saturated basis sets, there appears to be no compelling need to go beyond h functions in F12 calculations. This is good news, of course, for users of F12 correlation codes in program suites such as MOLPRO that do not permit going beyond i functions in the auxiliary basis sets (i.e., beyond h functions in the orbital basis set).
The slower basis set convergence for multiple-second-row species like P 2 , P 4 , S 3 , and AlCl 3 compared to their isovalent first-row counterparts N 2 , N 4 , O 3 , and BF 3 , respectively, can be rationalized to some degree in terms of the lower-lying 3d (and, to a lesser degree, 4f orbitals). This is probably best The Journal of Physical Chemistry A pubs.acs.org/JPCA Article illustrated by considering the d, f, and g populations in a natural population analysis (NPA), 50 presented in Table 3 for selected molecules at the HF/AVQZ+d and CCSD(T)/AVQZ +d levels. One sees a high d population already at the HF level for cases like SO 3 , which is a paradigmatic example of "inner polarization" in which the oxygen lone pairs back-donate into the empty 3d of sulfur (and similarly for other second row elements in high oxidation states; see ref 51 and references therein), which thereby becomes an 'honorary valence orbital of the second kind'. 52 It should be noted that this is primarily an SCF-level effect, and hence the increase in 3d population upon introducing correlation is quite modest in comparison. For cases like P 2 and P 4 , however, this effect is clearly not operativeone does see a significant d population even at the HF level, but it is much enhanced by correlation, especially when compared to the isovalent first-row species N 2 and N 4 . Moreover, this is not just limited to the d orbitals: natural orbital occupations for f and g shells clearly decay more slowly for the second-row species than for their first-row cognates. Similar observations can be made for Cl 2 vs F 2 , S 3 vs O 3 , and the like. cc-pVnZ-F12 and Related Basis Sets. How does the cc-pVnZ-F12 series perform compared to the REF-{h,i} basis set limit estimate? RMSDs are 1.44, 0.33, 0.084, and 0.044 for n = D,T,Q,5, respectively. The latter is close to the "no extrapolation required" goal, but there is still some room for improvement in high-accuracy thermochemistry applications, where one would like a 3σ = 0.1 kcal/mol error bar. AVTZ-F12 performs only marginally better (0.31 kcal/mol) than the underlying VTZ-F12, while AVQZ-F12, at 0.063 kcal/mol, does represent a modest gain over VQZ-F12. It was previously shown 8,12 that the aug-cc-pwCV5Z core−valence basis set (awCV5Z for short) performs remarkably well for F12 calculations, despite not being developed for this purpose at all: the RMSD = 0.030 kcal/mol we find here is consistent with that observation. (Expanding the hydrogen basis set from AV5Z to AV6Z yields the same performance to within statistical noise, RMSD = 0.033 kcal/mol.) The underlying valence-correlation basis set, aug-cc-pV5Z or AV5Z, has a higher RMSD = 0.055 kcal/mol; adding some radial flexibility by instead using AV6Z(h), that is, aug-cc-pV6Z with the highest angular momentum i removed, reduces RMSD slightly to 0.041 kcal/mol. This however is cut in half when the i functions are restored, RMSD = 0.022 kcal/mol for AV6Z. By comparing the two columns of numbers, we find several molecules where i function contributions deceptively seem to  Table 4 below, as are the RMSDs for total atomization energies with them. The TAE-optimized exponent V{Q,5}Z-F12 appears anomalously small, but this is an artifact of the very flat surface there, which precludes a "clean" optimization. E total gives a somewhat better defined minimum, though even there the surface is quite flat. In terms of RMSD, for V{Q,5}Z-F12 all three possible exponents yield the same error. For AV{T,Q}Z-F12, both optimized values from the present work yield superior RMSDs to the value from ref 55; for VTZ-F12 the same holds, although the difference here is quite modest. Finally, for the V{D,T}Z-F12 pair, the two optimized values from the present work are clearly superior to the one from Hill et al. 31 For reasons of numerical stability (notably because it eliminates the anomaly for V{Q,5}Z-F12), we favor the set optimized from total energies. The difference in RMSD is negligible; however, this is good news, since ideally one wants to be in a scenario where basis set extrapolation is only a minor component of the final result and is relatively insensitive to details of the extrapolation procedure. Note that, while all basis set pairs considered here are clearly some distance away from the asymptotic L −7 regime, the exponents do increase in that direction as L gets larger.
For some perspective, we add the RMSDs for regular MP2 with aug-cc-pV(L−1+d)Z and aug-cc-pV(L+d)Z basis sets. It is sobering to see that even AV{5,6}Z+d only reaches the accuracy level of V{D,T}Z-F12, and that V{T,Q}Z-F12 already markedly exceeds it.
What Do These Results Imply for CCSD-F12 and CCSD(T)-F12? At the request of a reviewer, we will address what we may infer for the basis set convergence of higher-level methods like CCSD(T)(F12*) or CCSD(T)-F12b.
First of all, the treatment of parenthetical triples (T) does not benefit in any way from the F12 treatment, 56 so the convergence of that contribution is effectively the same as in an orbital calculation. The latter has been addressed in great detail in a recent book chapter by one of us: 57 suffice to say here that basis set convergence of (T) with angular momentum is actually fairly rapid, and that radial flexibility of the basis set is actually more important than angular flexibility.
This leaves us then with CCSD-F12, or rather with the various practical approximations to it such as CCSD-F12b, 58 CCSD F12 , 59,60 and CCSD(F12*). 61 Of these, CCSD(F12*) has been shown 62 to be the most rigorous approximation that still is computationally feasible, while CCSD-F12b is widely used owing to its implementation for both closed-shell and open-shell cases in the MOLPRO 16 program system.
The basis set convergence of differences between different CCSD-F12 approximations has been studied in great detail in ref 12. In a nutshell: if correlation is predominantly dynamic then these differences decay quickly with increasing basis sets, but even moderate amounts of static correlation can cause nontrivial differences (as large as 0.3 kcal/mol) to persist even for spdfgh basis sets.
As additional complications, all of the available (to us) implementations of approximate CCSD-F12 methods require DFMP2 and JKFit auxiliary basis sets (further complicating comparisons), and the only CCSD(F12*) implementation at our disposal with which we were able to get enough calculations in REF-h basis sets to converge, i.e., that in MOLPRO, is limited to closed shell. Hence we resorted to calculating a number of closed-shell reaction energies instead. The relevant basis set increments are reported in Table 5. Auxiliary basis sets were taken from the Supporting Information of ref 31.
The basis set increments given there for g and h layers are obtained directly. For the i layer, the ORCA MP2-F12 values are calculated directly while the MOLPRO values are obtained by L −α extrapolation. (It can be easily shown that an estimate for the next basis set increment after E(L) − E(L − 1) is given by the following formula:)  For the sake of the estimate, we considered α = 7 (the asymptotic convergence rate) as a best-case scenario, and α = 5 as a worst-case scenario (for VQZ-F12 and V5Z-F12, we have previously found 54 the intermediate value α ≈ 6). The average of both extrapolated values, plus or minus half the difference between them, is taken as our approximate estimate.
As can be seen in Table 5, basis set convergence of especially CCSD(F12*) actually seems, depending on the reaction, comparable to or faster than that of MP2-F12. And for those reactions where Δi is still somewhat significant at the MP2-F12 level, the corresponding estimated CCSD-F12b and especially CCSD(F12*) increments are actually up to 3 times smaller.
We hence conclude that our conclusions about the lack of necessity of i functions in MP2-F12 calculations also apply to CCSD-F12b, CCSD(F12*), and other approximations to CCSD-F12.
As an aside, we note that the CCSD(F12*) − CCSD-F12b differences, while nontrivial, are conspicuously smaller than what was observed for cc-pVQZ-F12, cc-pV5Z-F12, and augcc-pwCV5Z in ref 12. Clearly, here too, the greater radial flexibility of the REF-n basis sets puts them at an advantage, even as they are unwieldy for practical production calculations.

■ CONCLUSIONS
We may conclude that for basis sets that are adequately saturated in the angular momenta represented, the contribution of i functions to total atomization energies is minimal (about 0.01 kcal/mol RMS): the largest individual contributions we have found here only reach 0.04 kcal/mol (for P 4 and P 2 ), and even that is adequately captured by L −7 extrapolation from REF-{g,h}, which is closer than 0.01 kcal/mol RMS to the CBS limit. (In this L region, basis set convergence behavior for MP2-F12 is found to be quite close to the asymptotic 11 L −7 .) Comparison of AV6Z with and without i functions indicates more significant i contributions, owing to insufficient radial saturation. The h contribution is more significant, reaching 0.23 kcal/mol for P 4 . REF-{g,h} extrapolation is within less than 0.01 kcal/mol RMS of REF-{h,i}, the largest single difference being 0.02 kcal/mol for S 4 . This means that the REF-{g,h} L −7 extrapolation is an acceptable proxy for the F12 basis set limit, and that h functions are sufficient even for highly demanding F12 applications.
If one-parameter extrapolation with an adjustable exponent is permissible, spdf and spdfg basis sets are adequate, with augcc-pV{T,Q}Z-F12 yielding the RMSD = 0.03 kcal/mol. Finally, the somewhat slower basis set convergence in the second rowespecially when there are multiple adjacent such elementscan be rationalized to some degree in terms of lower-lying 3d and 4f orbitals.
A limited investigation for closed-shell reactions indicates that basis set convergence of CCSD(F12*) and CCSD-F12b is comparable to or faster than that of MP2-F12, and that hence our conclusions about the relative insignificance of i-function contributions apply to F12 methods more broadly. Values that do not include an uncertainty interval in parentheses were calculated directly. Values that do include such an uncertainty interval were estimated as the average between L −7 (ideal case) and L −5 (pessimistic scenario) extrapolation, with one-half the distance between the two values taken as the uncertainty interval.