The Importance of Tight f Basis Functions for Heavy p-Block Oxides and Halides: A Parallel With Tight d functions in the Second Row

It is well-known that both wave function ab initio and DFT calculations on second-row compounds exhibit anomalously slow basis set convergence unless the basis sets are augmented with additional “tight” (high-exponent) d functions, as in the cc-pV(n+d)Z and aug-cc-pV(n+d)Z basis sets. This has been rationalized as being necessary for a better description of the low-lying 3d orbital, which as the oxidation state increases sinks low enough to act as a back-donation acceptor from chalcogen and halogen lone pairs. This prompts the question whether a similar phenomenon exists for the isovalent compounds of the heavy p-block. We show that for the fourth and fifth row, this is the case, but this time for tight f functions enhancing the description of the low-lying 4f and 5f Rydberg orbitals, respectively. In the third-row heavy p block, the 4f orbitals are too far up, while the 4d orbitals are adequately covered by the basis functions already present to describe the 3d subvalence orbitals.


INTRODUCTION
When, in the early 1990s, the G1 and G2 computational thermochemistry approaches 1 were extended to second-row elements, 2,3 SO 2 was found to be a significant outlier. The G1 team found that adding a third set of d functions to the basis set increased the atomization energy of SO 2 by 8 kcal/mol; they ascribed this to hypervalence. Indeed in G4 theory, 4 an additional layer of d functions is placed on Al-Cl.
Bauschlicher and Partridge 5 studied basis set convergence for SO 2 in detail for both CCSD(T) and the B3LYP density functional approach, and at both levels found hypersensitivity to high-exponent d functions. Martin,6 in 1998, showed that this is the case also for other properties such as vibrational frequencies, as well as that nearly the entire contribution in CCSD(T) can be ascribed to the HF-SCF component of the energy. Moreover, this persisted when the inner-shell orbitals were replaced by an effective core potential, refuting claims that core polarization might be involved.
Later benchmark studies on second-row molecules found more severe examples in SO 3 (40 kcal/mol) 7 and in perchloric acid and perchloric anhydride (50 and 100 kcal/mol, respectively). 8 It was also found (see ref 8 for a discussion) that the strength of the effect was roughly proportionate to the formal oxidation state of the central second-row atom: NBO (natural bond orbital) analysis 9 revealed 8 that, as the central atom becomes more positively charged, the 3d orbitals sink lower and become ever more available for back-donation from chalcogen and halogen lone pairs.
The textbook concept of hypervalence, e.g., d 2 sp 3 hybridization in SF 6 which violates the octet rule, has been comprehensively refuted by Reed and Schleyer 10 and by Cioslowski and Mixon 11 (see Norman and Pringle 12 for a recent review, as well as section 7.4 of Schwerdtfeger, Frenking, and co-workers 13 ). One of us has, however, referred 14 to 3d orbitals in such molecules as "honorary valence orbitals of the second kind". (The "first kind" in that paper refers to subvalence orbitals that are energetically so close to the valence shell that freezing them in correlated calculations may cause catastrophic errors. 15,16 ) In response to the findings described above, the correlation consistent basis sets for second-row elements have been revised 17 to include additional "tight" (i.e., high-exponent) d functions, giving rise to cc-pV(n+d)Z and aug-cc-pV(n+d)Z basis sets.
The authors have both been asked whether this situation is unique to second-row elements, and if yes, why. We shall show below that, in fact, a similar phenomenon occurs for 4f orbitals in the fourth-row heavy p-block elements�the last p-row of the periodic table before the lanthanides�as well as for 5f orbitals in the fifth-row heavy p-block�the last p-row before the actinides. To the best of our knowledge, the first work to refer to the need for tight f functions was the 2005 Weigend−Ahlrichs paper defining the "def2" basis sets. 18 Studies by Dixon et al. 19 on iodine fluorides and their ions, and by Peterson 20 on IOO − , both reference this same necessity, while Hill and Peterson,21 in the paper introducing the cc-VnZ-PP-F12 basis sets for the heavy pblock elements, add 1d1f tight functions for the same reason as well as for describing outer-core correlation. However, while the need for tight f functions on heavy p-block elements has been pointed out prior to this work, it is safe to say it is not broadly known in the theoretical chemistry community and has not been investigated systematically.

METHODS
In the present work, we investigate the effect of tight d and f functions on the total atomization energies of heavy p-block oxides and halides at the HF, DFT, and CCSD(T) 22,23 level of theories. All geometries are optimized at the PW6B95 24 -D3(BJ) 25 /def2-QZVPP 18 level of theory, and XYZ coordinates are reported in the Supporting Information. All calculations are carried out using the Gaussian 16 26 and MOLPRO 27 Program suites, running on the Faculty of Chemistry HPC facility "ChemFarm" at the Weizmann Institute. We considered the correlation consistent basis sets of Dunning and co-workers. 28,29 In this paper, we considered aug-cc-pVnZ 30 and aug-cc-pV(n +d)Z 17 basis sets for second-row elements and aug-cc-pVnZ-PP 31,32 for heavy p-block elements, where PP stands for the Stuttgart−Cologne energy-consistent relativistic pseudopotentials. 33 We also considered weighted core−valence basis sets: aug-cc-pwCVnZ 30,34 for second-row and aug-cc-pwCVnZ-PP 31,32,35 for heavy p-block elements. For brevity, aug-cc-pVnZ(-PP), aug-cc-pV(n+d)Z, and aug-cc-pwCVnZ(-PP) basis sets are denoted as aVnZ(-PP), aV(n+d)Z, and awCVnZ(-PP), respectively. The high-exponent (d,f) functions that are added to the aVnZ(-PP) basis set are taken from the core−valence awCVnZ(-PP) basis sets for the respective elements. All calculations are carried out within the frozen core approximation with Gaussian's int(grid = ultrafine), which corresponds to the pruned direct product of a 99-point Euler-Maclaurin radial and a 590-point Lebedev angular grid or its MOLPRO counterpart. The default tight SCF convergence criterion was used, which corresponds to the norm of the density matrix update being <10 −8 and the largest individual element change being <10 −6 in absolute value. We also carried out natural bond orbital (NBO) population analysis 9 at the Hartree−Fock and DFT levels using the NBO7 program 36 interfaced to both Gaussian 16 26 and MOLPRO. 27 As Gaussian has no CCSD(T) analytical derivatives and hence cannot write out first-order reduced density matrices at the CCSD(T) level, we assessed the importance of (T) for the NBO populations using MOLPRO, both using its built-in NBO implementation and using NBO7 via the interface. The differences between CCSD and CCSD(T) NBO populations were found to be negligible. The BFGS algorithm 37 as implemented in MOLPRO 2022 is used for the exponent optimizations, with the gradient threshold set to 10 −5 . Table 1 presents the difference between the atomization energies calculated with the aVTZ(-PP) basis set and those obtained by adding a progressively large set of (d,f) functions to the basis set at the Hartree−Fock level. As it was previously found (e.g., Section 2.2 in ref 38) for the second-row elements that core−valence basis functions are already in the right exponent range, we assumed that the tight d and f exponents from the cc-pwCVnZ basis sets were a reasonable starting point.

Effect of Tight d and f Functions on the Hartree− Fock Component.
First, we analyze the XO 4 − series (where X = Cl, Br, I, and At). The effect of adding tight d functions to the aVTZ basis set is quite significant for ClO 4 − , as well-documented: 8 for instance, adding one d function to the aVTZ basis set increases the atomization energy of ClO 4 − (i.e., ΔTAE SCF ) by 20.29 kcal/mol. The corresponding values for adding progressively larger sets of tight (d,f) functions to the aVTZ basis set are 23.57 (+2d), 0.72 (+1f), and 24.34 (+2d1f) kcal/mol. Therefore, the large TAE increase seen for aug-cc-pwCVTZ (24.49 kcal/mol) is almost entirely due to the effect of d functions on the Cl atom. Table 1 also lists the overall population of the d and f Rydberg orbitals taken from the NBO population analysis of the HF determinant. (This can be obtained either by modifying the NBO7 source code to print more than two decimal places in the Natural Electron Configuration section of the output or by means of a simple shell and awk script that searches for the Rydberg NAO occupations of a given angular momentum and sums them up.) An NBO analysis of the wave function reveals that the natural population of chlorine d orbitals (q 3d ) increases as much as 0.05 when tight d functions are added. Specifically, δq 3d values, that is, the changes in the NPA populations relative to the aVTZ basis set, are 0.041, 0.040, and 0.047, respectively, for aV(T+d)Z, aVTZ+1d, and aVTZ+2d. As discussed in the Introduction, the chemical significance of the tight d functions to the aVTZ basis set is that they increase the ability of the chlorine 3d orbitals (in ClO 4 − ) to act as back-bonding acceptors. The chlorine 4f orbitals are too far up in energy to participate significantly. These findings are consistent with those reported in ref 8 for Cl 2 O 7 and HClO 4 .
The effect of high-exponent d functions is drastically reduced for third-row pseudohypervalent molecules (e.g., BrO 4 − ).
Adding two high-exponent d functions to the aVTZ-PP basis set affects the SCF component of the total atomization energy by a measly 0.73 kcal/mol, compared to 23.57 kcal/mol for ClO 4 − . Likewise, q 4d is affected by only 0.003. Furthermore, inclusion of tight f functions (i.e., aVTZ-PP+2f) has a negligible effect, and the HF-SCF component of the total atomization energy is affected by just 1.62 kcal/mol, which is considerably less than other sources of basis set incompleteness error for aVTZ-PP. Furthermore, the 4f pop population is essentially unchanged. In the case of BrO 4 − , the bromine 3d orbitals are filled. Bromine 4d orbitals can act as back-bonding acceptors, but enough highexponent d functions are already present in the underlying basis set (for the purpose of describing the 3d subvalence orbital); hence, the additional high-exponent d functions have a negligible contribution. The 4f orbitals are still too far up; hence, they do not benefit from the additional tight f functions.
Next − , the addition of extra highexponent f functions on iodine increases the ability of low-lying virtual 4f orbitals to act as back-bonding acceptors from chalcogen and halogen lone pairs.
The effect of high-exponent f functions is even more pronounced in perastatate, AtO 4 − . For instance, TAE[AtO 4 − ] is increased by appreciable amounts of 8.38 (aVTZ-PP+1f) and 9.98 (aVTZ+2f) kcal/mol. The extra f functions increase q 5f on the At atom by ≈0.02. As expected, only trivial increases are seen for aVTZ+1d (0.07 kcal/mol) and aVTZ+2d (0.20 kcal/mol) basis sets. While the valence 5d orbitals are filled, 6d and 5f orbitals have significant occupations. Again, while there are enough decontracted functions for 6d orbitals from the valence shell of the basis set, extra high-exponent f functions are needed for the description of 5f orbitals.
We can summarize the above findings by saying that, for pblock elements in higher oxidation states, the second row requires tight d functions, while the fourth and fifth rows require tight f functions, and the third row requires neither. The fact that the second-row heavy p-block is approaching the first-row transition elements, and the fourth and fifth row heavy p-blocks the lanthanide and actinide series, respectively, is not a coincidence but directly linked to the energetic proximity of the d and f Rydberg orbitals.
For systems having lower oxidation states on the central atom, proportionally smaller effects are observed. For ClO 3 − and ClO 2 − , the HF-SCF component of the total atomization energy is affected by 12.909 and 5.113 kcal/mol, respectively, whereas for IO 3 − , AtO 3 − , IO 2 − , and AtO 2 − , the effect of the tight f functions ranges from 2.665 to 7.250 kcal/mol (Table 2).
What happens to the equilibrium distances and harmonic frequencies when extra (d,f) functions are added? Table 3 Table: from 0.49 (SeO 3 ) to 0.34 kcal/mol (AsF 5 ). Note also that adding a second hard d function contributes negligibly (0.03 kcal/mol). For TeO 3 , PoO 3 , and SbF 5 , as advocated in the previous section, the high-exponent f functions' contribution increases rapidly for the third and fourth-row pseudohypervalent molecules. The addition of one high-exponent f function increases the total  − , little effect is seen (see Table  1) from addition of either tight d or f functions to the Kr aVTZ-PP basis set. For XeF 6 in aVTZ-PP, on the other hand, the first added tight f function accounts for 3.95 kcal/mol at the HF level. (As expected, we find that additional tight d functions have no impact worth speaking of.) Adding the next f function increases TAE SCF by 2.68 kcal/mol: clearly, this effect is not specific to halogens.
In order to verify that our conclusions are not due to an artifact of our choice of exponents (i.e., tight d and f functions taken from the core−valence basis sets for the respective elements), we also reoptimized them at the HF level in the respective perhalides ClO 4 − , BrO 4 − , IO 4 − , and AtO 4 − . The exponents we found are reported in Table 4. Our main point remains unchanged; however, for higher angular momentum basis sets, such as aVQZ-PP and aV5Z-PP, a single optimized f exponent recovers nearly as much energy as two fixed ones from  The Journal of Physical Chemistry A pubs.acs.org/JPCA Article the awCVQZ(-PP) and awCV5Z-PP basis sets, respectively, especially for AtO 4 − . For aVTZ-PP basis set, the exponents we found are actually, by and large, close to those obtained from the awCVTZ-PP basis set.

Correlation Components.
It has been known since the late 1990s (e.g., ref 6) that the lion's share of the "tight d function effect" in second row elements is recovered at the Hartree−Fock level and that the correlation energy is only weakly affected.
For the XO 4 − series, Table 5 sheds some light on the matter. We present there not only the contributions to the total atomization energy (in kcal/mol) but also the cumulative d-and f-type NAO (natural atomic orbital) populations on the central halogen atom. These latter populations were obtained using the built-in implementation of NBO (natural bond orbital) theory 9 in MOLPRO 2022. 27 In Table 6, a breakdown of the correlation contribution into CCSD and (T) can be found.
The largest correlation contribution is for the d orbital in ClO 4 − , 1.9 kcal/mol, and even that is an order of magnitude less than the HF component. Out of that, 0.3 kcal/mol is accounted for by the triples contribution. These correlation contributions are nearly cut in half for the next basis set in the series, aug-cc-pVQZ vs aug-cc-pV(Q+d)Z, and again the SCF contribution exceeds that of correlation by an order of magnitude.
While in computational thermochemistry protocols like W4 theory, 41 which covers first-and second-row molecules, we have always striven to include tight d functions also for the CCSD(T) correlation steps, the post-CCSD(T) steps always omit them (in part because of the steep computational cost scaling of CCSDT, CCSDT(Q), and CCSDTQ). A very recent study by Karton 42 reconsiders this aspect and indicates that the inclusion of tight d functions even in these steps may have enough of an effect to be significant in high-accuracy computational thermochemistry (see Karton 43 for a very recent review).
For periodate and perastatate, the tight f contribution is in the 8 kcal/mol range, albeit especially for perastatate with a more noticeable correlation contribution. Still, the HF contribution dominates by far.
As a rule of thumb, going up from aVTZ to aVQZ appears to cut the tight d or tight f contribution in half. Going one step further up the hierarchy to aV(5+d)Z and aV(6+d)Z, the tight d contribution for ClO 4 − definitely tapers off smoothly, but contributions in excess of 3−4 kcal/mol are still seen with aV5Z-PP basis sets for IO 4 − and AtO 4 − . − , respectively. The contribution of correlation is a factor of 3−7 smaller, and out of that, (T) accounts for a negligible fraction. The latter means that CCSD will be generally enough for differential NBO analysis of this type�which is quite convenient, as CCSD densities are both much more economical than CCSD(T) and available in more electronic structure programs (notably, Gaussian).
For DFT calculations on heavier elements, the popular Weigend−Ahlrichs "def2" basis sets 18 are widely used owing to their availability for all elements through radon. Weigend and Ahlrichs 18 mention in passing that two additional tight f functions are needed for heavier p-block elements due to what they deem to be "core polarization". Now if this interpretation were correct, then the need for the tight f functions would go away if the subvalence electrons were all replaced by a large-core ECP. We thus repeated the IO 4 − calculations using Martin and Sundermann's large-core SDB-aug-cc-pVTZ and QZ basis sets, 44 which for practical applications were superseded by the small-core "official" (aug-)cc-pVnZ-PP basis sets. 31,45 Despite iodine now having all 46 core electrons, [Kr](4d) 10 , "rolled into" the ECP, it turns out we need the tight f functions just as much, which rules out core polarization. Repeating a numerical experiment that was "left on the cutting room floor" of the final published version of the 1998 SO 2 paper, 6 we replaced the sulfur core electrons in SO 2 and SO 3 with an ECP10MWB pseudopotential (while decontracting the s and p functions of the sulfur aug-cc-pVnZ basis set to avoid contraction mismatch with the ECP) and found that the need for tight d functions was likewise essentially the same as for the all-electron calculation. (See also the very recent large-core ccECP-cc-pV(n+d)Z basis sets of Hill and co-workers. 46 ) 3.3. Density Functional Theory. It stands to reason that the same observations made above for Hartree−Fock would also apply to other independent-particle models, specifically to DFT. For the tight d functions in the second row, this was indeed first spotted 5 at the B3LYP and CCSD(T) levels and later confirmed 6 at the HF level.
For the sake of completeness, we have repeated our analysis for the PW6B95 hybrid meta-GGA functional. As can be seen in the Supporting Information, our observations at the PW6B95 level are fundamentally the same as at the HF level.

CONCLUSIONS
We have examined the effect of tight d and f functions in aug-cc-pVnZ (or aug-cc-pV(n+d)Z) basis sets on SCF and post HF contributions to the total atomization energies and vibrational frequencies for p-block fluorides and oxides. From the present study, we can conclude that the need for added high-exponent d functions to the second-row p-block elements (as done in the (aug-)cc-pV(n+d)Z basis sets 17 ) has a direct (albeit milder) parallel in the fourth and fifth row, but now in terms of highexponent f functions.
The effect is linked to the 3d, 4f, and 5f virtual orbitals of second, third, and fourth row elements approaching the valence shell as one approaches, respectively, the first-row transition metals, the lanthanides, and the actinides. Additionally, with increasing oxidation states, these orbitals will sink still lower and become still better back-donation acceptors from halogen and chalcogen lone pairs.
In the third row p-block, the 4f is still too remote, while the 4d is adequately covered by the basis functions needed to describe the 3d subvalence orbital.
An alternative explanation in terms of core polarization was refuted by means of large-core ECP calculations in which there are no core orbitals left to polarize.