Extrapolation to the Limit of a Complete Pair Natural Orbital Space in Local Coupled-Cluster Calculations

The domain-based local pair natural orbital (PNO) coupled-cluster DLPNO-CCSD(T) method allows one to perform single point energy calculations for systems with hundreds of atoms while retaining essentially the accuracy of its canonical counterpart, with errors that are typically smaller than 1 kcal/mol for relative energies. Crucial to the accuracy and efficiency of the method is a proper definition of the virtual space in which the coupled-cluster equations are solved, which is spanned by a highly compact set of pair natural orbitals (PNOs) that are specific for each electron pair. The dimension of the PNO space is controlled by the TCutPNO threshold: only PNOs with an occupation number greater than TCutPNO are included in the correlation space of a given electron pair, whilst the remaining PNOs are discarded. To keep the error of the method small, a conservative TCutPNO value is used in standard DLPNO-CCSD(T) calculations. This often leads to unnecessarily large PNO spaces, which limits the efficiency of the method. Herein, we introduce a new computational strategy to approach the complete PNO space limit (for a given basis set) that consists in extrapolating the results obtained with different TCutPNO values. The method is validated on the GMTKN55 set using canonical CCSD(T) data as the reference. Our results demonstrate that a simple two-point extrapolation scheme can be used to significantly increase the efficiency and accuracy of DLPNO-CCSD(T) calculations, thus extending the range of applicability of the technique.


INTRODUCTION
The coupled-cluster method with singles, doubles, and perturbatively included triples excitations, i.e., CCSD(T), 1 can be used to compute relative energies with errors compared to experimental reference data that often fall within the experimental uncertainty. 2 Therefore, it is generally regarded as the "gold standard" of quantum chemistry. Unfortunately, its computational cost scales as the seventh power of the system size, and hence CCSD (T) calculations are only affordable for very small benchmark systems. To overcome this limitation, linear scaling local CCSD (T) variants that exploit the rapid decay of electron correlation with the interelectronic distance have been developed. 3−5 In particular, the domain-based local pair natural orbital CCSD(T) method [DLPNO-CCSD(T)] 5−13 has allowed for the calculation of single point energies for entire proteins. 14 To achieve such an efficiency, the coupled-cluster equations are solved in an extremely compact virtual space, which is tailored for each electron pair and spanned by a small set of pair natural orbitals (PNOs). 6 Only the PNOs with an occupation number greater than a threshold, denoted as T CutPNO , are included in the correlation space of a given electron pair.
When TightPNO settings 5,13 are used (in this case, T CutPNO is set to 10 −7 by default), DLPNO-CCSD(T) typically retains around 99.9% of the canonical CCSD(T) correlation energy. Thus, as shown on many benchmark data sets, 13,15−18 it provides essentially the same accuracy and reliability of the parent canonical method. On the 1505 reactions of the GMTKN55 superset, 19 DLPNO-CCSD(T)/TightPNO results showed mean absolute errors (MAEs) compared to canonical CCSD(T) below 0.2 kcal/mol on 39 subsets, between 0.2 and 0.4 kcal/mol on 14 subsets and larger than 0.4 kcal/mol for two subsets (RC21 and MB16−43). 18 Even more accurate results could in principle be achieved by including more PNOs in the virtual space, i.e., by tightening the T CutPNO threshold to 10 −8 or 10 −9 . Unfortunately, the computational cost of such calculations often approaches that of canonical CCSD (T).
Herein, we suggest an alternative computational strategy to reduce the PNO truncation error. In particular, DLPNO-CCSD(T) results obtained with different T CutPNO thresholds are extrapolated to the complete PNO space limit using a twopoint extrapolation scheme. Two different approaches are thoroughly tested, namely T CutPNO = 10 −5 /T CutPNO = 10 −6 (denoted hereafter as 5/6 extrapolation) and T CutPNO = 10 −6 / T CutPNO = 10 −7 (denoted hereafter as 6/7 extrapolation). Our results demonstrate that the 5/6 extrapolation approaches the TightPNO accuracy at a fraction of its computational cost. Moreover, the 6/7 extrapolation significantly improves the accuracy of TightPNO calculations, roughly halving the MAE with respect to canonical CCSD (T) for all the tested benchmark sets.
The paper is organized as follows. The PNO extrapolation scheme is described in Section 2. The accuracy of the method is tested on the GMTKN55 superset in Section 3.1. The basis set dependence of the extrapolated results and the efficiency of the extrapolation scheme are assessed in Section 3.2. The accuracy and efficiency of the proposed approach is further discussed in Section 3.3 on a particularly challenging application for standard computational methods, i.e., the quantification of dispersion-dominated interactions. Section 4 is devoted to the concluding remarks of this study.
Overall, the present study involves 883 molecules and 499 reactions.
2.2. PNO Extrapolation. To investigate the dependence of the DLPNO-CCSD(T) correlation energy on the dimension of the PNO space, we initially considered the benzene dimer, in which the two monomers are arranged in a parallel shifted configuration (the geometry was taken from reaction 24 of the S66 set). For this system, the dependence of the DLPNO-CCSD(T)/TightPNO/aug-cc-pVDZ-DK correlation energy on the T CutPNO threshold 10 −X (X = 5, 6, 7, 8, and 9) is shown in Figure 1.
Consistent with earlier PNO studies, 20,21 the correlation energy converges smoothly by tightening the T CutPNO threshold. The best fit is obtained with the following functional form where E X is the correlation energy obtained with T CutPNO = 10 −X and E is the target energy (for a given basis set) at the complete PNO space limit; A and β are constants (A = 84.92; β = 5.55). The smooth convergence of the correlation energy suggests that a simple two-point extrapolation procedure could be used to extrapolate the results obtained for small X values to the complete PNO space limit. By defining Y = X + 1 and E X and E Y as the correlation energies obtained with the corresponding T CutPNO = 10 −X and T CutPNO = 10 −Y values, one obtains the following expression for the correlation energy at the complete PNO space limit Using the following substitution Equation 2 can be written in the compact form is the basis of the PNO extrapolation scheme proposed in this work. Hence, all that is required to perform the extrapolation is the calculation of E X and E Y , together with the knowledge of the single parameter F. Importantly, E X and E Y must be obtained from DLPNO-CCSD(T) calculations that use exactly the same computational settings (e.g., the same basis set and DLPNO settings) with the only exception being the T CutPNO threshold. For the benchmark sets investigated in this work, the optimal F value, i.e., the one that minimizes the error with respect to canonical CCSD(T), deviates within the 1.5 ± 0.2 range. It is important to note that the dependency of the MAE on the F value is very small within this range. Hence, for the sake of simplicity, the recommended F value is 1.5 for both 5/6 and 6/7 extrapolations.
Note that eq 4 does not make any strong assumption on the functional form that better describes the convergence of the correlation energy with respect to the T CutPNO threshold. For example, an exponential functional form of the type would provide the following expression for the two-point extrapolated energy However, this is equivalent to eq 4 for an F value of It is worth mentioning here that eq 4 (as well as eqs 2 and6) is analogous to that commonly used in two-point extrapolation schemes to the complete basis set (CBS) limit. 22−25 In these schemes, E is the CBS extrapolated energy and E X and E Y are the energies obtained with two basis sets of consecutive X and Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article Y cardinality. In the extrapolation of HF energies, 22 it is usually assumed that E X −E is proportional to e −α(X)γ , where γ is typically 1 or 0.5, and α is a constant obtained from fits to accurate benchmark energies. Correlation energies are usually extrapolated by considering that E X −E is proportional to X -β (with β ≈ 3). 23 Importantly, when the β value (or equivalently F) is obtained from a root mean square (RMS) fit to benchmark data, the extrapolated energies were shown to improve notably. 24,25 2.3. Efficiency Considerations. In the present work, E X and E Y were obtained from separate DLPNO-CCSD(T)/ TightPNO calculations using T CutPNO = 10 −X and T CutPNO = 10 −Y , respectively (Y = X + 1). As T CutPNO = 10 −X settings are significantly more efficient than T CutPNO = 10 −Y settings, the overall X/Y extrapolation procedure is only slightly more expensive than the corresponding T CutPNO = 10 −Y calculation. For example, the computational cost of T CutPNO = 10 −6 calculations is typically 30−40% that of T CutPNO = 10 −7 calculations, as shown in Sections 3.2 and 3.3. As it will be demonstrated numerically below for a large data set, X/Y extrapolation calculations are typically 2 times faster than the corresponding T CutPNO = 10 −(Y+1) calculations. This indicates that the PNO extrapolation scheme provides a cost-effective alternative to tightening the T CutPNO threshold.
It is worth mentioning here that a different computational strategy was also tested as an attempt to further increase the efficiency of the extrapolation approach proposed here. As the T CutPNO = 10 −X virtual space is a subset of the T CutPNO = 10 −Y virtual space if X < Y, an approximate estimate for the T CutPNO = 10 −X energy can be obtained from the T CutPNO = 10 −Y amplitudes. This could potentially save computer time by avoiding redundant operations. From a technical point of view, this was achieved by exploiting the fact that the correlation energy can always be written as a sum of "pair correlation energies" ε ij , i.e., contributions from pairs of localized occupied orbitals ij. Within the PNO approximation, the strong-pair 11 where a ij and b ij are the PNOs belonging to the ij pair, τã b ij are the contravariant amplitudes are the cluster amplitudes. For each ij pair, upon entering the PNO generation for the 10 −Y calculation, the PNOs for a threshold of 10 −X were also automatically generated. This provides us with two sets of perturbative weak-pair and PNO incompleteness corrections, namely ΔE wp (10 −Y ) and ΔE wp (10 −X ). The overlap matrix between the PNOs in the T CutPNO = 10 −Y and T CutPNO = 10 −X spaces S ij was also computed and stored.
Then, the solution of the DLPNO-CCSD equations for T CutPNO = 10 −Y provided us with the corresponding τ a ij b ij ij (10 −Y ) amplitudes and hence with pair correlation energies (eq 8).
The overall DLPNO-CCSD correlation energy associated with T CutPNO = 10 −Y was computed by adding ΔE wp (10 −Y ) to the correlation energy from the strong pairs. In the next step, S ij was used to project the τ a ij b ij ij (10 −Y ) amplitudes from the This projection is necessary since the PNOs for a given pair are recanonicalized before entering the cluster amplitude iterations. The projected amplitudes were transformed back to the original T CutPNO = 10 −Y space and used to estimate the strong-pair component of the correlation energy associated with T CutPNO = 10 −X . ΔE wp (10 −X ) was added to this energy to compute the overall DLPNO-CCSD correlation energy in the T CutPNO = 10 −X space. Note that in this approach we generate exactly the same PNOs as in a separate 10 −X calculation. However, the resulting amplitudes are not identical to the "genuine" 10 −X amplitudes, since the relaxation of the amplitudes after the projection is neglected. The triples correction contribution can be computed as usual 14 starting from converged doubles amplitudes for T CutPNO = 10 −Y and from projected doubles amplitudes for T CutPNO = 10 −X .
This procedure was implemented in ORCA and tested on the S22 and WATER27 sets. Although this scheme allows us to skip the DLPNO-CCSD part for T CutPNO = 10 −X calculations, which accounts for ∼25% of the overall computational time, it also introduces a non-negligible error in the T CutPNO = 10 −X amplitudes that deteriorates the quality of the extrapolated results (see AutoExtr-S22 and AutoExtr-WATER27 sheets of the Supporting Information). Given the need to reconverge the amplitudes and the fact that the triples amplitudes must be iterated too with the reconverged doubles amplitudes, the computational savings that can be realized over two separate calculations are very modest. This is particularly true, since the tighter threshold calculation needs to be performed in either scheme and dominates the overall computation time. Hence, for all intent and purposes, the extrapolation procedure discussed in Section 2.2 is recommended. Its accuracy and efficiency are discussed in the following section.  29 For open-shell molecules, the reference energy was obtained at the quasi-restricted orbital (QRO) level. 30 The perturbative triples contributions were calculated using the recently published iterative (T 1 ) algorithm for both closed-shell 31 and open-shell 32 systems. Since here we use the improved (T 1 ) algorithm for the open-shell system, 32 the present open-shell DLPNO-CCSD(T) results are not exactly the same as the previously reported ones. 18 The variations are, however, very small. Integral evaluations in DLPNO-based correlation energy calculations need an auxiliary basis set. This was generated using the automated auxiliary basis set construction module of ORCA (the so-called "autoaux") with the maximum possible angular momentum. 33 Canonical CCSD(T) calculations were performed without any resolution of identity (RI) approximation. In all cases, DLPNO-CCSD(T) calculations were carried out using TightPNO settings (e.g., the T CutPairs threshold was set to 10 −5 ) in conjunction with different T CutPNO thresholds.
Unless stated otherwise, both canonical and DLPNO-CCSD(T) calculations were performed with the relativistic second-order Douglas−Kroll−Hess Hamiltonian (DKH2) 34,35 in conjunction with the appropriate aug-cc-pVDZ-DK basis set, 36−38 consistent with the previous benchmark study on the GMTKN55 superset. 18 Although correlation energies obtained with double-ζ quality basis sets might suffer from the severe Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article basis set incompleteness error, the comparison of DLPNO-CCSD(T) and CCSD(T) results obtained with the same basis set makes it possible to assess the error inherent to the DLPNO approximation. 18 For the W4−11, 39 YBDE18, 40 and S66 41 sets, DLPNO-CCSD(T) and CCSD(T) results were also compared for larger basis sets, as discussed in Section 3.2. For the W4−11 set, the aug-cc-pVTZ-DK and aug-cc-pVQZ-DK basis sets 36−38 were used with the same settings described above. For the YBDE18 and S66 sets, canonical CCSD(T) calculations are not possible with quadruple-ζ quality basis sets. Therefore, previously published 40,42 explicitly correlated CCSD(T) energies were used as reference data in these cases. DLPNO-CCSD(T) interaction energies were computed using aug-cc-pVTZ and aug-cc-pVQZ basis sets. In the correlation part, the auxiliary basis sets constructed with the autoaux module (see above) were used for the YBDE18 set, while the matching aug-cc-pVTZ/C and aug-cc-pVQZ/C auxiliary basis sets were used for the S66 set. Consistent with the reference data, 42 the interaction energies of the S66 set were corrected for the BSSE. CBS(3/4) extrapolation of the correlation energies was carried out as described previously. 43 Hence, eq 2 was used with β = 3.05 (or, equivalently, eq 4 with F = 1.712).

RESULTS AND DISCUSSION
In this section, the accuracy and efficiency of the PNO extrapolation (eq 4 with F = 1.5) is discussed on the GMTKN55 superset. As shown in the "SUMMARY" sheet of the Supporting Information, PNO extrapolation improves the accuracy of both the CCSD and the (T) components of the correlation energy. For the sake of simplicity, only the accuracy of the extrapolated DLPNO-CCSD(T) energies is discussed in the following.
In contrast, the 5/6 extrapolation (blue bars in Figure 2) provides MAEs within chemical accuracy, i.e., less than 1 kcal/ mol, for all benchmark sets. The only exception is the challenging MBE16−43 set, for which an MAE of about 3 kcal/mol was observed. These figures are similar to that obtained for the more expensive T CutPNO = 10 −7 calculations (gray bars in Figure 2). As mentioned above, T CutPNO = 10 −7 is the default for TightPNO calculations and it is often recommended for the study of noncovalent interactions. 44 For noncovalent interactions (the last five subsets in Figure 2), the 5/6 extrapolation provides results that are more accurate than those from T CutPNO = 10 −7 calculations when the aug-c-pVDZ-DK basis set is used. However, in general, both approaches provide similar accuracy. Therefore, the 5/6 extrapolation should be considered as a cost-effectivebut still reasonably accuratealternative to TightPNO calculations on large molecular systems.
Excluding the challenging MB16−43 subset, standard TightPNO calculations give MAEs less than 0.6 kcal/mol, whilst 5/6 extrapolated reaction energies show MAEs less than 0.8 kcal/mol. For these subsets, the 6/7 extrapolation (yellow bars in Figure 2) is significantly more accurate, providing a near sub-kJ/mol accuracy for all subsets (with MAEs of less than 0.27 kcal/mol). For example, on the S66 set for noncovalent interactions, the 6/7 extrapolation (MAE = 0.11 kcal/mol) reduces the TightPNO error (MAE = 0.20 kcal/ mol) by a factor of two. Interestingly, despite the significant increase in the computational cost when tightening T CutPNO to 10 −8 and 10 −9 , the associated 7/8 (MAE = 0.08 kcal/mol) or 8/9 (MAE = 0.06 kcal/mol) extrapolations bring only a slight improvement to the 6/7 extrapolated results (see the S66 sheet of the Supporting Information). Hence, the 6/7 extrapolation already provides almost converged results with respect to the T CutPNO threshold, and the residual error is likely to be associated with the other approximations used in DLPNO-CCSD(T) calculations.
Finally, it is worth mentioning that the relatively large error in the reaction energies of the MBE16−43 set does not change significantly if all the electron pairs are included in the coupledcluster treatment (T CutPairs = 0) (see the MBE16−43 sheet of the Supporting Information). Hence, the main source of error in this case is the PNO truncation error, as demonstrated by the excellent performances of the extrapolation schemes.
3.2. Basis Set Dependence and Efficiency. In this section, the basis set dependence and the efficiency of the PNO extrapolation scheme have been investigated on the W4−11, YBDE18, and S66 subsets.
For the W4−11 set (Figure 3a), reaction energies computed using the 5/6 extrapolation are more accurate than those obtained using T CutPNO = 10 −6 settings for all basis sets. Analogously, the 6/7 extrapolation results are generally more accurate than T CutPNO = 10 −7 results. Importantly, the accuracy For the YBDE18 set (Figure 4a), the 5/6 extrapolation provides an accuracy between that of T CutPNO = 10 −6 and of T CutPNO = 10 −7 . The 6/7 extrapolation provides results that are very close to those obtained using T CutPNO = 10 −8 . However, since the results obtained with T CutPNO = 10 −7 and 10 −8 are also very close to each other, the 6/7 extrapolation reduces the MAE with respect to T CutPNO = 10 −7 by only 0.01 kcal/mol.
The timing associated with the various DLPNO-CCSD(T) calculations for the W4−11, YBDE18, and S66 subsets is shown in Figures 3b, 4b, and 5b, respectively. In average, the computational cost of the 5/6 extrapolation is only 46% higher than that of T CutPNO = 10 −6 calculations. Compared to T CutPNO = 10 −7 calculations, the 5/6 extrapolation is faster by a factor of 2.2. Analogously, the computational cost of the 6/7 extrapolation is only 38% higher than that of T CutPNO = 10 −7 calculations. Again, the 6/7 extrapolation calculations are 2.0 times faster than those of T CutPNO = 10 −8 . In summary, X/Y extrapolation calculations are about twice more efficient than the corresponding 10 −(Y+1) calculations.
These computational experiments demonstrate that the suggested PNO extrapolation scheme is valid irrespective of the basis set size and provides a cost-effective but still very accurate alternative to tightening the T CutPNO threshold.
3.3. Dispersion-Dominated Interactions. Herein, we discuss the accuracy and efficiency of the extrapolation scheme on three representative examples of dispersion-dominated interactions, i.e., the stacking interaction in the uracil dimer (reaction 26 of the S66 set, see Figure 6), the interaction of   Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article two pentane molecules (reaction 34 of the S66 set, see Figure  7), and the bowl to cage isomerization reaction of C 20 (reaction 2 of the DC13 set, see Figure 8).
For these systems, the 5/6 extrapolation does not generally provide a significant improvement over T CutPNO = 10 −6 because the T CutPNO = 10 −5 results are dominated by the perturbative contribution, which largely overestimates the strength of dispersion interactions. The default TightPNO calculations also feature relatively large errors, i.e., 0.31, 0.12, and 2.41 kcal/mol for the uracil dimer interaction, pentane dimer interaction, and the C 20 isomerization reaction, respectively. In contrast, the 6/7 extrapolation reduces the error to 0.01, 0.01, and 1.05 kcal/mol, respectively. With a ∼40% smaller computational cost, the 6/7 extrapolation is even slightly more accurate than T CutPNO = 10 −8 . As the latter becomes unaffordable already for medium-sized systems (also due to its huge memory requirements), the 6/7 extrapolation provides a cost-effective alternative to tightening the T CutPNO threshold, yielding essentially canonical CCSD(T) accuracy even for such challenging reactions.

CONCLUSIONS
We proposed a computational strategy to approach the complete PNO space limit in DLPNO-CCSD(T) calculations. Results obtained using different T CutPNO values were extrapolated using a two-point extrapolation scheme, while keeping all the remaining parameters of the calculations fixed to the TighPNO default. Our scheme was validated on the most challenging subsets of the GMTKN55 superset, including a total of 883 molecules and 499 reactions.
Typically, the 5/6 extrapolation provides an accuracy between that of T CutPNO = 10 −6 and T CutPNO = 10 −7 calculations at a small fraction of the T CutPNO = 10 −7 computational cost. It is especially recommended as a costeffective alternative to TightPNO calculations for very large systems, for which standard TightPNO calculations are computationally too demanding.   In all cases, the 6/7 extrapolation provides results that are more accurate than those obtained from standard TightPNO calculations. Its accuracy is comparable to that obtained using T CutPNO = 10 −8 but at a substantially lower computational cost. The residual error is associated with the remaining approximations made in DLPNO-CCSD (T)