Toward Pair Atomic Density Fitting for Correlation Energies with Benchmark Accuracy

Pair atomic density fitting (PADF) has been identified as a promising strategy to reduce the scaling with system size of quantum chemical methods for the calculation of the correlation energy like the direct random-phase approximation (RPA) or second-order Møller–Plesset perturbation theory (MP2). PADF can however introduce large errors in correlation energies as the two-electron interaction energy is not guaranteed to be bounded from below. This issue can be partially alleviated by using very large fit sets, but this comes at the price of reduced efficiency and having to deal with near-linear dependencies in the fit set. One posibility is to use global density fitting (DF), but in this work, we introduce an alternative methodology to overcome this problem that preserves the intrinsically favorable scaling of PADF. We first regularize the Fock matrix by projecting out parts of the basis set which gives rise to orbital products that are hard to describe by PADF. After having thus obtained a reliable self-consistent field solution, we then also apply this projector to the orbital coefficient matrix to improve the precision of PADF-MP2 and PADF-RPA. We systematically assess the accuracy of this new approach in a numerical atomic orbital framework using Slater type orbitals (STO) and correlation consistent Gaussian type basis sets up to quintuple-ζ quality for systems with more than 200 atoms. For the small and medium systems in the S66 database we show the maximum deviation of PADF-MP2 and PADF-RPA relative correlation energies to DF-MP2 and DF-RPA reference results to be 0.07 and 0.14 kcal/mol, respectively. When the new projector method is used, the errors only slightly increase for large molecules and also when moderately sized fit sets are used the resulting errors are well under control. Finally, we demonstrate the computational efficiency of our algorithm by calculating the interaction energies of large, non-covalently bound complexes with more than 1000 atoms and 20000 atomic orbitals at the RPA@PBE/CC-pVTZ level of theory.


INTRODUCTION
There is great scientific and commercial interest to successfully predict the electronic structure of molecules and materials. Toward this aim, density functional theory (DFT) 1−3 and the Hartree−Fock (HF) method 4−8 are indispensable tools, but often they capture exchange−correlation effects only insufficiently. For instance, dispersion and polarization effects which derive mainly from mid-to long-range electron correlation 9 are not accounted for by standard DFT functionals. 10,11 While empirical dispersion corrections have been highly successful in describing some of these aspects of electron correlation, 9,12 methods based on coupled cluster (CC) 13−17 theory are generally considered to be the most precise class of correlation methods and have been the workhorse for many high-precision calculations. 18 However, given their large computational cost, 19 their usage is in practice still limited to relatively small systems. Using massively parallel implementations 20,21 and/or local approximations, 22−25 it is in principle possible to treat larger molecules. 26,27 Massively parallel calculations do, however, require large computational resources, while the errors introduced by the local approximations can give rise to uncertainties of the order of several kcal/mol in some cases. 26,28,29 An attractive alternative is provided by double-hybrid (DH) density functionals, 30,31 which usually offer a good compromise between accuracy and computational effort. 32−35 DHs combine DFT with methods which treat correlation effects explicitly, mostly second-order Møller−Plesset perturbation theory (MP2) 36 but (more recently) 37−39 also the random-phase approximation (RPA) 40,41 has been considered for this task. While MP2 is not necessarily very accurate and limited in its applicability, RPA is gaining popularity. 42−50 RPA is applicable to a wider class of systems than MP2, as it is unaffected by some of the drawbacks of MP2, namely, divergences for metals, small band gap systems, 40,51 and large molecules. 52 Without increasing the computational effort one can also greatly improve upon the accuracy of RPA by using σ-functionals. 53−55 Another popular alternative is the inclusion of screened exchange, which comes essentially at MP2 cost. 56−59 RPA and MP2 are typically implemented with N ( ) 4 60 and N ( ) 5 scaling with system size using global density fitting (DF) 61−64 (DF-RPA 60 and DF-MP2, 64,65 respectively). Efficient implementations of these methods enable applications to systems with a few hundred of atoms even at the quadruple-ζ level, 50 but larger systems are out of reach on standard hardware. For this reason, more efficient algorithms and approximate implementations have been developed to improve the scaling of both RPA and MP2. Common strategies are the usage of localized orbitals, 66−70 cluster-in-molecule (CIM) approaches, 71,72 or implementations which rely on sparsity in the atomic orbital basis. 73−90 In the latter class of methods, implementations using local DF approximations have gained increasing popularity. 80,86,87 While they do not achieve linear scaling with systems sizes, they typically come with a very small prefactor 87 and are believed to only introduce minor errors compared to canonical, molecular orbital based implementations. 87,91 One particular flavor of local DF approximations is pair atomic density fitting (PADF), 62,87,92−94 also known as pair atomic resolution of the identity (PARI), 87,95 concentric atomic density fitting, 96,97 or RI-LVL. 98 PADF had originally been introduced to speed up the construction of the Hartree contribution in nonhybrid DFT calculations but was later generalized to accommodate also the formation of the exact exchange matrix in HF or hybrid DFT calculations. For a comparison of PADF-HF to other approximate exact exchange algorithms, see ref 99. PADF can also be used to reduce the asymptotic scaling of RPA and spin-opposite-scaled (SOS)-MP2 100 to formally cubic. 87 However, quadratic scaling is often observed in practice since the prefactor of the cubic steps is small. 87 This speedup comes at the cost of errors which can in principle cause variational collapse of HF calculations to solutions corresponding to artificially low energies. 94 In practice, this is usually only an issue when insufficiently large fit sets are employed. It is often more problematic that PADF also leads to an artificial increase of the magnitude of correlation energies, as evidenced by numerical results (also see the Supporting Information). 101 Unless unrealistically large fit sets are used, it is difficult to avoid these errors. This can then also affect the precision of relative energies 102 that are the typical target of quantum chemical calculations.
In the Amsterdam modeling suite (AMS), 103,104 the issues of PADF are mitigated by applying a projector to the exact HF exchange matrix in order to prevent variational collapse. The same projector can also be applied to orbital coefficients in order to reduce errors in post-SCF methods. This strategy has already been successfully employed in the past for many-body perturbation theory (MBPT) based calculations. 59,105 However, systematic benchmarks against other codes using the same basis sets were not yet performed. For this purpose, we report here an implementation of PADF-MP2 and PADF-RPA in the numerical atomic orbital (NAO) based code BAND. 106−109 This implementation allows us to use Gaussian type orbitals (GTO) as basis sets and therefore to systematically investigate the accuracy of the PADF-MP2 and PADF-RPA implementations in AMS for relative correlation energies with respect to global DF-based implementations (DF-MP2, DF-RPA) in Psi4 110 and TURBOMOLE. 111 Similar benchmarks of PADF-based correlation energies have already been reported by Ihrig et al. using the FHI-AIMS code 112−115 who found excellent  agreement of PADF-MP2 and PADF-RPA with DF-MP2 and  DF-RPA 98 and also by Tew. 102 However, these authors focused on small-and medium-sized molecules only. To assess whether this accuracy also holds for larger systems and large basis sets, we herein report benchmarks for non-covalently bound dimers with up to 200 atoms and for large GTO type basis sets up to quintuple-ζ (5Z). In our benchmarks, we focus exclusively on non-covalent interactions. This is mostly due to the availability of accurate reference values for data sets containing large molecules, like the L7 116 or the S30L 117 compilations.
This work is organized as follows: In Section 2 we review the PADF method and introduce the projector method (PM). We also sketch how PADF can be used to achieve low-order scaling implementations of RPA and SOS-MP2. For more details, we refer to previous work. 87,118 After an outline of our computational details in Section 3, we assess the accuracy of relative PADF-MP2 and PADF-RPA correlation energies in Section 4. Our calculations show that PADF-SOS-MP2 is in excellent agreement with DF-SOS-MP2. Our interaction energies for the S66 data set 119 show maximum absolute deviations for PADF-MP2 and PADF-RPA with respect to the reference results of 0.07 and 0.14 kcal/mol, respectively, irrespective of the chosen basis set. For much larger molecules, we observe only a negligible loss in accuracy for SOS-MP2 and we find the PM to be decisive to obtain accurate results. The loss in accuracy is more pronounced for PADF-RPA, but errors are smaller than errors due to basis set incompleteness or due to local correlation approximations for large molecules 26,27 To showcase the efficiency of our implementation, we calculate PADF-RPA interaction energies of eight large non-covalently bound complexes at the triple-ζ (TZ) level, with up to 1000 atoms and more than 20000 AOs. Finally, Section 5 summarizes and concludes this work. are the electron−electron interaction, = | | r r v ( , ) r r c 1 , which is a key ingredient in HF and post-HF methods, and the non-interacting polarizability P(r,r′) (for a certain value of imaginary frequency or time), which appears for instance in RPA or SOS-MP2. The symbol χ μ refers to an atomcentered basis function which belongs to a basis set of N bas functions {χ μ (r) ∈ X ∀ 1 ≤ μ ≤ N bas }. We assume these functions to be composed of an angular part Y l m (θ, ϕ) and a radial function R n (|r A |),

THEORY
The radial part only depends on the distance from atom A, r A . The angular part Y l m is a spherical harmonic function with angles defined in the local coordinate system of atom A.
Representing in this way, the memory required to store all the integrals defined by (1) grows as N ( ) 4 with system size. Furthermore, since no (computationally feasible) analytical expressions are available, evaluating these electron−electron interaction integrals explicitly is very time-consuming when sets of STOs or NAOs are chosen as primary basis. It is therefore convenient to look in more detail at the set of functions F p = {χ μ (r)χ ν (r) ∀ 1 ≤ μ ≤ N bas and 1 ≤ ν ≤ N bas } that is obtained by gathering all unique products of two basis functions and investigate whether this function set can be represented in a more compact form via density fitting.
We first define formally 120 as a linear operator in a Hilbert space The set of fit functions that is defined in (PA)DF forms the basis F f and spans a subspace X f of the full Hilbert space K . Given Likewise we have that the basis F p spans the space X p K , with the integrals taking the form (1) or more concisely (χ κ χ λ |χ μ χ ν ). If X p ⊆ X f , χ μ (r)χ ν (r) can be expressed exactly in terms of fit functions and we can consequently use the compact expression 4 instead of eq 1. In practice we find that part of the product space is not spanned by the fit functions (X p \X f ≠ Ø). To characterize errors made by fitting basis functions products with the fit set F f , we therefore write members of the product basis F p as where e μν (r) is an error function which accounts for the fact that X f does not completely span X p . To keep the notation short, we only indicate explicitly the dependence of this error function of the basis function pair indices μ and ν and omit the dependencies on the choice of fit set and the optimization criterion used to determine the fit coefficients c μνα . The exact representation of in X p can then be written as Using the scalar product notation (3), we also have Keeping in mind that the function e μν depends implicitly on the fit space and on the kernel used to define the scalar product, we can define this function to lie entirely in X e = X p \X f so that we have Note that integrals over f α and e μν with other kernels are in general nonzero. Then, assuming invertible and considering the symmetry of scalar products, we may write This choice of fit coefficients in (9) can also be viewed as minimizing the Lagrangian for every e μν . Minimizing guarantees a reasonable precision also for off-diagonal terms in the residuals matrix since, due to the Cauchy−Schwartz inequality, we have With the fit-coefficient definition (8), the cross-error terms in (6) vanish. This property arises naturally because the metric used in fitting is defined through the same kernel we aim to fit. Using = v c and functions of the form (2), the theoretical framework just presented is known as (global) robust density fitting (DF). [61][62][63][64]121,122 It has the advantage of reducing the storage complexity of the matrix elements of the kernel and the amount of integrals to be evaluated from N ( ) 4 to N ( ) 3 . DF is not an approximation if the expansion is complete, and in this case a compression would only be achieved for exact linear dependencies in X p . In practice the compression is obtained at the price of an approximation since, for reasons of computational efficiency, F f does not span the complete space of products of primary basis functions. Considering what is left out, we note that a product set defined as χ μ χ ν is strongly non-orthogonal. Orthogonalization of such a basis to span as much as possible the full Hilbert space would result in linear combinations of χ μ χ ν with large coefficients. Given the finite precision of computer operations, the calculation of matrix representations of these parts of K is likely to be numerically unstable. In addition, we can expect such combinations of pair functions to be of minor physical relevance for a quantum chemical calculation. For this reason it is also numerically favorable to work with a fit set that is better behaved in terms of orthogonality than an orthogonalized product set.
DF reduces the asymptotic scaling of the evaluation of RPA and direct MP2 correlation energies from N ( ) 6 and N ( ) 5 , respectively, to N ( ) 4 . 60 However, the asymptotic scaling of methods involving exchange terms is not automatically reduced. For instance, using (6), the exchange contribution to the Fock matrix can be expressed as 123 which is evaluated with the same asymptotic scaling of N ( ) 4 as the variant without density fitting when no further approximations are made. This is also generally true for post-HF methods where exchange terms profit less from the compression of 4-index tensors. 76,87 Therefore, it is advantageous to eliminate the exchange terms entirely and introduce empirical scaling factors, as for instance in the SOS-MP2 100 or SOS-CC2 methods. 124 One can however also greatly improve upon the efficiency of global DF by constraining eq 5 in such a way that the number of nonzero elements in c only grows linearly with system size. For instance, instead of using the Coulomb kernel directly, 61,63 one can avoid defining the scalar product on the operator and can define another metric with more suitable properties. This could for instance be a local kernel, like the overlap kernel 63 (also known as RI-SVS) or an attenuated Coulomb kernel. 64,125 These kernels have been used successfully to lower the complexity of, for instance GW, 91,126 RPA, 80,81 MP2, 88,89 and CC2 127 calculations. The price to be paid is the loss of robustness; eq 8 is then not fulfilled so that the cross-terms in (6) become nonzero. An alternative approach which introduces the desired sparsity in the fit-coefficient tensor more directly is PADF. In PADF, the density fit is restricted to pairwise sums only and subsequently distance cut-offs are introduced. Using Latin uppercase superscripts to denote the atomic centers of functions, the PADF expansion of products of basis functions is replacing the simpler expansion (5). The notation α ∈ A indicates that the summation is restricted to fit functions centered on atom A.

Fit Set Generation. 2.2.1. Fit Sets from Products of Basis Functions.
It is easily understood that the choice of F is of key importance in a PADF code. Ideally, the fit set should be generated on-the-fly, tailored to the primary basis at hand, and the precision of the expansion (13) should be adjustable in a systematic way using only a single parameter. Many algorithms for this task have been developed for global DF. 128−131 An alternative way to generate fit sets on-the-fly is Cholesky decomposition, 130,132−134 but this approach is computationally demanding for basis sets for which three-center integrals involving the Coulomb potential cannot be evaluated analytically. We are only aware of two algorithms which have been developed specifically for PADF. 98,135 We here adopt the one presented by Ren et al. 114 and Ihrig et al., 98 which we recapitulate for completeness.
From all unique combinations of AOs centered on atom A (denoted by X A ) we build an atom-specific trial fit set F̃A of functions of form (2),

and ( ) and ( )
We then regroup F A in subsets with the same angular momentum where R̃A ,lm , the set of radial components centered on the same atom A, is multiplied element-wise to the same spherical harmonic. We then compute the eigenvectors of the matrix (R̃α| R̃β) for R̃α and R̃β ∈ R̃A ,lm and we keep only the ones with eigenvalue greater than a specific threshold ϵ fit . This threshold can be seen as a parameter tuning the fit quality. Setting it to zero does not imply exact fitting, as it only solves the one-center part exactly. In addition, as mentioned above, choosing a too small parameter will likely introduce numerical instabilities. The set of remaining eigenvectors is called R A,lm . Our final fit set is then Algorithm 1 (see Chart 1) shows a pseudocode of the algorithm. In Algorithm 1 (Chart 1), R μ (r) × {Y ld u m (θ, ϕ)} denotes basis functions with the same radial part and different spherical harmonic. Our approach to calculate the eigendecomposition of (R̃α|R̃β) is slightly different from the one by Ihrig et al. and Ren et al., who employ Gram−Schmidt orthogonalization instead. 98,114 Lehtola argued that a pivoted Cholesky decomposition of (R̃α|R̃β) should be used 130 since this results in an optimal auxiliary basis set. 136,137 In this work, we do not consider these more sophisticated strategies. However, we note that especially the approach by Lehtola 130 might result in auxiliary basis sets that are more compact than the ones produced by our algorithm.
For small basis sets, the fit set generated through such a procedure sometimes does not lead to sufficiently accurate results. To overcome this problem, Ihrig et al. 98 artificially enlarged the fit set. This is achieved by adding a new function to each X A . In our implementation, the function is a Slater type orbital with angular momentum l max A + 1, where l max A is the maximum angular momentum present in X A , and the arbitrarily chosen exponent α is equal to its angular momentum.
In the following, we will refer to this procedure as Lenhancement (L-e).

Slater Type Orbital Fit Sets.
As an alternative to the algorithm just described, we herein also test the use of handoptimized STO type fit sets. These come with the disadvantage that they are not systematically improvable through the adjustment of a single parameter ϵ fit . They are however more compact and therefore more suitable for large-scale applications. In all of the cases, the number of functions used depends on the atomic charge of the element considered. In this work we use three different thresholds which we refer to as Normal, Good, and VeryGood. The former two contain STO type functions with angular momentum up to l = 4, while the latter one contains functions with exponents up to l = 6. The main difference between the Normal and the Good fit sets is that the latter is much larger for hydrogen (107 vs 62 fit functions per hydrogen atom). It also contains more s and p functions for second-row elements. The VeryGood set is about two times larger than the Normal set. The reader may refer to ref 87 for details. We also refer to the Supporting Information for a detailed description of these sets, where we report the exponents of all fit functions of all elements relevant for this work.

Projection Methods.
The improved algorithmic scaling deriving from PADF comes with downsides which need to be handled carefully in order to retain sufficient numerical precision. Therefore, we use two related projection methods, which we describe in the following:

Projection Method for the Basis Set.
To prevent instabilities due to near-linear dependencies in the AO basis set itself, this basis set size is often reduced by modifying the Loẅdin orthornomalization 138,139 step. The Loẅdin transformation matrix S −1/2 to an orthonormal set follows from the eigensystem of the overlap matrix Here we can choose to ignore eigenvectors with eigenvalues below a threshold ϵ bas . The simplest way to achieve this is to set the corresponding eigenvectors in U to zero (thus introducing artificial states). This is done in the ADF implementation. In the BAND implementation, we define a smaller orthonormal basis by introducing the regularized Loẅdin transformation where Ũis the nonsquare (tall) matrix obtained by removing the eigenvectors columns with eigenvalues smaller than ϵ bas from U. Doing so, fewer orbitals are obtained and appearance of artificial states is avoided. The elimination of problematic orbitals that are expressed in the original basis with large coefficients also helps to prevent problems later on when considering the product basis and can therefore be beneficial to mitigate errors resulting from the density fitting.

HF Projection Method.
A known issue of PADF is that contributions to the electron repulsion energies can become negative which can lead to variational collapses of HF SCF calculations. 95,140 As we will argue below, this problem is to a large extent due to integrals over product functions that are difficult to describe by the fit set. A way to avoid this problem would be to Cholesky decompose the matrix (1) and keep only the most important Cholesky vectors, but this is not practical in calculations with Slater or numerical type orbitals. Instead, the PADF implementation in AMS uses a simple projector in the original AO space: where we use the eigensystem of the AO overlap matrix = S UDU T (22) with D being a diagonal matrix with the eigenvalues on the diagonal, and U having the eigenvectors stored as columns. The diagonal matrix R is obtained from D by taking Note that for ϵ K = 0 we get R = 1 and hence T = 1. Heuristically, the action of T on a vector or matrix is to remove components of the space parallel to eigenvectors of the overlap matrix with eigenvalues smaller than the specified threshold ϵ K . At the SCF stage, the projector is applied both on the left and on the right of the exact exchange matrix K While the regularized Loẅdin orthonormalization removes a subspace for all energy terms, the HF projector method neglects only the (small) stabilizing action of the exchange energy, shifting energies upward. Physically, this can be interpreted as screening of the exchange force which results in a more diffuse electron distribution. The default value for ϵ K = 10 −3 in BAND can therefore be much higher than the one for the Loẅdin orthonormalization projector ϵ bas = 10 −8 .
We also use the same projector (21) to calculate correlation energies. For this we redefine the matrix b which transforms from the AO to the MO basis as Here, the use of the projector is supposed to improve the accuracy of correlation energies by removing a subspace leading to AO products which can only be represented poorly by the fit set. The usefulness of the application of the PM-K to correlation energies can be rationalized as follows: If we consider an eigenvector v o (r) = ∑ μ s oμ χ μ (r) of the overlap matrix, we can notice that ∑ o s μo † s oν = D μν −1 . From this we understand that the average order of magnitude of the coefficients is at least of the order of s o component. We then expect that products involving such linear combinations are the most difficult to fit. This is mostly because of the diffuse products of basis functions from distant atoms and from the consequent difficulties in using the PADF approximation to express such products. 118 2.4. RPA and SOS-MP2 Correlation Energies. We now briefly discuss how PADF can be used to speed up the evaluation of SOS-MP2 and RPA correlation energies, summarizing the more detailed discussions in refs 87 and118. In the basis of fit functions, the RPA correlation energy can be expressed as which follows directly from the corresponding real-space representation of the RPA. 141 The integration is performed over the imaginary frequency axis for which either modified Gauss−-Legendre grids or, more efficiently, 142 minimax grids of size N ω can be used. ω k , σ k denote points and corresponding integration weights on the imaginary axis. Z is and is obtained through the non-interacting polarizability P (0) and the electron−electron interaction v in the basis of fit functions. Since matrix logarithms are difficult to calculate, we use that (assuming 1 − Z can be diagonalized with eigenvalues λ j ) (28) and evaluate the determinant |1 − Z| instead. The imaginary frequency representation of P (0) is obtained from its discrete imaginary time representation using nodes {τ k } j=1,...,Nd τ . The transformation is achieved by the discrete cosine transform (since P (0) is bosonic) where the γ kj (c) are the matrix elements of the kernel of the discrete cosine transform. The imaginary time representation of P (0) is given as with greater and lesser components of the time-ordered Green's functions being defined as Notice that b′, as defined in (25), appears in these equations so that the PM enters the PRA correlation energies through G > and G < . As previously discussed in some detail by us, 118 the evaluation of (30) scales asymptotically as N ( ) 2 with system size in the PADF approximation, since the number of elements in c only scales as N ( ) with system size. For detailed working equations we refer to our previous work. 87,118 In practice, the efficiency of this approach depends on the possibility to represent the imaginary time and frequency dependencies of the polarizability as compactly as possible. We follow Kaltak et al. 142,143 and use non-uniformly spaced minimax and leastsquares grids as described in ref 144 for imaginary time and ref 145 for the imaginary frequency domain.
Alternatively, the imaginary frequency polarizability P (0) (iω k ) can be evaluated directly in the MO basis as for instance described in ref 114, Using the series expansion of the logarithm in (26), one obtains the direct term of the MP2 correlation energy, E c (2) as its second-order term in v c . E c (2) can be evaluated directly in imaginary time and is given by where {α k } k=1,Nd τ denote the integration weights corresponding to the points {τ k } k=1,Nd τ . In the spin-polarized case we need to calculate When working in the AO basis, we are only interested in the contribution to E c (2) from electrons with unpaired spins which is used for instance in spin-opposite-scaled (SOS) MP2 100 or in DOD-DHs. 146 In that case, only the terms with σ ≠ σ′ contribute and the resulting correlation energy expression is scaled by an empirical factor.
While PADF can also be used to lower the time complexity of the exchange contribution to MP2 from N ( ) 5 to N ( ) 3 , 87 the resulting working equations can only be implemented with a very high prefactor and are therefore not useful in practice. Instead, the full MP2 correlation energy is evaluated in the MO basis: The expression is evaluated as in typical DF-MP2 codes. 147 The necessary fit coefficients are again transformed to the MO basis using relation 34.
For a detailed analysis of the asymptotic scaling of these algorithms with system size we refer to our earlier work. 87,118 In the Supporting Information, we compare the scaling of the AObased PADF-SOS-MP2 algorithm to its canonical counterpart Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article and discuss scaling with respect to the sizes of single-particle basis and auxiliary fit set.

Summary of the Projection Method Thresholds.
To improve the numerical accuracy, we apply at various stages of the calculation related techniques which we loosely call projector methods. Although the eigenvectors of the overlap matrix form an orthogonal basis set, some of the functions thus constructed are in an Orwellian sense more equal than others: those with a small eigenvalue are less valuable and may even be numerically harmful. In this spirit the first threshold ϵ bas comes from the regularized Loẅdin orthonormalization transformation (20) and is about removing completely a subspace from the basis set. Next we have the projector (21) associated with an eigenvalue threshold ϵ K . We can independently apply this to the exchange matrix K through the similarity tranformation (24) for Hartree−Fock calculations and to the orbital coefficients like in eq 25, entering the MP2 and RPA energy expressions. We do not consider the possibility of using different values for ϵ K for Hartree−Fock, MP2, and RPA, other than completely bypassing the projector. Finally, when constructing an automatic fit set for an atom type, again a regularized Loẅdin orthonormalization is used, based on the eigensystem of the overlap matrix of the fit functions now in the Coulomb metric, controlled by the eigenvalue threshold ϵ fit . This last threshold controls the number of fit functions; see Algorithm 1 (see Chart 1). The same threshold is also employed for the pseudoinverse to obtain the PADF fit coefficients.

COMPUTATIONAL DETAILS
All calculations have been performed with modified development versions of the ADF 104 and BAND 109 modules of AMS2022. 103 All Psi4 110 calculations have been performed using version 1.6.1.

AMS Calculations.
For all calculations using GTOs we used correlation consistent Dunning basis sets of double-ζ (DZ), triple-ζ (TZ), quadruple-ζ (QZ), and 5Z quality 148 from the basis set exchange library. 149 For comparison with ADF, we used the triple-ζ plus double-polarization (TZ2P) basis set. 150 In all BAND calculations we used the PADF scheme for the Hartree−Fock exchange operator, while the Hartree potential fitting procedure is based upon a partitioning of the density in atom-specific subspaces which contain a well-defined portion of the electronic charge density, each of which is expanded in products of radial splines and spherical harmonics. 151 This procedure does not rely in any way on the PADF fit functions. Throughout this work we performed tests varying the threshold ϵ fit to control the size of the fit set. The thresholds used for particular calculations will be indicated in the next section. The same holds for the threshold for the canonical orthogonalization of the primary basis. If not indicated otherwise, we set the numerical quality to VeryGood, which controls the accuracy of the numerical integration 152 and the quality of the ZLMfit, 151 as well as of the threshold controlling distance effects in HF, MP2, and RPA calculations. 87 The same settings have been used in all ADF calculations. In order to make the basis functions more compact, in BAND the radial part of the basis functions is multiplied by a Fermi−Dirac (FD) function by default. We disabled this behavior in all calculations. In all RPA calculations for the S66 data set we calculated the polarizability directly in imaginary frequency using (33) and modified Gauss−Legendre grids as described in ref 54 with 50 integration points.
In all calculations for the L7 116 and S30L 117 data sets we used the imaginary time based algorithms for RPA and SOS-MP2 algorithm. In all SOS-MP2 calculations we used 12 imaginary time points which ensures μHartree convergence of correlation energies of organic systems with large HOMO−LUMO gaps. 75,79 In all RPA calculations 59,118 we used 24 imaginary frequency and imaginary time points each and used PBE orbitals as input (RPA@PBE).
We calculated the interaction energies of the dimers in the CIM8 data set 71 at the RPA@PBE level of theory. If not indicated otherwise, we used correlation consistent Dunning basis sets of DZ and TZ quality. We then extrapolated the final correlation energies to the complete basis set limit using the relation 153 where x = 3 for TZ, x = 4 for QZ, and so on. We set the numerical quality to Good and set the threshold controlling distance effects for the RPA calculation to Normal. Also here we used various settings for the quality of the fit set. For details we refer to the next section. For reasons discussed in the next section, if not indicated otherwise, we set ϵ K = 10 −2 and ϵ bas = 5 × 10 −4 for calculations on the CIM8 data set. Detailed input settings for all calculations can be found in the Supporting Information.

Psi4 Calculations.
We performed Psi4 calculations for the S66 database using Dunning double-ζ (DZ), triple-ζ (TZ), quadruple-ζ (QZ), to quintuple-ζ (5Z) basis set, to perform global DF-MP2 calculations (in the following simply referred to as DF-MP2). We used default settings for all calculations, We used the default fit sets for each basis set, i.e., cc-pvxZ-RI 123,154,155 for the primary basis cc-pvxZ.

RESULTS
In this section, we assess the accuracy of the algorithms described herein. We proceed as follows: In Section 4.1 we first illustrate the effect of the threshold chosen for the regularized Loẅdin orthogonaliazation (basis set reduction, ϵ bas ) as well as for the HF projector method (ϵ K ), on the exchange matrix, for a simple molecule. In the subsequent sections, we compare our results for molecules of increasing size, starting with the S66 database in Section 4.2 and moving on to the L7 and S30L databases which contain molecules with more than 200 atoms in Section 4.4. Finally, in Section 4.5 we showcase the capabilities of our PADF based algorithms by calculating the interaction energies of 7 large non-covalently bound complexes in the CIM8 set by Neese and co-workers with up to 910 atoms (4500 electrons). 71

Effect of the HF Projector Method.
We start the discussion of our results by illustrating the effect of the HF projector method on the PADF-MP2 correlation energy of the fluorobenzene molecule for varying size of the fit set. In the heatmaps in Figure 1A,B, we show the MP2 bonding energy and MP2 correlation energy of fluorobenzene for different thresholds for the HF PM ϵ K and fit quality ϵ fit . In particular, in Figure  1A we report the bonding energy and highlight three interesting zones in the heatmap. On the right side, the PM threshold ϵ K is large enough to completely remove the exchange energy, thus reducing to the Hartree limit of HF. In the lower left corner where the quality of the fit set is poor, the PM threshold does only have a small effect and the PADF-MP2 bonding energy collapses to unphysically low values. The rest of the heatmap shows a more stable behavior but still a broad range of bonding energies. To check the accuracy of the PADF-MP2 implemen-tation in BAND, in Figure 1B, we show values deviating less than 0.1% from the reference DF-MP2 correlation energy (Psi4), and we blur in gray the ones outside of the range. Thus, we observe that increasing the threshold ϵ K allows one to use smaller fit sets while still maintaining a good precision in the result.

Deviation BAND-Psi4.
We now compare our PADF-MP2 and PADF-RPA results to DF-MP2 and DF-RPA reference values for the non-covalent interaction energies of the S66 database. We calculated the DF-MP2 reference values using Psi4. As reference values for PADF-RPA we use the TURBOMOLE results by Furche and co-workers 50 who used the frozen core approximation. Comparing our DF-MP2 results obtained with Psi4 to the DF-MP2 results of TURBOMOLE, we found the error introduced by the frozen core approximation to be between 0.0 and 0.1 kcal/mol for relative energies. Therefore, to allow for a better comparison of our PADF-RPA to the DF-RPA results, we corrected the latter ones by the frozen core error for MP2, assuming the impact of the frozen core approximation to be the same for MP2 and RPA. We stress that this procedure might not remove the frozen core error completely, and therefore the DF-RPA reference values have certainly higher error bars than the DF-MP2 ones.
In our calculations the relevant thresholds are the ones for the regularized Loẅdin orthonormalization ϵ bas , the projector method ϵ K , and the size of the fit set ϵ fit . For the MP2 calculations, we used ϵ bas = 10 −8 . We set the threshold for the projector method (PM) to ϵ K = 10 −3 , and we used ϵ fit = 10 −12 except for some of the 5ζ calculations for which we used 10 −10 instead (see Supporting Information for details). The L-e was enabled only for 3ζ calculations. This corresponds to an (unrealistically sized) fit set which is around 15 times larger than the primary basis.
The absolute deviations (ADs) for all PADF-MP2 interaction energies in the S66 database with respect to DF-MP2 are shown in Figure 2 for Dunning basis sets of 3ζ to 5ζ quality. In the same plot, we also show the deviations of the PADF-MP2 implementations in ADF and BAND using STOs. We see that, in all cases, the deviations are smaller than 0.1 kcal/mol, irrespective of the basis set. The small deviations between BAND and Psi4 should primarily be due to errors introduced by the PADF approximation. Given that the two implementations differ also in other aspects, we can, however, not exclude the possibility that differences in other technical parameters might play a role as well, for instance differences in the definitions of the numerical integration grids. Also incompleteness of the fit sets used in Psi4 might play a role. The deviations of BAND to ADF are mostly due to slightly different integration grids. In any case, for all practical purposes the agreement between the codes is excellent. Therefore, we did not investigate the precise origin of these small discrepancies further. The results of these calculations are summarized in Table 1.   In Table 1, we show that both maximum error and MAD of PADF-RPA relative to DF-RPA are about twice as large as for PADF-MP2. The reason for this might be that the fit errors are more pronounced due to the presence of the higher powers of Z (see (27)). There is also a slightly larger uncertainty in the reference values due to our approach to subtract to the frozen core errors in the reference calculations. We also assessed the effect of using a higher threshold of ϵ bas = 10 −5 . This did however not change our results at all, demonstrating the numerical stability of our method.
The l-e procedure is of key importance for both the GTO and STO triple-ζ basis sets. This is illustrated in Figure 3 where we show the deviations of the PADF-MP2 interaction energies with and without l-e. For the cc-pVTZ basis set the l-e causes differences of the order of ∼1.0 kcal/mol and of ∼0.1 kcal/mol for TZ2P. The larger deviation in cc-pVTZ from using l-e can be traced back to the fact that an STO is added to a GTO basis set, thus adding functions with a different radial behavior to the fit. For the TZ2P basis set the only difference comes from the higher angular momentum in the basis. The same comparison of PADF-RPA correlation energies gave similar results. Figure 4 we compare BAND results using several combinations of the abovementioned thresholds and show absolute deviation with respect to Psi4 for the S66 data set. The names for the different parameters combinations are specified in the table at the bottom of the same figure. We compare results obtained without any projector (None), without PM-K but using a tighter threshold for the regularized Loẅdin orthonormalization in the basis set (Bas), using PM-K only at the HF or at the MP2 stage, and finally using PM-K in both cases (mentioned following the order of the table). The best results are obtained applying PM-K to MP2 only, followed by applying it to both. This shows that the usage of PM-K at the HF stage does not contribute much to the overall accuracy and in fact can even worsen it. This supports our hypothesis from Section 2 since correlation energies are dependent on virtual orbitals which have more nodes and are more diffuse than the occupied ones. Hence, they are more difficult to represent in terms of an atomic-centered basis and consequently their products more difficult to express in terms of fit functions. We emphasize however that we use here a very large fit set. When fewer fit functions are used, PM-K also needs to be used at the SCF stage to prevent variational collapse. The calculations using a larger value of the Loẅdin orthonormalization threshold lead to improvements with respect to the None and HF settings since unstable components of the basis are projected out from all terms of the Fock matrix. This is decisive for the accuracies of MP2 and RPA correlation energies as we have already seen in the comparisons above.

Interaction Energies for L7 and S30L.
After having demonstrated the excellent agreement of our PADF-MP2/RPA results with DF-MP2/RPA also for large basis sets up to 5Z quality, we now discuss the accuracy of PADF-MP2 for the much larger molecules in the L7 and S30L databases, for which Furche and co-workers have recently published DF-MP2 reference values. 50 We focus on the direct contribution to the MP2 correlation energy only and calculate SOS-MP2 interaction energies using our imaginary time based PADF-SOS-MP2 implementation. 87 To represent the imaginary time dependence, we chose here 12 integration points in the interval [0, ∞) which is sufficient to achieve μH precision in absolute correlation energies. The reference values by Furche and coworkers have been calculated with the frozen core approximation. 50 We have however seen, for the S66 database, that the maximum error of this approximation is of the order of 0.1 kcal/ mol for relative energies, and we do not expect this to change for larger molecules. Finally, Furche and co-workers only calculated correlation energies using Dunning basis sets while they used basis sets of the Kahlsruhe type to calculate their HF or KS energies. 50 Therefore, we do not compare the non-covalent interaction energies for these sets but only the correlation contributions to them.
4.4.1. L7. For the fit sets generated from the products of basis functions, the results of this comparison for the L7 database can be found in Table 2. We first focus on the upper table. The results here have been obtained using a value of ϵ K = 10 −3 as threshold for PM-K (the same value as for S66) and without the l-e method. The results for the different values of ϵ fit controlling the size of the fit set ranging from 10 −6 to 10 −12 in Table 2 show a slow convergence of the relative SOS-MP2 correlation energies to the DF-SOS-MP2 reference values. However, even with the already very large fit set corresponding to ϵ k = 10 −10 (the number of fit functions is around 10 times larger than the number of primary basis functions), the maximum deviation is still 0.89  . Absolute differences in non-covalent MP3 interaction energies between Psi4 and BAND for the S66 database using the cc-pVQZ basis set and varying parameters for the different projector methods. Each line corresponds to a different combination of the three parameters in the table. Symbol × means that the respective method has not been used. All values are in kcal/mol. kcal/mol, and only for the smallest of the systems (GGG) in L7, the deviation to the TURBOMOLE results reaches an acceptable value of 0.17 kcal/mol. Moreover, the results for ϵ k = 10 −12 start to worsen, which can be related to the occurrence of linear dependencies in the fit set. For the Phe system in L7 we even obtained a completely unreasonable value for one of the correlation energies.
Turning to the second section in Table 2, we find that the l-e reduces the error with respect to DF-SOS-MP2 compared to the value obtained with the same thresholds. However, we already

Journal of Chemical Theory and Computation
pubs.acs.org/JCTC Article observe numerical instabilities for fit sets corresponding to ϵ k = 10 −10 . Notice, however, that the total size of this fit set is already larger than the one corresponding to ϵ k = 10 −12 without the l-e. Therefore, also the l-e method does not resolve the issues of PADF for the molecules in L7.
We now turn to the third section in Table 2 which shows results obtained without the L-e method but with a larger threshold of the HF projector method, ϵ K = 5 × 10 −3 , instead of ϵ K = 10 −3 which has been used in the two previous tables. This changes the PADF-SOS-MP2 correlation energies drastically and brings them into much better agreement with the TURBOMOLE results. Already for a moderate size of the fit set corresponding to ϵ k = 10 −6 , the maximum deviation is reduced to 0.53 kcal/mol, and for a value of ϵ k = 10 −8 , for 6 out of 7 systems the agreement with the reference values is better than 0.1 kcal/mol. Notice again, that 0.1 kcal/mol is of the order of uncertainty in the reference values due to the frozen core approximation. The CBH complex is the only system for which the deviation to the reference is still relatively large. Only with ϵ k = 10 −12 , the deviation to the TURBOMOLE results reduces to an acceptable value of 0.22 kcal/mol. However, first this fit set is very large and therefore not very useful in applications to large molecules, and second, we can still observe quite large deviations to TURBOMOLE data for other systems, most drastically for PHE.
STO Type Fit Sets. Out of all systems in L7, CBH contains the largest number of atoms, but it is not the system with the most electrons. Therefore, it can be considered as the most spatially extended system. As we have already discussed, such systems are expected to be most problematic for PADF based methods since many products of diffuse functions will occur. Since GTOs decay much faster than STOs from their atomic centers, it is natural to ask whether GTOs are the best choice to fit such products of diffuse atomic orbitals. Therefore, we investigate the accuracy which can be achieved by fitting the products of GTOs with STO type functions in Table 3. In the second section of Table 3 we show some results for ϵ K = 10 −2 . Additionally, we have reduced the integration quality to Good and the quality of the threshold controlling distance effects in the SOS-MP2 method to Normal. Both settings together drastically speed up the PADF-SOS-MP2 calculations.
Using theNormal fit set, relatively large errors are obtained. The results using the Good and VeryGood fit sets are in relatively good agreement with TURBOMOLE. Especially for the CBH complex, the deviation is much smaller than for the GTO type fit sets. Increasing ϵ K to 10 −1 improves agreement with the TURBOMOLE results further, and using the Good fit set results in perfect agreement with the reference values (given their uncertainties due to the frozen core approximation) are obtained.
Independently of the value of ϵ K , the errors of the relative RPA correlation energies are of the same order of magnitude as for SOS-MP2. However, as already observed for the S66 benchmark, the errors tend to be slightly larger. We also observe that relative energies tend to be less negative, indicating that smaller values of ϵ K than for PADF-SOS-MP2 are beneficial for PADF-RPA results. Also here, using the Normal fit set, errors hardly exceed 1 kcal/mol. The system c3gc (A GC base pair absorbed on a circumcoronene molecule) is particularly problematic with errors of the order of 3 kcal/mol for the VeryGood fit set. We show in the Supporting Information that lowering of ϵ K leads to much better agreement with experiment. This shows that the optimal settings for the projector methods are system-specific and further research is needed to understand these patterns.
PADF-SOS-MP2 Results Using the CC-pVQZ Basis Set. Lastly, we examine the quality of the PADF approximation at the QZ level. As for the S66 database, the results shown in Table 4 demonstrate that the quality of the interaction energies is not deteriorated compared to the TZ level. Already with the rather moderate threshold of ϵ k = 10 −6 , the maximum deviation is −0.37 kcal/mol for CBH. This value is reduced to −0.16 kcal/ mol when the threshold is decreased to ϵ k = 10 −8 . Only using the VeryGood Slater type fit set leads to worse results than at the TZ level. This is due to the fact that the VeryGood fit set contains functions with l ≤ 6, while the product basis contains functions with l ≤ 8.

S30L.
To further assess the quality of PADF-SOS-MP2, we also calculate the SOS-MP2 and RPA contributions to the interaction energies in the S30L set, which contains 30 noncovalently bound complexes with up to 205 atoms, out of which 8 are charged. 156 The deviations to the respective TURBO-MOLE results are shown in Figure 5 and confirm the observations for the L7 database in Table 3. When the Good fit set is used, the SOS-MP2 and RPA errors never exceed 0.5 and 1.0 kcal/mol, respectively. These results also demonstrate that the errors in relative energies do not seem to depend much on the sizes of the complexes any more after a certain system size is reached. This is in fact the expected results, since AOs centered on two atoms very far apart form each other will not overlap and therefore not contribute to the PADF errors.

RPA Interaction Energies of Large Complexes.
After having demonstrated the relatively good accuracy of the PADF approximation also for the correlation energies of large molecules, we now calculate the interaction energies in the CIM (cluster-in-molecule)8 set by Neese and co-workers 71 at the RPA@PBE level of theory. It comprises 8 large non-covalently bound complexes ranging in size from 200 to 1027 atoms, with

Journal of Chemical Theory and Computation
pubs.acs.org/JCTC Article interactions dominated by either σ−σ dispersion or hydrogen bonding. 71 Given the size of these systems, it is clear that these interaction energies can only be calculated if certain approximations are introduced. Neese and co-workers used an approach in which they decomposed the complex into smaller clusters and calculated the correlation energies using a cluster expansion of the general form  (39) where I denotes a subset of localized occupied and virtual molecular orbitals. 71 They proposed to evaluate the correlation energy by two RI-MP2 calculations using the (aug-)cc-pvDZ and the (aug-)cc-pVTZ basis sets and by a DLPNO−CCSD(T) correction using the smaller basis. They used the augmented basis sets only for the two smallest systems, while for the other systems they used the nonaugmented basis sets. Basis set incompleteness errors aside, there are major sources of inaccuracies with this approach, which could potentially lead to large errors. First, both the CIM and the DLPNO approximations can introduce errors in relative energies of several kcal/mol for non-covalent interactions, even when "tight" truncation thresholds for the DLPNO settings are used. 28,29 Second, the chosen extrapolation scheme assumes that the CC basis set error can be faithfully estimated at the MP2 level. This is however not necessarily the case since the major basis set errors in CC calculations arise from the direct MP2 contribution, 157−159 an observation termed as interference effect by Petersson et al. 160 Especially since MP2 correlation energies will be rather inaccurate for the large molecules in CIM8, the strategy to estimate the basis set incompleteness error at the MP2 level will be error-prone. Finally, we mention the recently observed disagreement of well-converged CCSD(T) interaction energies with quantum diffusion Monte Carlo methods for large non-covalently bound complexes. 26 For all of these reasons, the values by Neese and co-workers are certainly not of quantitative accuracy. Despite all this, they are the most accurate reference values which are available for these large systems and certainly serve as a useful frame of reference for our RPA@PBE calculations.
We have calculated all RPA@PBE interaction energies using (38) using cc-pvDZ and cc-pVTZ basis sets with and without counterpoise corrections calculations. We have extrapolated the RPA correlation energies only, while the TZ results has been used for the remaining components of the interaction energies. We have verified the accuracy of this strategy by comparison to results using a (T,Q) extrapolation with the cc-pVTZ and cc-pVQZ basis sets. As shown in Table S1 in the Supporting Information, the results of both extrapolation schemes differ by about 6.5 kcal/mol. This also indicates that, for systems as large as the ones in CIM8 for which QZ calculations are out of reach, basis set incompleteness errors are typically much larger than the errors introduced by the PADF approximation.
The results of our RPA@PBE calculations are shown in Table  5. The counterpoise corrected results are the most accurate RPA@PBE interaction energies we can calculate for these large systems. Given the good accuracy of RPA@PBE for large noncovalently bound complexes, 50 they might serve as reference values for more approximate methods. However, they cannot be compared directly to the values by Neese and co-workers since they did not correct for basis set superposition errors. For this purpose, we also calculated interaction energies which are not counterpoise corrected.  Table 3. All deviations are in kcal/mol. The number of basis functions refers to the full complex using cc-pVTZ. The reference (ref) energy in the second to last column has been taken from Neese and co-workers 71 and has been calculated at the CIM-DLPNO-CCSD(T)-CIM-RI-MP2(D,T)Z level of theory. All interaction energies are in kcal/mol and have been extrapolated using (38) with cc-pVDZ and cc-pVTZ. The counterpoise (cp) correction used is indicated by a percentage (0% is no cp).
Overall, reasonable agreement of our RPA@PBE results with the reference values by Neese and co-workers 71 is observed. With a MAD of 4.44 kcal/mol, the deviations are of the same order of magnitude as popular (dispersion corrected) density functionals 71,161,162 (e.g., ωB97-X-D with a MAD of 5.06 kcal/ mol or B3LYP-D4 with a MAD of 4.81 kcal/mol) and smaller than the ones for other ab initio methods like SCS-MP2. 71 We emphasize again that even though the agreement to the CIM-DLPNO−CCSD(T) reference is satisfactory, these noncounterpoise corrected interaction energies come with large basis set superposition errors and are therefore likely to be incorrect (see also Tables S5 and S6 in the Supporting Information).

CONCLUSION
By comparison to DF-MP2 and DF-RPA, we demonstrated the accuracy of PADF-MP2 and PADF-RPA for the S66, L7, and S30L sets of non-covalently bound complexes ranging from 6 to more than 200 atoms in size. Especially for the small to medium molecules in the S66 database, PADF comes with negligible loss of accuracy compared to global density fitting.
The main advantage of PADF over global DF is that is leads to very fast algorithms for RPA and SOS-MP2. We have shown that the PADF approach is suitable to calculate the interaction energies of large molecules. In particular, we calculated the PADF-RPA@PBE interaction energies of eight large noncovalently bound complexes at the cc-pVTZ level with more than 1000 atoms and more than 20000 AOs in size on a single compute node.
The choice of fit set is decisive for precise correlation energies with PADF. We tested two different types of fit sets. In a first variant, the fit set is generated directly from products of basis functions. 98 In a second variant, fit sets consisting of an even tempered series of STOs are used. 87 Despite being much smaller, we found the second kind of fit set to be suitable to express products of GTOs. This might be due to the slow decay of the radial part of the STOs, making them more suitable to fit delocalized products of AOs.
To improve the precision of the PADF approach further, we introduced a projector which acts directly on the Fock matrix and removes the attractive component of the excact exchange. Most importantly, we also use this method to project out subspaces of AOs from the orbital coefficients matrix which can only be represented poorly by the fit set. Especially when smaller fit sets are used, we showed the PM-K to be of key importance for accurate interaction energies.
While the PM reduces the correlation energy errors arising from PADF, it cannot completely eliminate them. This is especially true for large molecules, for which a compromise between accuracy and computational efficiency is required. For very large systems like the molecules in the CIM8 data set, it becomes mandatory to use smaller fit sets which might introduce errors in interaction energies which can exceed 1 kcal/mol. This is however also true for many approximations to high-level methods for the calculation of correlation energies, for instance CC methods based on localized orbitals using the DLPNO 28,29 or LNO approximations. 26 At the moment, the HF projector method is not used in a system-specific way. A computationally efficient way to do so could be to check the definiteness of the Hartree-exchange matrix at runtime and to use this information to adjust ϵ K during the SCF.
In practice, the errors stemming from the PADF approximation will often be of only minor relevance. Especially MP2 is typically not used by itself but rather in double-hybrids functionals which typically use a fraction of around 30−60% MP2 correlation energy, 30,35 scaling the error by the same amount. Therefore, the already small PADF errors will be negligible for those functionals. Furthermore, when small fit sets are used, the fit set incompleteness error always leads to too low interaction energies while basis set incompleteness errors lead to too high interaction energies. Especially for large molecules with several hundreds of atoms for which QZ calculations are not feasible, the fit set incompleteness errors will be much smaller than basis set incompleteness errors.
On a more general note, it has recently been recognized that methods which agree well with each other for small and medium molecules might give very different results for larger systems. 26 Also, approximate (dispersion corrected) GGAs or hybrid functionals which work well for smaller systems are much more error-prone for large molecules. 161,162 In order to understand the reasons for this discrepancy, it is mandatory to push the boundaries of first-principle methods to much larger systems. At the moment, this comes at the price of numerical errors and additional research is needed to develop techniques to mitigate these errors further.
Analysis of scaling of AO-based and canonical SOS-MP2 algorithms and convergence of interaction energies in the CIM8 set with respect to the size of the single-particle basis (PDF) Exponents of all functions in the STO fit sets used in this work, results of all calculations reported in this work, example input files, and additional information for some of the calculations in the S66 set (ZIP) ■ ACKNOWLEDGMENTS We thank Erik van Lenthe for fruitful discussions and the idea behind the projector methods and acknowledge the use of the computing facilities of Vrije Universiteit Amsterdam and SCM. E.S. acknowledges funding from the European Union's Horizon 2020 research and innovation program under Grant Agreement No. 956813 (2Exciting). ■ REFERENCES