Static Polarizabilities at the Basis Set Limit: A Benchmark of 124 Species

Benchmarking molecular properties with Gaussian-type orbital (GTO) basis sets can be challenging, because one has to assume that the computed property is at the complete basis set (CBS) limit, without a robust measure of the error. Multiwavelet (MW) bases can be systematically improved with a controllable error, which eliminates the need for such assumptions. In this work, we have used MWs within Kohn–Sham density functional theory to compute static polarizabilities for a set of 92 closed-shell and 32 open-shell species. The results are compared to recent benchmark calculations employing the GTO-type aug-pc4 basis set. We observe discrepancies between GTO and MW results for several species, with open-shell systems showing the largest deviations. Based on linear response calculations, we show that these discrepancies originate from artifacts caused by the field strength and that several polarizabilies from a previous study were contaminated by higher order responses (hyperpolarizabilities). Based on our MW benchmark results, we can affirm that aug-pc4 is able to provide results close to the CBS limit, as long as finite difference effects can be controlled. However, we suggest that a better approach is to use MWs, which are able to yield precise finite difference polarizabilities even with small field strengths.


INTRODUCTION
Molecular electronic structure calculations are a widespread tool in chemistry, biology, and materials science. Such a diffusion across disciplines has been enabled by Kohn−Sham density functional theory (KS-DFT, hereafter just "DFT") 1 which brought about calculations with accuracy comparable to coupled cluster with singles and doubles (CCSD) at the computational cost of a single-determinant method like Hartree−Fock (HF). A large part of the current development of theoretical methods is concerned with obtaining accurate energies, which are essential to interpret and predict chemical reactivity.
Molecular properties constitute another important area of method development. Electric dipole polarizabilities are related to important processes in chemistry; for example, they hold a key role in our understanding of intra-and intermolecular interactions such as dispersion, 2,3 they are at the foundation of techniques such as Raman spectroscopy and Raman optical activity, 4 and they are employed in the development of accurate force fields for molecular simulations. 5,6 It is therefore highly relevant to assess the accuracy of polarizability predictions within the density functional theory (DFT) framework.
The quality of a given DFT calculation depends on two factors: the density functional approximation (DFA) and the basis set. In order to fairly assess the performance of functionals and basis sets, we must distinguish between these two sources of error. While an ideal (nonexact) functional should be accurate and yield a result as close as possible to the corresponding full configuration interaction (FCI) calculation, an ideal basis should be precise and minimize the error with respect to the complete basis set (CBS) limit. Most functionals and basis sets are developed so as to provide the best possible energies, which sometimes rely on fortuitous error cancellation. The assessment of accuracy (functional) and precision (basis set) for molecular properties, such as polarizabilites, is therefore challenging.
Hait and Head-Gordon 7 benchmarked the accuracy of electric dipole polarizability predictions for a large number of DFAs against coupled cluster with singles, doubles, and perturbative triples (CCSD(T)) calculations, for a set of 132 species. They employed the aug-pc4 basis set 8−11 for all DFT calculations, assuming that the obtained quantities were close to the CBS limit, although they noted that this assumption may not hold for certain DFAs. We refer the reader to their excellent paper for details. 7 Even the largest practical Gaussian-type orbital (GTO) basis sets are far from complete in the mathematical sense. In general, it must be assumed that GTO basis sets deliver results close to the CBS limit, even for the large aug-pc4 basis set: one cannot know how close to the limit a given basis set is without a reference value, and simply comparing with a larger GTO basis set does not in general guarantee that one converges to the CBS limit. For energies, the variational principle serves as a guide, but quantifying the basis set incompleteness error (BSIE) for other molecular properties is not a trivial task, and two issues lie at the heart of the challenge: (i) atomic orbital (AO) bases are generally developed by minimization of the total energy as the guiding principle and may therefore not be optimal for molecular properties, and (ii) hierarchical constructions of AO bases do not guarantee any rate of convergence of the molecular properties. While the Hylleraas− Undheim theorem 12 proves that the polarizability for nondipolar molecules is a lower bound of the CBS limit, there is no guarantee that a systematic extension of an AO basis will in practice reach the CBS limit, unless the basis can formally be extended to completeness.
Multiwavelets (MWs) 13 have recently emerged as a powerful alternative to the traditional AO bases and are not subject to the same shortcomings as AO bases. MWs are a particular choice of wavelets 14 used to represent functions and operators on a real-space grid. To overcome the hurdles posed by realspace methods, such as large memory footprint and computational cost, MWs exploit adaptive grid refinement 15−17 and Cartesian separated representation of the required operators. 18 Such features are combined with a rigorous formalism with strict error control. 19−21 For molecular energies, it is possible to request a predefined precision with respect to the CBS limit, and for molecular properties such as polarizabilities, a steady progression toward the corresponding limit is observed. 22 MW calculations of polarizabilities performed at high precision can be employed as a true reference because they can be assumed complete to within the given precision. Such capabilities have been recently exploited in our group to perform two extensive benchmark studies on total and atomization energies 20 and on magnetizabilities and NMR shielding constants. 23 The objective of the present paper has been to use MWs to assess whether aug-pc4 indeed is capable of delivering polarizabilities at the CBS limit, by comparing MW-based polarizabilities to the recent aug-pc4 benchmark. 7 We start by describing the mathematical framework for computing molecular properties with MWs. Next, we report the computational details and present and discuss our results. We also touch upon additional benefits of MWs related to the finite differences (FD) approach, and finish by summarizing our findings. where a, b, c, ... are Cartesian directions. Such an expansion implicitly defines the components of the dipole moment (μ), the polarizability (α), and the hyperpolarizabilities (here limited to the first and second hyperpolarizabilities, β and γ, respectively). The dipole moment components μ a can be obtained as a simple expectation value of the corresponding dipole operators μ̂a. Several approaches can be employed to compute (hyper)polarizabilities. Hait and Head-Gordon 7 have employed a second-order FD expression of the energy. For the diagonal components of the polarizability tensor, the expression reads Such an expression is formally equivalent to taking the derivative of eq 1 with respect to the external field and then applying a linear finite difference formula to the dipole moment:

THEORETICAL
Both formulas have a leading error term that is quadratic in the applied field and proportional to the second hyperpolarizability. To minimize the error, it is therefore necessary to employ small fields, especially for molecules with large γ.

Multiwavelets.
Wavelet theory is a relatively recent branch of mathematics, dating back to the 1980s. 14,25 It constructs functions with the following properties: they are localized in both real and Fourier space, they achieve completeness as a limit in the L 2 sense, and they provide rigorous error control. Multiwavelets are a particular kind of wavelets that include several functions in one interval, as the "multi" prefix suggests. For the construction of MW bases and details about their properties, we refer to the literature on the subject. 13,15 Finite Field Polarizability with Multiwavelets. In 2004, Harrison and co-workers 19 for the first time used MWs to solve the Kohn−Sham (KS) equations of DFT, demonstrating that arbitrary precision with respect to the CBS limit can be achieved also for general molecular systems; 19,26,27 previously, this was possible only for very small and highly symmetric molecules. 28 Due to the large number of primitive MW basis functions necessary for the precise represention of the molecular orbitals, it is not practical to solve the KS equations in the traditional way by constructing the primitive Fock matrix and solving the corresponding Roothaan equations. Instead, the equations are rewritten in integral form using the bound-state Helmholtz integral operator, which is given as the inverse of the kinetic energy operator shifted by the orbital energy Ĝi = (T̂− ϵ i ) −1 . There are several benefits with this reformulation: (i) it avoids the explicit construction and diagonalization of the primitive Fock matrix; (ii) it allows for different and adaptive primitive basis sets for each orbital; (iii) it avoids the application of the kinetic operator as a second derivative, which is not numerically stable in the discontinuous MW basis; and (iv) the implicit construction of a huge virtual orbital space is not necessary, and one solves instead only for the occupied orbitals by iterating eq 4 to self-consistency.
In the presence of a uniform electric field F ⃗ , the KS potential operator in eq 4 reads which features the nuclear (V̂n uc ), Hartree (J), and exchangecorrelation (V̂x c ) potentials. By solving the corresponding KS equations to obtain the ground state density ρ = ∑ i |φ i | 2 , we can compute the electric dipole moment as a function of field strength from the expectation value This procedure can be used to approximate the electric polarizability through the finite difference formula in eq 3. Linear Response Polarizability with Multiwavelets. The starting point to obtain linear response properties with multiwavelets is standard perturbation theory. A small perturbation ĥ( 1) is introduced, and all terms in eq 4 are expanded to first order into a set of Sternheimer equations, 22,29−31 which can be written in integral form where Ĝi is the same as that for the ground state problem and ρ̂( 0) is the ground state density projector, while V̂( 0) = V̂n uc (0) + Ĵ( 0) + V̂x c (0) and V̂( 1) = Ĵ( 1) + V̂x c (1) are the unperturbed and firstorder perturbed potential operators, respectively. At this point, all unperturbed quantities are already known from solving the ground state problem, whereas first-order quantities are obtained by iterating eq 7 to self-consistency. The perturbed orbitals are then used to build the corresponding density perturbation Here we have assumed real, time-independent perturbations: only one set of real, perturbed orbitals is obtained, which simplify the expression for the perturbed density.
The polarizability tensor is computed as the expectation value of the dipole operator μ⃗, on a density perturbed by the same operator For details about the general derivation of time-dependent and imaginary (magnetic) perturbations in a MW framework, we refer the reader to the works by Sekino et al. 22 and Jensen et al., 23 respectively.

COMPUTATIONAL DETAILS
Cartesian coordinates of the species studied here were obtained from Hait and Head-Gordon, 7 and a list of the species and their spin multiplicities is given in Table 1. The set of 124 species includes 92 closed-shell and 32 open-shell systems. The set is slightly smaller than the original one provided in the mentioned benchmark, 7 due to convergence issues encountered for the remaining species (missing species: The relative numerical precision was set to ϵ rel = 1 × 10 −7 , the MW polynomial order to 11, and the norms of the orbital residuals between consecutive iterations (∥ϕ i n+1 − ϕ i n ∥) were converged to within ϵ mo = 1 × 10 −6 , both for unperturbed and perturbed orbitals.
In general, it is expected that the converged total energy should be correct at least within ϵ rel with respect to the CBS limit (relative precision). The orbital convergence necessary to reach this precision in total energy is roughly ϵ ϵ = mo rel because of quadratic error propagation. However, since we are also interested in properties with linear error propagation (dipole moment and polarizability), we converge the orbitals well beyond this point in order to get the maximum number of digits out of the chosen numerical precision ϵ rel , 35 and we then expect around ϵ mo absolute precision in dipole moment and polarizability. Static polarizabilities were computed with DFT using the LDA 36 and PBE 37 functionals, provided by the XCFun library. 38 Closed-shell species were treated with the spinrestricted formalism, and open-shell species, with the spinunrestricted formalism. We used the central two-point finite difference formula of eq 3 to compute the diagonal elements α aa of the polarizability tensor. Field strengths of ±0.001 au were used for all species. Calculations without an applied electric field were first performed to generate initial orbitals for the FD calculations. Initial orbitals for zero-field calculations were generated by the superposition of atomic densities (SAD) method. 39 All MW calculations benefited from the Krylov subspace accelerated inexact Newton (KAIN) convergence accelerator. 40 To validate our results, we also used MWs to compute static polarizabilities with linear response (LR) for a subset of the species. PBE response calculations were performed for 17 of the 124 species (closed-shell only), while LDA response  7 They used the energy expression in eq 2 to estimate the polarizability using a field strength of 0.01 au. However, they identified a few cases that were contaminated by hyperpolarizabilities, for which they reduced the field strength to 0.001 au, but this diagnosis was performed only at Hartree−Fock level and simply transferred to the DFT calculations.
In order to assess if further contamination could be present in the DFT results of Hait and Head-Gordon, 7 we performed analytical polarizability calculations using the ORCA program package, version 4.1.2, 41 with the PBE functional and the aug-pc4 basis set. 8−11 All species were treated with the spinunrestricted formalism, and the integrals were computed over an angular Lebedev grid consisting of 770 points and a radial grid consisting of 50, 55, and 60 points for first, second, and third row elements, respectively ("grid7"). Self-consistent field convergence was accelerated by the direct inversion of the iterative subspace (DIIS) method. 42, 43 The total energy change was converged to within 1 × 10 −9 E h , and the one-electron energy change to within 1 × 10 −6 E h (as defined by the "VeryTightSCF" ORCA keyword). The (default) resolution of identity (RI) approximation was turned off for all calculations in order to guarantee benchmark quality of the results (some initial test runs indicated a large dependence on the choice of auxiliary basis set).
Polarizabilities from different calculations were compared using the relative error (RE) metric, which for species n was given by

Linear and Degenerate Open-Shell Systems.
We have given special treatment to seven species in our data analysis (vide infra). In order to motivate this decision, it will be useful with a reminder of the electronic structure of linear and open-shell systems with a degenerate ground state. Such systems are particularly challenging to model for mean-field methods such as DFT. Let us consider NO as a prototypical molecule. It has an unpaired electron in a π orbital. Ideally, π x and π y are degenerate, but mean-field approaches break the symmetry as soon as one of the two orbitals is populated (the density and hence the KS potential become non-totally symmetric). For such systems, Hait and Head-Gordon 7 reported identical values for α xx and α yy , which is not what we observed: our MW-FD polarizabilities show that one component (the larger one) is virtually identical to the GTO-FD value, whereas the other is slightly smaller. According to Hait and Head-Gordon, 44 the smaller component should in this case be discarded as being unphysical, in connection to the symmetry breaking occurring for mean-field approaches. 45 Since the main objective of the present paper is to quantify the BSIE for the GTO basis set aug-pc4, we decided that the fairest analysis could be made by performing the same procedure as that by Hait and Head-Gordon. 7 We therefore explicitly set α xx = α yy in our MW data set by selecting the component closest to the xx/yy component reported by Hait and Head-Gordon. 7 The seven species that received the above treatment in our data analysis are SCl, OCl, OH, OF, SF, BN, and NCO. To qualify for the special treatment, they had to fulfill the following three criteria (to within a tolerance of 1 × 10 −4 ): 1. α xx = α yy in Hait and Head-Gordon's data set 2. α xx ≠ α zz in Hait and Head-Gordon's data set 3. α xx ≠ α yy in our data set Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article 3.5. MW vs GTO: Practical Considerations. Availability. MRC hem is one of two programs currently available that offer an all-electron MW basis (the other is MADNESS 46 ). Detailed instructions on how to obtain and compile the MRC hem code, as well as a user manual, are available on the documentation web page. 34 Ease of Use. From a user standpoint, selecting appropriate GTO basis sets for a particular application is not a simple task: The high parametrization of GTO basis sets means the user has to carefully evaluate several factors, for example, which family of GTOs to use (Pople, Jensen, Karlsruhe, Ahlrichs, Dunning, etc.), how many polarization functions to use, how many diffuse functions to use, whether to treat different atoms differently, and so on. Although the result can be converged to a limit within the given basis, no knowledge about the CBS limit can be inferred from it. Selecting the best basis is not trivial, and suboptimal choices based on "habit" and "popularity" are common (analogous to the "zoo" of DFAs 47 ).
For MW calculations, all the user must do is to specify an overall numerical precision, in terms of convergence thresholds for energy and orbitals. This precision parameter is relative to the exact CBS limit, which is a key distinction from GTO. MWs can therefore provide the user with excellent precision and a quantifiable error without expert knowledge about basis sets.
Cost and Performance. At present, a calculation at moderate precision is cheaper to perform with GTOs because of a smaller prefactor. At very high precision, MW calculations become more competitive due to a linear scaling with respect to the precision. The most severe limitation of MWs is the memory requirements, which is rather demanding. For the molecules used in the present work, the total memory needed for the MW-FD calculations was typically between 50 and 150GB ( Figure 4 in the Supporting Information), although this is rather efficiently distributed across several compute nodes on a cluster. The number of CPU hours needed for the MW-FD data set is presented in Figure 5 in the Supporting Information. Figure 1 presents plots of computation time against increasingly larger GTO and MW basis sets for the calculation of total energy, dipole moment, and polarizability for the SiH 3 Cl molecule: it shows that each additional digit of precision for MWs requires a predictable doubling of CPU time, while moving along the aug-pcn (n = 1, 2, 3, ...) series increases the computational cost by a larger factor, without a guarantee of gaining an additional digit in precision.

RESULTS AND DISCUSSION
A challenge in computational benchmark studies is the precision of the basis set: one has to assume that the computed reference property is at the CBS limit. The large GTO basis set aug-pc4 7,48 has been assumed to be close to the CBS limit for electrical properties. Here, we attempt to evaluate if aug-pc4 indeed is at the CBS limit for static polarizability predictions, by quantifying the BSIE associated with this basis set. We do this by comparing our reference MW polarizabilities to a recent aug-pc4 benchmark on DFT static polarizabilities. 7 All data presented herein are available via the Supporting Information accompanying this Article, or as a separate document at the Dataverse open-data repository. 49 In order to isolate BSIEs, we need a detailed understanding of other potential errors, and in particular the error associated with using a finite field approach. In order to assign errors to the right source, we have considered the following types of calculations: (1) GTO-FD calculations; (2) MW-FD calculations; (3) GTO-LR calculations; (4) MW-LR calculatations.
The comparison of GTO-FD vs MW-FD is the central point of this contribution. The comparison of GTO-FD vs GTO-LR will shed light on potential errors due to finite field effects with GTOs; the comparison of MW-FD vs MW-LR will show how much MW results are affected by the FD approach, and the comparison of MW-FD with GTO-LR will be used to doublecheck that the discrepancies observed have been attributed correctly. The RE distributions listed here are summarized in Figure 2. The maximum absolute RE and MRE of the LDA validation were 0.23 and 0.011%, respectively. The PBE validation yielded similar statistics. The PBE set was limited due to convergence problems. Nevertheless, we see no indication that LDA and PBE behave differently.
Based on the very high numerical precision that has been used throughout, we expect that the remaining discrepancy between MW-FD and MW-LR is due to the field strength of 0.001 au, although this has not been verified numerically. We conclude that our MW-FD polarizabilities have field-related errors at least below 1% but usually much lower than this.
4.2. How Good Are FD Results with the aug-pc4 Basis? To judge the quality of the FD aug-pc4 results, we    Figure 2 and in more detail in Figure 3. Several features are revealed: 1. The error distribution suggests that FD aug-pc4 on average performs quite well, yielding a RE smaller than ±0.5% for most species. 2. GTOs seem to overestimate static polarizabilities, which may be counterintuitive as analytical polarizabilities are variationally approached from below. 12 Figure 3 arose from the FD approach, we compared the GTO-FD polarizabilities to computed GTO-LR values. The RE distribution for this comparison, using the analytical polarizabilities as a reference, is presented in the second plot in Figure 2 and in more detail in Figure 4. At first sight, the distribution is very similar to the one presented in Figure 3, indicating that GTO-FD polarizabilities have been contaminated by external field-related effects (the aug-pc4 benchmark study 7 used a field strength of 0.01 au for most species). To rule out the possibility that the two distributions incidentally show similar shapes, we plotted the two distributions against each other, species for species in Figure 5. Linear regression yielded an r 2 value of 0.97. Here it is clear that the error for one species is more or less the same across the two distributions, further indicating that the GTO-FD polarizabilities have been contaminated. Thus, our results show that the observed deviations between MW and GTO (aug-pc4) polarizabilities in Figure 3 are predominately fieldstrength-related errors in the GTO-FD values.
4.4. What Is the True BSIE? In order to return to our original objective, the estimation of BSIEs in static polarizability predictions with the aug-pc4 basis set, we ultimately chose to compare GTO-LR and MW-FD values. Based on the above discussion, this comparison should yield the fairest estimation of the BSIE of aug-pc4. The RE distribution is presented in the center plot in Figure 2, and in more detail in Figure 6, and it is clear that the REs have been dramatically reduced for almost all species. Two species stand out with REs larger than 0.5%: HO 2 (2.1%) and Na (0.9%). While both have open-shell (doublet) electronic structures, it is not clear what the origin of their relatively large REs may be. Despite the two outliers, the comparison in Figure 6 shows that the BSIE for aug-pc4 is very small. 4.5. Multiwavelets Can Handle Smaller Field Strengths than GTOs. Using FD calculations to estimate molecular response properties is a very simple approach, but it requires a careful consideration of the applied field strength. A weak field is required in order to stay within the linear regime, but this at the same time leads to the amplification of numerical errors due to cancellation of significant digits in the nominator of eq 3; a large field reduces numerical noise but simultaneously increases nonlinear effects from higher-order responses, leading to deviations from the correct result. The optimal compromise is therefore the weakest possible field that induces a sufficiently large first-order response in the dipole to obtain a sufficient number of digits in the polarizability. Examples of nonlinear behavior for a few species are presented and briefly discussed in the Supporting Information.
The MW framework guarantees that the computed dipoles are at the CBS limit with a controlled and systematically improvable precision: 50 the number of correct digits in the calculated polarizabilities can be improved systematically by tightening the precision thresholds. Therefore, MWs can make use of very small fields (10 −3 or less) to eliminate higher-order responses, while still controlling the numerical noise by making use of tighter thresholds. As shown in Figure 1, even aug-pc4 has an error of roughly 10 −3 in the energy as well as in the dipole moment, whereas the best MW calculation (MW7) yields three additional digits (10 −6 ). Making use of such a small field for GTOs will therefore heavily rely on error cancellation.

CONCLUSIONS
We have shown that GTO-FD polarizabilities presented by Hait and Head-Gordon 7 display quite large errors, considering the size of the aug-pc4 basis set used. However, we conclude that these errors mainly originate from field strength-related effects and not from BSIEs. Indeed, GTO-LR polarizabilities computed with aug-pc4 are very close to the CBS limit, which we have confirmed by comparing to very precise MW reference calculations. Specifically, we show that the observed errors exceeding 1% in GTO-FD polarizabilities are attributed to a field strength of 0.01 au, while the MRE of 0.06% in GTO-LR polarizabilities is attributed to the BSIE of aug-pc4. The internal validation of our MW results demonstrates that MW-FD polarizabilities can be made virtually free from field strength-related effects, because numerical issues arising from using very small fields can be countered by tightening the MW thresholds. Our MW-FD polarizabilities using a field strength of 0.001 au show a MRE of 0.02% relative to a MW-LR reference.
For future benchmarks of any property, we recommend to validate that the reference data indeed is at the CBS limit by comparing to MW results.