All-Purpose Measure of Electron Correlation for Multireference Diagnostics

We present an analytical relationship between two natural orbital occupancy-based indices, and INDmax, and two established electron correlation metrics: the leading term of a configuration interaction expansion, c0, and the D2 diagnostic. Numerical validation revealed that and INDmax can effectively substitute for c0 and D2, respectively. These indices offer three distinct advantages: (i) they are universally applicable across all electronic structure methods, (ii) their interpretation is more intuitive, and (iii) they can be readily incorporated into the development of hybrid electronic structure methods. Additionally, we draw a distinction between correlation measures and correlation diagnostics, establishing MP2 and CCSD numerical thresholds for INDmax, which are to be used as a multireference diagnostic. Our findings further demonstrate that establishing thresholds for other electronic structure methods can be easily accomplished using small data sets.


■ INTRODUCTION
−3 To effectively train artificial neural networks for various tasks, such as image recognition, natural language processing, and drug discovery, large-scale screening of extensive data sets is essential.−7 Insights into the MR character of a molecule require knowledge of the exact wave function and an uncorrelated counterpart.The distance between these wave functions is assessed using a specific metric, giving rise to a correlation measure.There is an undeniable arbitrariness in the choice of the metric, giving rise to many possible correlation measures. 8owever, some reasonable restrictions might be imposed to constrain the set of valid measures.Bartlett et al. logically impose size-intensiveness to make the measures comparable among systems of different sizes. 9Furthermore, a pragmatic approach suggests the development of measures that can be easily implemented across various electronic structure methods.−18 Nevertheless, the primary and most common application of correlation measures remains their role as MR diagnostics.
In our perspective, MR diagnostics bring forth an additional criterion for correlation measures: the capacity to capture the most profoundly correlated aspects of the system.This principle holds true regardless of the system's size; even if only a small fraction of it exhibits features necessitating a MR description, it justifies the classification of the molecule as MR.In this context, it is only logical to employ correlation measures founded on the concept of maximal distance between wave functions.While this criterion does introduce additional constraints on correlation measures, it still allows for a considerable degree of flexibility.The T 1 diagnostic 19 relies on the Frobenius norm of t 1 coupled-cluster amplitudes, representing an average value.In contrast, D 1 20 and D 2 21 diagnostics are grounded in the 2-norm of matrices constructed from the t 1 -amplitude and t 2 -amplitude tensors, respectively.These norms are associated with the maximum eigenvalues of these matrices and are better suited to MR diagnostics.−24 In practice, the correlation measure requires an approximate wave function�that introduces electron correlation to some extent but is far from the exact wave function�and an uncorrelated reference such as the Hartree−Fock (HF) wave function.HF is the most natural choice for wave function methods.The HF wave function corresponds to a single Slater determinant including the HF orbitals that are obtained from a variational minimization of the electronic energy subject to the orthonormality constraint of the orbitals.The resulting HF electron density presents a simple diagonal representation in terms of the HF canonical orbitals, where some orbitals are occupied (with an occupation of one electron each), while others are regarded as virtual and do not contribute to the electron density (occupation equal to zero).In general, post-HF wave functions do not admit a diagonal representation of the electron density in terms of HF orbitals, the orbitals providing a diagonal representation of the electron density being the natural orbitals. 25Essentially, HF orbitals can be regarded as the natural orbitals of the HF wave function, the occupation of which is fixed by their energy and the number of electrons in the system.Natural orbital occupancies of post-HF wave functions are generally noninteger, the deviation from the closest integer number, 0 or 1, providing a straightforward way to assess the effect of electron correlation in the density.In this sense, measures of electron correlation based on natural orbital occupancies (NOOs) 9,24−32 are a most convenient way of measuring the effects of electron correlation as they only require the natural orbital representation of the electron density of the target method.
In a recent investigation, 6 Kulik and colleagues leaned on the NOO indicators developed in our group. 31,32These indicators were favored due to their ability to exhibit robust linear correlations, achieving a Pearson coefficient of 65% with energy-based correlation metrics like the contribution of perturbative triples to the CCSD(T) correlation energy, %E corr [(T)].This study suggests that indicators like D 2 (75%) might offer a more accurate reflection of the MR character; however, they are typically computed only for coupled-cluster wave functions.While it is possible to calculate D 2 using MP2, it is not a common practice.
In this article, we propose a NOO correlation measure that meets the aforementioned conditions and can thus be employed as a MR diagnostic.We show that this quantity is connected to D 2 , however, unlike the latter, it can be straightforwardly used for all types of electronic structure methods, provided NOOs are available.In this regard, it can be even employed for density functional approximations as fractional occupancies can be derived from Kohn−Sham orbital energies 7,30,33 or unrestricted Kohn−Sham NOOs. 34−39 ■ METHODOLOGY Some of us have recently defined indices of dynamic and nondynamic correlation based on NOOs, 31,32 rooted on the Lieb partition 40−42 of the pair density 43,44 = [ ] (1 ) where n i σ stands for the ith natural orbital occupation with spin σ.For convenience, we assume natural orbitals are arranged in descending order based on their occupation.I ND coincides with the deviation from idempotency of the first-order reduced density matrix, 25,26 takes values in the interval [0, N/2], and for closed-shell systems can be recasted as where n i now takes values between zero and two.All these quantities are size-extensive, as noted by some authors, 45 and I ND can be easily converted to a size-intensive one where N stands for the number of electrons and the last equality only holds for closed-shell systems.Sometimes it is convenient to split I ND into pseudo-occupied (corresponding to the N α and N β highest occupied) and the pseudo-virtual (the rest) spin natural orbitals i k j j j j j j j y and as other authors have found, 9 we have heuristically identified that which reflects the symmetric nature of particles and holes often encountered in the first-order reduced density matrix of molecular systems (see Figure S1 for a convincing numerical test).Hence From I ND , we can construct the maximum counterpart 4 .However, these maximal values are reached for very different situations.The maximal value of I ND max is achieved for = n 1 2 ; consequently, we only require one natural orbital with an occupation midway between being occupied and empty.On the other hand, the supremum of I ND is approached when we have a very large number of orbitals relative to the number of electrons, all of which attain a very small occupation value (∀i, n i = N/M ∧ M → ∞, M being the number of orbitals).This situation requires an extremely correlated regime only achieved in very peculiar model systems. 14,39In practice, for regular molecular systems, large I ND usually obtained because one or various occupation numbers deviate significantly from 0 and 1 (hence, they are somewhat close to ) contributing to a large value relative to the total number of electrons.Since I D is not bounded above, its sizeintensive version becomes complicated and necessarily uses a different normalization than I ND .However, for practical purposes, we suggest employing = I I N D 2 D , in analogy to I ND .In Figures S14 and S15 in Supporting Information, we demonstrate that I ND has a small sensitivity to the basis set, whereas I D varies much more significantly with the quality of the basis set as expected from dynamic correlation measures.
In the following, we will show how I ND and I ND max can be analytically connected to other existing measures of electron correlation.To this end, we will first consider two approximate wave functions: a configuration interaction singles and doubles (CISD) and a closed-shell MP2 wave function.Finally, the relationships found for these methods will be numerically tested for other wave function methods to assess their general validity.

Connection between the CI Leading Expansion Coefficient and I
ND .Let us assume a CISD wave function expansion ( ,. . ., ) where we have used the short-hand notation r 1 ( , ) 1 1 to indicate the spatial and spin coordinates of electron 1. Φ I corresponds to Slater determinants constructed from HF orbitals, and c I are the expansion coefficients, c 0 being the one associated with the HF Slater determinant and assumed to be the dominant one.Without loss of generality, Ψ is normalized to 1.For convenience, we define the following quantities where c i a and c ij ab represent the coefficient expansions for single and double excitations, respectively.These expansions are associated with the Slater determinants formed by substituting orbital i with orbital a and orbitals (i, j) with orbitals (a, b).i, j and a, b are occupied and virtual orbitals in the HF wave function, respectively.
Hence, we can write Let us consider the electron density constructed from this wave function which can be expanded in terms of the HF molecular orbitals, {ϕ i (1)} with We can decompose the trace of 1 D into two parts, corresponding to the submatrices we can construct from the orbitals that are occupied and empty in the HF picture  (16)   which can be calculated from where we have assumed that the role of single excitations is significantly smaller than twice of all the excitations.For a dominant block-diagonal 1 D matrix, as expected for nonhighly correlated systems, we have From the latter two approximations and eq 7, we find connecting the leading term in the CI expansion with a naturalorbital-based measure of electron correlation.In Figure 1, we can numerically confirm this relationship for various diatomic molecules at equilibrium and larger interatomic distances, as well as a collection of 18-electron systems.Notice that both CISD and a selected CI expansion give excellent agreement.For the same token, it becomes evident that (1-c 0 2 ) should exhibit a proportional relationship with I ND and, considering the latter is not size-intensive, it follows that c 0 2 also lacks this property.Interestingly, the leading term in the size-intensive version of c 0 is consistent with the approximation obtained for I ND in eq 20 as is numerically confirmed in Figure S12.These results strongly suggest that, notwithstanding the well-founded criticisms raised in previous studies, 46−48 where i, j and a, b denote occupied and virtual orbitals, respectively, t ij ab being first-order corrections to double amplitudes. 50For convenience, let us define L vir 1 (28)   n i the NOOs for the closed-shell unrelaxed MP2 density matrix arranged in descending order.
If we disregard the last term in eqs 22 and 23, we construct an approximate 1 D, containing these blocks ab ab vir 1 (30)   that bring approximate counterparts of eqs 27 and 28 (32)   which can be connected with n H and n L 51 Since and are semipositive defined,   Nielsen and Janssen 21 defined the 2 diagnostic for closedshell molecules as 2), ( 2) On the other hand, we can rewrite eq 8 as which we can now connect to D 2 ) max for an unrelaxed MP2 electron density including up to second-order corrections in which we have neglected some terms.In practice, the latter terms contribute to the electron density and thus affect the value of natural occupancies.In Figure 2, we can identify the effect of this approximation.Since n H and n L overestimate electron correlation, we know that systematically I I ND max ND max .Indeed, we find good correspondence between D 2 and I ND max for MP2 calculations (see Figure 2 for a large-set validation).As we can see in Figure 3, the relationship is not limited to MP2, giving a very good correlation for CCSD too.A separate analysis of the HEAVY28 and HEAVYSB11 data sets (included in GMTKN55, 52 which is part of Set D) reveals that this connection holds also for transition-metal complexes (see Figure S13), hence the transferability issues for diagnostics like T 1 are not of concern in the present context. 53he strong correlation between ND max and D 2 starts to diminish at large values of D 2 , well above the recommended MR diagnostic thresholds (D 2 > 0.15−0.18).We can therefore interchangeably utilize either D 2 or I ND max in the design of an MR diagnostic.Since I ND max can be employed for all sorts of wave functions, its use is recommended over D 2 , which can be only computed for MP2 and CC, but in practice is often limited to CC.
Nielsen and Janssen 21 set D 2 = 0.15 as the boundary below which is safe to trust MP2 and CCSD results, hence the molecule can be classified as non-MR.Conversely, D 2 ≥ 0.17 (MP2) and D 2 ≥ 0.18 (CCSD) signal the molecule as MR.In between these values, the molecule should be considered with caution.Based on the fittings of I ND max against D 2 for sets A, C, D, and E (see Figures 2 and 3), we find an equivalence with the D 2 thresholds established by Nielsen and Janssen (see Table 1). 21For the same method, the MR diagnostic thresholds of I ND max barely change regardless of whether we use a modest data set like set A and C or whether we employ much larger ones such as D or E, setting some confidence on the connection between both sets of thresholds.In all cases, the agreement between the diagnostics (the percentage of molecules equally classified as MR or non-MR) is above 94%, validating the replacement of D 2 by I ND max .

■ CONCLUSIONS
In this study, we presented the analytical relationship between two NOO-based indices and two established electron correlation metrics: the leading term of the CI expansion, c 0 , and the D 2 diagnostic.Numerical validation revealed that I ND and I ND max , which are straightforwardly connected to the deviation from idempotency of the first-order reduced density matrix, can effectively substitute for c 0 and D 2 , respectively.
Utilizing I ND and I ND max offers three distinct advantages: 1.They are universally applicable across all electronic structure methods, as long as NOOs are accessible.2. Their interpretation is more intuitive, given that they correspond to the average and the maximal deviation of NOOs from zero and one, respectively.Notably, juxtaposing both indices offers a comprehensive evaluation of electron correlation within a molecule, but only I ND max is recommended as MR diagnostic.3.As they solely depend on NOOs, these indices can be effortlessly incorporated into the development of hybrid electronic structure methods.Furthermore, we have furnished numerical thresholds for I ND max pertinent to MP2 and CCSD wave functions, facilitating its use as MR diagnostic.Owing to the consistent alignment between D 2 and I ND max thresholds across different data sets, deriving thresholds for alternative electronic structure methods using compact test sets, such as A and C, becomes straightforward.

■ COMPUTATIONAL DETAILS
All CISD and CCSD calculations and geometry optimizations were carried out with Psi4, 54−56 whereas all MP2 and Selected Configuration Interaction (SCI) calculations 57,58 were carried out with PySCF. 59,60The SCI values should be fairly close to those of the FCI solution.To effectively establish a connection between correlation measures, it is not essential to focus primarily on the nature of the data sets.Instead, our approach emphasizes the selection of tests that encompass a diverse range of correlation regimes and a wide variety of molecules in our study.In particular, we considered the following test sets: • Set A: Seven small diatomic molecules at the equilibrium distance (R e ) and stretched geometry (1.5 R e ) studied by Handy et al. 61 at the CISD/cc-pVTZ and CCSD/cc-pVTZ levels of theory.• Set B: Crittenden and Gill's 62 18-electron systems extended with other such molecules.The geometries were optimized at the MP2/6-31G* level, and singlepoint calculations were performed at the CISD/6-311G and SCI/6-311G levels of theory.This reduced basis set was employed due to the SCI computational cost.• Set C: 34 small molecules of Nielsen and Janssen 21 from which they defined the D 1 and D 2 diagnostics.Geometries and single-point calculations were performed at the MP2/cc-pVTZ level of theory.Set D: A 5090 closed-shell molecular set obtained from merging part of the GMTKN55 data set 52 and the AD-3165 data set. 7The former was included in order to cover the spectrum of low-correlated molecules, which was undersampled with the AD-3165 data set.From GMTKN55, we considered 1925 closed-shell molecules among the total 2462 molecular structures, excluding molecules that carried a considerable computational cost.Calculations were done at the MP2/def2-qzvp level.From the AD-3165 data set of Kulik and coworkers, 7 we extracted geometries and performed the MP2/cc-pVTZ calculations.
) the latter equality being valid only for closed-shell molecules.One can easily prove that =

Figure 1 .
Figure 1.Relationship between 4(1-c 0 2 )/N and I ND for (left) Set A, including various diatomic molecules at equilibrium and 50% larger interatomic distances at the CISD level, and (right) Set B containing several 18-electron systems studied at the selected CI (SCI) level.

Figure 2 .
Figure 2. I ND max and I ND max against D 2 the molecules in Set C (left) and I ND max against D 2 for the molecules in Set D (right).All calculations were performed at the MP2 level.

Figure 3 .
Figure 3. Relationship between D 2 and I ND max for Set A (left) and Set E (right) at the CCSD level.
with appropriate normalization, c 0 can serve as a reliable correlation measure.In practical applications, the use of I ND is recommended over c 0 because it is not limited to CI expansions.
n H and n L overestimate electron correlation.In Figures S8−S11, it can be checked that this overestimation grows with electron correlation.

Table 1 .
Equivalence between D 2 and I ND max Thresholds for MR Diagnostics • Set E: A 311-molecule diverse set with D 2 ≤ 0.4 computed at the CCSD level.See the Supporting Information for the detailed list of molecules.The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c01073.
■ AUTHOR INFORMATION