Benchmarking Ionization Potentials from pCCD Tailored Coupled Cluster Models

The ionization potential (IP) is an important parameter providing essential insights into the reactivity of chemical systems. IPs are also crucial for designing, optimizing, and understanding the functionality of modern technological devices. We recently showed that limiting the CC ansatz to the seniority-zero sector proves insufficient in predicting reliable and accurate ionization potentials within an IP equation-of-motion coupled-cluster formalism. Specifically, the absence of dynamical correlation in the seniority-zero pair coupled cluster doubles (pCCD) model led to unacceptably significant errors of approximately 1.5 eV. In this work, we aim to explore the impact of dynamical correlation and the choice of the molecular orbital basis (canonical vs localized) in CC-type methods targeting 230 ionized states in 70 molecules, comprising small organic molecules, medium-sized organic acceptors, and nucleobases. We focus on pCCD-based approaches as well as the conventional IP-EOM-CCD and IP-EOM-CCSD. Their performance is compared to the CCSD(T) or CCSDT equivalent and experimental reference data. Our statistical analysis reveals that all investigated frozen-pair coupled cluster methods exhibit similar performance, with differences in errors typically within chemical accuracy (1 kcal/mol or 0.05 eV). Notably, the effect of the molecular orbital basis, such as canonical Hartree–Fock or natural pCCD-optimized orbitals, on the IPs is marginal if dynamical correlation is accounted for. Our study suggests that triple excitations are crucial in achieving chemical accuracy in IPs when modeling electron detachment processes with pCCD-based methods.

The reliable determination of ionization potentials (IP) is crucial for the theoretical modeling of molecular electronic structures and molecular properties.The IP provides information about the system's reactivity as it facilitates measuring the strength of one electron being attached to the molecular bulk and quantifying the molecule's ability to form a more positively charged ion.This information can be further utilized to design, optimize, and comprehensively understand the functionality of modern technological devices such as photovoltaic (PV) cells, light-emitting diodes, or sensors. 1,2For instance, a critical factor in designing novel organic-based donor and acceptor molecules in organic PV devices 1 is the knowledge of the energies of the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) and the corresponding HOMO-LUMO gap.From a theoretical perspective, one of the simplest approximations to deduce orbital energies exploits the diagonal elements of the Fock matrix and the electron repulsion energy. 3More reliable orbital energies can be, for instance, obtained from the ionization potential (IP) [4][5][6][7] and electron affinity (EA) 8 variants of equation-of-motion (EOM) [9][10][11] applied on top of a coupled cluster reference wave function.The resulting IPs and EAs are then exploited to predict the so-called charge gap.
Initial numerical studies 42 demonstrated that the accuracy of IP-EOM-pCCD approaches closely matches the accuracy of CCSD(T) or the density matrix renormalization group [70][71][72][73] (DMRG) algorithm in open-shell electronic structures.We recently presented a benchmark study to assess the accuracy of the IP-EOM-pCCD method in predicting ionization ener-gies. 74In Ref. 74, we compared the vertical ionization energies obtained in the space of one-hole (1h) and two-holeone-particle (2h1p) states for three types of molecular orbitals (canonical Hartree-Fock, Pipek-Mezey localized, and natural pCCD orbitals).Our study suggests that the orbitaloptimized IP-EOM-pCCD method, restricted to the 2h1p operator, demonstrated the highest accuracy among the investigated methods.However, due to the absence of dynamic correlation, we observed unacceptably large errors in IPs of approximately 1.5 eV.
As demonstrated previously, 75,76 natural pCCD orbitals present a promising alternative to canonical Hartree-Fock orbitals, serving as a reference wave function for more sophisticated calculations.Consequently, the question arises whether it is possible to achieve an accuracy comparable to more elaborate approaches (like CCSDT or higher) employing natural pCCD-optimized orbitals in combination with simplified CC ansätze, which account for dynamical correlation.Thus, in the current work, we explore, for the first time, both the effect of dynamical correlation and the choice of the molecular orbital basis (canonical vs. localized) in CC-type methods including up to double excitations.Specifically, we focus on various pCCD-tailored CC flavours [76][77][78] and compare their performance to the conventional CCSD method exploiting a canonical and pCCD-optimized localized molecular orbital basis.Specifically, we investigate the influence of dynamical correlation to the IP values determined by six different approaches using the natural pCCD-optimized orbitals, namely frozen-pair (fp)CC 77 methods (IP-EOM-fpCCD and IP-EOM-fpCCSD), their linearized (fpLCC) 78 variants (IP-EOM-fpLCCD and IP-EOM-fpLCCSD), and conventional IP-EOM-CCD and IP-EOM-CCSD [4][5][6][7] , and compare their performance to the CCSDT equivalent and experimental reference data.
This work is structured as follows: In section I, we briefly review the investigated theoretical models.Section II provides an overview of the computational methodology.Section III presents the numerical results, including a statistical analysis.Finally, we conclude in Section IV.

I. THEORY
The pCCD [43][44][45][46] ansatz is a simple reduction of the singlereference CCD approach, where the cluster operator only contains electron-pair excitations TpCCD , and where Φ 0 ⟩ is some reference determinant, âp ( â † p ) are the elementary annihilation (creation) operators for α (p) and β (p) electrons, and c a i are the pCCD cluster amplitudes.The above sum runs over all occupied i and virtual a orbitals.[96][97][98] Although these numerical studies support pCCD to be a promising alternative to capture static/nondynamic electron correlation effects, [99][100][101] a large fraction of the correlation energy cannot be captured by electron-pair states alone.This missing correlation energy is commonly attributed to so-called broken-pair states.These correlation effects are commonly included a posteriori using various state-of-the-art techniques. 467][78] For instance, in the frozen-pair coupled cluster (fpCC) ansatz, 76,77 |fpCC⟩ = e the pCCD wave function is taken as the fixed reference function and the T ext cluster operator contains electron excitations beyond electron-pair excitations.Thus, the cluster operator of fpCCD is defined as T ext = T ′ 2 = T2 − TpCCD , while the cluster operator of fpCCSD includes also single excitations, T ext = T1 + T ′ 2 .3][104][105] The fpCC ansatz can be further simplified by truncating the Baker-Campbell-Hausdorff expansion after the second term (concerning all non-pair excitations) and hence including only linear terms in T ext . 78,106Strictly speaking, the frozen-pair Linearized CC correction (fpLCC) does not fall into the category of tailored CC methods.Nonetheless, we will use the acronym fpLCC due to its simplicity (originally, fpLCC was introduced as pCCD-LCC).The wave function ansatz of fpLCC is approximated as We should note that all disconnected terms containing TpCCD still appear in the fpLCC amplitude equations as the exponential ansatz of pCCD is not linearized.For instance, terms associated with T ext TpCCD and T 2 pCCD have to be considered in fpLCCD-type methods.Thus, in fpLCC, the coupled cluster equations are linear concerning non-pair amplitudes T ext but the coupling between all pair-and non-pair amplitudes is included.
Since we formally work in a single-reference CC picture, we can straightforwardly employ single-reference CC techniques to target, for instance, electronically excited, ionized, and electron-attached states. 8,39,107Another possible extension are spin-flip EOM-CC methods. 108Specifically, for ionized states, we can employ the IP-EOM formalism, where we use a linear ansatz on top of the closed-shell fp(L)CC reference function to parametrize the k-th (ionized) state where the operator R(k) generates the targeted ionized state k from the initial fp(L)CC reference state.In the single IP-EOM formalism, R(k) reads where we introduced the hole (h, encoding âi ) and particle (p, encoding â † a ) labels and dropped the k-dependence for better readability.The ionized states are then obtained by solving the corresponding EOM equations where ω = ∆E − ∆E 0 is the energy corresponding to the ionization process concerning the fpCC ground state, while ĤN = Ĥ − ⟨Φ 0 Ĥ|Φ 0 ⟩ is the normal product form of the Hamiltonian.We can rewrite the above equation in the well-known form where is the similarity transformed Hamiltonian of the used fpCC flavour in its normal-product form The ionization energies are obtained by iteratively diagonalizing of the chosen pCCD-based CC correction.Note that depending on the selected CC model, the similarity transformed Hamiltonian either contains all non-vanishing nonlinear terms (here, H .The diagrammatic representation of the IP-EOM-fpCC equations is shown in Fig. 1 (see also, for instance, Ref. 32 for the diagrammatic form and its algebraic expressions).We should note that the IP-EOM-fpCC(S)D equations are similar in form to the conventional IP-EOM-CCSD formalism.However, in the IP-EOM-fpLCCSD method, the effective Hamiltonian diagrams (a1), (b1), (b2), (b3), (b4), (b5) do not contain the disconnected T1 T1 terms, while, in addition, diagram (b1) lacks the T1 T1 T1 part.Furthermore, the T1 T2 term contained in (b1) is replaced by the simpler T1 Tp counterpart due to the truncation of the BCH expansion of the LCC correction (see also eq. ( 4)).Finally, we should note that we focused on the S z = −0.5 case 8,42 in its spin-dependent and spin-summed versions.2][113] These included IP-EOM-fpCCD, IP-EOM-fpCCSD, IP-EOM-fpLCCD, IP-EOM-fpLCCSD, IP-EOM-CCD, and IP-EOM-CCSD.In all CC calculations (electronic ground states and IP-EOM), the pCCD-optimized orbitals (labeled as "(pCCD)") were used to construct the reference determinant |Φ 0 ⟩, which were obtained through a variational orbital-optimization protocol of the pCCD reference calculation. 44,45,79,80The optimization protocol used in pCCD calculations automatically selects the reference determinant according to the pCCD natural occupation numbers.
The ionization energies were computed in the space of twohole-one-particle (2h1p) states as previous studies indicate that errors in IPs are significantly reduced compared to the use of only one-hole (1h) states. 74No symmetry constraints were applied to allow the algorithm to freely relax the orbitals resulting in a symmetry-broken, localized molecular orbital basis.
A frozen core was used in all CC calculations, keeping the 1s orbitals for C, N, O, and F and 1s, 2s, and 2p orbitals for Si, P, S, and Cl frozen.We should note that preliminary tests indicate minimal impact on the IP values when freezing core orbitals.All calculations employed the cc-pVTZ basis set by Dunning, 114 facilitating a direct comparison with previously published vertical IPs obtained using the IP-EOM-CCSDT model, 109 various forms of the unitary coupledcluster (IP-UCC) approach, and algebraic-diagrammatic construction (IP-ADC) methods. 110rthermore, our test set contains 41 molecules shown in Fig. 2, whose molecular geometries were relaxed using the CCSD(T)/aug-cc-pVTZ method 115 and are available in the supplementary material of Ref. 109.In total, we optimized 201 IP states and compared the performance of our IP-EOM-CC methdos to theoretical and experimental reference data.I defines the error measures used in this work.We should note that we use the labels CCD(pCCD) and CCSD(pCCD) to indicate that the corresponding CCD and CCSD calculations were done employing pCCD-optimized natural orbitals (or equivalently the orbitaloptimized pCCD reference determinant in the CC ansatz).The upper section of Table I presents error values concerning IP-EOM-CCSDT reference data, 109 while the lower section reports the corresponding errors relative to experimental results. 109For a direct comparison, the table includes data from IP-EOM-CCSD calculated with a canonical Hartree-Fock reference determinant (or molecular orbital basis) in-dicated as IP-CCSD(HF), two variations of the unitary coupled cluster ansatz (IP-UCC2 and IP-UCC3), and four variants of non-Dyson algebraic diagrammatic construction schemes (ADC(2), ADC(3(3)), ADC(3(4+)), and ADC(3(DEM)). 110BLE I. Statistical error measures in electronvolts (eV), such as mean error (ME), mean absolute error (MAE), and root-meansquare error (RMSE), were assessed based on the ionization potentials (IP) calculated using various methods: IP-EOM-fpCCD, IP-EOM-fpCCSD, IP-EOM-fpLCCD, IP-EOM-fpLCCSD, IP-EOM-CCD(pCCD), and IP-EOM-CCSD(pCCD), listed in the upper part of each block, are derived in this work.All these CC approaches exploit natural pCCD-optimized orbitals (or the orbital-optimized pCCD reference determinant) and are conducted in the space of two-hole-one-particle (2h1p) states.The corresponding errors in IPs for CCSD done with Hartree-Fock orbitals, unitary coupled cluster (UCC), algebraic-diagrammatic construction (ADC) methods, and pCCD, using natural pCCD-optimized orbitals, are presented in the lower part of each block.The errors in IPs were calculated concerning IP-EOM-CCSDT 109

Table I summarizes our statistical analysis, including mean errors (ME), mean absolute errors (MAE), and root-mean-square errors (RMSE) calculated using IP-EOM-fpCCD, IP-EOM-fpCCSD, IP-EOM-fpLCCD, IP-EOM-fpLCCSD, conventional IP-EOM-CCD(pCCD), and IP-EOM-CCSD(pCCD) of 201 ionized states in 41 molecules shown in Fig 2. The footnote in Table
Furthermore, Fig. 3(a) visualizes, using box and violin plots, the locality, spread, skewness, and distribution of errors of all targeted ionization potentials concerning IP-EOM-CCSDT.Fig. 3(b) displays an equivalent analysis for experimental data.The individual ionization energies obtained by all methods investigated in this work are accessible in the Electronic Supplementary Information (ESI) † .
The effect of adding dynamical correlation, that is including the seniority two and seniority four sectors in the fpCC reference function of the IP-EOM-fpCCD method, provides a considerable improvement of the ionization energies in comparison to the seniority zero orbital-optimized pCCD method.Specifically, the MAE and RMSE errors are reduced by approximately 1.2 eV on average if we go beyond the seniorityzero sector in the CC reference function.This supports the original finding 74 that dynamical correlation is needed to correctly describe the electron detachment process within pCCDbased methods.Using a simplified version of the frozenpair methods, IP-EOM-fpLCCD increases slightly the ME, MAE, and RSME errors compared to IP-EOM-fpCCD by 0.040 (0.040) eV, 0.036 (0.025) eV, 0.028 (0.024) eV, respectively, with respect to IP-EOM-CCSDT reference data (experiment).On the other hand, IP-EOM-CCD(pCCD) (conventional IP-EOM-CCD with an orbital-optimized pCCD reference determinant) decreases the errors of IP-EOM-fpCCD by 0.054 (0.053) eV for ME, 0.049 (0.033) eV for MEA, and 0.043 (0.035) eV for RMSE with respect to IP-EOM-CCSDT (experimental) results.
A similar trend is observed for CCSD-based approaches.Specifically, IP-EOM-fpLCCSD yields the largest errors, while IP-EOM-CCSD(pCCD) (conventional IP-EOM-CCSD with an orbital-optimized pCCD reference determinant) exhibits the smallest errors compared to theoretical (experimental) reference values.Nonetheless, their differences in errors are acceptable, amounting up to around 0.11 eV (or 2.5 kcal/mol).Most importantly, including single excitations slightly reduces the ME, MAE, and RMSE values by 0.041 to 0.061 eV between the frozen-pair CCD and CCSD methods, respectively.Regarding statistical errors, IP-EOM-fpCCSD is the most accurate among all investigated pCCD-based approaches investigated in this work.Specifically, it yields very similar errors to the IP-EOM-UCC3 method identified as the best among recently investigated approximations. 109oncerning IP-EOM-CCSDT (experiment), the relative error measures between IP-EOM-fpCCSD and IP-EOM-UCC3 are ∆∆ME = 0.021 (0.022) eV, ∆∆MAE = 0.013 (0.029) eV, and ∆∆RMSE = −0.022(0.003) eV, where ∆∆ indicates the difference between the IP-EOM-fpCCSD and IP-EOM-UCC3 error measures.
Our box plots (see Fig. 3 top) illustrate that the differences in errors among all fpCC methods are very similar, displaying an almost identical dispersion of 50% of errors (highlighted in yellow and blue boxes).The total range of scope (indicated by the whiskers) diminishes slightly with the addition of single excitations.However, the differences are minimal.The violin plots (see Fig. 3 bottom) highlight interquartile ranges distributed closely around the median.The skewness of errors is left-shifted in all cases.Although the dispersion of results is slightly smaller when using the IP-EOM-CCSDT method as a reference, a similar trend of error dispersion can be seen for both references (theoretical and experimental).All frozen-pair variants exhibited an accuracy range similar to IP-EOM-CCSD conducted with canonical HF molecular orbitals, demonstrating a comparable dispersion and skewness of errors.
The IP-EOM-CCSD results reported by Ranasinghe et al. 109 allow us to directly assess the effect of the choice of the molecular orbital basis on molecular properties, that is if the performance is significantly different between canonical HF and natural pCCD-optimized orbitals.Surprisingly, the errors are almost identical and exhibit resilience to the choice of the reference wave function.Furthermore, the overall appearance of error distribution, skewness, and even the positioning of outliers presented in the box and violin plots in Fig. 3 is almost identical in both cases (differences lie within chemical accuracy).This suggests that including triple excitations in the theoretical model will be crucial for further improving the accuracy of pCCD-based approaches in predicting IPs.

IV. CONCLUSIONS
As recently shown, 74 restricting the CC ansatz to the seniority-zero sector is insufficient in predicting reliable and accurate IPs.Although the seniority-zero pCCD model can capture static correlation reliably, it is inadequate to describe electron detachment with sufficient accuracy.This deficiency was attributed to the missing broken-pair states, that is the exclusion of the seniory-two, seniority-four, etc. sectors.In this work, we investigated the impact of dynamical correlation and the choice of the molecular orbital basis (canonical vs. localized) on vertical ionization potentials using various pCCDbased and conventional CC approaches.Specifically, we studied six CC variants: IP-EOM-fpCCD, IP-EOM-fpLCCD, IP-EOM-CCD(pCCD), IP-EOM-fpCCSD, IP-EOM-fpLCCSD, and IP-EOM-CCSD(pCCD).Throughout this work, we included (up to) 2p1h operators in the IP-EOM formalism as the resulting pCCD-based model turned out to be superior to the corresponding IP-EOM approach restricted to 1h operators. 74ur analysis encompasses a set of 41 molecules, targeting 201 ionized states.These ionization energies are compared to IP-EOM-CCSDT and experimental reference data.Furthermore, our results are juxtaposed with those obtained using various conventional CC methods, UCC flavors, and non-Dyson ADC second and third-order schemes.
Our statistical analysis (mean errors, mean absolute errors, root mean square errors) highlights that all investigated frozen-pair coupled cluster methods feature similar performance.Specifically, the differences in errors are typically within chemical accuracy (1 kcal/mol or 0.05 eV).Adding single excitations slightly reduces error measures with respect to the corresponding CCD model.Yet, these changes approach chemical accuracy, constituting approximately 0.06 eV.Our benchmark data renders IP-EOM-fpCCSD the bestperforming method among all tested frozen-pair variants.We should stress, however, that differences between the inves- tigated frozen-pair methods are nearly invisible on box and violin plots.Furthermore, the scattering of errors and their distribution around the median make them comparable to the conventional IP-EOM-CCSD method.On the other hand, the error measures of IP-EOM-fpCCSD are comparable to the accuracy of IP-EOM-UCC(3), identified as the best among recently investigated approximations. 109Finally, the influence of the molecular orbital basis or CC reference determinant (that is canonical vs. localized) is marginal as the conventional IP-EOM-CCSD and IP-EOM-CCSD(pCCD) result in almost identical errors in ionization potentials.This observation suggests that triple excitations are crucial for further improving IPs and approaching chemical accuracy for modeling electron detachment processes with pCCD-based methods.
non-linear terms associated with the electron-pair excitation operator (here, H fpLCCD N and H fpLCCSD N
FIG.3.Box plots presented at the top and violin plots at the bottom, illustrating errors [eV] derived from selected methods (refer to Table1for numerical values).All errors are reported relative to either (a) IP-EOM-CCSDT or (b) experimental reference data.For brevity, we have omitted the EOM prefix in IP-EOM-CC-type methods.A star in each box plot denotes the mean value, while a white dot in each violin plot represents the median value.
(top) and experimental data 109 (bottom).The definitions for ME, MAE, and RMSE are printed in the table footnote.