Pushing the Limits of EOM-CCSD with Projector-Based Embedding for Excitation Energies

The calculation of accurate excitation energies using ab initio electronic structure methods such as standard equation of motion coupled cluster singles and doubles (EOM-CCSD) has been cost prohibitive for large systems. In this work, we use a simple projector-based embedding scheme to calculate the EOMCCSD excitation energies of acrolein solvated in water molecules modeled using density functional theory (DFT). We demonstrate the accuracy of this approach gives excitation energies within 0.01 eV of full EOM-CCSD, but with significantly reduced computational cost. This approach is also shown to be relatively invariant to the choice of functional used in the environment and allows for the description of systems with large numbers of basis functions (>1000) to be treated using state-of-the-art wave function methods. The flexibility of embedding to select orbitals to add to the excited-state method provides insights into the origins of the excitations and can reduce artifacts that could arise in traditional linear response time-dependent DFT (LR-TDDFT). The extension of density functional theory to the timedependent domain (TDDFT) by Runge and Gross and its efficient linear response implementation (LR-TDDFT) by Casida has led to the practical calculation of the excited-state properties for medium and large molecular systems of hundreds of atoms. LR-TDDFT generally is used when system size becomes a consideration due to its low scaling, but comes with the caveat that practical implementations, which employ an approximate exchange and correlation functional and the adiabatic approximation, do not have reliable accuracy for charge-transfer excitations, transitions with double-excitation character, or S1/S0 conical intersections. 14 From a practical standpoint, the main disadvantage of LR-TDDFT arises from the wide choice of functionals, each of which may have certain failure cases depending on the systems and physical process under study. For high accuracy benchmark calculations of excited-state properties, ab initio correlation methods are needed, although traditionally the size of the systems that can be treated by these methods is limited to 10− 20 heavy atoms in a moderate basis. The gold-standard method is equation of motion coupled cluster singles and doubles (EOM-CCSD), which is more or less black-box and can handle a wide range of chemical problems with predictable accuracy. However, EOM-CCSD has a worse formal scaling compared to LR-TDDFT. Various approaches have been developed to improve the scaling and cost of EOM-CCSD, such as local orbital methods, restricted virtual spaces and highly efficient implementations. For larger systems, however, methods with such high scaling often require a multiscale approach such as the quantum mechanical molecular mechanical (QM/MM) type partitioning or other multilevel techniques. The aim of multiscale modeling is to treat different regions of a system with methods matched to the required accuracy, e.g., important regions are treated with higher accuracy methods, and less important regions with lower accuracy methods. The accuracy/cost trade-off of such an approach is often better than treating the entire system with the same method. It is in this spirit that we present work that extends projector-based embedding to excited-state properties. We present an approach to conduct density-functional/Hartree−Fock calculations on a large system, select a chemical subregion of interest, and embed a more expensive method inside it. This avoids the expense that would be incurred by applying the highlevel method to the whole system. The high-level method becomes polarized by the low-level method, allowing us to obtain very accurate results. For excited-state calculations, the only extra knowledge needed for this approach concerns the atoms most involved in the excitation, or for improved efficiency the exact orbitals that may be needed. We shall show that it is possible to obtain accurate excitedstate excitation energies and oscillator strengths from embedded EOM-CCSD (eEOM-CCSD). We have selected one small test system of a water and formaldehyde to highlight physical effects of embedding, followed by a larger acrolein in water the efficiency and accuracy of this method. We have Received: September 21, 2017 Accepted: October 27, 2017 Published: October 27, 2017 Letter

The extension of density functional theory to the timedependent domain (TDDFT) by Runge and Gross 1,2 and its efficient linear response implementation (LR-TDDFT) by Casida 3 has led to the practical calculation of the excited-state properties for medium and large molecular systems of hundreds of atoms. 4−9 LR-TDDFT generally is used when system size becomes a consideration due to its low scaling, but comes with the caveat that practical implementations, which employ an approximate exchange and correlation functional and the adiabatic approximation, 4,5,7 do not have reliable accuracy for charge-transfer excitations, 10,11 transitions with double-excitation character, 12,13 or S 1 /S 0 conical intersections. 14 From a practical standpoint, the main disadvantage of LR-TDDFT arises from the wide choice of functionals, each of which may have certain failure cases depending on the systems and physical process under study. For high accuracy benchmark calculations of excited-state properties, ab initio correlation methods are needed, although traditionally the size of the systems that can be treated by these methods is limited to 10− 20 heavy atoms in a moderate basis. The gold-standard method is equation of motion coupled cluster singles and doubles (EOM-CCSD), 15,16 which is more or less black-box and can handle a wide range of chemical problems with predictable accuracy. However, EOM-CCSD has a worse formal scaling compared to LR-TDDFT.
Various approaches have been developed to improve the scaling and cost of EOM-CCSD, such as local orbital methods, 17−21 restricted virtual spaces and highly efficient implementations. 22,23 For larger systems, however, methods with such high scaling often require a multiscale approach such as the quantum mechanical molecular mechanical (QM/MM) type partitioning 24−27 or other multilevel techniques. 28−30 The aim of multiscale modeling is to treat different regions of a system with methods matched to the required accuracy, e.g., important regions are treated with higher accuracy methods, and less important regions with lower accuracy methods. The accuracy/cost trade-off of such an approach is often better than treating the entire system with the same method. It is in this spirit that we present work that extends projector-based embedding 31−34 to excited-state properties. We present an approach to conduct density-functional/Hartree−Fock calculations on a large system, select a chemical subregion of interest, and embed a more expensive method inside it. This avoids the expense that would be incurred by applying the highlevel method to the whole system. The high-level method becomes polarized by the low-level method, allowing us to obtain very accurate results. For excited-state calculations, the only extra knowledge needed for this approach concerns the atoms most involved in the excitation, or for improved efficiency the exact orbitals that may be needed.
We shall show that it is possible to obtain accurate excitedstate excitation energies and oscillator strengths from embedded EOM-CCSD (eEOM-CCSD). We have selected one small test system of a water and formaldehyde to highlight physical effects of embedding, followed by a larger acrolein in water the efficiency and accuracy of this method. We have focused on a noncovalent system for embedding to show it is possible to obtain solvent effects cheaply and avoid artifacts that are known to occur in such systems; 35 however, we also show that it is possible to partition covalently bound systems by selecting important orbitals on the water to improve accuracy. It should be stressed that there is no fundamental reason that this approach would not be appropriate for obtaining excitedstate properties for large covalent molecules as long as the excitation can be localized.
Projector-Based Embedding for Excited States. The detailed description of the embedding method has been presented elsewhere. 31−34 Here we briefly reiterate the main equations and features of projector-based embedding to keep this paper self-contained. Projector-based embedding belongs to a class of quantum embedding techniques that constructs a low-level potential from a method such as DFT and uses it to polarize a higher-level method such as CCSD(T). This type of embedding partitions the electronic density matrix into subregions A and B In the case of projector embedding, the partitioning is constructed by first localizing the molecular orbitals, and then automatically categorizing whether they are primarily on atoms in the active region (ρ A ) or the environment (ρ B ) based on a population analysis. In this work we use the standard Mulliken analysis to select our molecular orbitals (MOs) based on if the majority of the density (by default >0.4 electrons) resides on basis functions centered on atoms in the active region. We will highlight in our results that for excited-state properties a better selection criterion may be possible based on the configuration interaction (CI) vector.
The partitioning of the electronic density based on the Kohn−Sham orbitals 36 makes it possible to define the potential for the subregions free of nonadditive errors. The Fock operator for subsystem A can be built from the partitioned reduced one particle density matrix γ for the active region (A) and environment (B) where the first term is the normal core Hamiltonian, the second is the Coulomb potential, the third term is the exchangecorrelation potential, and the final term is the level shift operator. This last term has the effect of enforcing the subsystems to be orthogonal, ensuring that the Pauli principle is obeyed and restricting the wave function of the active region to not over-relax into the environment (subsystem B). The μ parameter is a large numerical value (10 6 −10 10 ) that elevates all of the orbitals that construct γ B to an extremely high energy, effectively making them inaccessible to the active region orbitals that build γ a . P B is constructed out of the subsystem B (environment) density and the overlap matrix S, A key feature of projector embedding is that it exactly reconstructs the environment potential, in the sense that the energy (and other properties) of DFT-in-DFT is the same as DFT on the whole system. The method is simple to implement, relying only on a modified core Hamiltonian. This allows most correlation methods (e.g., MP2, 37 CCSD(T), 38 MRCI) to be performed in the high-level region without bespoke reprogramming. It is also simple to accelerate by using basis-set truncation to further reduce the cost of the correlation method by reducing the scaling with the number of virtual orbitals. 33 We note here that the frozen-density embedding strategy was combined with LR-TDDFT for excitation energies. 39, 40 One potential concern of using projector-based embedding for excited-states is that occupied orbitals in the environment (subsystem B) will not be able to contribute to the excited-state relaxation as they are frozen in the form of the ground state and only exist implicitly. The simplest way to improve the description of subsystem A is to include extra atoms in subsystem A; however, a more efficient and instructive approach is to determine those occupied orbitals from subsystem B (the environment) that capture the important environment relaxation effects upon excitation. A critical distinction should be made that the orbitals chosen to add to the active region are occupied MOs from the environment, not the virtual orbitals. The reason for this is that the embedding method does not restrict the virtuals of the environment, and thus they are free to mix with the active region orbitals to obtain accurate correlation energies as they would in a traditional calculation.
As an example, let us describe the application of projectorbased embedding to a photoactive molecule we define as subsystem A in explicit solvent that we define as subsystem B shown in Figure 1. Even for such a naive partitioning of the chemical system, projector-based embedding naturally includes contributions (i) and (ii), which correspond to all electronic excitations within the subsystem A, shown by a green arrow in Figure 1, as well as excitations from occupied orbitals in subsystem A to virtuals in subsystem B (red arrow in Figure 1). For a more accurate description, subsystem A can be expanded to encompass additional important occupied orbitals from the solvent, this would include the two further types of contribution (iii) and (iv) shown as the blue and gray Figure 1. Left: two-electron-in-two-orbitals schematic model provided to illustrate all possible transitions in eEOM-CCSD. Case (i) is given by a green arrow: occupied orbital (active) to virtual orbital (active). Case (ii) is given by a red arrow: occupied orbital (active) to virtual orbital (environment). Case (iii) is given by a blue arrow: occupied orbital from the environment (included in the active region) to virtual orbital (active). Case (iv) is given by a gray arrow: occupied orbital from the environment (included in the active region) to virtual orbital (environment). Right: schematic representation of the transitions allowed in the projector-embedded excited-state method (same color code is used as in the left panel). The central molecule is in the active region, while the solvent molecules (in light blue) constitute the environment. If an occupied orbital of a solvent molecule plays an important role, it can be included in the active region (highlighted solvent molecule).

The Journal of Physical Chemistry Letters
Letter transitions. The blue arrow indicates direct contributions of those occupied orbitals in subsystem B to the electronic transition itself (solvent-to-solute transitions). The gray arrow in Figure 1 highlights how occupied orbitals of the environment respond to vertical transitions in the active (e.g., solute) region. This strategy allows for a highly flexible scheme where contributions from the environment can be tested and included if necessary and will be further discussed in details in the applications presented below. In contrast to our embedding scheme, a QM/MM formalism would not incorporate the sudden polarization (or relaxation) of the MM part induced by a transition in the QM region. 41 Such polarization due to a change in the chromophore density would be accounted for in a polarizable force field simulation 42−44 or using the effective fragment potential (EFP) method, 45 but a direct involvement of the environment in the transition (inclusion of (ii), (iii) and/ or (iv)) would imply the full inclusion of additional solvent molecules in the QM region. As mentioned above both contributions are fully incorporated in the projector embedding described in this work as well as a DFT/HF quality ground state subsystem B potential that is not present in QM/MM. A summary of the excitations possible in a range of methods can be found in Table 1. We finally note that the response of the environment to an excitation in the active region could also directly be incorporated in a linear-response formalism, as recently demonstrated. 46 For systems where the size of the environment becomes large (>700 basis functions), reducing the number of electrons by embedding is not necessarily sufficient to make calculations feasible due to the forth order scaling of CCSD with respect to the number virtuals. For the larger calculations, we employed basis-set truncation, 32 which can reduce and ultimately eliminate scaling of computational cost with respect to environment size. Standard EOM-CCSD 47  where the scaling with respect to the occupied orbitals is now frozen to the size of subsystem A. The use of basis truncation further reduces the scaling to O (A) 2 V (C) 4 , where V (C) < V (A+B) and the size of C is tunable according to the density of subsystem A. Hence, subsystem B can become large without significantly affecting the computational cost. 32 Projector embedding uses an effective embedding potential determined from the ground-state density of the full system; and the open question is whether this is appropriate for the calculation of excited-state properties, or if a full relaxation of the density in response to the excitation is necessary (for example a Δ-SCF). In this work, we show that this is a reasonable approximation for small solvated organic molecules, provided that some of the important occupied orbitals on the solvent water molecules are included, if necessary. The ability to specify which orbitals are included in subsystem A furnishes insight into their impacts on the excitation process, and we will outline a simple and systematic approach to ensure selection of the most efficient subsystem A has been chosen.
As part of our testing, we confirmed the insensitivity of the excitation energy with the level shift parameter μ as was done for ground state properties in the original embedding paper, indicating that the projected orbitals remain sufficiently orthogonal to the virtual space. It should also be noted that it is possible to run LR-TDDFT-in-DFT and CIS-in-HF embedding for low cost determination of the additional occupied orbitals to be included in subsystem A or merely to further accelerate these techniques.
To explore the effect of embedding potentials on excitedstate properties affected by solvation, we first present results on a small model system made of a formaldehyde molecule with a single water positioned near the carbonyl group, forming an hydrogen bond. The S 1 state of the isolated formaldehyde has 1 (nπ*) character, but might be perturbed by a water molecule in the close vicinity of the carbonyl group. Hence, choosing this system allows us to control the perturbation of the S 0 → S 1 excitation as well as having an easily calculable full EOM-CCSD benchmark number.
Determining the number of orbitals required to reproduce the full EOM-CCSD is critical in gauging whether projectorbased embedding is feasible. If all the MOs are needed, the embedding approach would have the same cost as full EOM-CCSD and the method would have little advantage. A typical example would be an excited-state described by excitations from many orbitals, all contributing with small CI coefficients. Figure 2 shows the transition energy and oscillator strength of the S 0 → S 1 transition as a function of the distance between the nearest hydrogen on the water and the oxygen of the carbonyl group for different numbers of embedded orbitals for EOM-CCSD-in-PBE0. At the closest distance of 2 Å, we expect any error due to embedding to be most pronounced. With the minimal embedding region of 8 occupied orbitals, denoted "eEOM-CCSD(8)", where only the orbitals on the formaldehyde are treated at the EOM-CCSD level (in red in Figure  2), the excitation energy deviates from the benchmark number in black by up to 0.7 eV. The error indicates that although the external potential is polarising the formaldehyde at the meanfield level, the relaxation of the environment necessary to construct the S 1 state is sill missing. The error is expected to gradually decrease by adding more occupied orbitals to subsystem A. For a separation of 2 Å, it is only when 3 additional MOs from the water are included (eEOM-CCSD(11) in Figure 2) that the energy reproduces the full EOM answer. This observation can be rationalized by looking at the HOMO orbital plots (Figure 2) which show a significant contribution of the water orbital at short water-formaldehyde distance. Such contributions disappear at larger distances. As the water distance increases the contribution of the water occupied orbitals to the excitation becomes smaller, and at 3 Å

The Journal of Physical Chemistry Letters
Letter the difference between the number of orbitals in the embedding region becomes negligible, i.e., the occupied orbitals on the formaldehyde only (eEOM-CCSD(8)) contribute to the transition. The oscillator strength (indicated by dot size in Figure 2) is near zero for this formally forbidden 1 (nπ*) excitation and only increases when the water starts to break symmetry at shorter distance. eEOM-CCSD(8) at 2 Å underestimates the oscillator strength, while eEOM-CCSD(11) is within a factor of two of the benchmark value of 6.63 × 10 −6 (inset of Figure 2). This simple test case highlights how embedding can accurately describe the excitation at a range of solute−solvent distances and also provide useful microscopic insight into the role that the solvent plays in modulating the transition energies. Comparing EOM-CCSD to eEOM-CCSD results shows that at short distances, the water solvent actively participates in the transition, and additional orbitals are required to obtain full EOM-CCSD results. At longer distances, solvent orbitals are not required because it plays a more passive role, as a sort of polarizing spectator to the transition. To evaluate the extent to which the choice of density functional impacts accuracy, we repeated the above test of formaldehyde+water for the 2 Å strong interaction limit and varied the functional used in the environment. We present a selection of results from four main exchange and correlation functionals that cover the families of DFT functionals generally used: PBE, PBE0, M06-2X. We also present further comparisons with HF. Additional functionals were examined and we observed similar trends as what is presented in the following. From Figure 3 it can be seen that, even for this strongly interacting limit, the error in excitation energy is 0.07 eV for the 8 and 9 orbital case and for the 10 and 11 orbital case is within 0.02 eV. Hence, for localized transitions similar to the one under study here, there is only a weak dependence on the choice of density functional approximation. Such weak dependence is clearly not observed with LR-TDDFT (and TDHF) when using the same set of functionals, as shown in the right part of Figure 3. These results further show that the most important factor in the embedding is the selection of environment orbitals in the active region to be treated at the high-level. To test the role of the orbital selection, we investigated strategies beyond the default Mulliken scheme. Specifically, we identified orbitals which had the largest energetic impact, and selected those. Using this strategy for example, the error of embedding 9 orbitals was reduced from 0.067 to 0.022 eV for the M06-2X functional. This improvement indicates that orbital choices based on Mulliken population analysis are not optimal for finding excited-state properties and that a better criterion is possible (likely involving analysis of a CIS vector).
We now move to the study of the embedded EOM-CCSD (eEOM-CCSD) treatment of electronic transitions in larger molecular systems. To expand upon the work on 1 (nπ*) transitions in formaldehyde+water to larger systems, we used a single structure from the work of Aidas 43 et al., which comprises an acrolein molecule solvated by a large number of water molecules, offering a realistic model of acrolein in aqueous solution. Substructures based on a radial cutoff around the acrolein center-of-mass were then selected to change the number of water molecules in the system, and an example of the largest cutoff of 5 Å is shown in Figure 4. In Figure 4, we show the results of different numbers of water molecules solvating acrolein for LR-TDDFT/ωPBE, all electron EOM-CCSD, eEOM-CCSD on the acrolein only (eEOM-CCSD(15)), and eEOM-CCSD with an additional 4 molecular orbitals from the 2 water molecules that are closest to the carbonyl group (eEOM-CCSD (19)). An aug-cc-pVDZ basis was used for all calculations. For acrolein in gas phase (0-water case), the EOM-CCSD and eEOM-CCSD(15) are the same calculation as there is no environment to embed. The ωPBE/ LR-TDDFT transition is 0.15 eV smaller than the EOM-CCSD result, a difference that is carried forward throughout the calculations. The first microsolvated system (2 water molecules next to the carbonyl group of acrolein) shows that eEOM-CCSD (15), where only the occupied orbitals on acrolein are

The Journal of Physical Chemistry Letters
Letter considered, is only within 0.09 eV of the full EOM-CCSD result. As a comparison, LR-TDDFT on the full molecular system (all orbitals included) deviates by 0.14 eV from the EOM-CCSD benchmark value. Upon inclusion of the four most strongly contributing water orbitals to the active region, the eEOM error reduces now to less than 0.01 eV (eEOM-CCSD (19) in Figure 4). We note that two of these four orbitals (σ(O−H)) were detected by the Mulliken analysis as required for the description of the embedding potential; the other two (lone pairs on the oxygen atoms) were chosen via a preliminary embedded CIS calculation and, therefore, contribute to the excitation energy directly. We determined these orbitals by incrementally increasing the size of the embedding region according to Mulliken population and calculating the CIS energy of the transition until agreement with a benchmark CIS had been reached to within 0.01 eV. This was only done for the first few model systems, as the same orbitals were found to be important to obtain an accurate excitation energy for all the cluster sizes. The first four embedded results were of clusters small enough to be compared against benchmark numbers of full EOM-CCSD, and showed an error of 0.01 eV. With 13 waters surrounding the acrolein, the calculation has already reached 700 basis functions and obtaining EOM-CCSD benchmarks was computationally intractable using our available computational resources. Thanks to its better scaling, eEOM-CCSD was possible. The eEOM-CCSD (19) trend is well within those observed for eEOM-CCSD (15) and LR-TDDFT. The favorable scaling of eEOM-CCSD (19) with respect to EOM-CCSD is demonstrated in the inset of Figure 4 by using the number of configuration state functions (CSFs) as a platform-independent measure of cost. EOM-CCSD has approximately two orders magnitude more CFSs than the eEOM-CCSD (19) result for a given number of water molecules.
The high-accuracy calculation of excited-state properties usually requires expensive correlation-based methods whose computational cost scales poorly with system size. We have shown in this work how projection-based embedding enables the calculation of accurate excitation energies and oscillator strengths in a more tractable way. By using DFT or HF as an environment potential, it is feasible to include the effect of solute contributions to excitation energies without explicitly including them at the high level. The implementation of this approach is simple: it requires specification of those chromophore atoms which receive high-level excitation treatment. If one desires further accuracy, it is possible to additionally include specific orbitals from the environment. This flexibility allows us to obtain important qualitative insight, for example allowing us to understand how specific environment orbitals participate in electronic transitions. We chose EOM-CCSD as the high-level method in this work due to its relatively black-box nature; however, the simplicity of projectorbased embedding is such that extremely accurate methods like multireference configuration interaction (MRCI) can also straightforwardly be used. In future work, we intend to expand this work to cases where the subsystems are connected by covalent bonds, as well as designing a new orbital selector to automate the optimum selection of orbitals for excited-state embedding. These technologies will enable us to make progress in developing predictive models that furnish microscopic insight across a range of important systems, from photocatalysis to light harvesting. 48,49 ■ COMPUTATIONAL DETAILS The calculations conducted in this work were run with Molpro 2015 50 and Terachem 51−53 for LR-TDDFT (with and without the Tamm-Dancoff approximation (TDA) where indicated). The following mean-field methods were tested; PBE, 54 PBE0, 55  (15)) and on acrolein plus 4 selected water orbitals (eEOM-CCSD (19)). The number of waters surrounding acrolein is shown on the bottom axis, and the corresponding number of basis functions for each system is found on the top axis. The inset shows the number of configuration state functions for EOM-CCSD and the eEOM-CCSD (19) for the various numbers of waters. The molecular representation shows the system composed of an acrolein molecule surrounded by 37 water molecules (the two important water molecules discussed in the text are highlighted).

The Journal of Physical Chemistry Letters
Letter M06-2X 56 and HF 57 with the aug-cc-pVDZ and aug-cc-pVTZ 58 basis set. For orbital localization, we utilize the intrinsic bond orbitals method of Knizia. 59 The basis truncation value of 5 × 10 −6 used for the larger acrolein systems was obtained from benchmarks of smaller systems and tuned to find an error of the order of 0.01 eV. The largest calculations we performed in Figure 4 would correspond to 400 electrons in 1727 functions for a traditional EOM-CCSD.
The approach to determine which orbitals of the environment to be included was as follows: (i) HF on the whole system; (ii) select the atoms of the photoactive molecule (or part of the molecule); (iii) run CIS (or LR-TDDFT, CC2, etc.) for subsystem A; (iv) increase the number of orbitals in the embedding region by 1 and check if the difference in the excitation energy is larger than some threshold value (e.g 0.01 eV); (v) add all those orbitals whose inclusion causes the difference in CIS (or LR-TDDFT, CC2, etc.) excitation energy to change by more than the specified threshold and rerun with eEOM-CCSD.
Molecular and orbital renderings were generated with VMD. 60