Multistate, Polarizable QM/MM Embedding Scheme Based on the Direct Reaction Field Method: Solvatochromic Shifts, Analytical Gradients and Optimizations of Conical Intersections in Solution

We recently introduced a polarizable embedding scheme based on an integral-exact reformulation of the direct reaction field method (IEDRF) that accounts for the differential solvation of ground and excited states in QM/MM simulations. The polarization and dispersion interactions between the quantum-mechanical (QM) and molecular-mechanical (MM) regions are described by the DRF Hamiltonian, while the Pauli repulsion between explicitly treated QM electrons and the implicit electron density around MM atoms is modeled with effective core potentials. A single Hamiltonian is used for all electronic states so that Born–Oppenheimer states belonging to the same geometry are orthogonal and state crossings are well-defined. In this work, we describe the implementation of the method using graphical processing unit acceleration in TeraChem, where it is combined with multiple electronic structure methods, including Hartree–Fock, time-dependent density functional theory, and complete active space self-consistent field. In contrast with older implementations of the DRF method, integrals of the polarization operators are evaluated exactly. Expressions for ingredients needed to construct analytical gradients and nonadiabatic coupling vectors are derived and tested by optimizing a conical intersection between two excited states in the presence of a polarizable solvent shell. The method is applied to estimate the solvent shifts of absorption energies of a series of donor–acceptor dyes having low-lying charge-transfer states. Even for a nonpolar solvent such as n-hexane, the inclusion of its static polarizability leads to non-negligible shifts that improve the agreement to essentially quantitative levels (0.03 eV) with full-system calculations. Good agreement with the positions of the experimental absorption maxima measured in solution is also observed.

which has the dipole polarization tensors T (ij) ∈ R 3×3 for pairs of dipoles p i and p j in the off-diagonal positions.
With these abbreviations the polarization energy becomes and the last one the dipole-dipole interaction energy between different polarized atoms.
Combining Eqns. 2 and 3 and using supervector notation gives an equation for the induced polarization, which is equivalent to (SI-9) This matrix equation has the solution Since T is symmetric and α is diagonal, B is also symmetric.Plugging the polarization induced by the charge distribution (Eqn.SI-10) into the expression SI-7 for the polarization energy, yields the additional term that has to be added to the QM Hamiltonian: (SI-11) The matrix can be understood as an effective dipole polarizability for the whole system.In the case of a single polarizable atom, A reduces to the atomic polarizability and we recover the energy of a single atom in a field f : where the first term is identified as the work required to polarize the atom and the second term is the electrostatic interaction of the resulting induced dipole and the field.

Damping functions
Dipole-dipole interaction.As pointed out by Thole, 2 the molecular polarizability tensor can have negative eigenvalues, so that the polarization energy diverges for certain geometric arrangements.This is an artifact of treating the dipole-dipole interaction classically.To avoid the polarization catastrophe 2 the dipole-dipole interaction has to be damped if two dipoles with polarizabilities α 1 and α 2 come close.For distances shorter than s = 1.662(α 1 α 2 ) 1/6 , the dipole-field tensor in Eqn.SI-1 is modified as 2 where v = r/s.
Monopole-dipole interaction.The interaction between point charges and point dipoles is also problematic.The electric field generated by the nuclei and other point charges at the position of a polarizable atom diverges as the distance to any of the monopoles goes to zero.The field in Eqn.16 of the main text becomes infinite if a point charge Q n coincides with a polarizable atom at R. The solution is to replace the dipole by a smeared out charge distribution. 2The nuclear fields are multiplied by a damping function at short range (in addition to C(r)): The damping function λ(r) ensures that the field vanishes as the two atoms fuse.

Detailed derivation of analytical gradients of integral exact direct reaction field
In the following, the gradient of the restricted Hartree-Fock (RHF) energy in combination with a QM/MM-IEDRF embedding scheme will be derived.According to Eqn. 4.25 in Ref.
3, the gradient of the plain RHF energy is given by where x represents any external parameter, which in our case could be the coordinates of the nuclei, the point charges or the polarizable sites.The gradient depends on the following quantities: is the density matrix and is the "energy-weighted" density matrix, both in the atomic orbitals basis.Here C µ,i are the coefficients of the occupied molecular orbital i with orbital energy ε i .Gradient terms of the form µ,i ∂E ∂C µ,i ∂C µ,i ∂x vanish, since the energy is variational with respect to the MO coefficients, ∂E ∂C µ,i = 0. (µν|λσ) are the two-electron repulsion integrals, H core µν and S µν are the core Hamiltonian and the overlap matrix, respectively, and V nuc contains the nuclear energy.In order to limit the memory footprint of the algorithm, it is convenient to avoid storing large arrays with gradient of integrals.Instead gradients of matrix elements are immediately contracted with a density matrix.These contracted gradients are thus functions of up to two density matrices, labelled D (1) and D (2) for complete generality.After defining the gradient contractions of the one-electron integrals, and the Coulomb and exchange-like contractions ∂J ∂x (D (1) , λσ , (SI-21) the gradient of the RHF energy can be expressed as Although the focus will be on the RHF gradient, the functions in Eqns.SI-19 to SI-22 are in fact sufficient to express the energy gradient of many other wavefunction-based methods such as CIS or CASSCF, all be it using different generalized density matrices.
The polarization Hamiltonian does not depend on any reference state, it simply modifies the matrix elements of the molecular Hamiltonian.Therefore the gradient of the total RHF QM/MM-IEDRF energy follows directly from adding the gradients of the polarization Hamiltonian to the respective zero-, one-and two-electron terms.In the presence of a polarizable environment, the gradient of the J-like contraction, for instance, would have to be replaced by ∂J ∂x + ∂(∆J pol ) ∂x.The superscript pol will be dropped for brevity.

Gradients of Coulomb and Exchange Energy
Let us start with the gradient terms ∂(∆J) ∂x and ∂(∆K) ∂x, as an efficient implementation of those is the most critical part.The polarization Hamiltonian reduces the electron-electron repulsion integrals by the amount (see Eqn. 24) λσ . (SI-24) Given two general (not necessarily symmetric) density matrices D (1) and D (2) the corrections to the Coulomb and exchange energies are ∆J(D (1) , λσ , (SI-25) ∆K(D (1) , µλ F (e) µν AF (e) λσ D (2) νσ . (SI-26) When taking derivatives of the RHF energy, the density matrices in the above expressions are kept constant, since their derivatives with respect to orbital coefficients do not need to be considered, and their derivatives with respect to the basis centers are already included as Pulay force terms.As a result, only gradients with respect to the matrix elements in brackets are needed.The gradient of those matrix elements follows from the product rule of differentiation: To find the gradient of the effective polarizability tensor A one notes that which can be solved for T and the atomic polarizabilities α are constant, we get (SI-30) The full gradient of A is not kept in memory, derivatives with respect to individual coordinates are computed as needed.Since the effective polarizability tensor only depends on the polarizable site, the gradient ∂A ∂x is zero, if x is a QM atom or a point charge.
When computing the additional J-and K-like contractions due to the polarization Hamiltonian, the order of the summation is important.Round brackets are used to show the presumably best order, which allows to contract gradients of matrix elements with the corresponding elements in the density matrix on the fly.Written out with all indices, the polarization Hamiltonian's contribution to the J-like contraction takes the form i,µν D (1) µν j,λσ D (1) λσ D (2) µν . (SI-31) The placing of brackets suggests that the following vectors should be stored as intermediates during the evaluation of the J-like gradient contractions: where D stands for any density matrix.In addition, the contraction of the derivative polarization integrals, ∂F (e) ∂x, with a tensor E of dimensions (3N pol ) × N AO × N AO has to be computed in an integral-direct fashion: The derivative polarization integrals are evaluated inside the loop over i, µ and ν.They are directly multiplied with the corresponding matrix element of the density matrix and are added to the gradient ∂F ∂x.Similarly we define the contraction of a (3N pol ) × (3N pol ) matrix U with the gradient of the effective polarizability, In terms of the intermediate vectors and the functions ∂F ∂x and ∂A ∂x the additional gradient due to the Coulomb interaction of two density matrices via the polarizable medium becomes ∂(∆J) ∂x (D (1) , D (2) ) = − ∂F ∂x (AF D (2) ) ⊗ D (1) + (AF D (1) ) ⊗ D (2) − ∂A ∂x (F D (1) ) ⊗ (F D (2) ) . (SI-36) Here, the argument of the function ∂F ∂x contains Kronecker products between a vector of size N pol and a density matrix of dimensions N AO × N AO , e.g.
(AF D (2) ⊗ D (1) i,µν = AF D (2) i D (1) µν . (SI-37) The contraction of the gradient of the exchange operator with another density matrix, is probably best computed in the following order: (SI-41) and similary for the transposes of the density matrices, (SI-44) Contracting Eqn.SI-44 with the polarization integrals gives i,µν (F D (2)T D (1)T ) j,µν . (SI-46) Using the just defined quantities, the exchange part of the gradient becomes (SI-47)

Gradient of Core Hamiltonian
Now let us turn our attention to the gradients of the core Hamiltonian, (SI-48) The matrix elements of ĥ(1) are given in Eqn. 26 .Each term is treated separately.The gradient belonging to the interaction between the nuclear and the electronic fields is computed (SI-49) After defining the intermediate vectors where again use is made of the function ∂F ∂x (U ) defined in Eqn.SI-34.The second term of Eqn. 26 may be expressed via the function ∆K(•, •) (see Eqn. SI-26) as (SI-53) Since the overlap matrix depends on the position of the QM atoms, the gradient of previous expression becomes where the functions ∆K and ∂(∆K) ∂x are defined in Eqns.SI-26 and SI-47, respectively.
The gradient of the inverse overlap matrix is formally expressed as but it is not convenient to compute it in this way and store it entirely in memory.Instead ∆K(D, ∂S −1 ∂x) should be formulated as a contraction of the overlap gradients with some matrix, which can be evaluated with the help of the function ∂S ∂x defined in Eqn.SI-19.i,µν (S −1 ) να (SI-57) The memory requirements are at least two arrays of size (3N pol ) × N AO × N AO .The tensors in Eqns.SI-58 and SI-59 are summed over the first two indices to give (SI-60) Then Eqn.SI-54 turns into Of course, when derivatives are taken with respect to the polarizable sites or point charges, the gradient of the overlap matrix is zero and the second term can be dropped.
The third and fourth terms in Eqn. 26 are only needed if same site integrals are to be evaluated exactly.In this case the contracted gradient of the third term is evaluated analogously to the second one but replacing A with a the block diagonal matrix diag(A) 3×3 , Note that the gradients of the diagonal blocks of A are in general nonzero.
Since the function ∂F ∂x is linear, Eqns.SI-52 and SI-61 can be combined.In order to incorporate also Eqns.SI-62 and SI-68, we define the flag ξ, which is 0 if all integrals are evaluated with the resolution of identity or 1 if same-site integrals are evaluated exactly.
The polarizability tensor is then replaced by Now all the pieces are put together.To group the terms into meaningful entities, the gradient of the core Hamiltonian is decomposed into the sum of the gradients of its constituents by using the chain rule: The density matrix is held constant.Each term is understood as a contraction of a gradient with respect to x and a tensor.The contraction has to be implemented as a linear function, where gradients of integrals are computed on the fly and are directly combined with the corresponding elements of the tensor.Below the definitions of those linear functions and tensors will be presented for each term.But first the scalar products in Eqn.SI-70 are replaced by linear functions (these have to be implemented as GPU kernels), which take the partial derivatives of the core Hamiltonian as inputs: All the tensors are functions of the density matrix.The meanings and ranges of the tensor indices are summarized below: • atomic orbials: µ, ν, λ, σ, γ, δ = 1, . . ., N AO • coordinates of polarizable sites: i, j = 1, . . ., 3N pol • polarizable sites: k = 1, . . ., N pol • Cartesian axes: α, β = 1, 2, 3 1.The gradient of the effective polarizability tensor, takes a (3N pol ) × (3N pol ) tensor as input, j,λσ (S −1 ) σν D µλ .(SI-73) 2. The gradient of the diagonal blocks of the polarizability tensor, 3. The gradient due to the nuclear fields, takes a (3N pol ) sized vector as input, 4. The gradient due to the electronic fields, (SI-79) 5. The gradient due to the exact same-site integrals, 6.The gradient due to the coordinate dependence of the overlap matrix, j,λδ (S −1 ) δν (SI-83)

Nuclear Gradients
The gradient of the 0-electron part of the polarization Hamiltonian, with (SI-86) (SI-87) In all the above expressions the term ∂A ∂x (U ) can be further simplified by noting that where i, j, k, l = 1, . . ., 3N pol .We have defined a new contraction of the gradient of the dipole field tensor ∂T ∂x with a (3N pol ) × (3N pol ) matrix AU A.
The correctness of the implementation was verified by comparing the analytical gradients of the total (excited state) energy with a 5th order finite-difference approximation.The nonadiabatic coupling vectors were checked similarly.
3 Protocol for Optimal Tuning of the Range-Separation

Parameter
For the TD-DFT excited state calculations, the optimally-tuned version of Rohrdanz' ωPBEh functional 4 with C HF = 0.2 is employed: ω is a fitting parameter that determines where the Coulomb operator is split into the shortrange (SR) and the long-range (LR) parts.In E FR x,HF , the Hartree-Fock exchange is applied to the full-range (FR) potential.
Following Ref. 5, the optimal value of ω was determined by performing ground state DFT calculations (ωPBEh/aug-cc-pVDZ) for each molecule with N − 1 (cation), N (neutral) and N +1 (anion) electrons for a range of ω values.For the exact functional of the density, minus the HOMO energy should equal the ionization potential (IP) (Janak's theorem). 6Enforcing this condition both for the neutral and the anion, gives the following objective function: (SI-90) The range-separation parameters were tuned separately in vacuum and in solution.In the latter case the embedding Hamiltonian (either QM/MM or QM/MM+IEDRF) was included.
For the dye 7c we repeated the tuning for all optimized snapshots, but only found a very weak dependence on the solvent configurations (less than 0.005 Bohr −1 ).Therefore, the optimal range-separation parameter was determined from a single snapshot for each dye.
A coarse grid covering the range 0.0 ≤ ω ≤ 1.0 Bohr −1 in equidistant steps of size ∆ω = 0.05 Bohr −1 was used to determine the approximate minimum of J 2 (ω), before switching to a finer grid with resolution ∆ω = 0.01 Bohr −1 covering the vicinity of the minimum.The optimal value was then obtained by spline interpolation.For most dyes the optimal ω lies around 0.

Locally Excited States
The energies of the locally excited states are compared in Fig. S2.When comparing theoretical and experimental excitation energies, we are faced with the problem that the TD-DFT spectrum contains more excited states than there are peaks in the absorption spectrum in the energy range of interest.We select the lowest bright state closest to the experimental LE peak.With the exception of 7c and 9c, the QM/MM-IEDRF embedding scheme shifts the excitation energies of the LE state to slightly lower energies, which can be attributed to the dispersion interaction with the polarizable environment which is larger in the excited state than in the ground state.As a result, the overall agreement with experiment is improved.

Dyes Excluded from the Analysis
The dyes 3b, 5b and 8c were excluded from the analysis, since we were either not sure about the character of the lowest electronic state, or the experimental assignments and energies were unreliable.For the dyes 3b and 8c, Pasman et al. observed only a single peak, which they assign to a local excitation.However, the TD-DFT calculations predict the lowest states to have charge transfer character.3d can occur in two conformations which differ by the orientation of the methyl group in the 1-methylpiperidine donor group (see Fig. S3).In the gas phase, the conformation with an equatorial methyl group is more stable by 2.3 kcal/mol than the axial one.In the axial conformer the non-bonding nitrogen orbital on the donor is hybridized with the π orbital of the acceptor (Fig. S3b), so that the n → π * excitation acquires a relatively large oscillator strength of 0.18 a.u.In the equatorial conformer, on the other hand, the lowest excitation is dark ((Fig.S3a).Based on these calculations the single peak seen in the experimental spectra could originate from the axial conformer and just be a CT state.However, this assignment conflicts with the energy of the absorption maximum of 5.2 eV, which is much higher than the computed CT energies of 4.4 eV (equatorial -CH 3 ) and 4.2 eV (axial -CH 3 ).In the TD-ωPBEh/aug-cc-pVDZ spectra the local excitations lie at 5.5 eV (equatorial -CH 3 ) and 5.8 eV (axial -CH 3 ), which agrees much better with the observed absorption maximum.
8c is a stereoisomer of 7c, in which the donor group forms an angle of 90 • with the acceptor group.The transition density of the nπ * state is zero and the CT state is dark (see Fig. S4).This state is not observed experimentally, either, which could be due to its low oscillator strength.
Finally, the CT energies of dyes 5b and 6c are both listed as 34500 cm −1 (4.28 eV) in table I of Pasman et al., which could be a coincidence or a typo.This energy agrees reasonably well with the TD-ωPBEh/aug-cc-pVDZ estimate for the CT state of 6c (4.44 eV), but is too low when compared with the TD-DFT estimate for 5b (4.69 eV).

F
1) λµ (SI-39) (D T ) µν = D νµ are the matrix elements of the transpose of D and the symmetry A ij = A ji was used to bring the first and third line into a similar form (except for the transposition of the density matrices).The order of computation has been chosen such that the intermediates that exist in memory at any time fit into one or two arrays of size (3N pol ) × N AO × N AO .The intermediates arise from transforming each of the three indices of F (e) j,µν to another basis.Transforming the third and the second indices by the density matrices and the first index by the A matrix is done in three steps: (F D (2) ) j,λν = σ

F
(S −1 ) σβ (SI-56) The necessary intermediate tensors result from transforming two of the three indices of the polarization integrals F (e) : (F S −1 ) i,µα = ν

Figure S1 :
Figure S1: Tuning curve for dye 7c in vacuum and n-hexane.

Figure S2 :
Figure S2: Correlation between experimental absorption band maxima of the LE state (experiment) and the vertical TD-ω opt PBEh/aug-cc-pVDZ excitation energy of the closest bright state (theory) with different embedding schemes: isolated chromophore in gas phase (vacuum), electrostatic (QM/MM) and electrostatic+polarizable embedding (QM/MM-IEDRF).Violin plots indicate the distribution of energies among the 10 snapshots.The diagonal dashed line indicates a perfect correlation.

Figure S3 :
Figure S3: HOMO and LUMO orbitals (isovalue=0.05) of the S 1 transition of the two conformers of dye 3b.

Table S1 :
2 Bohr −1 in the gas phase and approximately 5% (with QM/MM) to 10% (with QM/MM+IEDRF) higher in n-hexane; the values for each chromophore are listed in tableS1.A typical tuning curve J 2 (ω) is shown in Fig.S1.It is not surprising that the QM/MM tuning curve differs little from the gas phase curve.The QM/MM embedding scheme only adds electrostatic interaction and Pauli repulsion, but the MM point charges on n-hexane are very small.Optimal range-separation parameters for Pasman dyes in the gas phase and in n-hexane (with either QM/MM or QM/MM+IEDRF embedding schemes for a single snapshot).