Combining Graphics Processing Units, Simplified Time-Dependent Density Functional Theory, and Finite-Difference Couplings to Accelerate Nonadiabatic Molecular Dynamics

Starting from our recently published implementation of nonadiabatic molecular dynamics (NAMD) on graphics processing units (GPUs), we explore further approaches to accelerate ab initio NAMD calculations at the time-dependent density functional theory (TDDFT) level of theory. We employ (1) the simplified TDDFT schemes of Grimme et al. and (2) the Hammes-Schiffer–Tully approach to obtain nonadiabatic couplings from finite-difference calculations. The resulting scheme delivers an accurate physical picture while virtually eliminating the two computationally most demanding steps of the algorithm. Combined with our GPU-based integral routines for SCF, TDDFT, and TDDFT derivative calculations, NAMD simulations of systems of a few hundreds of atoms at a reasonable time scale become accessible on a single compute node. To demonstrate this and to present a first, illustrative example, we perform TDDFT/MM-NAMD simulations of the rhodopsin protein.

* sı Supporting Information ABSTRACT: Starting from our recently published implementation of nonadiabatic molecular dynamics (NAMD) on graphics processing units (GPUs), we explore further approaches to accelerate ab initio NAMD calculations at the time-dependent density functional theory (TDDFT) level of theory. We employ (1) the simplified TDDFT schemes of Grimme et al. and (2) the Hammes-Schiffer−Tully approach to obtain nonadiabatic couplings from finite-difference calculations. The resulting scheme delivers an accurate physical picture while virtually eliminating the two computationally most demanding steps of the algorithm. Combined with our GPU-based integral routines for SCF, TDDFT, and TDDFT derivative calculations, NAMD simulations of systems of a few hundreds of atoms at a reasonable time scale become accessible on a single compute node. To demonstrate this and to present a first, illustrative example, we perform TDDFT/MM-NAMD simulations of the rhodopsin protein.
N onadiabatic molecular dynamics (NAMD) simulations using trajectory surface hopping (TSH) 1−4 have become a powerful tool to describe the dynamics of molecular systems involving multiple electronic states. Their field of application ranges from the description of rather small molecular machines 5−9 over medium-sized photoswitches 10,11 to the dynamics of entire photoactive proteins. 12,13 They can be used with a variety of excited-state methods, e.g., the complete active space self-consistent field (CASSCF) method, 14 the algebraic− diagrammatic construction (ADC(2)), 15 several coupled cluster methods (e.g., CC2), 16 as well as time-dependent density functional theory (TDDFT). 17,18 Also, triplet states 19 can be included.
However, the greatest challenge remains the large computational cost of NAMD simulations, which is a result of the expensive excited-state methods mentioned above and the fact that TSH requires not only one but a series of trajectories to determine observables as ensemble averages. This problem can be tackled by using semiempirical methods, 5,6 employing exciton models 20 or using graphics processing units (GPUs). The latter have been applied to ground-state properties, 21−28 excited-state calculations, 9,20,29−31 ab initio multiple spawning (AIMS), 32−35 and NAMD 36,37 simulations.
Based on our recent work, 9 we explore in our present work the use of simplified TDDFT schemes 38,39 and the Hammes-Schiffer−Tully (HST) 2 model in addition to GPU-based integral routines. They tackle the two major bottlenecks of NAMD: The calculation of the state energies and the nonadiabatic couplings between the states. After a brief summary of the corresponding theory and its validation for the investigated problems, we show timings and use our approach to simulate the photoinduced rotation of the retinal chromophore in the rhodopsin protein at the TDDFT/MM level of theory. Details on the methods (thresholds, convergence criteria, etc.) implemented in our FermiONs++ program package 25,26 and the computational setup can be found in the Supporting Information.
In TSH, 1,2 a system is allowed to switch the potential energy surface (PES) within one trajectory. The occurrence of such a surface hop depends on the hopping probability: if it exceeds a randomly drawn number between 0 and 1, the trajectory continues on a different PES with a rescaled nuclear velocity. The average of multiple trajectories with a different series of random numbers describes the behavior of the system. The hopping probability itself is calculated from the change of the state energies and the nonadiabatic couplings (Q), which can also be obtained as the product of the nonadiabatic coupling vectors (τ) and the nuclear velocity (Ṙ) where Φ I is the wave function of the electronic state I. While τ can be calculated using response theory, Q cannot be determined analytically. Energies of all considered states, gradients, and Q are thus the main ingredients of a TSH algorithm. The first can be obtained from TDDFT by solving the TDDFT or random phase approximation (RPA) equations 17,40,41 with ω I being the excitation energy of state I. A and B are the orbital rotation Hessians, and X I and Y I are the transition densities for excitation and de-excitation, respectively. Neglecting B in eq 2 is known as the Tamm−Dancoff approximation (TDA), 42 leading to The calculation of excitation energies with eqs 2 and 3 is time-consuming, mainly because of the evaluation of the twoelectron integrals in A and B. To accelerate these calculations, we apply the simplified RPA and TDA methods by Grimme and co-workers. 38,39,43 Here, Coulomb and exchange kernels are approximated (J′ for Coulomb and K′ for exchange) using the Mataga−Nishimoto−Ohno−Klopman 44−46 damped Coulomb operators together with the transition/charge density monopoles q obtained from a Loẅdin population analysis 47 p, q, ... are arbitrary molecular orbitals. r is the interatomic distance, η is the mean of the chemical hardness of the atoms N and M. α and β are global fit parameters, while c x is the amount of exact exchange. This leads to the following approximate orbital rotation Hessians i, j, ... denote occupied molecular orbitals, and a, b, ... denote virtual molecular orbitals. s k is 2 or 0 for singlet−singlet or singlet−triplet excitations, and ϵ p is the orbital energy of p.
The excitation energies (ω′) are then obtained by diagonalizing A′ in the case of sTDA or (A′ − B′) 1/2 (A′ + B′)(A′ − B′) 1/2 in the case of sRPA. To avoid the diagonalization of the entire matrix, the number of included configuration-state functions (CSFs) is truncated using the thresholds ϑ pCSF , ϑ sCSF , and ϑ CSF . 38 Only primary (with an energy below ϑ pCSF ) and secondary CSFs (with an energy between ϑ pCSF and ϑ CSF and a significant coupling to the primary CSFs > ϑ sCSF ) are considered. The sTDA scheme greatly reduces the cost of excited-state calculations, giving access to absorption spectra of large molecular systems. 38 The sRPA scheme yields better transition densities, 39 leading to better transition dipole moments and even enabling the calculation of higher order dynamical response properties. 48 Excited-state gradients (ω I x ) and τ's at the TDDFT/TDA level of theory can be derived from eqs 2 and 3 using linear response theory. 49−59 These equations cannot easily be transferred to sTDA/sRPA, as the derivatives of J′ and K′ with respect to the nuclear coordinates have, so far, not been implemented. Instead, we are using the calculated ω I ′, X′ I , and Y′ I in the RPA/TDA algorithms for ω I x and τ featuring exact evaluations of the two-electron integrals, i.e., without the Mataga−Nishimoto−Ohno−Klopman damped Coulomb operators. As a consequence of this, we expect a slight disagreement between the calculated energies and the excited-state properties (ω I x and τ's). To validate this approach, we compare optimized structures of biphenyl (I) using the S 1 potential energy surface at the TDDFT and sTDDFT levels of theory in Table 1 and show ω x 's and τ's of optimized groundstate structures of protonated formaldimine (II) and the Schiff base of the retinal chromophore (III) in Figure 1. A validation based on NAMD is discussed below. Additionally, selected natural transition orbitals 60 and a screening of the thresholds (ϑ pCSF , ϑ CSF , and ϑ sCSF ) are shown in the Supporting Information.
All four optimized S 1 structures of I in Table 1 are nearly planar and feature a similar central C−C distance. The ω x 's and τ's based on sTDDFT results in Figure 1 are also in good agreement with the RPA and TDA properties. The only differences are the weaker nonadiabatic couplings of II and III and the fact that ω 1 x of III has a larger contribution in the sixmembered ring and a smaller contribution in the conjugated system. Both are the result of the slightly different excitation energies (II: 9.76 eV (RPA), 9.87 eV (sRPA); III: 2.74 eV (RPA), 3.17 eV (sTDA)) and transition densities. In the case of III, three other observations can be made (see Figures S8− 10 in the Supporting Information): (1) sRPA performs worse than sTDA, (2) PBE0 61−64 /def2-SVP 65,66 is, in contrast to ωB97 67 /def2-SVP, not able to capture the charge transfer character of the excitation, and (3) the natural transition orbitals at the RPA and sTDA levels of theory are nearly identical, indicating that the major source of error is the excitation energy. Table 1 and Figure 1 suggest that simplified TDDFT might also be used for NAMD simulations, which require, however, time-consuming calculations of the Q's between the states. To circumvent the analytical calculation of Q via τ, we apply the numerical HST model 2    where we have calculated numerical and analytical τ's for formaldehyde. Additionally, we have performed NAMD simulations of II, using the HST model and analytically calculated τ's as well as RPA, TDA, sRPA, and sTDA. After excitation to the S 2 state, the molecule shows a fast conversion to the S 1 state, which goes along with the elongation of the C− N bond. Further relaxation to the S 0 state is achieved via a rotation around this bond. In Figure 2, the increase (S 2 → S 1 ) and decrease (S 1 → S 0 ) of the S 1 occupation is shown. The change of occupation for all states as well as energies and Q's of selected trajectories are listed in the Supporting Information.
In all sets of trajectories, we observe a similar behavior of II, indicating that the HST model and the simplified TDDFT are valid approximations for this example. This is reflected in the S 1 occupations, which are very similar for all cases. The only differences are slightly different S 2 −S 1 nonadiabatic couplings when applying the HST model and a slower decay of the S 1 occupation in the case of the simplified TDDFT methods, which is, however, also visible in the case of TDA. The first observation is in good agreement with previous findings 71−73 comparing analytical and numerical Q's in the vicinity of  Here, we want to stress that one of the well-known limitations of TDDFT is its poor description of conical intersections involving the ground state, 74,75 which can be improved by applying, e.g., the TDA. 76,77 Alternatively, the spin-restricted ensemble-referenced KS (REKS) method 78 can be employed. Obviously, these limitations are not overcome by applying the simplified TDDFT approach. However, for this particular system, good agreement between TDDFT and CASSCF simulations has been observed, 70 so that we are confident that our chosen setup allows for a good qualitative description of the investigated processes. Please note that the simulations using sTDA or sRPA are not strictly energy conserving as discussed above. Therefore, the standard deviation of the total energy is approximately 5 times higher than in standard TDDFT simulations. However, rescaling the nuclear velocities to enforce energy conservation has no effect on the average properties of the system, which illustrates the validity of the present approach. Table 2 presents the impact of the approximations on the performance of an NAMD simulation of III. It shows that HST and sRPA virtually eliminate the computational cost of the calculations of ω I x 's and Q's. In combination with the GPUbased integral evaluations, the total speed-up in the case of N roots = 2 is ∼4 with respect to the CPU-based RPA implementation using analytical τ's. An NAMD simulation of III involving 5000 steps (e.g., 1 ps simulation using 0.2 fs time steps) can thus be conducted within ∼5 instead of ∼21 days. The acceleration becomes even larger when more excited states and nonadiabatic couplings are considered (e.g., a factor of more than 20 for III in case of N roots = 7). This and the fact that the performance of GPUs is better for large molecular systems (see, ref 9) makes the presented approach interesting for the investigation of systems involving hundreds of atoms and plenty of electronic states.
As a first application of our proposed scheme, we have calculated 105 NAMD simulations of the rhodopsin protein (IV) at the sTDA/MM (ωB97/def2-SVP) level of theory. The chromophore of IV (Figure 3a) undergoes a cis−trans isomerization when exposed to light (see Figure 3c). For a review of calculations on this system, the reader is referred to ref 79. Similar systems have also been investigated by Martı́nez et al. 34,35 The change of the dihedral γ 1 (defined in Figure 3b) and the state occupations are shown in Figure 4.
Out of 105 calculated trajectories, 9 feature a cis−trans isomerization (see Figure 4. A movie of the isomerization is available at https://www.cup.uni-muenchen.de/pc/ ochsenfeld/download/). Most of them reach γ 1 = −90°at ∼280 fs, which coincides with the crossing point of the state occupations (see Figure 4). This hop time (t) is significantly higher, and the yield (y) of 9% significantly lower than the experimental (t = 147.7 ± 1.0 fs; y = 0.63 ± 0.01) results reported in ref 13. Our analysis of III (see Figure 1) indicates that the weaker gradients and nonadiabatic couplings of sTDA or the discussed problems of TDDFT with conical intersections (see also refs 79−81) may be the reason for this. However, our approach describes the direction and mechanism (see Figure 3c) of the isomerization correctly. In contrast to previous work, 12,13,79 our method requires, besides α, β, and the QM/MM ansatz, no further parametrizations and/or reductions of the system. One trajectory of IV takes ∼5−7 days on two Intel Xeon CPU E5 2640 v4 @ 2.20 GHz (20 threads) CPUs and four AMD FirePro 3D W8100 GPUs using our FermiONs++ program package. 25,26 We have introduced a combination of GPU-based integral routines, simplified TDDFT schemes, and numerical nonadiabatic couplings for efficient NAMD simulations. For all investigated systems ranging from small organic molecules (II)  The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter to proteins (IV), excited-state properties and dynamics are described qualitatively correctly with a significantly reduced computational cost. The latter is due to the vanishing computational times for TDDFT energies and nonadiabatic couplings calculations. The present approach may be used to qualitatively explore relaxation pathways and predict trends (e.g., effects of mutations or different isotopes) within these reactions.
■ ASSOCIATED CONTENT

* sı Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpclett.0c00320. Computational details, a validation of the finite-difference nonadiabatic couplings, a benchmark of sTDA/ sRPA, and the used thresholds for the investigated systems (PDF) The Journal of Physical Chemistry Letters pubs.acs.org/JPCL Letter