Intersystem Crossing Pathways in the Noncanonical Nucleobase 2-Thiouracil: A Time-Dependent Picture

The deactivation mechanism after ultraviolet irradiation of 2-thiouracil has been investigated using nonadiabatic dynamics simulations at the MS-CASPT2 level of theory. It is found that after excitation the S2 quickly relaxes to S1, and from there intersystem crossing takes place to both T2 and T1 with a time constant of 400 fs and a triplet yield above 80%, in very good agreement with recent femtosecond experiments in solution. Both indirect S1 → T2 → T1 and direct S1 → T1 pathways contribute to intersystem crossing, with the former being predominant. The results contribute to the understanding of how some noncanonical nucleobases respond to harmful ultraviolet light, which could be relevant for prospective photochemotherapeutic applications.


Methods
According to several authors, S1-S4 2-thiouracil (2TU) exists in both gas phase and solution primarily in the thione-keto tautomeric form. The molecular structure of this tautomer of 2TU and the numbering of the atoms used in the following are shown in Figure 1 in the paper.
In our previous work on 2TU S5 we found that the MS-CASPT2(12,9)/cc-pVDZ (multi-state complete active space perturbation theory second order) level of theory offers an accurate and efficient description of the excited-state PESs of 2TU, compared to reference calculations at the MS-CASPT2(16,12)/ano-rcc-vqzp level of theory. Hence, in the present work all calculations were performed with MS-CASPT2(12,9)/cc-pVDZ, S6-S9 based on SA(4/3)-CASSCF(12,9) (complete active space self-consistent field with state-averaging including either 4 singlet or 3 triplet states) calculations. S10 The active space contained the 8 π and π * orbitals of oxygen, sulfur and the pyrimidine ring, plus the lone-pair orbital on sulfur. The IPEA shift S11 was set to zero, as this was found to improve the results for 2TU in combination with the small double-ζ basis set. S5 However, to avoid intruder states and ensure a stable propagation in the dynamics simulations, an imaginary shift of 0.1 Hartree was added. S12 In order to account for scalar-relativistic effects, the secondorder Douglas-Kroll-Hess (DKH) Hamiltonian S13 was employed while spin-orbit couplings (SOC) were computed with the RASSI S14 and AMFI S15 formalisms. All calculations were performed with Molcas 8.0. S16 The active space orbitals are presented in Figure S1. Except for the lone pair of oxygen (n O ), this CAS represents the full π+lone pair active space often employed for photochemical studies of medium-size molecules. The n O orbital was not included in the computations, since it is only relevant for the n O π * states (which are always S 4 or higher) and would lead to an instable active space at geometries where the n O π * states are not in the state-averaging. No σ orbitals were included in the CAS in order to keep the computations feasible and because they are not essential for the description of the excited-state PESs, according to a comparison of 9) and MS-CASPT2(14,11) in our previous publication. S5 π O π 1 π 3 π 5 n S π S π * 2 π * 6 π * 4 Figure S1: Active space CAS (12,9) used for all calculations presented in this work.
The initial geometries and velocities for the dynamics simulations were sampled from the Wigner distribution S17,S18 of the harmonic ground state potential based on an MP2/cc-pVDZ frequency calculation. The MP2 minimum geometry agreed to within 0.01 Å and 0.5 • with the CASPT2 minimum geometry. S5 A single-point calculation at 9) level of theory was performed at each of the 2000 geometries sampled. The energies and oscillator strengths were used to simulate an absorption spectrum with the line-broadening method (using Gaussians with a full width at half maximum of 0.3 eV) and to stochastically select the initial excited states. S19 Restricting the selection scheme to the excitation energy window between 3.9-4.2 eV, 209 permissible initial conditions (all starting in S 2 ) were obtained, out of which 100 were used for the simulations.
The dynamics simulations were performed with the Sharc method. S20,S21 A new Sharc-Molcas interface was developed, which allows to perform MS-CASPT2 calculations in the scope of Sharc simulations. It implements the calculation of MS-CASPT2 gradients from finite differences (central differences) in a trivially parallel fashion. Furthermore, the interface automatizes the calculation of SOC, dipole moments and wavefunction overlaps through the Molcas RASSI S14 module, employing the perturbatively-modified CASSCF wavefunctions resulting from MS-CASPT2 calculations. Note that the wavefunction phases in the calculations of the SOC can be inconsistent with those from the overlap calculations. While we keep track of the wavefunction phases by means of the overlaps, we cannot control the phases in the spin-orbit calculations in RASSI.
The dynamics simulations included 3 singlet and 3 triplet states, considering each triplet state explicitly (which makes 12 states in total). As a central paradigm in the Sharc methodology, in each time step the electronic Hamiltonian matrix spanned by these 12 states is computed-including SOC-and diagonalized. The surface hopping simulations are then performed using these "diagonal" states. For details on the justification and implementation of this ansatz we refer the reader to the recent review on the Sharc method. S20 We note that for the analyses presented here the trajectory data was transformed back into the original singlet and triplet states to facilitate the interpretation of the results.
We employed the velocity-Verlet algorithm with a time step of 0.5 fs for the nuclear dynamics and a time step of 0.02 fs for the propagation of the electronic wavefunction, using the "local diabatization" formalism. S22,S23 The latter formalism was adopted because it does not require nonadiabatic coupling vectors-which are not available at MS-CASPT2 level-and because it offers very good numerical stability for the electronic wavefunction propagation. S23 Energy conservation during a surface hop was ensured by scaling of the full velocity vectors, since the nonadiabatic coupling vectors are not available for our level of theory. We employed an energy-based decoherence correction with a parameter of 0.1 Hartree. S24 From the 100 trajectories, 8 trajectories were discarded due to problems with total energy conservation, which is probably related to instabilities in the calculation of the numerical gradients. These instabilities could be due to changes in the active space composition or intruder states at some of the displaced geometries in the numerical gradient calculation. Several measures were taken to minimize these problems. For all displaced states, the overlap to the states at the central geometry were calculated with RASSI in order to monitor whether the states at the displaced geometries change character (intruder states, changes of active space composition, state reorderings). Furthermore, the reference weights of the displaced states were monitored. If problems in the overlaps or weights were detected for a particular displaced geometry, the gradient for the corresponding cartesian direction was calculated using one-sided differentiation, neglecting the problematic geometry (central differences were used in all other cases). If both displacements for a cartesian direction were problematic, the trajectory was terminated. The reference weights from the CASPT2 output were also used to identify intruder states for the central geometry. If the reference weight of an excited state was less than 80% of the reference weight of the ground state, then the trajectory was likewise terminated. From the remaining 92 trajectories, 48 had to be excluded from the data analysis due to inconsistencies in the wavefunction signs between the SOC calculations and the overlap calculations (as mentioned above).  In Figure S2 (b) the experimental gas phase spectrum S25 is shown, together with a tentative decomposition into three Gaussians g 1 , g 2 , and g 3 . The Gaussians are centered at 4.0 eV, 4.3 eV and 4.7 eV; the low-energy tail of the spectrum extends to about 3.5 eV. This agrees with other reported absorption spectra of 2TU, S25-S29 which always find a a maximum at around 270 nm (4.6 eV), an intense shoulder at about 290 nm (4.3 eV) and absorption extending until approximately 350 nm (3.5 eV). An nπ * state has been observed in the circular dichroism spectrum of 2-thiouridine at a wavelength of 320 nm (3.9 eV). S30 As shown in Figure S2 (b), and confirmed by circular dichroism spectroscopy, S30 the experimental absorption spectrum can be described as arising from three excited states in the spectral range considered here. At the low-energy edge of the spectrum, an nπ * state is located, which has only a very low intensity. Furthermore, the absorption band can be decomposed into two contributions of similar intensity (g 2 and g 3 ). This is consistent with the spectra of 2TU in several solvents, where two distinct maxima can be seen. S26-S29 The absorption spectra simulated in the present work reproduce very well the absorption contributions g 1 and g 2 , which can thus be assigned to the states S 1 ( 1 n S π * 2 ) and S 2 ( 1 π S π * 6 ). On the contrary, the employed method does not fully describe the upper part of the absorption spectrum correctly, as is evident from the too low intensity of the simulated spectra below 280 nm. The reason is mainly the too small number of states employed in the multi-state procedure, as it was chosen mainly to describe the lower excited states. As shown in our previous publication, S5 state-averaging over 6 singlet states produces a second bright state at 4.7 eV, which matches well with the center of g 3 at 4.6 eV. For more details, see also the discussion of the oscillator strengths in our previous publication. S5 As the present work is focused on the dynamics after excitation to the lowest bright stateparalleling the experimental conditions (excitation at 316 nm) S29 -the chosen number of states is appropriate. Further support that 3 roots in the state-averaging/multi-state treatment are appropriate is given by the fact that for the low-lying states (S 1 , S 2 , T 1 , T 2 , T 3 ) the potential-energy scans of MS(3)-CASPT2(12,9) and MS(6)-MS-CASPT2(14,11) match very well. S5

Solvent Shifts
The permanent dipole moments at single-state CASPT2(12,9) of the excited states at the Franck-Condon geometry according to Ref. S5 are as follows (all in Debye): 4.2 for S 0 , 4.4 for S 1 ( 1 n S π * 2 ), 6.4 for S 2 ( 1 π S π * 6 ), 3.8 for T 1 ( 3 π S π * 2 ), 4.1 for T 2 ( 3 n S π * 2 ), 3.3 for T 3 ( 3 ππ * 6 ). As mentioned in the main text, except for S 2 all dipole moments are very similar and the excited states are expected to shift only to a minor degree in polar solvents, explaining the good agreement of our gas-phase simulations and the experiments in solution. Table S1 shows the maximum wavelengths of the main absorption bands of 2-thiouracil in different solvents. In can be noted that the maxima are relatively independent of the solvent used, with a spread of 8 nm (0.14 eV) for Band 1 and 9 nm (0.13 eV) for Band 2. This implies that the excited states might not be very sensitive to solvation, unlike in many other nucleobases.

Kinetic Modelling and Global Fit
The kinetic model in Figure 2 (c) of the main manuscript can be described by the following system of differential equations: Here, k 1 is the S 2 → S 1 rate, k 2 is the S 2 → T 2 rate, k 3 is the S 1 → S 0 rate, k 4 is the S 1 → T 1 rate. Integration of this system of differential equations leads to the following expressions for the populations: where S 0 2 is the initial population of S 2 (all other initial populations are zero). The functions S 2 (t), S 1 (t), S 0 (t), T 1 (t), and T 2 (t) were then fitted in a global least-squares fashion to the population data from the 44 trajectories, where the population of T 3 was added to the population of T 2 (the average T 3 population was 2%). Fitting was performed with gnuplot 4.6 S32 using the Marquardt-Levenberg algorithm. The fitting parameters were the four rate constants k 1 to k 4 , which are related to the time constants by τ i = 1/k i .
Error estimates for the time constants were calculated using the bootstrapping methodology, S33 where 1000 copies of the ensemble were randomly generated (random resampling with replacement, i.e., trajectories can occur multiple times in the copies) and fitted in the same way. The error estimates given in the paper are the standard deviations of the distributions of the so-obtained time constants. We note here that these errors only describe the uncertainty in the populations due to the finite number of trajectories; the systematic errors due to the level of theory and the dynamics method are not included. In particular, in Figure 2(c) in the main text it can be seen that the relative error in the time constants is smaller for channels which were sampled by more trajectories (e.g., S 2 → S 1 with 32 trajectories and 20% relative error vs. S 1 → S 0 with 1 trajectory and 110% error).
A comment is in order here regarding the fact that we obtain a good population fit despite not considering all pathways revealed by the net transfer analysis in Figure 2(b). The discussion in the main manuscript highlights an important problem when interpreting excited-state population dynamics from time-resolved experiments: Even if perfect time-resolved data were available, it would not be possible to extract all lifetimes when several reaction pathways are competing (i.e., if the stochiometry matrix has a non-zero nullspace). For many experimental studies, an additional problem is that the individual populations are not directly observable and signals (spectra, ionization yields, etc.) are usually the sum of contributions from some or all populated states, which might be difficult to separate. The beauty of molecular dynamics simulations is that every state population is directly accessible for analysis. However, in order to estimate rate constants, even in simulations it might be necessary to explicitly consider the fluxes between the states, and not only the total populations. Figure S3 shows the minimum-energy path from the 1 π S π * 6 / 1 n S π * 2 crossing to the 1 n S π * 2 minimum on the S 1 ( 1 n S π * 2 ) potential. The descent from the crossing to the minimum proceeds in two stages, where the first leads to pyramidalization at carbon C 2 and nitrogens N 1 and N 3 , accompanied by a large decrease in energy (0.35 eV). The second stage involves the removal of the pyramidalization at the two nitrogen atoms, while keeping the pyramidalization at C 2 . This leads to the out-of-plane motion of the sulfur atom. The second stage shows only a small decrease in energy (0.03 eV). 1 π S π * 6/ 1 n S π * 2 1 n S π * 2 min MEP Coordinate (a.u.)

Supporting Calculations
Energy (eV) Figure S3: Minimum-energy path (MS-CASPT2//CASSCF) from the 1 π S π * 6 / 1 n S π * 2 crossing to the 1 n S π * 2 minimum. Figure S4 shows a potential energy scan along the side-ways motion of the sulfur atom around the T 1 minimum geometry. As can be seen, this vibrational mode leads to a crossing of S 1 (n S π * ) and T 2 (π S π * ) at energies only slightly above the S 1 minimum. Since at the crossing the T 2 is of π S π * character, the S 1 → T 2 transition is El-Sayed-allowed, as shown by the spin-orbit couplings (about 120 cm −1 ). The pyramidalization angles were calculated exactly as in our previous publication. S5 The angle p ABCD , given by atoms A, B, C and D, is defined as the angle between the bond A-B and the plane spanned by atoms B, C and D. Figure S5 shows how p can be defined from the positions of the four atoms by constructing the normal vector on the plane spanned by B, C and D and then calculating the angle between the normal vector and the bond A-B. The pyramidalization angle is then 90 • minus the angle between normal vector and A-B.