Disclosing the Role of C4-Oxo Substitution in the Photochemistry of DNA and RNA Pyrimidine Monomers: Formation of Photoproducts from the Vibrationally Excited Ground State

Oxo and amino substituted purines and pyrimidines have been suggested as protonucleobases participating in ancient pre-RNA forms. Considering electromagnetic radiation as a key environmental selection pressure on early Earth, the investigation of the photophysics of modified nucleobases is crucial to determine their viability as nucleobases’ ancestors and to understand the factors that rule the photostability of natural nucleobases. In this Letter, we combine femtosecond transient absorption spectroscopy and quantum mechanical simulations to reveal the photochemistry of 4-pyrimidinone, a close relative of uracil. Irradiation of 4-pyrimidinone with ultraviolet radiation populates the S1(ππ*) state, which decays to the vibrationally excited ground state in a few hundred femtoseconds. Analysis of the postirradiated sample in water reveals the formation of a 6-hydroxy-5H-photohydrate and 3-(N-(iminomethyl)imino)propanoic acid as the primary photoproducts. 3-(N-(Iminomethyl)imino)propanoic acid originates from the hydrolysis of an unstable ketene species generated from the C4–N3 photofragmentation of the pyrimidine core.

Vertical excitation energies. Gas phase absorption spectrum. S-11 Triplet potential energy surface. S-12 Geometries of selected stationary points.

S-33
Materials and experimental methods. 4-pyrimidinone (> 98% purity) and acetonitrile (99.8% purity) were purchased from Sigma Aldrich and Fisher Scientific, respectively, and used as received. Phosphate buffer solutions with a total phosphate concentration of 16 mM from monosodium and disodium phosphate salts dissociated in ultrapure water (Milli-pore) were freshly prepared the day of each laser irradiation experiment. The pH of the solution was adjusted using 0.1 M solutions of NaOH to the desired pH of 7.4 (± 0.1 pH units).

Transient absorption spectroscopy
The experimental setup for the femtosecond broadband transient absorption spectroscopy (TAS) technique and the method of data analysis have been described in great detail previously. 1,2 Briefly, a Ti:sapphire oscillator (Vitesse, Coherent) was used to seed a regenerative amplifier (Libra-HE, Coherent) to produce 100 fs pulses, centered at 800 nm, with a 1 kHz repetition rate. An optical rail kit (FKE series, EKSMA optics) was used to generate the excitation pulse at 267 nm as previously described, 1 whereas an optical parametric amplifier (TOPAS, Quantronix/light conversion) was used to generate the 290 nm pump pulse via second harmonic sum frequency of the idler frequency. A 2 mm translating CaF2 crystal was used to generate the while light continuum from 320 nm to 700 nm. Experimental pump-probe conditions were adjusted to eliminate two photon conditions as previously described in detail. 3 The samples were freshly prepared the day of the laser experiment with absorbances matched at 1.0 in in both solutions at each excitation wavelengths. The homogeneity of the solutions in a 2 mm path length fused silica cell was maintained by continuous stirring with a Teflon-coated magnetic stirbar. The solution inside the cell was refreshed after every scan during data acquisition to minimize the potential contamination of the transient signals by the formation of photoproducts absorbing at the excitation wavelength.
Data collection made use of a home-made LabView program, following which global and target analysis were preformed using the Glotaran graphical user interface to the R-package TIMP software. 4 The complete multidimensional data was globally fit using a two component sequential kinetic model in ACN and a single exponential decay kinetic model in aqueous solution, convoluted with a Gaussian instrument response function of 250 ± 50 fs (FWHM). The evolution associated difference spectra (EADS) were extracted from the global and target analyses. Target analysis showing dependence of the single exponential decay lifetime on probe wavelength was performed using Igor Pro 6.37 (Wavemetrics, Inc.), convoluted with a Gaussian instrument response function of 250 ± 50 fs (FWHM).

Reversed Phase -High Performance Liquid Chromatography (RP-HPLC)
Samples were freshly prepared the day of each experiment in ultrapure water, pH 5.5, to have an absorbance of 1.0 at the excitation wavelength of 267 nm. Following one photon irradiation as described in the transient absorption spectroscopy methods, irradiated samples were analyzed by using HPLC (Shimadzu LC-20AD) associated with a photodiode array (PDA) detector (Shimadzu SPD-M20A) at room temperature. An Ascentis RP-Amide column (Supelco, 5 µm, 25 cm x 4.6 mm) was used to separate the parent chromophore from its photoproducts. A 20 µL aliquot of the irradiated solution was injected into the HPLC through a manual injector. Ultrapure water (pH 5.5) was used to elute the sample with a flow rate of 0.5 mL min -1 .      Figure S5. Transient absorption spectra of 4OPy in ACN (a-c) and phosphate buffer pH 7.4 (d-e) following excitation at 290 nm. Note the break in the abscissa to remove the overtone of the pump.

Computational details
Previous to the theoretical study of the photophysics of 4-pyrimidinone (4OPy), we carried out a tautomer analysis in order to evaluate their relative stability. Altogether we considered four different tautomers, see Figure S9, whose stability was evaluated at the CCSD(T) 5 /aug-cc-pVTZ 6,7 /PCM(water) 8 //MP2/cc-pVTZ 9 /PCM(water) level of theory using ORCA 5.0.1 program. 10 According to our calculations and assuming a Boltzmann distribution at 298.15 K, 4-(3H)-pyrimidinone is the most stable tautomer and thus is expected to be the predominant compound with a relative population of almost 100%. Considering the results from the tautomer analysis, the photophysics of this system has been studied from a theoretical point of view considering the majority 4-(3H)pyrimidinone tautomer. For both the static and the dynamical study of 4OPy, the same protocol has been employed.
All the optimizations and gradient calculations have been performed with the extended multistate complete active space second order perturbation theory method, XMS-CASPT2, 11,12 as implemented in BAGEL program. 13 A second-order Douglas-Kroll-Hess Hamiltonian [14][15][16] has been employed in all the calculations, to account for relativistic effects. The ANO relativistic basis set contracted to triple-zeta quality (ANO-RCC-pVTZ) 6,17,18 and the cc-pVTZ-jkfit auxiliary basis set were used throughout. An imaginary level shift of 0.25 a.u. was employed to avoid the presence of intruder states and the IPEA shift was set to zero. The active space employed in the multiconfigurational calculations is depicted in Figure  S10. The (12,9) active space includes the complete  orbital system, plus two lone pairs respectively sitting at the N1 and the oxo substituent. The number of roots employed in the state average CASSCF calculations varies from calculation to calculation. For the modeling of the vertical absorption spectrum at the equilibrium ground state geometry (Table S4) and the simulated TAS ( Figure S20-S26) we considered 10 singlet roots. The rest of the reported calculations were performed over 4 singlet and 4 triplet states. Figure S10. SA(4)-CASSCF/ANO-RCC-pVTZ orbitals with their occupation numbers included in the (12,9) active space employed in the multireference calculations at the ground-state equilibrium structure.
The main features of the potential energy landscape of 4OPy in the gas phase were explored resorting to the protocol described above. In particular, we have localized the minima, transition states and internal conversion funnels in the ground and in the singlet and triplet excited potential energy surfaces (PES) relevant to the deactivation of this system. The connection between these regions of the PES was ensured with the help of different theoretical tools, as linear interpolation calculations in internal coordinate (LIIC), minimum energy path calculations (MEP), or the nudged elastic band (NEB) method computed at the same level of theory described above. For the MEP calculations, a homemade interface between the ORCA program and BAGEL was used. This interface allows the coupling between the analytical XMS-CASPT2 gradients computed by BAGEL and ORCA's optimizer. Similarly, NEB calculations were performed combining BAGEL's XMS-CASPT2 gradients and the Atomic Simulation Environment (ASE) program 19 optimizer by a homemade interface. The potential role of intersystem crossing in the deactivation mechanism of 4OPy was evaluated by locating intersystem crossing funnels along the main singlet deactivation routes. For the location of singlet-triplet crossings, we prepared a homemade code based on a penalty function. 20,21 The probability of population transfer at the ISC points was evaluated through the computation of the SOC with OpenMolcas. 22 For this purpose, the CI vector of the perturbatively modified CASSCF (determinant based) wave function was extracted from the BAGEL output and translated into OpenMolcas format (CSF based), to calculate the wave function overlap at different times and the SOC.
The time evolution of the excited system was monitored by means of singlet-triplet nonadiabatic mixed quantum-classical dynamics, considering the adiabatic representation. Within the trajectory surface hopping algorithm approach, nuclei are treated classically and follow Newton's equations, whereas electrons are treated quantum mechanically. For the integration of Newton's equations the velocity Verlet algorithm 23,24 was employed with a time step of 0.5 fs. The evolution of the probability amplitudes determining the contribution of the different adiabatic states to the total wave function was integrated using the fourth order Runge-Kutta algorithm with a time step of 10 -2 fs. A decoherence correction was applied, as recommended by Granucci and Persico with the parameters C = 1 and α = 0.1 hartree. 25 As already explained above, the electronic structure method employed for the molecular dynamics simulations was XMS-CASPT2 (12,9)/ANO-RCC-pVTZ. For this, we combined BAGEL (to calculate the energy and the gradient of the spin-free states) and OpenMolcas codes (to evaluate the spin-orbit coupling and the overlap between the wave functions in two consecutive steps) with a homemade version of the molecular dynamics code SHARC 26 with a Surface Hopping based on the density 27 to follow the coupled electronic and nuclei movement.
The initial set of geometries and velocities employed for the modelling of the semiclassical spectrum and the dynamics was generated using a Wigner distribution for every normal mode of the electronic ground state, employing the harmonic MP2 frequencies.
From the original set of 1000 geometries, we simulated the semiclassical absorption spectrum shown in Figure 1 at the XMS-CASPT2(12,9)/ANO-RCC-pVTZ level of theory. To cut down the computational cost, in this case, the oscillator strengths were approximated considering PM-CASSCF wave functions. Gaussian functions with a width of 50 nm were employed to convolute the spectrum. An energy window of 0.1 eV to both sides of the maximum of the calculated absorption band, i.e between 4.32 and 4.34 eV, was considered to select the initial pairs of geometries and velocities that were employed for the molecular dynamics of the system. A total of 100 trajectories were run for a maximum time 500 fs. All the trajectories reaching the ground state were stopped once they had evolved either to the original heterocyclic minimum or to the ketene photoproduct.
Vertical excitation energies. Gas phase absorption spectrum. Table S4 collects the gas phase vertical energies at the equilibrium geometry of 4-(3H)pyrimidinone, characterized by a planar structure. According to our XMS-CASPT2 calculation using a SA10-CASSCF (12,9)/ANO-RCC-pVTZ reference, in the region comprised between 400 and 200 nm, there are two intense absorptions with similar oscillator strengths, the first (H  L) peaking at 287 nm (4.33 eV) and the second (H  L+1) at 224 nm (5.53 eV), both of * character, see orbitals in Figure S10. These two transitions are separated by two almost dark absorptions at 265 and 257 nm, corresponding to n* states, and respectively involving the lone pairs of N1 and the carbonyl group, Figure S10. Table S4 also reports the computed permanent dipole moments of the electronic excited states evaluated at the S0 minimum position, which, in the case of * states, exceed the value computed for the S0. Additional regions of the singlet potential energy surface.
Irradiation at the maximum of the first absorption band results in the partial population of the S2(n*) excited state (11%). Thus, for completeness, we also mapped the potential energy surface along this electronic state. A MEP calculation along the S2 was found to connect by a barrierless path the FC region with the S2/S1 crossing, where the T2 state is also energetically degenerate. This crossing is directly connected with the S1A(*) minimum, as revealed by MEP calculations starting from the crossing position and following the state with * character. A second MEP, this time following the n*state, describes a rather planar region of the PES (see Figure S11 and Figure S39) with n*character which latter evolves in a barrierless manner to the S1A(*) minimum. In fact, XMS-CASPT2 does not locate a n* minimum in the S1 potential, even when significantly reducing the optimization step. This topography of the n* PES is fully consistent with the outcome of the MD simulations which predict an ultrafast decay of the S2 population to the S1.

Triplet potential energy surface.
Along the mapping of the PES of 4OPy, we also explored the topography of the triplet excited states. The energetic proximity of the triplet and singlet excited states at particular regions of the PES shown in Figure 3 suggests the possibility for population transfer. We find moderately coupled triplet states lying close to the position of the singlet minima, S1A and S1B (energy gaps <0.2 eV, SOC ca. 10-20 cm -1 ). Indeed, the located S1/S0 internal conversion funnels actually correspond to S1/S0/T1 three state degeneracy points, with the T1 significantly coupled to the S1 (SOCs 15-20 cm -1 ) and the S0 (SOCs 6-10 cm -1 ). MEP calculations along the T1 state starting both from the minima and the S1/S0/T1 crossings lead to a T1 minimum, T1min, which preserves the * character of the S1 but not its characteristic puckered geometry, since the T1min recovers the planarity of the ground state minimum.
Moreover, two higher lying additional S/T crossings were located, this time involving the second triplet state, T2 (n*), (See Figure S11). The first crossing at 4.19 eV, which in fact coincides with the position of the S2/S1 conical intersection, shows a quite stretched CO bond distance and the oxo group slightly slanted in the direction of the N3 atom. The second S1/T2 crossing, 0.2 eV higher in energy than the former, presents as well a quite strained CO bond and a displacement of the C2HN3H moiety from the plane of the molecule. Minimum energy paths from these two crossings lead to a n* minimum in the T2 potential, T2min, which would internally convert to T1min (*) after accessing a T2/T1 crossing located ca. 0.3 eV above the T2min. The spin-orbit coupling between S1 and T2 at the intersystem crossing regions is quite remarkable (ca. 25-50 cm -1 ) for a system consisting of atoms of the first period. T 2min CI-A S1/S0 /T1 CI-B S1/S0 /T1 CI T2/T1

Non-adiabatic molecular dynamics simulations
The decay dynamics of 4OPy was investigated through the monitoring of 100 trajectories, propagated in the gas phase at XMS-CASPT2 level and employing a quantum classical surface hopping approach (see details above).
Consistent with the theoretical semiclassical spectrum (recall Figure 1), irradiation of the system within an energy window of 0.2 eV, centered around 4.34 eV (286 nm), would excite 89% of the trajectories to the S1 state, identified as the lowest lying * in the vertical absorption spectrum, and 11% to the dark (n*) state S2, also contributing in a lesser extent to the first absorption band of the system. After 500 fs, 73% of the trajectories revert to the S0, whilst a fraction of the population remains trapped in the excited state (27%). From the population remaining in the excited state, 7% was found to undergo intersystem crossing to the triplet manifold. For most of the trajectories, population transfer from the singlet manifold to the T2 is observed through the two ISC presented in Figure S11. Part of the population in T2 decays to the T1 potential within the time window of our simulations. It must be, however, noticed that the deactivation mechanism involving triplet excite states is practically insignificant and emphasized that these simulations were done in the gas phase. The analysis of the trajectories decaying back to the ground state shows that more than 70% of the excited 1 ππ* population returns to the ground state in 500 fs, and ca. 35% does in the first 100 fs.
An approximated excited state lifetime can be extracted from the above results and compared with the experimental results. For this purpose, the total number of trajectories in the ground state as a function of time have been fitted to a Boltzmann's sigmoidal function, see Figure S14.  The fitting of the ground state population to the Boltzmann's sigmoidal function in (1) delivers an average lifetime (t0) for 4OPy system of 126.2 fs. Due to the characteristics of the molecular dynamics approach used, not all the trajectories follow the same deactivation pathway, that is, some of them can get trapped in the minima for more time than others which are able to find a more favorable deactivation mechanism. For this reason, a deviation from this average lifetime of 39.5 fs has been estimated through a Boltzmann sigmoidal fitting, tc. This function width indicates the possibility that the trajectories get retained in the potential energy landscape along the dynamics.

Representative trajectories
The trajectories shown in the following are representative of the two main deactivation routes observed for 4OPy which account for the recovery of the original ground state ( Figure S15) and ketene formation ( Figure S16).

NEB calculation between the CI-BS1/S0/T1 and ketene
The C4-N3 bond breaking observed during the molecular dynamics simulations was also studied resorting to a Nudge Elastic Band (NEB) calculation interpolating the most stable optimized CI-BS1/S0/T1 internal conversion funnel and the ketene open product, see Figure  3b. The first cycle of NEB (red line in Figure 3b) reveals that dissociation from CI-BS1/S0/T1 point could take place directly, i.e. there is no barrier connecting these two points of the potential energy surface. The optimized reaction path (last cycle of the NEB calculation, black line in Figure 3b), however, would first drive the system to the original heterocyclic planar minimum, from where, after surmounting an energy barrier the system would access the global ketene minimum. Interestingly, the NEB results reveal that the topography of the PES alone does not govern the pyrimidine ring-opening process and that the formation of this photoproduct is mainly a consequence of dynamical effects. In fact, the alignment of the momentum in the direction of the C4-N3 bond (see Figure S17) would corroborate that the ketene is rather a dynamically driven photoproduct.

Photoproducts spectra simulation
With the objective of identifying the photoproducts detected in the experimental sample after the 10 min irradiation, a computational spectroscopic study of several plausible photoproducts was carried out.
In order to monitor the evolution of the unstable ketene to the final photoproduct(s) detected in the experimental absorption spectra and due to the considerable flexibility of this intermediate, we selected one of the trajectories leading to the ketene species and reinitiated its propagation. The adiabatic dynamics for this trajectory in the ground state was monitored for 2 ps at the more computationally affordable B3LYP/cc-pVDZ level of theory using the RIJCOSX approximation. 28,29 This dynamics was employed to identify ketene most stable conformers. For that purpose, we recorded altogether 50 snapshots separated by a 40 fs time lapse. Then, these structures were reoptimized at B3LYP/cc-pVDZ level of theory leading to 4 different conformers. Assuming that the most probable fate of the ketene species is the attack of a water molecule and the formation of a carboxylic acid, we reoptimized the structures of the corresponding acids this time at the B3LYP/cc-pVDZ level of theory, obtaining the 3 conformers collected in Figure S18. Figure S18. Three conformers of the carboxylic acid generated from the hydrolysis of the ketene photoproduct calculated at the B3LYP/cc-pVDZ.
For each of these conformers, we subsequently run ground state molecular dynamics simulations at the B3LYP 30 /aug-cc-pVDZ level, using the RIJCOSX approximation 28,29 and a NHC thermostat 31 initialized at 350 K during 500 fs considering a timestep of 0.5 fs. The absorption spectrum for each conformer was simulated considering the XMS-CASPT2(12,9)/ANO-RCC-pVTZ vertical excitation energies on the geometries registered every 5 fs from the DFT molecular dynamics. As for the semiclassical absorption spectra, the oscillator strengths were computed considering PM-CASSCF wave functions. Finally, the computed absorption spectrum for the carboxylic acid in Figure 4c was prepared considering the same population for the three tautomers.
The same procedure was used for the simulation of the absorption spectra of the 4OPy hydrates. In this particular case, however, the DFT molecular dynamics were directly initialized from the optimized hydrated species. In our simulations, we considered both the 5-hydro-6-hydroxy and the 5-hydroxy-6-hydro species although similarly to previous studies on uracil the former species was found to be more stable by 1.4 kcalmol -1 . 32 For the hydrates, a new active space (14,10) in Figure S19, has been considered to include the water orbitals involved in the hydration reaction. Figure S19. SA4-CASSCF/ANO-RCC-pVTZ orbitals with their occupation numbers included in the (14,10) active space employed in the multireference calculations of the 5hydro-6-hydroxy 4OPy-hydrate. An equivalent active space was employed for the 5hydroxy-6-hydro isomer.
The active space in Figure S19 was also employed for the photohydration potential energy profile in Figure 3c.

Theoretical transient absorption spectra
To help in the interpretation of the experimental transient absorption spectra (Figures 2ac, S1 and S5) and evolution associated difference spectra (Figures 2d, and S2), we have computed the transient absorption at key regions of the excited potential energy surface of Figure 3a and S11, where the system is expected to spend some time along the deactivation mechanism. In particular, we have computed the vertical absorption energies and oscillator strengths at the singlet (S1Amin, S1Bmin) and triplet (T1min, T2min) minima, and at the position of the CIS1/S0/T1 conical intersections (CI-AS1/S0/T1 and CI-BS1/S0/T1) as a first approximation to the geometries for the vibrationally hot S0 once internal conversion has occurred. From the vertical excitation energies and the corresponding oscillator strengths at the position of each of the stationary points of the PESs, we have generated the individual theoretical transient absorption spectra reported in Figure S20 and Figures S21-S26. As already mentioned in the Computational details section, the simulated TAS have been computed at XMS-CASPT2 (12,9)/ANO-RCC-pVTZ level of theory, including 10 singlet roots in the calculations. The individual TAS were plotted using Gabedit 2.4.8 program 33 , which assigns a Gaussian function to each vertical excitation. A full width at half maximum of 50 nm was selected for all the spectra. The simulated evolution associated spectra (EADS) replica in Figures 2e-f and S3 were constructed by linearly combining the individual transient absorption spectra to deliver the best possible experimental spectra, using a homemade program. Figure S20 reveals that minimum S1Amin shows a moderate absorption in the region between 600-400 nm and a strong absorption centered at 310 nm. On the contrary, the absorption signal of S1Bmin is moderate and spreads all along the region between 700 and 300 nm. Similar to S1Amin, the absorption signal corresponding to the T1 minimum peaks at 300 and 550 nm, but the singlet signal at 450 nm is replaced by another absorption above 600 nm. The T2 instead shows a very weak absorption in this energy range, and thus it is not possible to monitor its population through the recording of transient absorption spectra. Both S1/S0 conical intersections, CI-AS1/S0/T1 and CI-BS1/S0/T1, show an absorption signal that follows an exponential decay within the considered energy range, presenting maximum around 300-350 nm and almost no absorption beyond 600 nm.                  Figure S38. Minimum energy path from the ISCS2/S1/T2 along the S1(*) excited state at XMS-CASPT2 (12,9)/ANO-RCC-pVTZ level of theory. Figure S39. Minimum energy path from the ISCS2/S1/T2 along the S1(n*) excited state at XMS-CASPT2(12,9)/ANO-RCC-pVTZ level of theory. Figure S40. Minimum energy path from the ISCS2/S1/T2 along the T2 excited state at XMS-CASPT2 (12,9)/ANO-RCC-pVTZ level of theory. Figure S41. Linear interpolation from Franck-Condon region to the ISCS1/T2 along S1(*) excited state at XMS-CASPT2(12,9)/ANO-RCC-pVTZ level of theory. Figure S42. Linear interpolation from Franck-Condon region to the ISCS2/S1/T2 along S1(*) excited state at XMS-CASPT2 (12,9)/ANO-RCC-pVTZ level of theory.

Cartesian coordinates for the stationary points
Equilibrium