Unified Description of Ultrafast Excited State Decay Processes in Epigenetic Deoxycytidine Derivatives

Epigenetic DNA modifications play a fundamental role in modulating gene expression and regulating cellular and developmental biological processes, thereby forming a second layer of information in DNA. The epigenetic 2′-deoxycytidine modification 5-methyl-2′-deoxycytidine, together with its enzymatic oxidation products (5-hydroxymethyl-2′-deoxycytidine, 5-formyl-2′-deoxycytidine, and 5-carboxyl-2′-deoxycytidine), are closely related to deactivation and reactivation of DNA transcription. Here, we combine sub-30-fs transient absorption spectroscopy with high-level correlated multiconfigurational CASPT2/MM computational methods, explicitly including the solvent, to obtain a unified picture of the photophysics of deoxycytidine-derived epigenetic DNA nucleosides. We assign all the observed time constants and identify the excited state relaxation pathways, including the competition of intersystem crossing and internal conversion for 5-formyl-2′-deoxycytidine and ballistic decay to the ground state for 5-carboxy-2′-deoxycytidine. Our work contributes to shed light on the role of epigenetic derivatives in DNA photodamage as well as on their possible therapeutic use.


Classical molecular dynamics
Each compound initial structure ,prior to QM/MM calculations, was obtained through classical molecular dynamics simulation, using the AMBER 12 suite 1 . Since no standard force-fields are available for these epigenetic derivatives, the utility antechamber of AMBER was used to generate semi-empirically AM1-BCC 2 charges and atom types from the AMBER GAFF force field 3 . Each molecule was surrounded by a cubic solvent box of TIP3P water molecules (force field: leaprc.water.tip3p) built up adding solvent molecules along x,y,zdirections within 30 Å from the nucleoside and periodic boundary conditions were applied. Before running final production runs of 100 ns, each system was initially optimized (2000 steps of optimization), then heated at constant volume from 0K to 300K for 20 ps and subsequently equilibrated at 300K for 300 ps at constant pressure (1atm), allowing the density of the system to relax. Hydrogen-containing bonds were restrained by using the SHAKE algorithm, electrostatic and non-bonding interactions were evaluated with a cutoff of 10 Å and the time-step was set to 2 fs. Then, each production run of 100 ns has been analyzed by means of cluster analysis using the clustering algorithm dbscan 4 , setting the RMSD of atomic positions as a distance metric (RMSD = 0.2 Å). For each compound, the molecular structure closest to the centroid of the most populated cluster along the whole dynamics has been chosen for further QM/MM calculations. By doing so It has also been possible to investigate the 'sugar-syn vs sugar-anti' conformational isomerism of these nucleosides around the Nglycosidic in water solution: all four systems show that the 'sugar-anti conformer' is predominant in water solutions at 300K and 1 atm, as suggested by previous works 5-7 .

QM/MM setup
The QM/MM calculations were done by means of the COBRAMM 8 software, developed in our group, interfacing Molcas 8 9 and OpenMolcas 10,11 for the QM part and AMBER for the MM region. A spherical droplet centered on each nucleoside with a radius of 28 Å was cut out from the cubic water box used in the MM dynamics. The partitioning scheme employed was High/Medium/low ( Figure S1), where the pyrimidine ring of each nucleoside was treated at QM-level (High layer part), the sugar ring plus water molecules at most 5 Å far from the high layer part were included in the medium layer (i.e. movable during QM/MM optimizations and coupled to the QM region), whereas the rest of the water droplet was included in the low layer (i.e. frozen during QM/MM optimizations).

QM/MM calculations
Ground state optimizations were done at the Møller-Plesset second-order perturbation theory (MP2) level as implemented in the Molcas8 9 package through its interface with COBRAMM by using the Gaussian16 12 optimizer. Excited states critical points and conical intersections (for all the four epigenetic nucleosides) have been obtained optimizing at QM(SS-CASPT2(14,10)/MM level, basis set ANO-L-VDZP (2s1p contraction on hydrogen atoms and 3s2p1d on carbon/oxygen/nitrogen atoms). The (14,10) active spaces used for all the four epigenetic bases optimizations is documented in Figure S2b, S3b, S4b and S5b, inside the grey squares, subgroups of the RAS-PT2 larger active space. The decay paths documented in Figure 2, 3, 4 and S7 (and in Figure S6, S8, S9 and S10) have been obtained calculating the vertical energies of the ground state minima and refining the vertical energies of each structure documented for all the decay paths at SS-RASPT2/ANO-L-VDZP level, using different active spaces and state-average values, depending on the nucleoside considered (documented in the captions of Figures S2, S3, S4, S5). SE signals have been calculated at the same SS-RASPT2 level, instead for the PA values a state-averaging on 30 states has been employed. The ionization-potentialelectron-affinity (IPEA) shift 13 was set to 0.0 and an imaginary shift 14 of 0.2 a.u. was used. Conical intersection optimizations were performed with the gradient projection algorithm by Bearpark et al. 15 as implemented in COBRAMM.     The CI-pp1*/GS of the 5-methyl-2'-deoxycytidine characterized in the present work is not exactly reproducing the structure of the 'ethene-like' CI reported previously by Martinéz-Fernandéz et al. 16 but it is exactly on the reaction coordinate distortion that leads from the FC region to the Plateau-pp1*, so right along the distortion mode dynamically populated (See on top of Figure 2d). It features a ring puckering (as the ethene-like CI) but characterized by the NH2 group out of plane motion as a consequence of the torsion around the C4-C5 bond, instead of the C5-C6 involved in the ethene-like CI. The difference with the CI structure found by Martinéz-Fernandéz et al. could be assigned to some variance in the calculation level (i.e. different QM/MM setup including smaller solvent sphere) and more probably to the system configuration: they analyzed the sugar-syn instead of the most populated sugar-anti conformer we adopted, which could hinder some particular ring distortion. In fact, we also characterized the 'ethene-like' CI, but we found that it lies at higher energy (≈0.50 eV above our documented CI, Figure S6).

Benchmark calculations
Benchmarking the method employed is mandatory to validate its accuracy and reliability, in particular when investigating different molecular systems. Indeed, before starting extensive computational investigations, we performed systematic tests and benchmarks. In particular, the following was tested and verified: i) reliability and accuracy of the QM-MM partitioning. ii) the flavor of multireference perturbative approach systematically employed in the work; iii) the active space.
the choice of the active space has been carefully analyzed to find a suitable methodology able to provide the best agreement with the experimental linear absorption spectrum, focusing on the lowest S0 → S1 transition, which is exactly the main band that we largely populate adopting a 4.35 eV (285 nm) pump energy, as documented in ref. 17 . 5-methyl-dC (5mdC) is the first nucleoside that we started investigating, and thus we considered it as a test-case for such purpose. Tables S1, S2 and S3 report our tests on 5mdC vertical excitation energies that were calculated employing different active spaces, different levels of theory (SS vs MS, or CASPT2 vs RASPT2) and different QM/MM partitioning (all details are specified in each first row of the tables). Our choice to adopt the SS-RASPT2 method (on top of ground state MP2-optimized geometries) is based on the documented results, trying to reach the best agreement with the lowest experimental absorption bands in the UV.
The final choice was the MP2 GS minimum, to coherently use a second order perturbative methods on both the ground and CASPT2 excited states calculations. Multireference methods have been adopted because they properly describe the internal conversion and inter-system crossing regions.
Ground state DFT/PBE0 optimized structure (basis set for optimization def2tvzp) (Full nucleoside optimized at QM level, but VEEs below were obtained by including the sugar ring in the MM part and the remaining nucleobase ring in the QM one) State n.

5-formyl-2'-deoxycytidine: formyl in syn and anti conformation
The fdC conformer not exhibiting any intramolecular hydrogen bond between the amino and the formyl groups (named anti conformer) was mainly discussed in this study, not showing the one where formyl carbonyl and the amino group are bridged through an intramolecular H-bonding (named syn conformer).
Despite the fact that both conformers may exist in a polar protic solvent (water), as discussed in a recent work 18 , this choice was taken because the anti conformer is the one that, we believe, is more relevant for the ultrafast sub-400 fs photoinduced dynamics observed and investigated here. This choice was based on the following reasons: i) While the syn conformer is expected to prevail in apolar solvents, the stability of the anti conformers smoothly increases while increasing the solvent polarity; ii) In a polar protic solvent like water, that is able to give H-bonds stronger than the amino group itself, the stability of the two conformers should be similar if not event inverted. Moreover, the stronger molecular rigidity induced by an intramolecular H-bond should increase the entropic component of the system, making this conformer even less stable than the anti one where this restriction does not apply. That said, it is apparent that the shortest (sub-400 fs) lifetimes experimentally observed should be properly assigned to the barrierless pp1* decay path documented here for the anti conformer, as this does not suffer for any effect hindering the pp1*® GS internal conversion and the underlying out of plane motions as, instead, it may possibly occur in syn conformers if their intra-molecular N−H···:O bond persists also in the pp1* excited state decay process. However, we have seen that this is unlike to occur (see point iv above). The striking agreement between computations and experiments support this view. carboxyl-deoxyCytidine

Sample preparation for linear absorption/TA
A phosphate-buffered saline (PBS) has been prepared by dissolving 26.7 g of sodium dihydrogen phosphate and 20 g of sodium hydrogen phosphate in 1 L of ultrapure water to obtain a concentration of 150 mM and a pH of 7.1 after adjusting with sodium hydroxide. The sample solutions were prepared to achieve high concentrations for the TA measurement to preserve the high resolution of the experimental setup, for 2 OD in 100 μm: 5.1 mg of mdC in 1 mL, 5.8 mg of hmdC in 1 ml, 4.8 mg of fdC in 1 ml and 5.4 mg of cadC in 1 ml for 1.2 OD in 100 μm. For recording the steady-state absorption spectra the samples have been diluted 10 times to not saturate the spectrophotometer after going through the 1 mm path of the cuvette.

Setup description
Ultrafast TA experiments were performed using a home-built pump-probe setup 21 , based on a Ti:Sapphire laser (Libra, Coherent) generating 100-fs pulses at 1.55 eV photon energy and 1 kHz repetition rate. A fraction of the laser power was used to feed a broadband visible non-collinear optical parametric amplifier (NOPA). The NOPA output pulses, with spectrum spanning 1.77-2.38 eV, were compressed to sub-10-fs duration by chirped dielectric mirrors and successively frequency doubled in a 20-µm-thick Type I b-barium borate crystal, generating broadband UV pump pulses with spectrum spanning 4.43-4.6 eV. The UV pulses were compressed with a MgF2 prism pair to nearly transform-limited 18-fs duration, fully characterized by two-dimensional spectral interferometry 22 . Broadband probe pulses, covering 1.9-3.9 and 3.5-4.6 eV, were obtained through white light continuum generation by focusing either the laser fundamental wavelength or its second harmonic in a slowly moving 2-mm-thick CaF2 plate. The instrumental response function of the system, depending on the probe wavelength, is estimated to be 25-35 fs.
To avoid photodamage of the sample and generation of solvated electrons by two-photon absorption from water, the pump energy was limited to 70 nJ (resulting in a fluence of 300 μJ/cm 2 ) and high concentration sample solutions (~0.1 mm effective path length) were employed. TA spectra of the pure solvent, shown in Fig. S18, demonstrate the absence of solvated electrons. After the sample, the transmitted probe was sent to a spectrometer (SP2150 Acton, Princeton Instruments) and detected using a linear image sensor driven by a custom-built electronic board (Stresing Entwicklungsburo GmbH). For each probe wavelength, the differential absorption (∆A) was measured as a function of the pump-probe delay.

DUV probe measurement of fdC
Figure S11 below shows the TA map of fdC probed in the DUV range to better highlight the strong PA signal at 4 eV discussed in the main text. In addition to this PA signal, we also observe the edge of the ground state bleach visible at the early times.

Parallel polarizations for mdC and hmdC and magic angle polarization for cadC
To complement the results shown and discussed in the main text, here we also report results obtained for different polarization settings. Those with parallel polarizations show relatively stronger SE signals, and the magic angle measurement for cadC shows better the PA signals hidden under the SE in the parallel version (shown in the main text). We did not observe a significant difference between magic angle and parallel measurement for fdC.

Impulsively excited vibrations
Due to the high temporal resolution of our TA setup, we were able to observe fingerprints of coherent vibrations in the measured spectra. Below we show extracted oscillation maps together with their Fourier transforms. In the case of mdC and fdC, these are superimposed with the spectral density plots of the excited-state after vertical excitation from Franck-Condon region. The spectral density plots are computed with numerical CASPT2 frequencies, evaluated on the involved critical geometries of the populated excited-states.   In case of cadC, high noise of the extracted oscillation map makes it difficult to correctly identify the frequencies present in the measurement. Figure S18 shows the dynamics at selected probe energies of the TA measurement on the phosphate buffer solution. No signal from solvated electrons was observed under our experimental conditions. The signal visible at time zero is due to cross-phase modulation artifact.