Serial Femtosecond Crystallography Reveals that Photoactivation in a Fluorescent Protein Proceeds via the Hula Twist Mechanism

Chromophore cis/trans photoisomerization is a fundamental process in chemistry and in the activation of many photosensitive proteins. A major task is understanding the effect of the protein environment on the efficiency and direction of this reaction compared to what is observed in the gas and solution phases. In this study, we set out to visualize the hula twist (HT) mechanism in a fluorescent protein, which is hypothesized to be the preferred mechanism in a spatially constrained binding pocket. We use a chlorine substituent to break the twofold symmetry of the embedded phenolic group of the chromophore and unambiguously identify the HT primary photoproduct. Through serial femtosecond crystallography, we then track the photoreaction from femtoseconds to the microsecond regime. We observe signals for the photoisomerization of the chromophore as early as 300 fs, obtaining the first experimental structural evidence of the HT mechanism in a protein on its femtosecond-to-picosecond timescale. We are then able to follow how chromophore isomerization and twisting lead to secondary structure rearrangements of the protein β-barrel across the time window of our measurements.


Supporting Figures and
rsEGFP2 data k = 0.01868 s 1 Cl-rsEGFP2 data k = 0.01631 s 1 Figure S1. OFF-to-ON switching rates for chlorinated and unchlorinated rsEGFP2 constructs measured under the same conditions. An exponential decay of the form y = Ae −kt was fitted to obtain the rate k for each construct.
Related to Figure 2.     (2)  Figure S8. The pipeline implemented to denoise Q-weighted maps through PCA is shown. Each map from a collected time point is a row in the data matrix A (6 time points total: 300 fs, 600 fs, 900 fs, 5 ps, 100 ps, 1 µs) and PCA is performed using the sci-kit learn package in python. The last (C6) component found explains less than 5% of the total variance in the data and, upon visual inspection, appears as noise. The first five components are shown in the bottom panels at +/-3.5σ. Related to Figure 4.
Generate q-weighted maps of the form : Save WΔFcorr as CCP4 maps.
In real space, identify the chromophore region (Rloc) and the rest of the protein (Rglob).   Figure S9. Schematic of the steps involved in the generation of background subtracted maps (W∆F max maps). Related to Figure 5.    Figure S11. W∆F max maps generated using the method described in this work and illustrated in Figure S9. The following four characteristics, which we attribute to the presence of a femtosecond intermediate, are marked in the 600 fs and 900 fs maps: (i) the presence of Peak 1 (ii) an elongated and uncentered peak where the cis anti chlorine is positioned, which is in contrast with the round, centered features visible in the W∆F max maps from the later time points (iii) electron density that fills the cis anti chromophore phenol ring (iv) features that suggest a tilt of the imidazolidone ring oxygen towards the phenol ring. Though less pronounced, these features are also identifiable in the 5 ps map.               Figure 3.

OFF State SFX Structure for Cl-rsEGFP2
Our room temperature dark structure for Cl-rsEGFP2 presents predominantly the planar trans anti configuration (trans-PL) and minor populations of trans syn (trans-TW ), and cis anti configurations ( Figure  S4). A previous cryotrapping crystallographic study 2 identified the trans-PL and trans-TW configurations as photoproducts of the HT and OBF pathways respectively. The trans-PL photoproduct was found in the structure with a contracted unit cell, while trans-TW was the primary configuration in the structure from the larger unit cell. This led to the conclusion that the choice of pathway is dependent on the crystal packing, where tighter packing favors the volume-conserving HT. Interestingly, our dark room temperature SFX structure presents mainly the trans-PL form, despite its large unit cell, when the chlorine substituent is present (Table S3 and Figure S5). This suggests that the cis-to-trans isomerization pathway at room temperature is a hula-twist, independent of unit cell dimensions. Table S3 lists the predominant conformations for the Cl-rsEGFP2 structures in this paper and in the cryotrapping study 2 , with the corresponding treatments, as well as other published rsEGFP2 SFX structures. Room temperature SFX data indicate that the difference in chromophore conformation observed by Chang et al. 2 was induced by the dehydration protocol or the freezing process itself, rather than dictated by a lattice-dependent change in pathway. This interpretation is further supported by a separate cryotrapping experiment (row 3 of Table S3 and Figure S6, PDB 8A83) where the ON state crystal is first irradiated with 488 nm light, dehydrated into the smaller unit cell size, and then subsequently cryocooled; the resulting structure shows a trans-PL chromophore.

Frame Data Processing and Crystallographic Analysis
A bad/damaged pixel mask for the detector was generated from dark scans recorded before data collection and was applied across all image data processing. Peak and hit finding were performed using Cheetah 3,4 . Identified hits were then indexed and integrated in CrystFEL 5,6 using the XGANDALF algorithm 7 . The achieved indexing rate was around 80% for all data. Indexed crystals from CrystFEL streamfiles were binned into different time delays and separated as "light" (pump laser on) and "dark" (pump laser off) to generate respective reflection files using a custom python script. Merging statistics for the datasets collected are reported in Table S1. The PHENIX 8 reflection conversion function was used to convert intensities to structure factors and model refinement was done using the phenix.refine function. Minor chromophore populations in the dark structure were manually refined (PDB 8A6G). Low resolution (30 Å) and high resolution (1.63 Å) cutoffs were applied. Q-weighted difference electron density maps and background subtracted maps were then generated as explained in Section 2 of the Supporting Procedures.
For the analysis of protein-wide pump-induced changes, we used the scripts by Wickstrand et al. 9 . In short, a spherical volume of radius 2.0 Å was walked across every atom in the protein and the positive and negative difference electron densities from each dataset's Q-weighted map were averaged separately within each volume. A cut-off of +/-3σs and a grid spacing of 0.5 Å were used in the scripts.

Time Zero Determination by Cross Correlation
The cross-correlation of the optical and X-rays pulses was measured via the standard technique, previously described 10 . Briefly, a 50 µm thick, semi-conducting crystal (Ce:YAG) was placed in the interaction region. The transmission intensity of the optical pulse through the crystal was measured by a photo-diode. Exposure to hard X-rays causes modulation of the crystal's reflective and transmissive properties. The amplitude of the optical transmission was monitored while the temporal delay between the two pulses was scanned. With appropriate volume of data and averaging, the resulting signal exhibits a step-like function, centred approximately when the two pulse are overlapped in the temporal delay. The signal measured on the photo-diode ( Figure S3(a)) as a function of delay was fitted as previously described 11 . The averaged time zero over a number of runs was used to correct the delay stage, to give an accurate time zero and, as such, an accurate binning for each data set (300 fs, 600 fs, 900 fs, 5 ps).
SACLA has recently implemented a feedback system between the optical and X-ray pulse, using a balanced optical-microwave phase detector (BOMPD) 12 . This corrects for long-term temporal drift over several hours or days which previously caused movement on the order of 60 -100 fs 11 . The BOMPD feedback loop, additionally, reduces X-ray jitter stemming from an intrinsic instability in the Self Amplified Spontaneous Emission (SASE) process to sub 50 fs. Since it was difficult to suppress the actual timing drift using only a BOMPD, the drift here was further compensated using a phase shifter and the timing tool data. The jitter measured from the cross-correlation fittings is shown in (Figure S3(b)). As an additional precaution to mitigate drift, the cross-correlation was measured before the start of each day. Furthermore, the reflection intensity (transmission −1 ) was measured and used as a comparative cross-correlation.
With the various steps taken described above, as-well as the new feedback system implemented at SACLA, we were able to accurately bin the data into sub picosecond bins (300, 600 and 900 fs) and confidently ascribe the ultrafast dynamics.

Generation of Q-weighted Maps
Through the reciprocalspaceship library 13 , dark and light structure factors (F obs D and F obs L respectively) are in turn scaled to the structure factors calculated from the refined dark structure (F calc ).A simple scale factor m is applied to an entire dataset ( Figure S7(a-b)). Q-weighted difference electron density maps are then calculated using weighted structure factor amplitudes, where the Bayesian weight applied is based on the work of Ursby et al. 14,15 and implemented in python using reciprocalspaceship ( Figure S7(c-d)). A value of α = 0.2 is used here. Maps where the scaling is done using the SCALEIT program within the CCP4 suite 16 and the Q-weighting is implemented as in 17±19 present the same key features as those generated by the method described here (not shown). The easy implementation in python, however, allows for easy manipulation and visualization of the structure factor distributions and screening of improved map generation parameters (such as scale factor and α values).

Principal Component Analysis (PCA) of Electron Density Maps
Principal component analysis (PCA) of the Q-weighted maps described above is performed in python by loading each map as a NumPy array through the GEMMI library for structural biology (GEMMI version 0.4.5). A solvent mask and a mask that only retains grid points within 7 Å of the chromophore are applied. PCA is then sklearn.decomposition.PCA function (Scikit-learn version 1.0.1). Figure S8 presents the pipeline and results for PCA on the six DED maps. The first 5 components (C1-C5) are shown (noting that the component sign is arbitrary in the analysis). PCA here is used primarily as a method to improve the data signal-to-noise ratio: the 6th component (C6) is identified as noise and the final difference maps shown in Figure 4 are reconstructed without it.

Generation of Background Subtracted Maps (W∆F max maps)
In order to refine light-induced coordinates, one ideally wants to separate the signal that belongs to the photoproduct population from that of the dark state. Because only a small percentage of protein molecules actually undergo photoactivation, the problem of extracting the electron density of the photoproduct is a very similar issue to the one Pearce and colleagues describe of discerning minor states in macromolecular crystallography 1 . Pearce et al. generate background corrected maps by subtracting the "ground state" 2mF o -DF c map from the "dataset" 2mF o -DF c map as follows : (1) where the background correction factor (we refer to it here as N bg , in 1 it is called BDC ± Background Density Correction factor) is the value that maximizes the difference in correlation (calculated from the ground state and the corrected map) between the entire protein and a specific area of change.
We have extended this here to Q-weighted maps. Figure S9 describes the steps involved in generating what we have called W∆F max maps. Q-weighted maps of the form : are generated for a range of N bg between 0 and 1 and saved as electron density maps in CCP4 format. In real space, a local region, R loc , is defined as a sphere of 5 Å centered around the chromophore double bond. The entire protein is defined as R glob after a solvent mask is applied. For each value of N bg , we compute the Pearson correlation coefficient between the respective W∆F corr map and the map obtained from the dark model calculated structure factors (F calc ). This is done for both R loc and R glob . We then choose the N bg value that maximizes the difference between these two correlation coefficients and save the corresponding map (W∆F max ) for that timepoint. The determination of the appropriate background subtraction value for each timepoint is shown in Figure S10 and the corresponding W∆F max maps are shown in Figure S11. The cis anti photoproduct is discernible already at 300 fs and is very clear in the 100 ps and 1 µs maps. The maps from the 600 fs and 900 fs datasets contain additional features (numbered (i-iv)) that support the presence of the femtosecond intermediate trans-FS (see main text and Figure S13). The 5 ps W∆F max map, just as the respective Q-weighted map, is noisier than the other datasets, though it still contains weak trans-FS features and some cis anti population. Figure S12 displays background corrected maps generated by subtraction of light and dark 2mF o -DF c as done in 1 . These are very similar to the ones calculated directly from reciprocal space and support the conclusions drawn above.
Refinement of Cl-rsEGFP2 coordinates to W∆F max maps is done, starting from the dark coordinates, with phenix.refine by allowing only atom positions from the chromophore and the residues immediately next to it in the sequence (residues 63-65 and 69-71) to vary. This is because the background subtracted structure factors are much lower signal-to-noise than the ones used to refine the dark structure, and we therefore limit their use to the chromophore region where light-induced changes are strong enough to clearly reveal the presence of minor populations. Occupancies for the cis anti and trans-FS species in light datasets were refined so as to minimize R work ( Figure S14). Despite the weak presence of trans-FS features in the 5 ps maps, trans-FS occupancy refinement (not shown) for this timepoint did not yield a significant population, so this conformation was not included in the final coordinates. Final structures for each time point are deposited (in order from 300 fs to 1 µs) as PDBs 8A6N, 8A6O, 8A6P, 8A6Q, 8A6R, 8A6S.

2019 SFX for OFF-state rsEGFP2 structure
Dark data was collected at SACLA at the BL3 EH2 beamline with setup equivalent to the one described above (MPCCD-phase III detector, 10.5 keV), but a refined detector distance of 50.6 mm. Data was then processed as described for the 2021 SFX experiment, with the difference that structure refinement was done in REFMAC 20 (PDB 8A7V).

Extra-Cryotrapping Experiment
Crystals were grown as described in 2 . For the irradiation/dehydration protocol, the crystal was fished onto a loop and transferred to a separate droplet of mother liquor on a cover slip. The entire cover slip was picked up with tweezers and placed in the beam of a 488nm laser (specifics as in 2 ). Illumination lasted between 20-30 seconds. Following this, the crystal was transferred to a drop of dehydrating cryoprotectant 2 and placed long enough to contract into the smaller unit cell size (approximately 5 seconds). It was then fished out onto a loop again and plunged into liquid nitrogen. Data was collected at the SSRL 7-1 beamline. Data reduction and refinement were then done using XDS 21 and Phenix 8 (PDB 8A83).Refinement statistics for this cryotrapped structure and for the dark structures collected at SACLA are reported in Table S2.

Quantum Chemical Modeling Details
QM-MM simulations in the protein were performed using our own version of GROMACS 4.6.5 22,23 coupled to the Terachem 24,25 (for ground state) and GAMESS(US) 26 (for excited-state) quantum chemistry packages. We searched for minimum-energy geometries in all systems using the limited-memory Broyden-Fletcher-Goldfard-Shannon (L-BFGS) quasi-Newton optimization algorithm, without PBC and with infinite cut-offs for the Coulomb and Lennard-Jones interactions. All optimizations in the work were performed until the maximum component of the force was lower than 10 kJ/mol/nm. Vertical absorption and emission spectra calculations were performed with xMCQDPT2 method implemented in the Firefly package 27 .

Planar Ground and Excited State trans-PL Structures
Initially, we performed optimization of the Cl-rsEGFP2 on ground and excited states, starting from the dark trans-PL crystal structure. On the ground state, we performed optimization with density functional theory (DFT) at PBE0/cc-pVDZ//Amber03 28±30 with empirical corrections to dispersion energies and interactions introduced with Grimme's DFT-D3 model 31 . On the excited state, a double optimization scheme was employed. At first, SA2-CASSCF(2,2)/3-21G//Amber03 32 was used to minimize the S1 state. We used a small active space in this optimization to prevent the interchange between the S1 and S2 states, which typically happens for the neutral GFP chromophores when using using larger active spaces without electron correlation. The structure optimised with the small active space, was subject to a second optimisation at the SA2-CASSCF(12,11)/3-21G//Amber03 level of theory.
Both S0 and S1 optimized structures are planar. Next, we computed vertical excitation and emission energies at the xMCQDPT2/SA6-CASSCF(12,11)/cc-pVDZ//Amber03 level of theory. All 6 states were included into averaging as well as into the effective Hamiltonian. Results suggest absorption at 405 nm (3.06 eV) and emission at 518 nm (2.39 eV). In addition for the S1 minimum energy geometry, we computed an excited-state absorption (ESA) from the S1 into the S5 of 425 nm (2.92 eV). These excitation energies are in line with the experimental data ( Figure 6), and hence suggest that the model provides an adequate qualitative description for our system.

Identification of the Photoisomerization Pathway
At first, we performed a search for the isomerization pathway connected to the TR-SFX-resolved trans-FS structure at 900 fs. Optimization on the ground (PBE0/cc-pVDZ//Amber03) and excited (SA2-CASSCF(12,11)/3-21G//Amber03) states resulted in the planar trans-PL structure, suggesting that this twisted structure is not connected to any minima on the excited state. To identify minimum energy conical intersection points(MECI), we implemented a penalty function MECI search algorithm 33 with α=0.02 and σ=16 as the parameters for the penalty function. When starting the optimization from the the trans-FS structure, no MECI could be located, in line with previous computational studies 34,35 that suggest that in the neutral GFP-chromophores there is no conical intersection associated with the rotation of the phenol ring. We next searched for a hula-twist MECI by manually increasing τ torsion angle by 45 • in the trans-FS structure and optimising its geometry at the SA2-CASSCF(12,11)/3-21G//Amber03 level of theory. This optimisation resulted in a twisted structure with an S1/S0 energy gap of ≈0.035 a.u. Starting a MECI optimization from this geometry leads to the MECI structure shown in Figure S19 with an S1/S0 energy gap of ≈0.0005 a.u. After the optimizations we also recomputed energies for both twisted minima and MECI points with xMCQDPT2/SA6-CASSCF(12,11)/cc-pVDZ//Amber03. In addition, to determine whether the MECI point connects the anti-trans conformer to the anti-cis conformer, we performed a ground state optimisation at PBE0/cc-pVDZ//Amber03 level starting from the MECI. While a direct optimization leads to the original planar anti-trans-PL chromophore conformer, a slight perturbation of the position of the methylene hydrogen atom by ≈ 0.3 Å towards a more cis-like position, leads to the anti-cis conformer.

Confirming the Presence of the Chlorine Substituent
To confirm incorporation of the chlorine substituent into the chromophore, we analyzed anomalous difference Fourier maps from anomalous data taken at different wavelengths. Single crystals were obtained by the hanging-drop vapor diffusion method at 20 • C described previously 36 . Briefly, the protein solution (12 mg/mL, 50mM Hepes pH 7.5, 20 mM NaCl) was mixed 1:1 with the precipitant solution (100 mM Hepes pH 8.0, 1.80 M ammonium sulphate, 20 mM NaCl) to yield 2-4 muL drops that were placed over a well containing 1 mL of the precipitant solution. Mature crystals with dimensions up to 500 muLm x 200 muLm x 200 muLm appeared after three weeks. Prior to flash-freezing in liquid nitrogen, crystals were cryoprotected by passage through three solutions of incrementally increasing amounts of sucrose to a final concentration of 1.2M in 75 mM Hepes pH 8.1, 20 mM NaCl, 0.9 M ammonium sulphate.
Data was collected on beamline I23, Diamond Light Source, using the Pilatus 12M semi-cylindrical detector at a temperature of 50K 37 . Data were collected above and below the chlorine K absorption edge at 4.0 keV (λ=3.1 Å) and 2.8 keV (λ=4.4Å) respectively. Each dataset consisted of 3600 images at 0.1 c irc oscillation with 0.1s exposure. 3 x 360 c irc datasets at 4.0 keV were used to phase the structure using CRANK2. Further structure refinement was performed in Phenix and Coot (PDB 8AM4). Anomalous difference Fourier maps were generated using ANODE 38 . Figure S20 shows the resulting anomalous difference Fourier maps from 4.0 keV and 2.8 keV. The presence of density in the 4.0 keV dataset (green mesh, 5σ) in conjunction with the absence of density in the 2.8 keV dataset (red mesh, 4σ) is conclusive of Cl, confirming the identity of the chromophore heavy atom substitution.

OFF-to-ON Quantum Yield for the Cl-rsEGFP2 construct
We carried out a comparative measurement to estimate the reaction quantum yield (QY) for our chlorinated rsEGFP2 construct. The OFF-to-ON QY for unchlorinated rsEGFP2 was most recently estimated to 0.23 39 . rsEGFP2 and Cl-rsEGFP2 solutions were fully converted to the OFF state using through illumination with a 488 nm LED. The final OD at 405 nm for both was ≈ 0.25. The OFF-to-ON conversion was then driven with a 405 nm LED at 4.4 mW power and the absorption at 488 nm and 405 nm monitored with an Agilent 8453 spectrophotometer.
Triplicate measurements were averaged and normalized. Figure S1 shows the 405 nm absorption decay for both constructs with the respective fitted decay rates (k ). From the ratio of the two fitted rates we estimate the OFF-to-ON QY for Cl-rsEGFP2: