From Microstates to Macrostates in the Conformational Dynamics of GroEL: A Single-Molecule Förster Resonance Energy Transfer Study

The chaperonin GroEL is a multisubunit molecular machine that assists in protein folding in the Escherichia coli cytosol. Past studies have shown that GroEL undergoes large allosteric conformational changes during its reaction cycle. Here, we report single-molecule Förster resonance energy transfer measurements that directly probe the conformational transitions of one subunit within GroEL and its single-ring variant under equilibrium conditions. We find that four microstates span the conformational manifold of the protein and interconvert on the submillisecond time scale. A unique set of relative populations of these microstates, termed a macrostate, is obtained by varying solution conditions, e.g., adding different nucleotides or the cochaperone GroES. Strikingly, ATP titration studies demonstrate that the partition between the apo and ATP-ligated conformational macrostates traces a sigmoidal response with a Hill coefficient similar to that obtained in bulk experiments of ATP hydrolysis. These coinciding results from bulk measurements for an entire ring and single-molecule measurements for a single subunit provide new evidence for the concerted allosteric transition of all seven subunits.


Searching for Labeling Sites Using risFRET
Several factors are to be considered when searching for labeling positions for FRET studies: i) altering the amino acid residues to an orthogonal chemical group for the labeling reaction (in this work, to a cysteine residue) must not disrupt the structural integrity and function of the protein; ii) the labeling positions must be exposed enough to the solution to facilitate the chemical conjugation between the labeling site and functional anchoring group of the dye (in this work, maleimide); and iii) the labeling sites should be selected so that the distance between the donor and acceptor dyes (in this work, AlexaFluor 488 and AlexaFluor 594) is similar to their Förster distance, and changes significantly upon a conformational transition.
To search for labeling-site pairs on a single subunit in the GroEL complex that comply with these considerations, we performed simulations using rotational-isomer model representations of fluorescent labels and their linkers, using our program risFRET, as detailed below. The Any dye-linker conformations with steric clashes (i.e., with two atoms whose van der Walls radii overlap more than 0.04 nm 7 ) were discarded. After generating the dye positions for each site, the expected FRET efficiency value for each pair of residues was calculated as where is the inter-dye distance of each dye-linker realization pair, , , and , , are the Boltzmann statistical weights used in the analysis. A Förster distance, 0 , of 5 Δ is the magnitude of the FRET efficiency change between two GroEL PDB models. is a measure of the available space of the linker attached to a labeling site in a specific PDB structure, calculated as the number of non-clashing simulated linker realizations divided by the total number of simulated linkers (10,000 realizations in this analysis). is the evolutionary conservation score for each residue leableing site obtained from the ConSurf webserver, 8 which ranges from 1 to 9 from most variable to most conserved. Geometric means of individual volume and conservation scores are calculated to reduce the contribution of extreme values. The higher the final score, the more favorable the labeling site.
Sorting the labeling pairs according to their scores significantly eased the search for appropriate FRET pairs. We initially selected eight labeling-site pair candidates. Preliminary smFRET experiments, conducted as discussed below, demonstrated that the pair E255/D428 showed distinct FRET efficiency changes with respect to ATP and GroES, while the other pairs showed little or no change. We, therefore, adopted this pair for our studies.

S6
Technologies), assuming an extinction coefficient of 10,430 M -1 cm -1 for the GroEL monomer. 11 All samples were aliquoted and flash-frozen in liquid nitrogen and stored at -80 °C until further use.

Reassembly Procedure
To detect FRET signals from only one GroEL subunit in a ring, we prepared GroEL constructs with a single labeled monomer in each complex, using the reassembly procedure shown schematically in Figure S1. Our reassembly protocol was based on several published studies, [12][13][14][15][16][17][18][19] which utilize the ability of GroEL to disassemble into monomers in the presence of 2-4 M urea and to reassemble back into a functional complex upon removal of urea under reassembly conditions. The idea, first implemented in our lab with ClpB, [20][21][22] is to reassemble GroEL and 100-400 nM dye. Samples were then aliquoted, flash-frozen with liquid nitrogen, and stored at -80 °C until further use.

Steady-State Kinetic Assays
Steady-state ATPase activity measurements on GroEL constructs were performed using a coupled reaction assay 20,23 containing pyruvate kinase (PK) and lactate dehydrogenase (LDH), which hydrolyzes nicotinamide adenine dinucleotide (NADH). Samples were prepared with GroEL where is a baseline parameter, is the maximum reaction rate, 0.5 is the apparent dissociation constant, and is the Hill coefficient. Data analysis and fitting were performed with a Matlab script utilizing a weighted non-linear least-square minimization algorithm (trust-region) and constraining the parameters , , and 0.5 to positive values.

Native Mass Spectrometry
GroEL samples were treated three times using a Bio-spin 6 column loaded with a buffer containing 150 mM NH4OAc (238074, Sigma). The samples were then loaded on a gold-coated capillary and introduced into the Q-Exactive-UHMR mass spectrometer for intact mass measurements. The instrument was operated under the following conditions: the inlet capillary temperature was set to 250 °C at a voltage of 1.3 kV, vacuum settings werefore pump 1.61 mbar, HV 3.23e-9 mbar and UHV 2.35e-10 mbar. Inject flatapole offset and bent flatapole DC were set to 5V and 2V, respectively. No HCD voltage was applied. Measurements were performed at a resolution of 10,000. Raw spectra were converted to MassLynx compatible files by the software Databridge (Waters) and analyzed by the MassLynx program.

Fluorescence Anisotropy Measurements
To assess the freedom of motion of protein-attached fluorescent dyes, steady-state and timeresolved fluorescence anisotropy measurements were conducted on single-cysteine SR1 complexes labeled with either the donor or the acceptor. To this end, single-cysteine SR1 variants, 255C and 428C, labeled with Alexa 488 C5-maleiminde (donor) and Alexa 594 C5maleimide (acceptor), were processed according to the urea reassembly procedure to form heptamers with one labeled subunit per heptamer, as described in previous sections.  Table S1).

Time-resolved measurements:
The single-labeled SR1 samples were diluted in a clean buffer (G10K + 1 mM DTT) to ~ 1-2 nM concentrations, loaded into a flow cell (see "Single-molecule experiments" for details) and measured on the MicroTime200 microscope as described below, but with the following modifications. Molecules were excited with a polarized laser at either 495 nm or 594 nm and pulsed at a repetition rate of 20 MHz (50 ns) with a power of 20 μW. The emitted photons passed through a polarizing beam splitter cube (Ealing), which split the photons according to their polarization into parallel and perpendicular channels. The split photons passed through emission filters for the donor and acceptor labeled samples, respectively, and their arrival times relative to the laser pulse were recorded. The fluorescence anisotropy decay curve for each sample was calculated using the following equation: where ∥ ( ) and ⊥ ( ) are the photon arrival time relative to the laser pulse of the parallel and perpendicular components, respectively. The decay curves are shown in Figure S3. For smFRET experiments, all prepared samples were diluted to picomolar concentrations in the same prepared buffer and measured as described in the following sections.

Data Analysis
All data analysis procedures were conducted using home-built MATLAB scripts, which can be provided upon request.

Burst identification and filtration:
where is the leak factor, is the direct excitation factor, and is the mean stoichiometry of the measured bursts, as obtained from the PIE analysis described in previous section. Second, to calculate the state-to-state kinetic transition rates, , we use the approximation ≅ Δ , where is the transition probability obtained in the H 2 MM analysis. This approximation is based on the fact that the time step Δ fulfills the condition Δ ≪ 1 . Finally, the occupancy of each microstate at equilibrium is obtained from the eigenvector of the transition probability matrix that corresponds to an eigenvalue of 1. 26

Histogram "recoloring":
To check if the model parameters obtained from the H 2 MM analysis successfully describe the photon data, we perform a photon "recoloring" procedure as introduced in ref. 27. In this procedure, the photons within each photon trajectory (burst) measured in the experiment are "stripped" of their color, i.e., the donor or acceptor channel assignment, and only the photon arrival times are kept. Then, a Markov chain state sequence is simulated on the photon arrival times using the parameters obtained in the H 2 MM analysis, effectively assigning new colors to each photon. FRET efficiency histograms of the recolored simulated photon trajectories are generated and compared to the real data histogram. A good overlap indicates that the converged H 2 MM parameters describe the data well.

Model selection:
In the framework of classical HMM analysis, several statistical criteria can be used to determine the minimal number of states to fit a data set, such as the Bayesian information criterion (BIC) 28 and the Akaike information criterion (AIC). 29 However, when using such measures on photon data analyzed in the H 2 MM analysis, these statistical scores do not converge to a definitive answer. 30,31 We therefore utilized empirical tests for model selection. Preliminary H 2 MM analysis was performed on datasets using models with 2 to 6 states with either full inter-state connectivity or using a chain architecture. We estimated the suitability of the tested models by comparing the root mean square difference (RMSD) between the FRET efficiency histogram values of the real data and of recolored realizations (Figure S8 A). These scores improved in general as the number of states increased, but models with 4 states and above resulted in the appearance of states with the same FRET efficiency value, a sign of redundancy of the model (Figure S8 B). Using a fully connected model resulted in low transition rates between nonneighboring states, pointing towards a chain architecture. These tests indicated that a four-state chain model is a suitable Markov model that best describes the collected data.

Global analysis:
In the case of global H 2 MM analysis, the FRET efficiency values of the microstates were shared between several analyzed datasets. This was achieved by calculating the re-estimated parameters for the emission probabilities by sharing the auxiliary γ-parameter S16 (35) across all datasets, while the other parameters, the prior vector and the transition probability matrix, were updated for each dataset individually. In the global H 2 MM analysis, we used 2-3 data sets independently measured for each condition (apo, ATP, ADP: 3 repeats, ATP+ES 2 repeats).
The parameters for transition rates and equilibrium populations were not shared between data sets, and we report mean and standard error values calculated from these. In order to estimate the errors in the FRET efficiency values, which were globally shared between data sets, we performed additional analysis runs on two groups, each containing apo, ADP, ATP and ATP+ES datasets. and extract the mean dwell-time values, which can be compared to those calculated directly from the H 2 MM parameters (see Figure S9 and Table S2).

Burst Segmentation
One way of validating that the H 2 MM model correctly traces the dynamics in photon bursts was to perform a photon trajectory segmentation analysis using the HMM Viterbi algorithm. 32 This algorithm generates the most likely state trajectory for each photon trajectory, based on the model parameters obtained from the analysis. Based on the microstate assignments generated by the Viterbi algorithm, segmented FRET efficiency histograms of photon trajectories with no state-tostate transitions were generated and plotted ( Figure S10). The segmented histograms of each state were normalized to match the raw data histogram.

Burst-Wise Fluorescence Correlation Analysis
We performed a burst-wise photon correlation analysis to identify the presence of fast GroEL subunit conformational dynamics on photon dataset of SR1 under the four measured conditions.
To obtain information on longer timescales, we performed the burst identification procedure as described above (see Burst identification and filtration section) by setting a Δt threshold of 50 μs but without demanding a Δt threshold of 10 μs. A correlation analysis was performed with a homebuilt code utilizing the xcorr MATLAB function, based on ideas presented in ref. 34. For a given smFRET dataset containing bursts of molecules labeled with both donor and acceptor dyes, the correlation functions between photons of channels and for a time shift , ( ), were calculated with the following equation: The subscript denotes a burst in the dataset and is the burst length in arbitrary bin time units The ratios between the correlation functions were smoothed using a logarithmic mean moving window (each window lag time is increased by a factor of 2 compared to the previous window) and are shown in Figure S11.

Burst-wise Likelihood Analysis to Assign the Macrostates of Bursts
We performed smFRET experiments with increasing ATP concentrations in a buffer containing an ATP regeneration system (see previous Sample preparation section). For each ATP concentration, 2-3 measurements were conducted. In the burst-wise likelihood analysis, two likelihood values were calculated for each recorded photon burst, based on the parameters of the HMM models corresponding to the apo and ATP macrostates. This calculation allowed us to quantify to what extent a detected burst resembled each of these macrostates. The two likelihood values were calculated using the H 2 MM-modified forward algorithm 25 and were normalized to their sums, yielding a final score with a value between 0 and 1. The mean likelihood scores based on all bursts in each dataset were then calculated. The likelihood score for the ATP macrostate as a function of ATP concentration is plotted in Figure 3 and fitted to the Hill function (eq. 3).

Labeling Controls on the Native Cysteine Residues of GroEL
To validate that the native cysteines of GroEL have a low labeling rate, as previously reported, [35][36][37] we conducted control smFRET experiments on three SR1 constructs. SR1 WT and the singlecysteine SR1 variants, 255C and 428C, were labeled with Alexa 488 C5-maleiminde (donor) and Alexa 594 C5-maleimide (acceptor) as described in the previous sections. The The purified SR1 samples contain, in addition to assembled heptamers (~400 kDa), traces of monomers (~57 kDa) and octamers (~460 kDa). In the reassembly procedure, SR1 monomers assemble into heptamers, and unwanted specimens are removed. F) Labeling controls on GroEL native cysteine residues. Normalized corrected stoichiometry histograms of labeled SR1 variants, which were labeled under the same conditions and subject to the reassembly procedure. Bursts with stoichiometry between 0.18-0.9 are assigned to SR1 complexes labeled with both a donor and an acceptor dye. The labeled WT variant (blue) shows a population (~50%) of double-labeled molecules due to non-specific native cysteine labeling. However, when a single mutant cysteine, either 255C (red) or 428C (yellow), is introduced to SR1, the doublelabeled population decreases to ~27% and 12%, respectively. This result indicates that the inserted cysteine residues are more reactive than the native cysteines, thereby increasing the population of single-labeled molecules. For comparison, the histogram of the double-labeled variant 255C/428C under apo conditions is shown (purple). Thus, we can conclude that labeling the double-cysteine variant under the same conditions as above yields a negligible fraction of SR1 molecules with labeled native cysteine and that the collected FRET signals in our experiments originate from GroEL subunits labeled at positions 255C/428C.  The FRET efficiency histograms are corrected for background, leaking, and direct acceptor excitation. All nucleotides and GroES (oligomer 100-500 nM) are in excess compared to GroEL (non-labeled + labeled oligomers 0.4-5 nM). The similar FRET efficiency manifolds suggest that the SR1 subunit adopts similar conformations as the double-ring subunit. Variations in the apo, ATP, and ADP conditions may originate because, during the smFRET diffusion experiment of the double-ring variant, the observed labeled subunit can be in the cis or the trans ring, giving rise to two types of conformational distributions in one experiment that cannot be resolved. This effect is best observed in the ATP-ES histogram (D), where FRET efficiency distributions of both low (ES-bound) and higher (nucleotide-bound) appear simultaneously. Conversely, the SR1 variant, having only one ring, adopts mainly a low FRET efficiency GroES-bound state. E) Comparison between FRET efficiency histograms of the double-ring complex in 1mM ATP+ES (filled histogram) and the symmetric GroEL football complex (solid line histogram). The football complex has two GroES oligomers bound to both GroEL rings, leading to an increase of the low FRET efficiency population in comparison to the asymmetric bullet complex.   Table S2, match the values from the H 2 MM parameters. Figure S10: Raw FRET efficiency histograms of all GroEL datasets (gray) with individual microstate histograms based on Viterbi assignment, specified with the same color code used elsewhere in this work. The shown histograms were generated from photon trajectories that have no state-to-state transitions. The separation between the FRET efficiency distributions of the four microstates validates the ability of H 2 MM to correctly identify the microstates in the dataset.

S30
A B Figure S11: Burst-wise fluorescence correlation analysis demonstrates the existence of submillisecond conformational dynamics of the GroEL subunit under all measured conditions (see Burst-Wise Correlation Analysis in Methods). A) Shown are the donor-donor autocorrelation functions (blue) and donor-acceptor crosscorrelation functions (red) versus time shift as obtained from the burst-wise correlation analysis. B) Shown are the correlation ratios between the donor-acceptor cross-correlation function and the donor-donor autocorrelation function versus time shift. This representation removes contributions due to diffusion. The correlation ratio plots show a bi-phasic increase: in the range of ~1-10 μs, presumably due to triplet-state dynamics and in the range of ~500-1000 μs due to conformational dynamics of the GroEL subunit. Note the late increase in the case of ATP-ES, where the subunit is indeed expected to show slower conformational dynamics.   Figure S9). The values are in milliseconds and presented as means with standard errors from 2-3 repeats.