Connecting Longitudinal and Transverse Relaxation Rates in Live-Cell NMR

In the cytosolic environment, protein crowding and Brownian motions result in numerous transient encounters. Each such encounter event increases the apparent size of the interacting molecules, leading to slower rotational tumbling. The extent of transient protein complexes formed in live cells can conveniently be quantified by an apparent viscosity, based on NMR-detected spin-relaxation measurements, that is, the longitudinal (T1) and transverse (T2) relaxation. From combined analysis of three different proteins and surface mutations thereof, we find that T2 implies significantly higher apparent viscosity than T1. At first sight, the effect on T1 and T2 seems thus nonunifiable, consistent with previous reports on other proteins. We show here that the T1 and T2 deviation is actually not a inconsistency but an expected feature of a system with fast exchange between free monomers and transient complexes. In this case, the deviation is basically reconciled by a model with fast exchange between the free-tumbling reporter protein and a transient complex with a uniform 143 kDa partner. The analysis is then taken one step further by accounting for the fact that the cytosolic content is by no means uniform but comprises a wide range of molecular sizes. Integrating over the complete size distribution of the cytosolic interaction ensemble enables us to predict both T1 and T2 from a single binding model. The result yields a bound population for each protein variant and provides a quantification of the transient interactions. We finally extend the approach to obtain a correction term for the shape of a database-derived mass distribution of the interactome in the mammalian cytosol, in good accord with the existing data of the cellular composition.

where bNH and cN are constants of dipole-dipole interactions (DD) and chemical shielding anisotropy (CSA) respectively and J(w) is the spectral density at frequency w (in the case above the Larmor frequencies of 1 H and 15 N nuclei). The spectral density is a quantity obtained by Fourier transforming the auto-correlated time function of a particle's motion defined by the overall motional correlation time tc. The correlation time dependence of DD and CSA relaxation contributions is described by an auto-correlated spherical harmonic of rank 2 (l=2) 1 and the corresponding spectral density function becomes (Eq. S3) The DD interaction strength constant bNH is defined by the following expression: bNH = - where µ0 is the magnetic permeability in vacuum, γH and γN are the gyromagnetic ratios of 1 H and 15 N, ħ is the Planck constant divided by 2π and rNH is the average bond length between an S3 amide nitrogen and its amide proton, which was set to 104 pm 1 . The CSA interaction strength constant cN is given by: with wN being the Larmor frequency of nitrogen and Δσ is the chemical shielding anisotropy, which here is set to -160 ppm 1 .
In addition, the transverse relaxation rate R2 is crucially sensitive to chemical exchange processes in the µs to ms time range 1-2 , and relaxation rate contributions -if present -are typically represented by an additive term Rex.
Provided that rotational correlation time (tr) dominates relaxation processes, a closed-form expression that estimates tr from R1 and R2 exists for large molecules (e.g. proteins) that rotate slowly and are thus within the NMR slow motion regime ((w0tc) 2 >> 1) 3 : with nN being the Larmor frequency of a the 15 N nucleus given in Hz. The rotational correlation time tr, was calculated from the R1 and R2 values of both in-vitro glycerol references as well as in-cell data. The relaxation rates were then plotted against tr and while the predicted rates fitted well with the observed ones in the case of the in-vitro glycerol references, the in-cell rates showed substantial deviations from the theory (Fig. 2).
To assign relaxation rates to distributions of differently sized proteins, we estimated tr from the molecular weight. For this, we first calculated the radius of gyration (rG) from the number of residues N in a protein 4 : If the number of residues was not known, the Mw was divided by 110 Da, the average molecular weight of an amino acid. To further convert rG to the hydrodynamic radius (rH) we use the following empirical correlation 4b, 5 : Based on the assumption that folded proteins behave roughly like hard spheres, we then used rH to gettc through the Stokes-Einstein-Debye relationship 6 where kB is the Boltzmann constant, T the temperature and η the viscosity of the solution.

SI 2. Conversion of Ri to apparent viscosity η app
To translate in-cell relaxation rates to apparent viscosity, both R1 and R2 was measured for each of the three types of protein (TTHA pwt , HAH1 pwt and SOD1 barrel ) in solutions made increasingly viscous through the addition of deuterated glycerol (Table S1). The viscosity of the solutions was estimated by calculating tr from the obtained relaxation rates and using the Stokes-Einstein-Debye relationship (Eq. S9), while keeping rH fixed for each protein. The relaxation rates were then plotted as a function of viscosity and the theoretically derived relaxation rate functions for each probe protein (Eq. S1-S2) overlaid (Fig. S4). Since relaxation parameters are so sensitive to the rotational tumbling rate and thus size, each of the three types of protein shows a slightly different dependence on viscosity (Fig. S4).
Using these reference curves, each in-cell relaxation parameter could be assigned a particular apparent viscosity (Fig. S4, Table 2) and in doing so, the retardation effects of cytosolic crowding on differently sized proteins were made comparable. Since protein size is not changing substantially by a point mutation, the same reference curves were used for the charge variants TTHA E32K , HAH1 K57E and SOD1 R100E .

SI 3. The binding models
Model S1. In the first interaction model we assume that the probe proteins are in fast exchange with an encounter complex with a molecular species of a particular averaged mass , is a function of the combined mass of probe protein and interaction partner (Mj + M av ), and ! "#$$ is a function solely of the probe protein's mass, given by Eq. S1 and S2 (SI1). The population weighted observed relaxation rate is then given by: This results in a set of six equations, one for each probe protein. Since their mass Mj is known to us, the optimisation problem has in total two fitting parameters: the population of bound protein pB, which is optimised locally for each probe protein and M av , which is S5 optimised globally for the entire array of probe proteins. With M av = 143 kDa, in-cell R1 and R2 can be closely reproduced with a rmsd of 0.26 (Fig. 3). The populations of bound species for each probe protein are listed in Table S2.
Model S2. The second binding model is conceptually similar to model S1, with the sole difference that M av is not optimised, but fixed to the average molecular mass of the UniProtderived size distribution for human cytosolic proteins. The average molecular mass 7<=<3<>? <@ (~73 kDa) is used as a proxy for a generic cytosolic binding partner, leaving an optimisation problem with only a single fitting parameter pB for each probe protein.
The optimised populations of bound species as well as the correspondingly calculated relaxation rates are summarized in Table S2 and result in a much poorer agreement with the in-cell measurements (rmsd = 0.54).
Model S3. To expand the binding models to a more realistic scenario, in which the probe proteins are encountering a variety of interaction partners, we make use of a distribution of sizes instead of a single average mass. Two different types of distributions can be used for approximating this range of sizes: the lognormal and the Γ-distribution (see model S4) 7 . Both describe the cytosolic protein molecular weight distribution fairly well, however the lognormal resulted in a somewhat better fit, and henceforth we use the lognormal distribution.
The fitted distributions ρdatabase(Mw) were then used to determine an expression for 2 34567 for each probe protein: And similar as in model S2, only one parameter -pB -is optimised for each probe protein: The obtained populations of bound protein and relaxation rates are summarized in Table S2.
The rmsd of 0.60 in the case of a fitted lognormal distribution is similar to model S2, however still far from the fairly accurate predictions of the much simpler model S1.
Model S4. In the fourth and last binding model, we use the same set-up of equations as in (Eq. S15) the shape parameter k and the scale parameter θ had to be optimized. With these expressions for ρ(Mw) and thus 2 34567 (Eq. S12-S13) the population of bound species pB was simultaneously and locally optimised for each probe protein. The results are summarised in Table S2. Figure 3 shows the calculated reduced relaxation rates derived from the optimised lognormal distribution. The Γ-distribution gave almost identical results and both have a rmsd = 0.26. In summary, model S4 is far superior than model S3 in predicting in-cell relaxation rates and just as good as the simpler model S1, however it provides more information and describes a physiologically more relevant scenario.

SI 4. Validation of interaction model with lysozyme.
In order to validate the interaction model, both R1 and R2 of TTHA pwt were measured in solutions with 50 mg/ml and 150 mg/ml lysozyme (Fig. S5). Despite lacking physiological relevance with regard to being a highly positively charged, secretory protein, we know from previous studies that it is forming charge-mediated transient complexes with net negatively Finally, we used the two pairs of relaxation data to determine the mass distribution of lysozyme, according to Eq. 3. At these concentrations, lysozyme occurs mainly as a monomer, and in accordance with this we find that a narrow distribution centered at 15.1 kDa fulfils the observed relaxation rates (Fig. S5).

SI 5. Surface net charge density and size distribution
To test the assumption, that the size composition of interaction partners is independent of net charge, we split the set of proteins, given in the database by Geiger et al. 9 , into three size categories: proteins smaller than 70 kDa, those larger than 70 kDa but smaller than 140 kDa and those larger than 140 kDa. For these three intervals the net charge density was calculated and plotted in a histogram (Fig. S7). A Lorentzian fitted on top revealed the mean charge densities for each category to be at -0.04, -0.08 and -0.11 e/nm 2 respectively. Although not identical, we conclude that the overall pattern with net negatively charged densities is valid for proteins along the extension of the size distribution.

SI 6. The maximum entropy approach to derive a minimum information distribution
Solving for r (Mw) in Eq. S12-S13 results in a multitude of solutions (Fig. S6). In all these solutions the calculated R1 and R2 values are identical for both distribution types (lognormal and G, model S4) even though the population of bound protein could vary in some cases up to 1 % (Table S2).
Due of the multitude of equivalent solutions, additional criteria to select possible representatives for the molecular weight distribution of the cytosolic proteome were introduced. One criterion is based on choosing the distribution that deviated the least from the data base derived fit to the cytosolic proteome. Another criterion is based on choosing the distribution whose information content is minimal by maximizing its information entropy S ( Fig. 4, Fig S9). The information entropy of any distribution r is defined as 10 : The resulting distributions are quite similar to the database-derived ones, however once again, they show a distinct tail in the high-mass region corresponding to e.g. larger intracellular molecular assemblies (Fig. S9).
SI 7. Validation of the self-consistency between in-cell R1 and R2 data.
Due to the short inter-scan delay of 1 s, a full recovery to equilibrium magnetisation is not achieved between scans of varying relaxation delay lengths. This may underlie the systematic deviations seen in the R1 intensity attenuation curves (Fig. S1). Consequently, we wanted to see if re-fitting the in-cell R1 data to only two of the three data points could still give reasonable results. Omitting the first data point at 10 ms relaxation delay led to overall prolonged longitudinal relaxation times and thus lower R1 values. However, re-running the calculations according to interaction model S4 with a lognormal distribution resulted in rmsd = 0.86 and r 2 = 0.27 (Fig. S10). This is an even worse match between prediction and measurement than that obtained from interaction models S2 and S3 with the original R1 data set (Fig. 4). This example illustrates well that not just any set of relaxation rates can be made compatible with Eq. S12 and S13 by adjusting distribution shape parameters and pB and S9 confirms that our in-cell R1 values are globally -over the set of six reporter proteinsconsistent with our in-cell R2 measurements.

SI Methods and Materials
Protein mutagenesis, expression and purification for subsequent data acquisition.

In-cell relaxation measurements
All NMR data was acquired using a Bruker Avance III 700 MHz spectrometer with a cryogenically cooled triple-resonance probe, at 37 °C using an 'interleaved' acquisition method. This form of data acquisition averages any time-dependent processes i.e. cell packing events that would temporarily change the protein concentration in the detection volume and therefore distort the relaxation decay curve 13 . Both R2 and R1 measurements were carried out using one-dimensional 15  %. In addition, there is evidence that most of the leakage is introduced during sample preparation of the supernatant 12 . The supernatant sample of TTHA E32K , for instance, sticks out as having considerably more leakage than the other samples (Fig. S2), yet the in-cell longitudinal relaxation rate obtained from that sample falls well in line with our expectations.
Due to technical issues with our equipment, this particular sample was prepared at 2 times the g-force normally applied, putatively inducing post-experimental leakage. This underlines that gentle handling during sample preparation is paramount for keeping the cells' structural integrity intact.

Determination of relaxation rates
The NMR raw data was processed and phase corrected with The integrals of the spectral portions were normalized and fitted to a single exponential decay. The relaxation rate errors were determined by using the standard deviation of the distribution of decay rates obtained through repeatedly fitting decay curves (40 000 -60 000 times) to data points that were randomly varied within their error region (Fig. S1). The error region of each data point, in turn, was defined by the signal-to-noise ratio of the corresponding spectrum. Naturally, the individual data points' errors increase with relaxation delay times as the signal becomes more attenuated.

S13
Curating and analysing the cytosolic proteome database To estimate the proteomic composition of the mammalian cytosol, we retrieved a list of 5217 human cytosolic proteins from the UniProt database 14 . The charge density of each protein was estimated by counting and assigning the acidic residues glutamate and aspartate with a negative charge and the basic residues lysine and arginine with a positive charge. Histidine residues were assumed to be neutral at physiological conditions, an assumption in line with net charge calculations performed with the PROPKA3 software 15 . The determined net charge for each protein was then normalized with its surface area, A, to obtain the charge density. The surface area was estimated from the radius of gyration, rg determined by Eq. S7 (SI1): Aprotein surface = 4π S % (Eq. S16) Furthermore, to get an impression of how much this size distribution changes (Fig. S6) S17 Figure S3. Correlation between leakage and deviation from single exponentiality. The magnitude of deviation from mono-exponentiality was quantified by the residual square sum (RSS) and the leakage was determined as described in material and methods, shown in Figure S2. We find no systematic correlation between leakage and RSS in this data set.   (Table S2).
The solutions had an rmsd = 0.26 when calculated reduced relaxation rates were compared with measured reduced in-cell relaxation rates (Fig. 3). Interestingly, the peak maxima are wandering slowly towards the average molecular weight of the interaction partner in model S1 (~143 kDa), while the distributions are becoming more and more symmetric around this peak maximum. This is well in agreement with the results of model S1, where a single, average molecular weight was able to unify the two in-cell relaxation parameters as long as the amount of bound protein was kept free to vary for each protein. Figure S9: Selected representatives for the cytosolic interactome. Lognormal (left) and Γ-distributions (right)

S23
were selected based on two criteria: (i) those that showed minimal deviation from the fit to the database-derived set (black) and (ii) those whose information entropy was maximal (dashed grey) (SI6). The fitted curves to the database-derived cytosolic protein sizes are shown in orange and blue for the respective type of distribution.  Fig. 4, Fig. S7), it is also not plausible to assume that abundant proteins between 100 and 200 kDa (Fig. S4) would not be part of the interactome. This shows reassuringly, that despite only using three different relaxation delay times, estimations for in-cell R1 and R2 are consistent for all 6 proteins and lead to reasonable solutions regarding interactome size-distributions.

S25
Supporting Tables   Figure 2B (see model S3) # errors estimated by re-optimising pB 10 000 times for normally distributed R1 and R2 using their respective errors as standard deviation. Upper and lower limits of pB errors correspond to the quantile at 16% and 84% of the obtained skewed distribution.