Quantitative Interpretation of Protein Diffusion Coefficients in Mixed Protiated–Deuteriated Aqueous Solvents

Diffusion-ordered nuclear magnetic resonance (NMR) spectroscopy is widely used for the analysis of mixtures, dispersing the signals of different species in a two-dimensional spectrum according to their diffusion coefficients. However, interpretation of these diffusion coefficients is typically purely qualitative, for example, to deduce which species are bigger or smaller. In studies of proteins in solution, important questions concern the molecular weight of the proteins, the presence or absence of aggregation, and the degree of folding. The Stokes–Einstein Gierer–Wirtz estimation (SEGWE) method has been previously developed to simplify the complex relationship between diffusion coefficient and molecular mass, allowing the prediction of a species’ diffusion coefficient in a pure solvent based on its molecular weight. Here, we show that SEGWE can be extended to successfully predict both peptide and protein diffusion coefficients in mixed protiated–deuteriated water samples and, hence, distinguish effectively between globular and disordered proteins.


Equations for predicting viscosities of mixed solvents
Three mixing rules for viscosity are summarised below. Here, each has been combined with Andrade's equation = to create expressions for the viscosity of a mixed solvent on the basis of its composition and the relevant Arrhenius-like parameters used in Andrade's equation. η = viscosity (kg m −1 s −1 ) η1,2 = combined viscosity of the component 1 (η1) and component 2 (η2) (kg m −1 s −1 ). T= Temperature (K) a and b are Arrhenius-like parameters (kg m −1 s −1 and K, respectively) ρ = density (kg m −3 ), where ρ1 and ρ2 are densities of components 1 and 2, respectively. x = mole fraction, where x1 and x 2 are mole fractions of components 1 and 2, respectively.

SI.2 Experimental
All data was collected at the Department of Chemistry Instrumentation Facility (DCIF) at Massachusetts Institute of Technology. All DOSY measurements were carried out on a 600 MHz Bruker AVANCE NEO spectrometer, using a 5mm helium-cooled QCI-F cryoprobe equipped with a z-gradient coil producing a calibrated maximum gradient of 55.37 G cm −1 . The gradients were calibrated using the standards and method of Holz and Weingartner. Temperature calibration was done with both methanol-d4 and ethylene glycol.
All DOSY data was acquired using a stimulated echo NMR pulse sequence with bipolar pulsed field echoes and longitudinal eddy current delay, with additional excitation sculpting used to suppress the solvent signals. Data were acquired using 16 gradients, incremented in equal steps of gradient squared. These arrays ranged from 5% to 95% of the maximum for aprotinin, ubiquitin, myoglobin and BSA at all temperatures, and lysozyme at both 298.15 K and 310.15 K. For lysozyme data sets at 278.15 K, 283.15 K and 288.15 K, the gradients ranged from 2 % to 98 % of the maximum. All diffusion-encoding gradients used smoothed square shaped pulses, with a gradient shape factor of 0.9. Experiment timing parameters, Δ and δ, are summarised in Table S3.
Data was processed using GNAT, using a Lorentzian line broadening of 10 Hz. Peaks between 0.5-1.5 ppm (alkyl region) and between 6.5-7.5 ppm (aromatic region) were used to obtain the diffusion coefficients and the largest error from the region was reported. The error in width of peak is calculated based on the fit of the data to the Stejskal-Tanner Equation. The diffusion coefficients of the alkyl region and aromatic region do not differ significantly. This can be seen in the DOSY spectra.          Figure S76 shows the positively charged proteins and IDPs in red and negatively charged proteins and IDPs in blue. The net charge of proteins and IDP are summarised in Tables S9 and S10 respectively. The net charge of these proteins and IDP raged from −24 to +7. Overall, the net charge of the protein or IDP does not appear to affect the diffusion coefficients of the proteins and IDP.

Figure S76 Experimentally-acquired diffusion coefficients of globular proteins and intrinsically disordered proteins plotted against diffusion coefficients predicted using the extended SEGWE equation at 287 K in H2O.
Positively charged proteins and IDP are shown in red and negatively charged proteins and IDP are shown in blue.

SI.7 Software
To simplify the D/MW estimation calculation, minimize the possibility for errors and make the methodology more accessible, the extended model has been implemented both as an Excel spreadsheet, available for free download from doi: http://dx.doi.org/10.17632/fn64x6vpn4.1. Figure S77 is an annotated screenshot of the Excel sheet. Estimations of both expected diffusion coefficient from molecular weight and molecular weight from experimental diffusion coefficient, at different D2O:H2O compositions, are possible.