Self-Diffusive Properties of the Intrinsically Disordered Protein Histatin 5 and the Impact of Crowding Thereon: A Combined Neutron Spectroscopy and Molecular Dynamics Simulation Study

Intrinsically disordered proteins (IDPs) are proteins that, in comparison with globular/structured proteins, lack a distinct tertiary structure. Here, we use the model IDP, Histatin 5, for studying its dynamical properties under self-crowding conditions with quasi-elastic neutron scattering in combination with full atomistic molecular dynamics (MD) simulations. The aim is to determine the effects of crowding on the center-of-mass diffusion as well as the internal diffusive behavior. The diffusion was found to decrease significantly, which we hypothesize can be attributed to some degree of aggregation at higher protein concentrations, (≥100 mg/mL), as indicated by recent small-angle X-ray scattering studies. Temperature effects are also considered and found to, largely, follow Stokes–Einstein behavior. Simple geometric considerations fail to accurately predict the rates of diffusion, while simulations show semiquantitative agreement with experiments, dependent on assumptions of the ratio between translational and rotational diffusion. A scaling law that previously was found to successfully describe the behavior of globular proteins was found to be inadequate for the IDP, Histatin 5. Analysis of the MD simulations show that the width of the distribution with respect to diffusion is not a simplistic mirroring of the distribution of radius of gyration, hence, displaying the particular features of IDPs that need to be accounted for.


QENS-spectra examples
Figure S1: All the QENS-spectra at different q-values for Hst5 at 50 mg/ml protein concentration, 150 mM NaCl concentration and 298 K temperature.  Figure S2: QENS-spectra at different protein concentrations at 280 K temperature, 150 mM NaCl concentration, q-value of 0.29 Å −1 , including Paalman-Pings corrections. From top left to bottom right right: 50 mg/ml, 100 mg/ml, 150 mg/ml, and 200 mg/ml protein concentration, respectively. Fitting is done with the "jump-diffusion" model, with one lorentzian for the solvent data and two lorentzians for the protein data ('Lor1' and 'Lor2'). Figure S3: The elastic incoherent structure factor (EISF/A 0 ) for different protein concentrations at a temperature of 298 K and salt concentration of 150 mM NaCl. EISF obtained through fitting of the "per-q" model. Paalman-Pings corrections applied.

Fitting considerations
First it was evaluated as to whether Paalman-Pings corrections and/or a restricted q-range could improve fitting. In this case, a jump-diffusion model was used for internal diffusivity, with a dependence on momentum transfer, but it is assumed that results would be transferable to other models. Looking at goodness-of-fit ( Figure S1), there is evidence that Paalman-Pings corrections have a positive impact on goodness-of-fit, with some exception of the low-concentration samples. However, this may be due to an increase in error, which S5 would cause the goodness-of-fit to decrease, which, depending on the difference in fitted and experimental values, may cause a bad fit to be categorized as good. Therefore, L1/L2 loss functions are also considered, see Figures S2 and S3.
It is seen that for samples with lower protein concentration and that have been dialyzed, L1 and L2 loss functions indicate Paalman-Pings corrections to indeed improve fitting. For these samples, there is a very small, negative impact of restricting the q-range.
However, for higher protein concentration and non-dialyzed samples, L1/L2 metrics indicate fits to be worsened by Paalman-Pings corrections. The non-dialyzed samples will be considered separately, therefore the continued discussion concern fittings with Paalman-Pings corrections and a full q-range. For illustration of the impact of the fits, computed apparent diffusion using Paalman-Pings corrected data and empty-can subtracted data is 0.674 Å 2 /ns and 0.518 Å 2 /ns, respectively, for the sample with the largest difference in fitting metrics (#25). Two models were compared, one with the Singwi-Sjölander jumpdiffusion model for internal diffusivity which fits imposes a momentum transfer dependence (across all q simultaneously) denoted "Jump-diffusion" and a Fickian model γ = Dq 2 fitting a Lorentzian for each q individually, denoted "per-q". As can be seen from Figures   S4 and S5, resulting fits are very much comparable, with a minor advantage for per-q fitting. However, this overview excludes the fact that for some individual q-values, the per-q fitting failed. Given the high similarity between the models in terms of adequate fitting, the failure of the per-q model in a few cases (in total five cases across all samples) determines the choice of mainly analysing results with the jump-diffusion model, which give more stable fits in terms of center-of-mass diffusion. Extracting a diffusion constant from the per-q model is as well not always tractable; using the γ = Dq 2 relation will sometimes yield poor correlation coefficient. Figure S4: Goodness-of-fit for all samples measured.

Samples with varying salt content
The proteins used in this study are synthesised, and the manufacturer of the samples has communicated that there is non-protein content in the samples, mainly TFA (CF 3 COON a) and other monovalent salts (e.g. NaCl, KCl), which on average is 35 % of the content. Two where X is the mole fraction of cations, E is a free energy and V is a volume (see reference for details). η 1 is taken as the viscosity of D2O at the corresponding temperature, as computed from the relation by Cho et al. 2

Assuming all salt is NaCl
Parameters E, V for NaCl for different temperatures are found in Goldsack and Franchetto. 3  Figure S9: Comparison of effective radius of hydration computed from the effective diffusion, comparing different sample preparation methods, which gives different amounts of salt content in the sample. S13

MDANSE calculation
The software MDANSE 4 was used to calculate the EISF from the trajectories. According to the internal documentation of the program, this computation is achieved by a using a grid of equidistantly spaced points along the q-axis, according to Eq. 2: where n I is the number of atoms of specie I, ω I is the weight for specie I, N q is a S15 user-defined number of shells and q m = q min + m * ∆q. EISF I (q m ) is defined in Eq. 3 where the overbar-q denotes that it is an average over the q-values having the same modulus q m and R α is the position of particle α.

Single-chain simulation: C36IDPS
Intermediary numbers found in the calculation of the effective diffusion from the molecular dynamics simulation performed by Jephthah et al. 6 Note that information about convergence of these trajectories is found in the original article. Translational diffusion after fix of finite-size effects: 73.857 Å 2 / ns.
After adjusting for the fact that simulation was done in water, while experiment was performed in deuterium, correcting for viscosity difference between the two: 59.767 Å 2 / ns, standard deviation 0.88595 Using Hydropro As input to HYDROPRO, the viscosity and density of D2O is given, rather than using the values from simulation which used H2O (when not using HYDROPRO, S17 this is adjusted for in post-processing). Snapshots from simulation were extracted every 100 ps. The following computed numbers (averaged across all replicates) were obtained:

Single-chain simulation: C36m
Intermediary numbers found in the calculation of the effective diffusion from the molecular dynamics simulation performed by Jephthah et al. 6 Note that information about convergence of these trajectories is found in the original article. Using Hydropro As input to HYDROPRO, the viscosity and density of D2O is given, rather than using the values from simulation which used H2O (when not using HYDROPRO, S18 this is adjusted for in post-processing). Snapshots from simulation were extracted every 100 ps. The following computed numbers (averaged across all replicates) were obtained:

Convergence information
Figure S11: Evolution of the radius of gyration in the single-chain simulation using the A99SB-disp force field.   Translational diffusion after fix of finite-size effects: 19.805 Å 2 / ns.

Computed diffusion
After adjusting for the fact that simulation was done in water, while experiment was performed in deuterium, correcting for viscosity difference between the two: 16.027 Å 2 / ns, standard deviation 0.15073 Using Hydropro As input to HYDROPRO, the viscosity and density of D2O is given, rather than using the values from simulation which used H2O (when not using HYDROPRO, this is adjusted for in post-processing). Snapshots from simulation were extracted every 100 ps. The following computed numbers (averaged across all replicates) were obtained: Translational diffusion: 13.37, standard deviation 0.83 Å 2 / ns Radius of gyration: 13.8 Å (Note that HYDROPRO assumes an atomic hydration layer of 1.1 Å in this computation).
No finite-size adjustment for diffusion is done in this case, as HYDROPRO is parametrized to directly compute properties from crystal structures.

Convergence information
Figure S14: Evolution of the radius of gyration for the chains in the 10 mg/ml molecular dynamics simulation. Each replicate contains two chains. Running averages used for visual clarity, using a window of 1000, and with thinned-out data, using every 5th data point.      Avg. total # proteins in clusters (6 Å) 5.9 5.5 5.5 5.7 6.6 5.8 Avg. total # proteins in clusters (7 Å) 6.5 6.0 6.2 6.3 6.9 6.4 Numbers for the computation of diffusion parameters Table S10: Translational diffusion for each chain in the 50 mg/ml protein concentration simulation. Computed using mean-square displacement and the Einstein relation. In this table, data is NOT corrected for finite-size effects or the water/deuterium difference. The number after the ± sign refers to the error of the linear fit to meansquare displacement.  Metric R 2 Avg. largest cluster (6 Å) 0.42 Avg. largest cluster (7 Å) 0.20 Avg. # clusters (6 Å) 0.45 Avg. # clusters (7 Å) 0.33 Avg. total # proteins in clusters (6 Å) 0.73 Avg. total # proteins in clusters (7 Å) 0.79 Ramachandran plot over all replicates, 50 mg/ml, A99SB-disp Figure S23: Ramachandran plots, using all replicates for each simulation case. Left: Single-chain simulation (A99SB-disp force field), Middle: 10 mg/ml protein concentration simulation, Right: 50 mg/ml protein concentration simulation.