Aggregation Behavior of Structurally Similar Therapeutic Peptides Investigated by 1H NMR and All-Atom Molecular Dynamics Simulations

Understanding of peptide aggregation propensity is an important aspect in pharmaceutical development of peptide drugs. In this work, methodologies based on all-atom molecular dynamics (AA-MD) simulations and 1H NMR (in neat H2O) were evaluated as tools for identification and investigation of peptide aggregation. A series of structurally similar, pharmaceutically relevant peptides with known differences in aggregation behavior (D-Phe6-GnRH, ozarelix, cetrorelix, and degarelix) were investigated. The 1H NMR methodology was used to systematically investigate variations in aggregation with peptide concentration and time. Results show that 1H NMR can be used to detect the presence of coexisting classes of aggregates and the inclusion or exclusion of counterions in peptide aggregates. Interestingly, results suggest that the acetate counterions are included in aggregates of ozarelix and cetrorelix but not in aggregates of degarelix. The peptides investigated in AA-MD simulations (D-Phe6-GnRH, ozarelix, and cetrorelix) showed the same rank order of aggregation propensity as in the NMR experiments. The AA-MD simulations also provided molecular-level insights into aggregation dynamics, aggregation pathways, and the influence of different structural elements on peptide aggregation propensity and intermolecular interactions within the aggregates. Taken together, the findings from this study illustrate that 1H NMR and AA-MD simulations can be useful, complementary tools in early evaluation of aggregation propensity and formulation development for peptide drugs.

. Photographs of selected samples. To the left of each sample is a pure water sample. All phots show some degree of light scattering. The top row shows cetrorelix samples, and the bottom row ozarelix samples. To the far left is the lowest concentration that shows scattering, at the earliest time point. For cetrorelix, the selected samples are: 1 mM at 2 h, 1 mM at 48 h, 2 mM initial, 2 mM 48 h, 5 mM initial, 5 mM 48 h. For ozarelix, the selected samples are: 2 mM at 2 h, 2 mM at 48 h, 5 mM initial, 5 mM 48 h, 10 mM initial, 10 mM 48 h.

pH and charge
The samples of D-Phe 6 Table S1 and the plots of net charge vs. pH in Fig. S2, expected to carry an (average) positive charge of +1 or somewhat higher. The actual pKa values of the peptides, and thereby the net charges at sample pH values, can be influenced by self-assembly and variations in the solution conditions (e.g., differences in the ionic strength). Table S1. pK a values of the ionizable amino acid sidechains in the investigated peptides. 3 Figure S2. Plots of charge vs. pH for the investigated peptides, based on the pK a values presented in Table S1 3 1

Comments on peak broadening in NMR spectra
Peak broadening and loss of signal can be attributed to formation of larger aggregates. The linewidth of an NMR signal depends on the spin-spin relaxation time, T2, which, in turn, depends on the rotational correlation time, effectively the tumbling rate, of the molecule or molecular segment giving rise to an NMR signal. Generally, slower tumbling gives shorter T2, which in turn corresponds to broader signals. 2 However, depending on the character of the selfassembled structure formed, the T2 of a signal may be controlled largely by local mobility rather than by the tumbling rate of the aggregate as a whole. Thus, there is no general, simple relationship between aggregate size and the extent of line broadening.
The effective disappearance of signal, i.e., the reduction in absolute integral relative to the nominal concentration, that is observed with increasing concentration can be attributed to very short T2. This can cause reduction in signal intensity by broadening so significant that the signal practically merges with the baseline, by actual loss of signal during delays in the NMR pulse sequence, or a combination. The excitation sculpting pulse sequence applied in this work for solvent suppression includes significant delay times, which will effectively function as a T2 relaxation filter. Signal from peptide residing in larger aggregates, with short T2, will be attenuated to a larger extent than signal from peptide present as monomers or residing in small oligomers or aggregates, which has longer T2. Thus, features associated with monomers or small aggregates will have a higher weight in the observed NMR spectra.

1 H NMR spectra
Near-complete spectra for all peptides are shown in Fig. S3. The range 4.5-6.5 ppm was excluded to allow better magnification of the more crowded aromatic region (6.5-8.5 ppm).
The excluded region contained the suppressed water signal, two singlets for ozarelix and cetrorelix (at 5.4 and 6.0 ppm), and one singlet for degarelix (at 5.8 ppm). Degarelix and D-Phe 6 -GnRH also showed a singlet at 9.9 and 10.0 ppm, respectively. All these peaks followed the same general development as the peaks shown in Fig. S3. There was a variability in the efficiency of water signal suppression, especially in samples with extensive aggregation and low peptide signal, resulting in relatively high intensities at 4-4.5 ppm in the spectra of 5 mM ozarelix and 10 mM cetrorelix. The excitation sculpting program might influence the intensity of signals nearby the suppressed water signal, which is one of the reasons why the leucine (or nor-leucine) signal at 0.7 ppm was selected for comparison of integral values. All spectra in Fig. S3 were individually scaled to similar intensity to show changes in line shape.
In addition to the substantial changes with concentration that are discussed in the main text a few minor variations in spectral appearance were observed among the spectra in the range above 6.5 ppm, most clearly for D-Phe 6 -GnRH and degarelix at 8.2-8.5 ppm. This range show signals arising from amide protons and aromatic side chains, which can be very sensitive to variations in the local chemical environment, and these differences could be attributed to variations in the general solution conditions among samples (ionic strength, pH) just as well as to differences in direct inter-peptide interactions.
Samples of ozarelix showed an unexpected sharp singlet at 3.6 ppm at all concentrations. The origin of this signal is not fully understood but is believed to be caused by a trace contamination of 1,4 dioxan introduced in the freeze-drying process during isolation of the peptide.

Comments on the acetyl and acetate peaks
The acetyl peak, which is expected for ozarelix, cetrorelix and degarelix (but not D-Phe 6 -GnRH) is most clearly seen in Fig. 3 (main text). The acetyl peak is expected to be close to the acetate counterion peak. For cetrorelix, there is a complete overlap (the two singlets are visible in DMSO-d6) at all concentrations where these peaks are distinguishable, whereas for ozarelix, the acetate counterion peak seems to move from the right to the left side of the acetyl peak, indicating a change in pH. Furthermore, the acetate peak seems to increase in intensity from 5 to 10 mM ozarelix, but when accounting for the concentration increase, the signals at 5 and 10 mM are most likely the result of a similar fraction of acetate in solution, while the peptide peaks disappear to a larger extent at 10 mM. Rapid (4 scan) experiments were performed at about 15 and 25 minutes after sample preparation, as well as after 24 hours and after a week, for all peptides at 1-10 mM concentration (spectra not included). In these experiments, the 5-and 10 mM samples of ozarelix showed substantial loss in signal intensity, without notable changes in the general spectral appearance. For 5 mM, the spectrum recorded 10 minutes after the initial spectra was about halfway to the intensity seen at the next measurement at 24 h (Fig. S4), whereas for the 10 mM sample, the spectra after 10 min and 24 h showed similar intensities These were the only samples for which a rapid, substantial reduction in signal intensity shortly after preparation was captured. The reduction in signal can likely be attributed to an initial larger fraction of relatively small aggregates, which rapidly grow into larger structures. 8 9 Figure S4. 1 H NMR spectra of D-Phe 6 -GnRH, ozarelix, cetrorelix and degarelix at 1, 5 and 10 mM, at three different times after sample preparation: initial (blue), 24 hours (yellow) and after a week (red). The same arbitrary intensity scale is used in all spectra. In all spectra line broadening is set to 1.2.

Experimental details
The NMR diffusion experiments were performed at 25ºC on a Bruker AVII-200 spectrometer equipped with a Bruker DIFF-25 gradient probe and a Bruker GREAT 1/40 gradient amplifier. The temperature control was calibrated using a thermocouple immersed in an NMR tube to measure the actual temperature at the position of the sample. Samples were inserted into the probe at least 15 minutes prior to the experiments.
Self-diffusion coefficients were determined using the pulsed gradient stimulated echo (PFG-STE) method. 3 The PFG-STE sequence is preferable (compared to the simpler spin-echo experiment) in investigations of large molecules or molecules residing in large aggregates, which results in a short T2. For a solute with an isotropic self-diffusion coefficient D, the intensity of the stimulated echo can be described by: (eq. S1) where I is the intensity of the stimulated echo, I 0 is the intensity of the stimulated echo in the absence of pulsed field gradients,  the gyromagnetic ratio of protons, G and  the strength and duration of the gradient pulses, and Δ the diffusion time.
Experiments were run with the parameters described in Table S2 with a linear ramping of G.
G was calibrated in a measurement on a trace amount of HDO in D 2 O, which has a D of 1.90210 -9 m 2 /s at 25°C. 4 For each experiment, the receiver gain was optimized using the rga (receiver gain adjustment) function.  For a single D, which will be the outcome either if only monomers are present or if there is fast exchange between different sites (where Dobs is described by equation S3), the echo decay described by equations S1 and S2 will be single exponential. For multiple distinct Di, which may result in a case of multiple sites and slow exchange, the echo decay will be multiexponential.
The D of a molecule or an aggregate in solution can be related to the hydrodynamic radius, RH, via the Stokes-Einstein equation: (eq. S4) where kB is the Boltzmann constant, T is the absolute temperature, and η the dynamic viscosity of the bulk medium.

NMR diffusion results
Samples of 3 and 10 mM of the four peptides in D2O were analysed. Figs      As was mentioned in the main text, the presence of a minor fraction of small transient oligomers cannot be excluded based on the NMR diffusometry results. When differences in D for monomers and aggregates are small, which is likely for loosely structured oligomers, a substantial fraction of the peptide needs to reside in the oligomers for the latter to be detected.

Comments on the NMR diffusion results for degarelix
In the diffusometry experiments on the degarelix samples, signal from the peptide itself was only detected in the range between 1.2 and 1.6 ppm, which is consistent with signals in this range being associated with a higher rate of reorientation (longer T2). The fact that the echo decay of the signals was multiexponential means that it results from aggregates of different sizes with slow exchange. A rough evaluation of the complex echo decay suggest that it corresponds to aggregates with RH in the range between 1 and 10 nm, which means that these may range from single molecules and/or small oligomer to aggregates containing at least tens of molecules.

Temperature coupling
In all MD-simulations there are tendencies for the system energy to change by a systematic drift over time. This is caused by force truncation and integration errors, with these issues being more pronounced in single compared to double precision floating point arithmetic as well as larger time steps. To alleviate this, temperature can be kept constant by applying an algorithm that couples atoms in the system to an external heat bath of a given temperature which scales atomic velocities up or down dependent on system state.

Pressure coupling
Pressure is maintained in molecular dynamics simulations in a similar way to temperature, by coupling the system to a pressure bath, with atomic coordinates and box dimensions scaled to adjust the pressure of the system. This can be done in a few different ways, either by scaling box dimensions evenly in all directions (isotropic pressure scaling) or by allowing box vectors to vary individually (either semi-isotropic or anisotropic pressure coupling). Simulations involving lipid bilayers often use semi-isotropic pressure coupling, to differentiate pressures in the plane of the lipid bilayer (x-y) from that normal (z) to the bilayer.
6 Simulation analysis

Transition networks
The transition network was calculated using all pairwise transitions between the aggregate states using the approach described in, 9 with the exception that the different aggregate states were defined based on only the number of peptides in an assembly. After identifying all the aggregate states and the number of transitions between the states, the corresponding transition matrix was developed. For each peptide, data from all three simulations were used to build the matrix. The transition network plots were then created using the software Cytoscape3.6. 10

Binding and unbinding
Peptide unbinding and binding events were calculated by tracking the state of each peptideaggregated or free -at each time step (employing again the distance cut-off of 0.5 nm to separate aggregated vs free peptides). Unbinding events were defined as a change in peptide state between aggregated to free in two consecutive time steps. Similarly, for a binding event the peptide state was changed from free to aggregated. Peptide-peptide interactions events were then characterized and quantified further by calculating the collision acceptance probability, CAP, using the following equation: 11,12 where nbind is the number of bound peptides and nunbind is the number of unbound peptides in the simulation.

Contact analysis
Hydrogen bond analysis was performed using the gmx hbond utility in Gromacs. Snapshot images were produced using VMD. 13 To plot the peptide residue-residue contact map, two residues were considered to be in contact when the distance between any pair of atoms from the respective residues was under 0.5 nm. 12 Hydrophobic solvent accessible surface areas (hSASA) were computed using the Gromacs gmx sasa module. Atoms with a partial charge in the range from -0.2 to 0.2 were defined as hydrophobic and all other atoms were defined as hydrophilic. 14 7 Simulation results   The exposure of the peptide amino acids to the solvent was characterized by assessing the hydrophobic solvent accessible surface area hSASA (Table S4).