Importance of the Force Field Choice in Capturing Functionally Relevant Dynamics in the von Willebrand Factor

Whether recent updates and new releases of atomistic force fields can model the structural and dynamical properties of proteins containing both folded and partially disordered domains is still unclear. To address this fundamental question, we tested eight recently released force fields against our set of nuclear magnetic resonance (NMR) observables for a complex and medically relevant system, the major factor VIII binding region on the von Willebrand factor. This biomedically important region comprises both a folded and a partially structured domain. By using an enhanced sampling technique (temperature replica-exchange molecular dynamics simulations), we find that some force fields indeed rise to the challenge and capture the structural and dynamical features of the NMR ensemble and, therefore, are the appropriate choice for simulations of proteins with partially structured domains. What is more, we show that only such force fields can qualitatively capture the effects of a pathogenic mutation on the structural ensemble.

R ecent advancements in computer architectures 1 and the parallelization of computer codes have increased the usefulness of atomistic molecular dynamics (MD) simulations in modeling complex biological systems and enabled them to reach unprecedented time scales and system sizes. However, one of the main questions that keeps arising is whether the commonly used Lifson-type force fields can describe the structural and dynamical properties of macromolecules with sufficient accuracy. 2,3 In an extensive force field comparison study, Lindorff-Larsen et al. showed that many recent force fields performed reasonably well when it came to folded, globular proteins. However, their performance faltered with intrinsically disordered peptides (IDPs). 4 To tackle this issue, atomistic protein and water force fields have been subjected to multiple rounds of refinement.
In the AMBER family of force fields, the torsional parameters of the a99 force field were progressively modified to improve side-chain rotamer and backbone secondary structure preferences of folded proteins. 5−9 These efforts were followed by a recent rewrite of a99SB by , which tried to improve the modeling of IDPs while retaining an accurate description of folded protein properties. 10 The a99SB-disp force field came with its own four-point water modela modified version of TIP4P-D. 11 Another notable adaptation of a99SB is the residue-specific force field (RSFF2), 12 which was optimized through additional nonbonded potential functions to reproduce the conformational distributions of amino acids from a protein coil library. Its improved version, RSFF2+, 13 further stabilized α-helical hydrogen bonds and was combined with the TIP4P-D water model to address the previously observed overstabilization of IDPs.
The CHARMM family of force fields also went through several updates since C27 14−16 to improve its representation of protein folding and IDPs. 4 First, C36 was released to address incorrect equilibrium sampling of helical and extended conformations in folding simulations. 17 However, Rauscher et al. showed that C36 tends to produce a high population of left-handed helices, inconsistent with experimental data. 18 C36m was released to rectify this issue and further improve modeling of IDPs. 19 Despite the corrections, C36m still keeps some IDPs overly compact, unless the Lennard-Jones parameters of the water hydrogens are also adjusted. 10 To understand how these force fields perform when it comes to a more demanding target than a typical folded protein and whether there are fundamental qualitative differences in their predictions, we used our recently solved solution NMR structure of the TIL′E′ (D′) region of the von Willebrand factor (VWF). 20 Previously, we solved the NMR structure of TIL′E′ using experimentally derived distance and backbone dihedral restraints, as well as vector orientation restraints and backbone chemical shifts. The NMR structure in combination with dynamics data allowed us to characterize VWF in unprecedented details. VWF plays a prominent role in the arrest of bleeding, and its malfunction leads to the common human inherited bleeding disorder von Willebrand disease (VWD). 21 Type 2N VWD arises due to destabilization of the VWF complex with coagulation factor VIII (FVIII) mainly as a result of mutations in the TIL′E′ region that forms the major binding interface. 21 The region itself comprises two domains: the E′ domain that is predominantly composed of β-sheets and the trypsin-inhibitor-like (TIL′) domain that is folded but has almost no secondary structure ( Figure 1A,B). Eight disulfide bridges (3 in the E′ domain, 5 in the TIL′ domain) give additional stability to the region. Relaxation experiments have shown that the loop region of the TIL′ domain is dynamic. 20 The dual structural nature of VWF provides an excellent opportunity to test how well force fields handle a protein with a partially structured region and whether they can reproduce the experimental observables and probe the effects of pathogenic mutations associated with VWD ( Figure 1C). 20 We simulated the TIL′E′ region of VWF with eight different force fields using replica-exchange molecular dynamics (REMD), 22 an enhanced sampling technique where independent simulations (replicas) span a range of temperatures and regularly exchange conformations among themselves to increase the conformational sampling of the system. We used 30 replicas spanning a 298−345 K temperature range and compared the ensemble obtained at 300 K with the experimental observables. Each replica was sampled for 500 ns with a total of 15 μs of sampling per force field. For the modeling of our system, we settled on three force fields commonly used for folded proteins, a99SB*-ILDN, 7 C27 14−16 and a14SB, 9 as well as five recently released force fields that are expected to handle modeling of the partially structured region better, a99SBdisp, 10 C36m 19 and its slightly modified version for IDPs (C36m IDP), 19 RSFF2, 12 and RSFF2+ 13 (see the Methods section in the SI for details). We checked the convergence of the observables by performing a block analysis ( Figure S1, Table S1).
In general, we observe that the E′ domain ( Figure 1) is more rigid across all of the simulations with the different force fields. The average structures are also similar and reproduce fairly well the structure obtained from NMR. All of the force fields keep the β-sheets of the rigid E′ domain in place (with only small instabilities in the last β-sheet in the case of RSFF2, RSFF2+, and a14SB). The main differences in the structural ensembles are observed for the partially structured TIL′ domain. To better understand the differences, we compared the secondary structure propensity arising from the various force fields and the NMR ensemble ( Figure S2; see the Methods section in the SI for details). According to the experimental data, the TIL′ domain is structured but mostly devoid of specific secondary structural motifs, except for a short β-sheet motif occasionally identified between residues 772−775 and 807−810, a longer, stable β-sheet within residues 814−823, and a 3 10 -helical turn formed by residues 792−796. C36m, a99SB-disp, and RSFF2+ are the only force fields that were able to maintain the shorter TIL′ β-sheet motif similarly to the deposited NMR ensemble, while a99SB*-ILDN, C27, a14SB, and a99SB-disp seem to overstabilize the very short 3 10 -helical turn. Furthermore, almost all of the force fields (except for C36m) also explore a short α-helical turn formed by residues 776−782.
In the comparison of average contact maps (see the Methods section for details), the folded E′ domain predominantly shows a slight gain of contact, and overall, all force fields agree well with the NMR ensemble. With respect to the flexible TIL′ domain, the analyses indicate that all force fields show a loss of contact compared to the NMR ensemble ( Figure S3). RSFF2+ and a99SB-disp ensembles came the closest to the NMR ensemble ( Figure S1, Table S1), with a14SB and C36m following right behind. Considering the secondary structure comparison, it is not surprising that the regions with loss of contacts primarily comprise the shorter β-sheet motif, as well as the 3 10 -and α-helical turns described above.
The performance of the selected force fields was assessed further by comparing how well they agreed with the experimental NMR data across several categories: (1) nuclear Overhauser effect (NOE) data, (2) chemical shifts, and (3) backbone dihedral angles sampling. We first tested the force

The Journal of Physical Chemistry Letters
Letter fields based on how much their generated ensembles violated the experimentally determined NOE distance restraints (see the Methods section). a99SB-disp, both variants of C36m, and RSFF2+ show the best agreement with the experimental values (Table 1). We also looked at NOE violations calculated for each residue pair to see the agreement for each domain, with 42.5% of NOEs belonging to the TIL′ domain and 55.2% to the E′ domain ( Figure 2). The worst NOE violations typically occurred between the residues that form the shorter β-sheet motif and the 3 10 -helical turn in the TIL′ domain. Curiously, residues 776−782, forming the short α-helical turn observed in the majority of force field simulations but not in the NMR ensemble, are not involved in any major NOE violation. Force fields that showed small instabilities in maintaining the C-terminal β-sheet motif of the E′ domain also show minor NOE violations in the same region (RSFF2, RSFF2+, and a14SB).
We also compared the chemical shifts computed from the simulation trajectories (see the Methods section in the SI) with the experimentally obtained values (Tables 1 and S1). When we considered the chemical shifts for C α , C β , H α , H, and N atoms, then the force field whose shifts were the closest to the experimental ones was C27, followed closely by RSFF2+ and then a14SB and a99SB-disp. However, when we considered only C α , C β , and H α atoms (essentially comparing secondary structure), the best agreement again came for C27, followed closely by RSFF2+ and then a14SB, RSFF2, a99SB-disp, and a99SB*-ILDN. Unlike for NOEs, there were no major differences observed for the TIL′ and E′ domains ( Figure S4), although large disagreements were present for residues composing the short β-sheet motif in the TIL′ domain, reflecting the secondary structure observations. Lastly, we compared the force fields using the agreement of backbone dihedral angle sampling in the simulations with that predicted by TALOS-N from backbone chemical shifts 23 ( Table 1). The best-performing force field in this case was a99SB-disp, followed closely by C27 and then a14SB and RSFF2+. A closer look at individual energy scores per residue ( Figure 3) reveals an already familiar situation in which the substantial differences between simulated and predicted angles are generally contained in the TIL′ domain with several angles that seem to be problematic for most of the force fields. Namely, residues 778, 806, 818, and 838 show the largest disagreement. While residues 778 and 806 are not part of the shorter β-sheet motif, they are in its vicinity, which might explain their high score. On the other hand, residues 818 and 838 are located in turns (connecting two β-strands or β-sheets) whose backbone dihedral angles are typically more difficult to predict. 23 Ef fects of the pathogenic mutation on the structural ensemble. Comparison of the 1 H− 15 N heteronuclear single-quantum coherence (HSQC) spectra of the wild-type (WT) and the pathogenic E787K mutant of the TIL′E′ region of VWF shows that the mutation causes significant changes in the underlying structural ensemble (Figure 4). We used REMD simulations to probe these changes on the molecular level. We chose two force fields from those tested on the WT protein to describe the mutant: a99SB-disp that performed consistently well across

The Journal of Physical Chemistry Letters
Letter all comparison categories for the WT and a99SB*-ILDN that belongs to the same force field family but has failed to accurately capture features of the partially structured domain. In both cases, multiple replica 500 ns long simulations were produced and evaluated against experimental data in an identical manner to the WT variants ( Figure 5). The convergence of the simulations was again confirmed using block analysis ( Figure S5, Table S3).

Letter
The differences in the energy scores for the mutant and WT simulations showed that the E787K mutant explored different conformations (Tables 2 and S3, Figure S5). This difference is particularly pronounced when comparing NOE distances, and the mutant simulation performed using a99SB-disp correctly produced an ensemble that agrees significantly worse (10 times) with the experimental WT data than the WT simulation itself. While a99SB*-ILDN force field also produced different ensembles for the mutant and the WT, these differences are not as pronounced, and both WT and mutant systems deviate

The Journal of Physical Chemistry Letters
Letter notably from the experimental WT data. Most of the changes introduced by the E787K mutation were contained in the region of the short β-sheet motif in the TIL′ domain loop ( Figure 5), which was no longer formed in the mutant ( Figure S6), as evident from the analysis of the a99SB-disp force field. These changes were also reflected in the contacts comparison, which showed a re-formation of interactions in the shorter β-sheet motif and its surrounding residues ( Figure S7). Generally, there was very little change in sampling of the backbone dihedral angles ( Figure S8). Considering that the mutation introduced a change in charge, we also looked at the differences in the formation of salt bridges for the WT and E787K mutant constructs (Table 3). We found a number of changes involving residues whose mutations are implicated in type 2N VWD ( Figure 1C), such as R854, R782, R816, and H817, and that likely also affect the formation of the shorter β-sheet motif.
Here, we assessed the performance of eight MD force fields, two which are often the first choice when it comes to investigating folded proteins and six that have been recently developed to improve the modeling of intrinsically disordered proteins. To the best of our knowledge, this is the first time that these force fields were tested on a complex target exhibiting a very diverse structural behaviorthe TIL′E′ region of VWF comprising the folded E′ domain and the partially structured TIL′ domain. The chosen protein is not only of pharmaceutical interest but poses a real challenge for force fields due to its structural heterogeneity, which is typically difficult to model accurately. Accurate models of heterogeneous and dynamic systems are of great importance as recent studies have estimated that over 40% of any eukaryotic proteome contains disordered regions. 24−26 In our enhanced sampling simulations of VWF, the force fields that overall agree best with the NMR data are a99SBdisp 10 and RSFF2+, followed by C36m. Robustelli et al. developed a99SB-disp to perform universally well across systems ranging from folded proteins to IDPs. However, their extensive benchmark only included calmodulin as an example of a protein with two folded domains connected through a flexible linker; it contained no test systems akin to the TIL′E′ where one of the domains is partially structured. Therefore, our study confirms that the a99SB-disp force field performs well in such systems. Out of the three top-performing force fields, it is worth noting that C36m uses the TIP3P-Charmm water model, which is computationally less expensive than TIP4P-D used otherwise. However, C36m does not necessarily perform well when it comes to modeling IDPs. 10,19 Because RSFF2+ has not been benchmarked as extensively as other force fields, it is less clear how well it would perform with IDPs and flexible systems outside of the original parametrization study. 13 On the basis of our results, it appears to describe proteins containing partially structured domains very well.
Once we identified force fields that can reproduce the experimental NMR data well, we were able to extend our study to the type 2N VWD E787K mutant. NMR shows that the mutant causes considerable structural changes, and the simulations predict that they are due to the rearrangement of salt bridge interactions in the TIL′ domain involving some of the other residues whose mutation also results in the type 2N VWD, pointing to a similar mechanism of action in such cases. These results also provide a possible molecular explanation to the observations that high ionic strength and low pH result in a much lower affinity of the VWF·FVIII complex. 27,28 We would like to stress that from all of the analyses performed above we find the comparison of NOE data to be the most relevant as it most directly compares simulated data to experimental observables. For folded regions, chemical shifts and TALOS-N provide a similarly robust experimental metric, but they are targeted toward well-folded proteins. The other analyses (contact maps, secondary structure; see the SI) make use of derived data and should therefore be used with caution. 29 In conclusion, our study provides a much-needed benchmark for MD simulations of proteins with both folded and partially structured domains, which will undoubtedly aid the community in simulation setup as well as force field development. It also provides insight into the effects of a mutation linked to a severe form of VWD, which is bound to guide future drug design studies. Finally, it shows the crucial importance of force field choice to correctly capture the effect of subtle structural changes; the effect of a severe pathogenic mutation in VWF was only correctly captured by using a force field that reproduces both the properties of folded and partially disordered protein domains. Author Contributions § A.K. and R.B.P. contributed equally.

Notes
The authors declare no competing financial interest. The Journal of Physical Chemistry Letters Letter