Exploring Ligand Stability in Protein Crystal Structures Using Binding Pose Metadynamics

Identification of correct protein–ligand binding poses is important in structure-based drug design and crucial for the evaluation of protein–ligand binding affinity. Protein–ligand coordinates are commonly obtained from crystallography experiments that provide a static model of an ensemble of conformations. Binding pose metadynamics (BPMD) is an enhanced sampling method that allows for an efficient assessment of ligand stability in solution. Ligand poses that are unstable under the bias of the metadynamics simulation are expected to be infrequently occupied in the energy landscape, thus making minimal contributions to the binding affinity. Here, the robustness of the method is studied using crystal structures with ligands known to be incorrectly modeled, as well as 63 structurally diverse crystal structures with ligand fit to electron density from the Twilight database. Results show that BPMD can successfully differentiate compounds whose binding pose is not supported by the electron density from those with well-defined electron density.


■ INTRODUCTION
Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) play a key role in structure-based drug design (SBDD). The presence of ligands in crystal structures must be supported by convincing experimental evidence, which is represented by the electron density (ED). The interpretation of the observed ED in a binding site as the ligand of interest or water or buffer molecules is far from trivial; it is a subjective work that requires good judgment and expertise, but sometimes mistakes can lead to erroneous modeling decisions in crystal structures. The visualization of the ligand−protein model together with the ED maps can be a useful way for assessing ligand placement, but a numerical measurement to quantify ligand reliability is also required to allow a more consistent classification. For example, the most commonly used comparison metric is the real space correlation coefficient 1,2 (RSCC). It is a local measure of how well the calculated ED density of a ligand matches the observed ED ranging between 1 (perfect correlation) and −1 (perfect anticorrelation) with values below 0.8, indicating a poor fit where the ligand might have been incorrectly modeled. The "consumers" of the structural information are often not expert crystallographers; therefore, the usage of misinterpreted crystal structures as the starting point of computational experiments such as ligand docking, active site identification, and lead optimization could lead to unreliable results from which erroneous conclusions are consequently drawn. As reported in several papers, 3−7 there are still cases in the Protein Data Bank (PDB, http://www.pdb.org) in which the presence and/or location of the ligand is not fully supported by the underlying ED. Given the fundamental importance of the crystal structures in the progression of SBDD projects, several tools that rank and assess their quality based on the fit of the ED have been developed. 8−10 In the present paper, the possibility to identify and separate ligands that are correctly placed and supported by ED from those that are misplaced and/or not supported by ED is studied with computational methods.
If the structural flexibility of a biological system needs to be studied, molecular dynamics (MD) simulation is the technique of choice. However, when an efficient sampling of structural dynamics is required in a limited timescale, enhanced sampling methods such as metadynamics are often more useful. Metadynamics 11,12 enables sampling of a complex free-energy landscape by adding a history-dependent bias into the system as a function of a carefully chosen collective variable (CV). In this way, the system is discouraged from revisiting previously sampled regions and at the same time is forced to escape stable free-energy minima, where it would normally be trapped, facilitating the exploration of the entire free-energy landscape. Such a methodology has been used to reconstruct the full freeenergy landscape of protein−ligand binding 13−15 and predict the ligand binding pose. Binding pose metadynamics 16 (BPMD), a variation of metadynamics, is an automated, enhanced sampling, metadynamics-based protocol, in which the ligand is forced to move in and around its binding pose, whose higher mobility under the biasing potential is considered indicative of binding mode instability. Clark et al., 16 the developers of the tool, showed the ability of BPMD to reliably discriminate between the correct ligand binding pose and plausible alternatives generated with Induced Fit Docking (IFD).
The aim of this paper is primarily to validate the BPMD tool using reliable and questionable protein−ligand binding poses obtained from crystal structures, identified using both RSCC value and analysis of the ED maps. It is hypothesized that ligands supported by underlying ED should display stability under the BPMD bias, whereas rapid ligand fluctuations, indicating instability, are expected for ligands that are misplaced in the ED and/or are not in a well-defined energy minimum. These studies will help to understand how the stability of ligand binding poses could inform on crystal structures in which the ligand has been incorrectly or questionably modeled and potentially expand the BPMD tool usage to assess the quality of crystal structures before undertaking SBDD campaigns.
■ METHODS Data Sets. The first three test cases were identified by searching the primary literature: epothilone 17−20 bound to tubulin alpha-1β chain protein, ampicillin 21 bound to penicillin-binding protein, and a ligand bound to IκB kinase β. 22 Extra cases were identified from the RCSB PDB in a semiautomated fashion. The RSCC 1,2 is a local measure of how well the calculated density of a ligand matches the observed ED and is defined as where ρ's are the ED values at grid points that cover the residue in question, obs and calc refer to the experimental and model ED, respectively, and cov and var are the sample covariance and variance, respectively. RSCC is a statistical measurement which is publicly available for deposited PDB structures through the Protein Data Bank in Europe (PDBe; http://pdbe.org/). In the worldwide Protein Data Bank (wwPDB) X-ray validation report, it is stated that a RSCC value above 0.95 normally indicates a good fit, whereas a poor fit results in a RSCC value around or below 0.8. All the ligands present in the PDB with RSCC annotation were taken from the Twilight 8 database and were merged with their Uniprot (https://uniprot.org/) entry name as found in the primary structure section of the PDB file (DBREF section). In this way, all the proteins in which at least one structure was present with RSCC < 0.8 (ligand not fully supported by the underlying ED) and one with RSCC > 0.9 (ligand with good ED fit) were selected; the set of two structures with different ligands but same Uniprot entry will be referred to as a "pair". Furthermore, crystal structures with resolution worse than 2.5 Å were discarded as well as all the proteins in which the biological assembly was not monomeric. A total of about 11,000 structures were retrieved, and ligands that are part of the crystallization buffer or solvent were removed (7538 total structures). The number of structures was further reduced by grouping them with their Uniprot entry name and RSCC value. A representative member of each protein was visually inspected; all the structures of that protein were discarded if the protein was challenging to model; for example, it contained unresolved portions or was difficult to parameterize. This procedure provided a set of 63 structures (61 unique ligands) as a reference set to validate the BPMD tool. Because it was difficult to find sufficient pairs for the data set, the 2.5 Å resolution criterion was relaxed to allow five extra structures (PDB: 2ITY, 3W16, 3QCQ, 4QE8, and 5HIB) to be selected. The (2mFo−DFc) maps and the (mFo−DFc) maps for the reference data set were downloaded from the PDBe database (http://www.ebi.ac.uk/pdbe/) contoured at +1σ and ±3σ, respectively, and visually inspected with Coot. 23 Fo and Fc are the experimentally measured and model-based amplitudes, respectively, m is the figure of merit, and D is the σ A weighting factor. 24 In general, protein−ligand models with RSCC < 0.8 have poor ligand ED fit. This can range from complete absence of ED for significant portions of the ligand to broken ED throughout the entirety of the ligand. 6 A low RSCC score can also be the result of a combination of good ED fit for the portions of the ligand interacting with the protein target and the remaining portion(s) having very poor ED fit because of high ligand moiety mobility. It is clear that the interpretation of the final ED fit is a subjective process, and there is likely to be a wider range of individual fitting interpretations when the ED is poorly defined.
In this work, we want to address the capability of BPMD to discriminate between high-and low-probability ligand binding modes; therefore, it is important to separate cases in which the ED is almost absent, indicating that the ligand presence is not supported by experimental evidence, from cases where the ED quality does not allow a complete determination of the ligand pose because of partial disorder, indicating that the ligand presence is partially supported by ED. In order to better define the data set, omit maps were calculated with σ A style maximum likelihood-weighted mFo−DFc and 2mFo−DFc map coefficients for all the structures by removing the ligand in question followed by maximum likelihood refinement in REFMAC. 25 The omit map here refers to the fact that the ligand is omitted from the model refinement to reduce model bias in the ED map. If the ligand molecule is present, the shape of the resulting difference ED will provide corresponding evidence. After the omit map was created and visually inspected with the original coordinates overlaid, the data set was more accurately divided in three distinct categories (Table 1): (1) green: ligands with RSCC > 0.9 that show very good fit with the ED; (2) amber: ligands with RSCC < 0.8 that are only partially supported by the ED, that is, the ligand presence showing partially occupied and/or disordered portions and a fractional positive difference density is observed in the (mFo−DFc) omit  26 in Maestro v.2018.04. Hydrogen atoms and missing residues were added to the initial coordinates; bond orders were assigned to the ligand in the crystal structures. The protein termini were capped with ACE and NMA residues. Epik was used to find the most likely protonation state of the ligand and the energy penalties associated with alternate protonation states. The protein's hydrogen bond network was optimized using the ProtAssign algorithm in the H-Bond Refine Tab of the Protein Preparation Wizard (Maestro v.2018.04) by correcting both potentially transposed heavy atoms in asparagine, glutamine, and histidine side chains and also the protonation states of histidine residues. Finally, a restrained minimization using the Impref module of Impact with the OPLS3e 27 force field was used such that hydrogen atoms were freely minimized while allowing for sufficient heavy atom movement to relax strained bonds, angles, and clashes. The Force Field Builder (FFBuilder) tool was used to automatically generate accurate force field torsional parameters derived from quantum mechanics for all ligands containing substructures not fully covered by the standard OPLS3e parameters.
Binding Pose Metadynamics. BPMD as implemented in Maestro v.2018.4 is a variation of metadynamics simulation in which 10 independent metadynamics simulations of 10 ns are performed using CV as the measure of the root-mean-square deviation (rmsd) of the ligand heavy atoms relative to their starting position. The alignment prior to the rmsd calculation was done by selecting protein residues within 3 Å of the ligand. The CAs of these binding site residues were then aligned to those of the first frame of the metadynamics trajectory before calculating the heavy atom rmsd to the ligand conformation in the first frame. The hill height and width were set to 0.05 kcal/ mol (about 1/10 of the characteristic thermal energy of the system, k B T) and 0.02 Å, respectively. Before the actual metadynamics run, the system was solvated in a box of SPC/E water molecules followed by several minimization and restrained MD steps that allow the system to slowly reach the desired temperature of 300 K as well as releasing any bad contacts and/or strain in the initial starting structure. The final snapshot of the short unbiased MD simulation of 0.5 ns is then used as the reference for the following metadynamics production phase.
The key concept in BPMD is that under the same biasing force, ligands that are not stably bound to the receptor will experience a higher fluctuation in their rmsd as compared to the stably bound ones. Three scores are provided by BPMD that are related to the stability of the ligand during the course of the metadynamics simulations: (1) PoseScore indicative of the average rmsd from the starting pose. Rapid increase in the PoseScore is indicative of ligands that are not in a well-defined energy minimum and hence might not have been accurately modeled. (2) PersistenceScore (PersScore) is a measure of the hydrogen bond persistence calculated as the fraction of the frames in the last 2 ns of the simulation that have the same hydrogen bonds as the input structure, averaged over all the 10 repeat simulations. Low PersScore is found in structures in which their contact network is weakened by the BPMD bias. It ranges between 0, indicating that either the ligand at the start did not have any interaction with the protein or that the interactions were lost in due course, and 1, indicating that the interactions between the starting ligand binding mode and the last 2 ns of the simulations are fully kept. (3) CompositeScore (CompScore) is a linear combination of PoseScore and PersScore obtained from fitting the results on 42 different systems from the primary paper 16 and is calculated as follows: CompScore = PoseScore − 5 × PersScore.

■ RESULTS
Each complex was run using Desmond on a single node with 4 GPU cards, taking for a typical system (1 complex = 1 × 10 metadynamics run), 2−3 h. The results were assessed for pose stability based on the PoseScore, that is, the rmsd of the ligand with respect to the initial ligand heavy atoms coordinates. A PoseScore ≤ 2 Å was considered stable (this value of rmsd is often used as a threshold defining success in prospective docking simulations). In addition, the results were also analyzed looking at the PersScore, which is an indication of the strength of the hydrogen bonds formed between the ligand Journal of Chemical Information and Modeling pubs.acs.org/jcim Article and the protein residues. If 60% of the total hydrogen bonds were kept during the simulations (e.g., PersScore ≥ 0.6), it was considered as a sign of good persistence. Finally, the linear combination of the two scores, CompScore, was also investigated but not used as a primary metric to assess pose stability as reported in the Results from Twilight Database section.
Results from Initial Literature Search. The evaluation of BPMD started from three well-characterized literature cases which are representative of problematic situations: (1) ligand with incorrect ligand binding mode [epothilone A (EpoA) in tubulin alpha-1β chain]; (2) ligand placed into ED belonging to another entity [ampicillin in penicillin-binding protein 4]; and (3) ligand modeled with incorrect geometry [inhibitor in IκB kinase β]. Each case will be discussed in detail in the following sections.
Tubulin Alpha-1β Chain. Epothilones are natural compounds belonging to the microtubule stabilizing antimycotic agent class that bind to the common binding site in β-tubulin. The first atomic model of EpoA ( Figure 1B) bound to α,βtubulin was solved by Nettles et al. 19 with a combined approach of electron crystallography, NMR, and molecular modeling in 2004 (PDB: 1TVK). Serious doubts have been raised for the proposed bound conformation of EpoA in this model because of the inconsistencies with NMR information. In fact, in 2013, Prota et al. 20 deposited a 2.3 Å X-ray crystal structure of EpoA in complex with α,β-tubulin, the stathminlike protein RB3, and tubulin tyrosine ligase (PDB: 4I50), which showed a different binding mode for the ligand as compared to the 1TVK model ( Figure 1A). For these reasons, the structure of tubulin alpha-1β chain in complex with EpoA was employed to examine the ability of BPMD to differentiate between the correct and incorrect ligand binding mode.
During the metadynamics calculation on the initial structure (1TVK), EpoA shows an average rmsd over the 10 runs that increased from the beginning to the end of the simulation time, and the hydrogen bonds were present for a limited time of the simulation run ( Figure 1C). The overall PoseScore and PersScore are 4.0 and 0.1, respectively. Both these observations are indicative of the instability of the binding mode and are consistent with reports questioning the original structure. On the other hand, EpoA in 4I50 shows a different behavior, where the averaged rmsd reaches a steady PoseScore of 0.9 that is kept constant until the end of the simulations ( Figure  1C). At the same time, the PersScore indicates that the hydrogen bonds identified at the start of the metadynamics run are kept for 70% of the averaged time. In this case, the results suggest a stable binding mode. There is a clear difference between the original structure (1TVK) and the more recent one (4I50) both in the PoseScore and the overall rmsd profile during the metadynamics run, showing, in this case, that BPMD can differentiate between a stable and unstable binding geometry. Another potential sign of instability for the ligand in 1TVK could also be seen in the high rmsd of up to 3.6 Å between the 10 structures used as reference for the metadynamics run; in 4I50, the ligand RMS deviation reaches 0.75 Å at maximum. Finally, in the case of 4I50, EpoA under the BPMD bias experiences no drastic rearrangement as compared to the initial pose; hence, the ligand conformation in 4I50 is in a stable conformation as opposed to the ligand modeled in 1TVK.
Penicillin-Binding Protein 4. Penicillin-binding proteins are membrane-associated proteins that catalyze the final step of murein biosynthesis in bacteria. Penicillins bind irreversibly to the active site of those enzymes disrupting the cell wall synthesis. The crystal structure of the penicillin binding protein 4 from staphylococcus aureus in complex with ampicillin (PDB: 3HUN, 21 resolution 2 Å) was deposited in 2010, showing two separate chains in the asymmetric unit, although the preferred biological assembly of the complex is monomeric. In each of the deposited chains, A and B, ampicillin is modeled with a different binding mode and the RSCC is low, suggesting that there is poor fit between observed and modeled ED: RSCC chain A = 0.52 and RSCC chain B = 0.73 (Supporting Information). Moreover, as stated by Weichenberger et al., 28 the phenyl moiety of ampicillin (ZZ7-501, chain B) is placed in a density that could better fit a sulfate ion. Therefore, each chain containing the ligand ampicillin was submitted to the BPMD protocol. The PoseScore of ampicillin in chains A and B are 4.6 and 5.1, respectively ( Figure 2). The high rmsd from the reference conformation of the ligand and the weakened hydrogen bond network during the simulations are consistent with the fact that the ED does not support the ligand presence in either of the chains.
IκB Kinase β. The structure of IκB kinase β bound to the ligand, as depicted in Figure 3B, was solved at a resolution of 3.6 Å and deposited as PDB code 3QAD. 22 This has been the focus of several papers concerning protein−ligand crystal structures with poorly refined ligand geometries. 5,6 In this crystal structure, the aminopyrimidine ring of the bound inhibitor had a pyramidal carbon in the pyrimidine ring instead of the expected planar one, and the piperazine moiety is in an unfavorable boat conformation ( Figure 3A). The authors released a second structure (PDB: 3RZF) after re-refinement of the erroneous one. However, even in the newly released structure, the ligand is highly strained. 29 Moreover, by analyzing the deposited ED, as explained in the Methods section, the (2mFo−DFc) map does not support the presence of the ligand. The BPMD protocol was applied to the ligand as modeled in both crystal structures. The averaged PoseScore and PersScore are 6.7 and 0 for the ligand in the obsolete structure (PDB: 3QAD) and 4.9 and 0.009 for the ligand in the refined structure (PDB: 3RZF), respectively. In both cases, the ligand rmsd increases significantly from the starting conformation, suggesting that the ligand binding pose is highly Transferases are the most well-represented structures (about 30% of the whole data set); the rest of the data set is distributed across 11 protein families including hydrolase, isomerase, DNA-binding protein, signaling protein, and ligase. The ligands appear to be widely distributed in drug-like physicochemical space, implying the generality of the data set used ( Figure 5).
Results from Twilight Database. The results of the complexes from categories green (ligand supported by ED) and red (ligand not supported by ED) are first discussed. A total of 51 crystal structures, 30 with RSCC > 0.9 and 21 with RSCC < 0.8 that have been confirmed by inspection of the ED maps, were subjected to the BPMD protocol to assess ligand stability. Initially, the Twilight database was searched for pairs of compounds crystallized in the same protein, in which one was well supported by the ED and the other was not. By analyzing the results for pairs of structures only, where each pair contains one structure with RSCC > 0.9 and one with RSCC < 0.8 as defined in the Methods section, it is observed that in 7/11 pairs, a cutoff of PoseScore = 2 clearly distinguishes between structures supported by ED and those that are not ( Figure 6). In 2/11 pairs (EPHA3 and PPARG), the structures cannot be distinguished by PoseScore, while in 2/11 pairs, the structures can be distinguished by the PoseScore, but both fall below (BACE1) or above (ANDR) the threshold of 2. All the test cases with RSCC > 0.9 have PoseScore < 2, except for the androgen receptor ( Figure 6).
To get a more meaningful picture of the performance of the method, a larger data set was needed. Because it proved difficult to find pairs of compounds from the same protein where one of the ligands was clearly not supported by the density, this requirement was removed, and the protocol was tested on an expanded set as shown in Figure 7. Overall, if the results of category green and red are analyzed ignoring the pair definition, 28/30 crystal structures supported by the ED have a PoseScore below the cutoff threshold of 2. The only two outliers are the surface-exposed ligand of the androgen receptor, PDB: 2PIT, and the fragment bound to the PHIP   In general, a PoseScore of 2 (which is indicative of the average rmsd deviation from the starting pose) has been identified as a practical threshold for distinguishing between stable and unstable ligands as proposed previously by Clark and coworkers. 16 From these results, by using the number of structures with RSCC > 0.9 and PoseScore < 2 as true positives (TP = 28) or PoseScore > 2 as false negative (FN = 2) and with RSCC < 0.8 and PoseScore > 2 as true negatives (TN = 17) or PoseScore < 2 as false positive (FP = 4), combining structures found both in the literature with manual search and in the Twilight database (Table 2), the confusion matrix of the PoseScore (Table 3) gave a sensitivity of 0.94, a specificity of 0.84, and a κ value of 0.78, which confirmed the ability of BPMD to correctly separate the crystal structures  Test cases are color-coded by ED (red and cross shape = the underlying density is too poor to model the ligand, green and circle shape = density is present). On the y-axis, a cutoff at a PoseScore equal to 2 is reported to identify ligands that are stable during the course of the BPMD runs.  where the ligand has a satisfactory ED from crystal structures where the ED does not explain the ligand placement. The general trend previously observed for the initial test cases identified by manual search of the literature has proven to be generalizable. This separation of the two classes of ligands also gives confidence that the force field is performing well. The stability of the green ligands under BPMD bias suggests that the instability of the red ligands is not a consequence of issues with the force field. Indeed, the OPLS3e 27 force field is well validated for drug-like ligands and has been show to perform well in comparison to other ligand force fields such as those in CHARMM, 30 AMBER, 31 and older versions of OPLS. 27 The overall results were also analyzed by PersScore ( Figure  8). The correct threshold to separate green and red categories is more difficult to identify unambiguously for PersScore as compared to PoseScore. However, if a threshold of 0.6 is adopted, which corresponds to 60% of the interactions being maintained on average across all 10 simulations, 23 out of 30 of the ligands supported by ED are correctly identified, while in the remaining 7 cases, the interaction networks were kept between 40 and 57% of the total averaged simulation time. The fragment in PDB: 3MB3, scored as unstable by the PoseScore metric, is the only example with RSCC > 0.9 that had no hydrogen bonds at the start of the simulation. The absolute numbers of ligands in the red category that fell below and above the threshold of 0.6 were equal to the number of complexes identified by the PoseScore threshold: 17 red protein−ligand complexes have an interaction network that is significantly altered by the BPMD bias and in 4 red cases the network is preserved. The sensitivity, specificity, and κ value if the PersScore is used to evaluate ligand stability are 0.81, 0.84, and 0.58, respectively. The PoseScore appears to be a better metric to separate cases where the ligand is correctly modeled in the ED from those which are not. Finally, the CompScore, which is a combination of the PersScore and PoseScore, gives results that are comparable to PoseScore. As a further comparison of the scores, a bootstrapping study was carried out to estimate the standard errors associated with each metric, using three crystal structures selected from the red and green categories (Supporting Information). The results of this study combined with the analysis above showed that PoseScore is the preferred metric because it can be estimated with greater precision than CompScore and is easier to interpret than the PersScore. We note in passing that CompScore is normally preferred when BPMD is used to rank docking poses. 32 Analysis of Structures in the "Twilight" Data Set. Inspection of the ED maps of the ligands belonging to the amber category revealed that the regions of the ligand that were not supported by ED were often outside the binding site and solvent-exposed while the regions inside the protein binding site were mainly supported by the ED (Figure 9). For some of these cases, the ligands may be stable even though they have a low RSCC. Hence, a BPMD PoseScore below the threshold of 2 might be expected.
The percentage of ligand atoms that are uncovered by the ED in the amber category varies from about 6−38%, signifying that most of the ligand is supported by the density with some portion that is not supported by experimental evidence. It is observed that the ligand PoseScore increases with increasing disorder in the ligand structure with a r 2 of 0.54 ( Figure 10). The ligand BJI in PDB: 2JKO is about 26% uncovered by the ED, and given the trend of the other structures in the amber category, it could be thought to be unstable. Its predicted BPMD PoseScore is 1.1, suggesting high ligand stability. After inspecting the original publication, 36 this is not surprising because the reasoning behind its design was not to add new protein ligand interactions but to improve the ligand solubility without affecting potency by including a piperazine ring 36 that is solvent-exposed (Figure 9, panel 4). Therefore, the high ligand stability obtained from BPMD confirms the hypothesis that the piperazine ring would not be detrimental to the Statistics were generated by using the green scored with PoseScore < 2 as true positive (28) or PoseScore > 2 as false negative (2), the red scored with PoseScore < 2 as false positive (4), or PoseScore > 2 as true negative (17).

■ DISCUSSION
Overall, BPMD showed good ability to identify ligands whose modeled pose is not supported by their ED (Figures 6 and 7), even for the cases in which the ligand is only partially supported by the ED (Figure 10). Investigating the results in more detail reveals some insights into situations that can be challenging for the method and caveats users should be aware of, which are worthy of further discussion. Before being submitted to the final metadynamics simulation, every complex underwent an equilibration procedure as explained in the Methods section. The last step of the last short unbiased MD simulation (0.5 ns) is used as the reference structure for the PoseScore calculation. Consequently, the reference structure for BPMD is not the input structure but the equilibrated one which can, in some cases, differ significantly. Given this observation, the PoseScore was compared to the rmsd between the initial crystal structure and the reference equilibrated structure, which is here called MD-rmsd, to understand if unstable poses could be flagged in advance from the MD equilibration procedure. The PoseScore correlates with the MD-rmsd, that is, the average displacement among the reference structures with a r 2 of 0.75 ( Figure 11): the higher the MD-rmsd from the originally deposited structure, the more likely the ligand will be unstable. It should be noted that the rmsd between the 10 reference structures can be quite significant, especially in the cases of the red category ( Figure 11) and in the region of MD-rmsd > 3 Å. Here, the conformation of the ligand reference structures is not a very good representation of the initial crystal structures. The high correlation between PoseScore and MD-rmsd ( Figure 11) suggests that MD-rmsd could be used as a preliminary indication that the initial structure is not correctly modeled. When a shorter equilibration step of 10 ps instead of 500 ps is used for all the cases in which at least one reference structure shows a rmsd ≥ 3 Å, a systematic decrease of MD-rmsd is observed, while the PoseScore does not change significantly, confirming that the ligand in those structures are highly unstable ( Figure 12).
In the region of MD-rmsd up to 2 Å (Figure 13), the average reference structure can be considered close enough to the initial starting structure as opposed to the ones in the region with MD-rmsd > 2.5 Å. The ligands in the green category have lower structural variability (low MD-rmsd), in agreement with the PoseScore below the threshold of 2. The presence of outliers in the region of MD-rmsd between 1.3 and 2 Å ( Figure  13) shows the advantage of using the metadynamics production phase. In the case of 5XHK, 5MYM, and 4WZ1, the benefit of BPMD is clear; in fact, despite the MD-rmsd being below 1.5 Å, which could be indicative of ligand stability, the BPMD bias was needed to correctly classify them as unstable (PoseScore > 2) in agreement with the missing ED from the experiment. A different situation is observed for 2HWQ and 5Q17 in which PoseScore is found below the threshold of 2 and also that the MD-rmsd is lower than 2 Å. The results obtained for this data set suggest that BPMD can successfully discriminate between crystal structures that are correctly modeled and those that are not, but also that the MD-rmsd obtained from the short equilibration step might be informative of dubious structures. A thorough investigation of MD as compared to BPMD in identifying questionable crystal structures is required to identify the optimum protocol to balance accuracy against computational efficiency.
Several of the structures that were considered to be misclassified by BPMD, that is, there was disagreement between the ED classification and the PoseScore, deserve further discussion. The ligand 4HY found in PDB: 2PIT 37 binds to the surfaced-exposed allosteric pocket called Binding Function 3 of the androgen receptor. It has a RSCC = 0.93 but resulted in a high PoseScore of 4.1. In the original paper, 37 it is claimed that it binds weakly (IC 50 ≈ 50 μM) to the surface weakening the contacts between the androgen receptor and coactivator proteins. In this case of low binding affinity in an open solvent-exposed site, the ligand can be readily displaced  Journal of Chemical Information and Modeling pubs.acs.org/jcim Article under the BPMD bias, despite being supported by the ED. A similar exposed pocket was observed in the case of three ligands PDB: 5MRD, 2XPA, and 2XP6, where there is agreement between ED and PoseScore, so this is not a consistently observed problem. Another interesting case is that of the MB3 ligand crystallized in the second bromodomain of PHIP (PDB: 3MB3 38 ), which has a high RSCC value of 0.93 but showed a PoseScore higher than 2. Interestingly, the ligand is a very small fragment possessing only seven heavy atoms, therefore, very different from others investigated here and in previous BPMD studies. It is possible that the metadynamics parameters may not be well optimized for such a small fragment.
Among the incorrectly predicted crystal structures in the red category, it is worth mentioning PDB: 2HWQ. 39 Despite a poor ligand RSCC score of 0.23 and ambiguous fragmented density in the omit maps, it has a PoseScore of 1.21. The binding pocket is narrow, and the ligand binds deep into it. The same authors have solved structures of related ligands bound to the same protein (PPARG) such as PDB: 2F4B 40 (RSCC = 0.73) and 2HWR 39 (RSCC = 0.71). Therefore, the ligand may have been modeled by using prior knowledge from these studies in addition to the ED. A crystal structure with comparable ligand binding mode and good RSCC value (PDB: 5GTN 41 ) was also solved a few years later, corroborating the deposited ligand binding mode. Therefore, in this instance, although the ED does not support the binding mode, the ligand may have been correctly modeled using other information, explaining why it is miscategorized in this work.
The cases discussed above suggest that particularly small ligands or open binding sites may be challenging for BPMD. To investigate whether these factors are a key influence on the BPMD score, heavy atom count, protein−ligand H-bond count, and binding pocket volume have been plotted against PoseScore ( Figure 14). No correlations were seen, suggesting that these factors are not driving the prediction from PoseScore.
Out of the 69 total structures, 34 had binding affinity data as retrieved from the primary literature citation where they were first discussed in the form of IC 50 . The binding data were converted into pIC 50 , and a poor correlation was observed with PoseScore (r 2 = 0.33, see Figure 15) as opposed to what was found by Clark et al. 16 in which they use BPMD as a tool to prioritize IFD ideas. In this work, despite the binding affinity of the ligand under investigation, if the starting position is correctly modeled, it will result in a PoseScore below 2; therefore, BPMD should not be regarded in this way as a tool to prioritize ideas based on the PoseScore.

■ CONCLUSIONS
We have investigated the capabilities of the BPMD tool to correctly identify stable ligands using a diverse data set of crystal structures from published literature and collected from the Twilight database. Primarily, this study has been used as a validation that BPMD is a useful predictor of binding mode The MD-rmsd is below 2.5 Å for all cases. (C) PoseScore obtained with shorter equilibration procedure (NEW_PoseScore) and with default protocol. The best fit line is displayed as a black line and y = x as a black dotted line. In general, the overall results did not change. Each structure is colored identically in A, B, and C. Figure 13. PoseScore in the range of MD-rmsd up to 2 Å where the reference structures can be considered similar to starting structure. The structures identified with their PDB name represent the advantages (5XHK, 5MYM, and 4ZW1) and potential limitation (5Q17 and 2HWQ) of using the BPMD protocol. All the structures in the green category show low MD-rmsd.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article stability, in agreement with previous studies looking at docking and scoring. From the conducted validation on 69 complexes with different physical chemical properties, BPMD has shown good performance categorizing ligand binding modes supported by ED from those not supported. In particular, it was identified that ligands supported by ED show a PoseScore below 2 as opposed to the unsupported ones, which have a PoseScore above 2. BPMD scores do not seem to correlate with binding affinity, suggesting that the method is useful for assessing the stability of individual poses or the relative stability of different poses of the same ligand, but not for scoring different ligands. From the more challenging cases in which the ligand is partially supported by the underlying ED, it was observed that if the disorder comes from parts of the ligand that do not participate in any protein interaction, then the overall stability of the ligand, given that the rest of it is correctly modeled, will not be affected. Thus, information from BPMD simulations could help medicinal and computational chemists in designing more potent compounds maximizing the ligand−protein interactions where the most stable portion of the ligand lies and incorporating flexibility and solubilizing groups in the noninteracting portion without compromising the overall binding stability.
Aside from the validation of the methodology, the data presented here suggest that BPMD may be a useful tool for the crystallographer. Solving crystal structures is an expert job incorporating information beyond that contained in the ED, but this makes it open to a degree of subjectivity. These results suggest that BPMD could be useful in assessing preliminary poses generated by crystallography and highlighting those that would need further investigation or confirmation before undertaking more time-consuming and expensive strategies. Specific cases have been identified, which can be challenging for BPMD, including very small fragments, surface-exposed binding pockets, and interactions with no hydrogen bonds. There are a number of parameters within BPMD that could be investigated to see whether they give an improvement in these cases, such as Gaussian width and height, choice of CV, and run time. However, they are not investigated further here because, overall, the default parameters offer good discrimination.
Finally, the average rmsd of the equilibrated structures prior to the metadynamics simulations has appeared to be informative of inherently unstable ligands. Interestingly, it is observed that the reference structure used in the BPMD protocol might have substantially changed from the input structure especially for the cases in which the ligand is not supported by ED. A brief investigation of using a shorter equilibration procedure on the cases with MD-rmsd > 3 Å showed overall unchanged results with a PoseScore that does not change significantly from the one obtained with the default protocol. It would be interesting to know in a more systematic way which protocol of unbiased MD is needed to obtain comparable BPMD results, and this will be the subject of further studies.
Additional statistics on PoseScore, PersScore, and CompScore (PDF) Figure 14. Relationship between PoseScore and number of heavy atoms (A), volume of binding site (B), and hydrogen bonds (C) of the ligands in the data set. Structures belonging to category green, ligand supported by ED are colored in green with circle shape; from category amber, ligand partially supported by ED are in yellow with diamond shape; and from category red, ligand not supported by underlying ED are in red and with cross shape. Figure 15. PoseScore of all the crystal structures with pIC50 data color-coded by category (green and circle, complexes supported by ED; amber and diamond, complexes partially supported by ED; and red and cross, complexes not supported by ED).
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article