Understanding Cryptic Pocket Formation in Protein Targets by Enhanced Sampling Simulations

: Cryptic pockets, that is, sites on protein targets that only become apparent when drugs bind, provide a promising alternative to classical binding sites for drug development. Here, we investigate the nature and dynamical properties of cryptic sites in four pharmacologically relevant targets, while comparing the e ﬃ cacy of various simulation-based approaches in discovering them. We ﬁ nd that the studied cryptic sites do not correspond to local minima in the computed conformational free energy landscape of the unliganded proteins. They thus promptly close in all of the molecular dynamics simulations performed, irrespective of the force- ﬁ eld used. Temperature-based enhanced sampling approaches, such as Parallel Tempering, do not improve the situation, as the entropic term does not help in the opening of the sites. The use of fragment probes helps, as in long simulations occasionally it leads to the opening and binding to the cryptic sites. Our observed mechanism of cryptic site formation is suggestive of an interplay between two classical mechanisms: induced- ﬁ t and conformational selection. Employing this insight, we developed a novel Hamiltonian Replica Exchange-based method “ SWISH ” (Sampling Water Interfaces through Scaled Hamiltonians), which combined with probes resulted in a promising general approach for cryptic site discovery. We also addressed the issue of “ false-positives ” and propose a simple approach to distinguish them from druggable cryptic pockets. Our simulations, whose cumulative sampling time was more than 200 μ s, help in clarifying the molecular mechanism of pocket formation, providing a solid basis for the choice of an e ﬃ cient computational method. temperature range of 305 − 315 K using 9 replicas. Unbiased Parallel Tempering simulations (310 − 400 K; 32 replicas) and SWISH simulations ( λ ∈ [1.0 ; 1.35], 8 replicas, HREX 73 ) were each performed for 500 ns. Ligands were parametrized using GAFF 74 with additional repulsive interligand potential. To monitor pocket exposure, we analyzed snapshots using fpocket. 75 For further details, see the Supporting Information.


■ INTRODUCTION
Drug discovery is an increasingly expensive endeavor as many therapeutic discovery projects fail in clinical trials. 1−5 Late-stage failures are especially costly, and lack of efficacy of lead compounds is often cited as the main reason for them. This problem might be addressed by using validated targets. In the postgenomic era, an increasing number of validated targets with a well-defined role in disease are arising from large-scale genomic sequencing complemented by proteome analysis. 6 However, not all validated targets are therapeutically tractable (or "druggable"). 7−9 Many attractive targets involve protein− protein interactions (PPI) or other allosteric regions lacking apparent binding sites and for a long time were considered undruggable. This notion is now changing as an increasing number of proteins previously considered intractable have been successfully targeted with small molecules, greatly extending the range of usable targets. 10,11 What is more, many validated targets have ligandable pockets, which are not obvious from structural information on the unbound protein and only become clear when a drug is bound. These "cryptic" or "hidden" pockets offer an attractive opportunity for therapy, as it was recently shown in the case of K-RAS, the most commonly mutated oncogene in human cancers. For 30 years, researchers had tried and failed to develop K-RAS inhibitors, to the point it was thought to be undruggable. Yet very recently, a new cryptic pocket was found by tethered compounds and successfully targeted. 12 The molecular mechanism by which cryptic sites are formed is not clear. Although ligands seem to be necessary in their opening, it is still debated whether they do it by an induced-fit, conformational selection, or a "mixed" mechanism. 13−15 Because of these uncertainties and their hidden nature, cryptic pockets are difficult to identify by both experimental and computational methods.
Many available cryptic pocket structures, such as those studied here, were discovered serendipitously. They helped to understand the nature of these sites and their potential use as therapeutic targets. Available experimental approaches rely on expensive procedures such as large-scale fragment screening, 16 site-directed tethering, 17 or the use of antibodies. 18 As cryptic pockets are mostly not evolved to bind small molecules and likely have fewer possible binders as compared to catalytic sites, the discovery effort has to be more focused. 19 Simulations and modeling can play a significant role in the systematic search for cryptic sites. Ad hoc machine learning approaches are starting to emerge, 20 but they rely on experimentally known cryptic pockets, whose number is still limited. Many early attempts to identify cryptic pockets by atomistic simulations were limited by the short time-scale accessible, 21,22 sampling effectively sidechain rearrangements, but less successful in predicting more significant conformational changes. 23 Simulations using small organic fragments as probes 24−27 and mixed-solvent 28 have been quite successful in identifying binding hot-spots. For instance, in the case of the eukaryotic translation initiation factor 4E protein, a cryptic pocket that opens only transiently during MD simulations with water is stabilized by a water− benzene mixture. 28 Yet they are also limited by the time-scale problem and cannot usually find cryptic sites that take longer to open. 29 The ever-increasing computational resources allowed cryptic pocket predictions based on much longer simulations complemented by Markov state models. 30 Still, even long simulations, lasting hundreds of microseconds, fall short when dealing with transient, high-in-energy pockets.
Enhanced sampling techniques have been successfully employed to accelerate the exploration of the relevant conformational space in complex biological systems 31−33 and were successful in localizing cryptic pockets. 34,35 Unfortunately, many of these methods, as Umbrella Sampling, 36 Metadynamics, 37,38 or Transition Path Sampling, 39 rely on the use of Collective Variables (CVs) or a path approximating the reaction coordinate, and require prior knowledge on the location and features of the opening pockets. As this knowledge is mostly unavailable for cryptic binding sites, significant efforts have been devoted to design general CVs, such as JEDI, 40 which sample the opening of hydrophobic cavities. These CVs, however, still require prior knowledge on the location of the binding sites. Methods based on multiple replica of the system such as Parallel Tempering (PT) 41 or Hamiltonian Replica Exchange (HREX) 42 offer an attractive alternative to CV-based enhanced sampling approaches, but thus far have not been systematically employed for this purpose.
The aim of this work is both to devise a robust and generally applicable simulation-based approach and to better understand the driving forces for cryptic pocket formation. 15,43 We compare long MD simulations with both Parallel Tempering and a novel HREX-based approach "SWISH" (Sampling Water Interfaces through Scaled Hamiltonians) in combination with fragment-based simulations, on four systems. Three are of pharmaceutical interest and harbor experimentally validated cryptic pockets: TEM1 β-lactamase, 44 interleukin-2 (IL2), 45 and Polo-like kinase-1 (PLK1) 46 (see Figure 1). β-Lactamase is responsible for widespread antibiotic resistance due to its ability to hydrolyze β-lactam antibiotics. Its cryptic binding site, which is not observed in the apo crystal structure, 47 has been discovered serendipitously 44 when crystals revealed two small organic molecules inserted at a predominantly hydrophobic interface between α-helices H11 and H12, 1.6 nm away from the active site. IL2 plays a key role in the activation of T-cells and in the rejection of tissue grafts. 10 Its cryptic binding site coincides with its partner protein binding surface. The discovery of this cryptic site led to one of the earliest examples of successful small-molecule inhibitor against PPIs. The site elongates across the natural PPI interface, of which a significant part is apolar, and has been explored with increasingly longer ligands, revealing a surprisingly large pocket. 45 PLK1 is a validated anticancer target in which phosphorylation-dependent PPIs play a crucial role in regulating cell mitosis. Its cryptic site was discovered in a complex with a synthetic phosphopeptide, 46 revealing a binding cavity that is not exposed when bound to natural peptides. 48 The fourth system, Ubiquitin, is not known to harbor cryptic pockets. It has been chosen to act as a control for "false positives" as its structure has been characterized by crystallography and its flexibility and weak superficial binding sites by NMR. 49,50 ■ RESULTS AND DISCUSSION We begin with testing whether or not the cryptic pockets are populated in the absence of the ligands allowing their sampling by long unbiased MD simulations. 22,30 We run several long MD simulations of TEM1, IL2, and PLK1 with different force-fields for 1−5 μs each. However, they fail to reveal an appreciable opening of the cryptic sites when starting from the apo crystal structures (see Supporting Information and Figure S2).
Cryptic Pockets Are Unstable without Ligands and Higher Temperature Does Not Improve Their Sampling. The Parallel Tempering 41 approach does not rely on the choice of system-dependent CVs and significantly enhances the sampling through excursions at higher temperatures; it is thus a seemingly suitable method for our purpose. However, to our initial surprise, increasing the temperature does not help exploring the cryptic sites for any of our chosen systems (see Protein surfaces are colored by element: cyan, red, blue, and yellow for carbon, oxygen, nitrogen, and sulfur, respectively. Cryptic binding sites in holo structures are circled in green with corresponding surface patch of apo structure displayed in red ovals. Ligands are shown in sticks with fragments that bind to cryptic sites highlighted in orange. Alternative comparison between the apo and holo structures is shown in Figure S1.

Journal of the American Chemical Society
Article Figure 2A and Figure S3). Throughout the 500 ns-long multiple replica simulations, the known cryptic sites remained mostly unidentifiable with an average exposure (see Supporting Information) of 23%, 8%, and 13% for TEM1, IL2, and PLK1, respectively. To better understand this, we investigated the TEM1 cryptic site stability and its thermodynamic balance.
Starting equilibrium simulations from the open (holo-like) conformation with ligands removed, we repeatedly observe a prompt pocket closure irrespective of the force-field used. Indeed, the results obtained with amber03 51 and amber14SB 52 are in qualitative agreement with those obtained with amber99SB*ildn; 53−55 while with charmm22* 56,57 we observe local unfolding (see Supporting Information and Figure S4).
The thermodynamic balance of the binding site formation was further studied with Parallel Tempering Metadynamics (PTMetaD) simulations. 58 We estimated the free energy surface (FES) describing the interconversion between apoand holo-like structures of TEM1 in the absence of the ligands. As CVs we used root mean squared deviation (RMSD) from the open and closed crystal structures and coupled the simulations to a range of temperatures (see Methods). The reconstructed FES shows that, as expected, the dominant state of β-lactamase is akin to the crystallographic apo structure in which the cryptic site is closed ( Figure 2B). However, it also confirmed that the holo structure does not correspond to a stable minimum, and pocket opening is thus rarely observed. After reweighting the FES 59 as a function of the pocket exposure (see Figure 2C), it is clear that the open conformation is ∼3 kcal/mol higher in energy, disfavoring the formation of the cryptic site. Figure 2C also reiterates the lack of a minimum and, as a result, a real transition state for cryptic site opening. In turn, this explains the lack of any significant pocket opening in the long equilibrium simulations. While fully open states are much higher in energy, the calculated free energy profile is in agreement with previous suggestions that equilibrium fluctuations can sample partially open states. 30 As the sampling of the open states might be force-field dependent, we repeated the PTMetaD simulation using the same amber03 force-field of ref 30. Although the cryptic pocket is still unstable, the protein spends significantly more time in a partially open conformation (see Figure S5). Not only the main minimum is shifted from ∼0 to 20% partial exposure, but also the energy penalty to achieve an exposure above 80% is 1 kcal/mol less than that of the amber99SB*-ILDN force-field.
As expected from the PT simulations, the free energy difference between apo and holo is hardly affected by the change in temperature (see Figure 2B,C). The estimated entropic contribution to the conformational change indeed was found to be very small and negative for both force-fields (see Figure S6), explaining why pocket exploration is not helped by higher temperatures. However, we cannot exclude alternative roles of entropy in other systems.
SWISH Approach Efficiently Explores the Cryptic Pockets. Our above-discussed results are suggestive of a major role of the ligands in opening and stabilizing the cryptic sites. As the druggable pockets are known to be apolar/ hydrophobic, 8,40,60,61 we sought an enhanced sampling approach that would replicate the effect of a small ligand by changing the protein−water interactions. Inspired by simulations with probes 27 and studies about the water−protein interaction effect on protein folding and stability, 62,63 we developed the HREX-based technique SWISH (Sampling Water Interfaces through Scaled Hamiltonians). By progressively scaling the nonbonded interactions of solvent molecules with protein atoms, we shift the water properties toward more ligand-like behavior to increase cryptic site opening. After a few trials, we found that the best results are obtained by applying the scaling to apolar protein atoms only: carbon and sulfur (see Supporting Information). The higher is the value of scaling factor λ, the stronger is water affinity to apolar protein surface patches. Our preliminary tests have also shown that the range of λ needs to be chosen sensibly to avoid the unfolding of the proteins. We settled for a range of λ that keeps the fluctuations

Journal of the American Chemical Society
Article of the α-carbon RMSD similar to those observed in the Parallel Tempering simulations.
We performed 500 ns long SWISH simulations for each target system. In all three cases, we found that as λ increases, the known druggable sites become more exposed (Figure 3).
This positive result suggests that the approach can be effective in cryptic binding site exploration even without ligands. As expected, the pocket population in replicas with λ = 1.0, corresponding to the standard Hamiltonian, is low, in agreement with the previous results of the other enhanced sampling approaches.
Fragments Bind to the Cryptic Sites with a Mixed Mechanism. With the further support for the hypothesis of ligand-induced cryptic pocket formation, we chose to perform a series of equilibrium simulations with probe molecules. We selected a set of six hydrophobic ring molecules: benzene, pyridine, imidazole, indole, pyrimidine, and tetrahydropyran, each of which are within the top 15 most common ring fragments in the FDA approved drugs, 64 benzene being the most common. We solvated the TEM1 apo structure with preequilibrated water boxes with ∼1 M probe concentration. To prevent phase separation, we used interligand repulsive potentials similar to those previously used in the literature. 24 For each probe, we ran 32 independent replicas of 100 ns in length. We observe several trajectories in which the fragments bind to the TEM1 cryptic pocket (see Figure S7), shifting the equilibrium to significant exposure. Closer inspection of the resulting structures from successful trajectories shows a wide open cryptic site bound by a few molecules intercalated between the helices 11 and 12 akin to the holo crystal structure.
In the presence of ligands, the open conformations clearly correspond to an energy minimum. However, there is a great variability between the individual trajectories. In many cases, the ligands were unable to discover the site. Moreover, they tend to interact with unspecific sites on the surface. Extending the simulations up to a microsecond still proved to be insufficient to successfully explore the cryptic pocket (see Figure S67H,I). It is possible that, due to the energy penalty of  In the presence of probes, we observe a significant cryptic site population even at the "neutral" (λ = 1.0) replicas. Structural trajectory snapshots of exposed cryptic sites bound by benzene probes, corresponding to maximum pocket exposure at λ = 1.0, are shown on the left-hand side for each system. Benzene is visualized as orange sticks and receptor surface in the same way as in Figure 1.

Journal of the American Chemical Society
Article site opening, the fragments are not always able to access the hydrophobic site and hence induce the full site opening, hindering the convergence. In addition, a slow exchange rate of the bound fragments together with unspecific interactions also contribute to poor reproducibility.
From a mechanistic point of view, we observe first a slight opening of a channel (e.g., from the active site or through the helices) that allows the permeation of a fragment to the hydrophobic area underneath the two helices. Once a hydrophobic fragment resides there, one of the hydrophobic residues might come loose and expose a larger hydrophobic patch to which an increasing number of fragments bind. Finally, the cavity fully opens with multiple fragments bound to it (see Figure S8). This is suggestive of a mixed binding mechanism for the fragments whereby both conformational selection and induced-fit play a role (see analysis and discussion in Supporting Information). To verify the nature of the binding mechanism with more drug-like ligands, we repeated the simulations with CBT [N,N-bis(4-chlorobenzyl)-1H-1,2,3,4tetraazol-5-amine], which was shown experimentally to bind to the cryptic pocket. Albeit we were not able to observe the binding of CBT during a standard MD simulations of average length, in agreement with the observations of ref 65, CBT successfully found the cryptic pocket when SWISH was used (see below).
Finally, we note that the chemical nature of the probe is important. We found that the site opening decreased going from benzene and pyrimidine to pyridine, imidazole, and indole, and was almost negligible in the case of tetrahydropyran (see Figure S7).
Combining SWISH with Fragments Provides the Most Efficient Approach to Cryptic Pocket Detection. We repeated the SWISH simulations in the presence of benzene and pyrimidine fragments for all systems (see Figure 4 and Figure S9). In all of the cases, we see a significant and consistent opening of the cryptic site. The increased effectiveness of the SWISH+fragment approach might be explained by the fact that, while SWISH sampling helps exposing the cryptic sites, those are further opened and stabilized by the fragments. The combined method is thus very general as it helps sampling systems with induced-fit, conformational selection and "mixed" binding mechanisms. 15 What is more, due to the scaling of protein−water interactions, at high λ values the ligands are out-competed by water. This behavior accelerates the sampling by allowing the fragments to quickly leave unspecific interaction sites and more efficiently explore the protein surface ( Figure S9). We note that this also reduces the average pocket exposure at higher λ values as water molecules are less efficient at inducing the exposure than larger fragments. However, the fractional opening appears to be sufficient for the fragments to enter and further induce the sites once lower λ allows displacing the scaled waters. What typically happens is that at higher lambda the opening and closing of the pockets is faster, helping the binding of the fragments to cryptic pockets, but also the unbinding is faster, leading to a lower residence time. When a ligand bound to a replica at higher lambda exchanges with a replica at lower lambda values, the residence time of the fragments is higher, and thus the open states experience a longer lifetime. Thus, by combining the pocket opening power of the scaled replicas and the binding stability of the unbiased replica, the fragments access the pocket and stay bound. However, higher lambdas help the sampling up to a point; when lambda is too high, the protein starts to unfold resulting in the exploration of mostly irrelevant unfolded states (see the Supporting Information).
The greatly increased effectiveness of the SWISH+fragment approach might be explained by the fact that, while SWISH sampling helps expose the cryptic sites, those are further opened and stabilized by the fragments. What is more, due to the scaling of protein−water interactions, at high λ values the ligands are out-competed by water. This behavior accelerates the sampling by allowing the fragments to quickly leave unspecific interaction sites and more efficiently explore the protein surface ( Figure S8). We note that this also reduces the average pocket exposure at higher λ values as water molecules are less efficient at inducing the exposure than larger fragments. However, the fractional opening appears to be sufficient for the fragments to enter and further induce the sites once lower λ allows displacing the scaled waters. What typically happens is that at higher λ the opening and closing of the pockets is faster, helping the binding of the fragments to cryptic pockets, but also the unbinding is faster, leading to a lower residence time. When a ligand bound to a replica at higher λ exchanges with a replica at lower λ values, the residence time of the fragments is higher, and thus the open states experience a longer lifetime. Thus, by combining the pocket opening power of the scaled replicas and the binding stability of the unbiased replica, the fragments can access the pocket and stay bound. The combined method is very general as it helps sampling systems with induced-fit, conformational selection and "mixed" binding mechanisms. 15 As compared to plain MD with fragments or mixed solvent, it is much more efficient in exploring conformational changes with high activation barriers. Unlike CV-based approaches, 34,35 it does not need previous knowledge of the binding site.
We also performed SWISH simulations of TEM1 with the validated allosteric ligand CBT. At higher λ's, one CBT molecule was able to bind to the cryptic pocket in ≃100 ns. After the first CBT was bound, as second molecule quickly entered the site, and the two explored orientations similar to those revealed by X-ray crystallography (see Figure S10). This result is also suggestive of a mixed binding mechanism for CBT, as the presence of one molecule in the cavity clearly helps the binding of additional ones.

FALSE POSITIVES
An important question that needs to be addressed is how do we distinguish "real" cryptic sites from false positives? In general, one cannot be certain whether a novel cavity found by simulations and previously unreported by experiments is a false positive or a yet undiscovered cryptic site. However, this does not mean that the question cannot be addressed, especially if the aim is to distinguish between weakly binding superficial sites and buried cryptic sites. Take, for instance, the case of TEM1. Apart from the main cryptic site to which CBT binds, the other two minor cryptic sites have been proposed to be druggable. 30 We want to distinguish these sites (the main and possibly the two minor ones) from other more or less superficial binding "hotspots". Both long repeated MD simulations (32 × 1.1 μs) and SWISH simulations with fragments were able to identify the superficial sites (see Figure  S11). When we plot the isosurfaces of the volumetric maps showing the time-averaged occupation density, the hotspots corresponding to the two superficial sites are evident. This reflects the fact that the ligands spend more time in contact with those sites. At difference with the superficial sites,

Journal of the American Chemical Society
Article however, only the SWISH simulations were able to systematically explore the main cryptic site between the helices 11 and 12, irrespective of the use of benzene or pyrimidine. Thus, the occupation of the main site in the long MD is much lower than that obtained by SWISH. Subtracting the ligand density obtained from the long MD simulations from that obtained with SWISH+fragments clearly reveals a hotspot corresponding to the buried cryptic site ( Figure S11). We repeated the analysis on IL2 and PLK1 with similar results. On the basis of these results, we propose that buried cryptic sites can be detected by looking at the fragment occupation density differences between SWISH and standard MD simulations. This approach also provides a quick visual way to assess different hotspots.
Still, an important question is whether or not we can distinguish true pockets from false positive. To test whether or not the approach we propose also works in a case where no experimental cryptic pockets are known, we have tested it on Ubiquitin, whose structure and dynamics as well as a number of weak superficial binding sites have been experimentally characterized. 49,50 First, by computing the time-averaged fragment occupation, we observe with both long MD simulations and SWISH two weak superficial hotspots, which are interestingly located close to experimentally reported weak binding sites ( Figure S11). 50 Yet when we compute the difference in the binding density, both hotspots disappear, consistent with the absence of observed buried cryptic binding pockets.

■ CONCLUSIONS
We investigated the molecular mechanisms of cryptic pocket opening and compared the efficiency of different simulationbased approaches in finding them. Long standard MD simulations, free-energy calculations, and extensive Parallel Tempering simulations revealed the energy penalty of cryptic site opening resulting in a spontaneous closure in the absence of a ligand. Although long equilibrium simulations can sample rare fluctuations leading to the opening of the sites, the incomplete opening and short lifetime make their detection difficult. Also, the partially open states are quite different from the states observed in ligand-bound crystal structures and might prove impractical in virtual screening. The use of different force-fields confirms the qualitative picture; while some of them might underestimate the energy penalty to form the pockets, all agree on their lack of stability.
Our results indicate a role of induced-fit effects in cryptic site opening, and show a mixed mechanism in some cases, as the binding of benzene and the more drug-like ligand CBT to TEM1. We have demonstrated that our proposed SWISH method is able to induce the opening of the known cryptic binding sites in TEM1, IL2, and PLK1 systems by scaling protein−water interactions. Combining the SWISH method with suitable probe molecules proved to be the most efficient and reliable approach. At difference with CV-based enhanced sampling algorithms, it does not require prior knowledge of the site or the definition of optimal CVs to discover the cryptic sites. What is more, the difference in time-averaged fragment densities from SWISH and long MD simulations is able to distinguish them from nonspecific superficial binding hot-spots. To the best of our knowledge, this is the first successful example in which enhanced sampling techniques are combined with fragment molecule simulations to explore cryptic pockets. We also note that the SWISH approach might be applied to a wide range of protein sampling problems including protein folding.

■ METHODS
The protein structures were retrieved from the Protein Data Bank 66 (PDB entries 1JWP, 1PZO, 1M47, 1PY2 (chain D), 3FVH, 3P37 (chain A), and 2LJ5). Missing residues were added using the software Modeller, 67 according to the respective Uniprot sequences. The structural figures were made using VMD. 68 Molecular Dynamics was performed with Gromacs 4.6.7. 69 Unless otherwise specified, all simulations were run at 300 K NVT ensemble with temperature coupled using V-rescale algorithm, 70 using the amber99SB*-ILDN 53−55 and explicit solvent TIP3P 71 force-fields. PTMetaD 58 was run using the PLUMED 72 for 500 ns using binding site RMSD relative to apo and holo structures over the temperature range of 305−315 K using 9 replicas. Unbiased Parallel Tempering simulations (310−400 K; 32 replicas) and SWISH simulations (λ ∈ [1.0 ; 1.35], 8 replicas, HREX 73 ) were each performed for 500 ns. Ligands were parametrized using GAFF 74 with additional repulsive interligand potential. To monitor pocket exposure, we analyzed snapshots using fpocket. 75 For further details, see the Supporting Information.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/jacs.6b05425.
Supplementary results and discussion with accompanied figures and tables referred to in the main text, along with a more explicit account of methodology (PDF)