SWISH-X, an Expanded Approach to Detect Cryptic Pockets in Proteins and at Protein–Protein Interfaces

Protein–protein interactions mediate most molecular processes in the cell, offering a significant opportunity to expand the set of known druggable targets. Unfortunately, targeting these interactions can be challenging due to their typically flat and featureless interaction surfaces, which often change as the complex forms. Such surface changes may reveal hidden (cryptic) druggable pockets. Here, we analyze a set of well-characterized protein–protein interactions harboring cryptic pockets and investigate the predictive power of current computational methods. Based on our observations, we developed a new computational strategy, SWISH-X (SWISH Expanded), which combines the established cryptic pocket identification capabilities of SWISH with the rapid temperature range exploration of OPES MultiThermal. SWISH-X is able to reliably identify cryptic pockets at protein–protein interfaces while retaining its predictive power for revealing cryptic pockets in isolated proteins, such as TEM-1 β-lactamase.

Potential energy values of an example system along the first 50 ns of a SWISH-X replica.At the start of the simulation, the potential energy of the system is governed solely by the thermostat (300 K).In this example, after 1 ns, OPES MultiThermal introduces a bias on the potential energy that drives the system to match the temperature range selected (300-350 K).At convergence, the potential energy should fluctuate evenly across its range, as shown here from approximately 25 ns.

Selection of PPIs
A brief description of the four target proteins harbouring cryptic pockets at their proteinprotein interfaces follows.
Bcl-X L .B-cell lymphoma-extra-large (Bcl-X L ), a member of the Bcl-2 protein family, is a pro-survival mitochondrial transmembrane protein.However, its interaction with BCL2 Antagonist/Killer 1 (Bak) protein activates the mitochondrial apoptotic pathway and mitophagy. 1Consequently, Bcl-X L has become a target for anticancer agents such as venetoclax and navitoclax.These agents mimic the BH3 domain of Bak protein and promote the apoptotic cascade.Bcl-X L (UniProt: Q07817, sequence: 1-MSQ. . .LYG-196) is assumed to form a homodimer in its apo form with a N-swapped domain(PDB ID: 1R2D). 2 In its holo form, each of the two protomers binds to either BH3 or a BH3-mimicking inhibitor (PDB ID: 4C52) 3 , resulting in a heterotetramer assembly.IL-2.Interleukin-2 (IL-2) is a single-chain soluble cytokine hormone released by T-cells, which binds to IL-2 receptors (IL-2R) to modulate the normal immune response. 4The IL-2R family includes three monomeric transmembrane proteins, IL-2Rα, IL-2Rβ, IL-2Rγ.While IL-2Rα firstly binds and concentrates IL-2 on the T-cell surface, the subsequent binding to IL-2Rβ and IL-2Rγ activates the intracellular response, leading to T-cell growth and differentiation.While agonists of the IL-2/IL-2Rα PPI (IL-2-like peptides) have been developed for anticancer immunotherapy, antagonists have been pursued as a promising strategy to mitigate inflammatory and autoimmune diseases.One of the main challenges in developing small-molecule antagonists of the IL-2/IL-2Rα PPI is to target a large and discontinuous epitope.Nonetheless, a first hit compound Ro26-4550 with low micromolar activity was discovered, 5 and later further optimised to a nanomolar inhibitor by fragment-based ap-MDM2.Murine double minute 2 (MDM2) protein is an E3 ligase that inhibits the function of the transcription factor p53 by binding to p53's transactivation domain.This interaction promotes the export of p53 outside of the nucleus and eventually leads to p53 ubiquitylation and degradation by the proteasome.Notably, MDM2 is overexpressed in 7% of various tumour types.Targeting the MDM2/p53 interaction has been actively pursued as a strategy to reactivate p53 and inhibit tumor growth.There are currently nine inhibitors being evaluated in clinical trials. 7These inhibitors mimic the key interactions between MDM2 and p53, primarily involving three hydrophobic residues on p53: Phe19, Trp23, and Leu26.MDM2's surface accommodates these residues in three distinct sub-sites, created by the displacement of residues Phe86, His96, Val93, and Tyr100.For our simulations, we selected an NMR structure of the apo form of MDM2 (PDB ID: 1Z1M) and the X-ray structure of MDM2 bound to the 6b inhibitor (IC 50 MDM2/p53 = 0.819 µM, PDB ID: 5LAV), as reported by Gollner and colleagues. 8 E2.The E2 protein from Human Papilloma Virus type 11 (HPV-11) is a replication initiation factor that recruits the E1 helicase to improve the DNA binding activity.Biological assembly of E2 is dimeric and is composed of a C-terminal DNA binding/dimerization domain connected to the N-terminal transactivation domain (TAD) by a hinge region.The TAD domain is responsible for the binding to E1 helicase and shows a helix bundle and antiparallel beta sheet.To date, only two structures of the HPV-11 E2 TAD have been resolved (PDB IDs: 1R6K, 1R6N), the first corresponding to the apo conformation and the second bound to two spirocyclic indandione inhibitors. 9The ligand pocket is not visible in the apo state of HPV-11-E2 (PDB ID: 1R6K) but becomes apparent upon binding of inhibitors (PDB ID: 1R6N).

Protein preparation
All systems were prepared following the same protocol.The structures of the selected proteins were downloaded from the RCSB PDB.For each protein, we selected a holo state (inhibitor-bound) and an apo state (Table 1).PDB structures were chosen taking into account the crystal resolution, the absence of mutations and the completeness of the sequence.Titratable residues were protonated at pH = 7.4 using the H++ webserver (http: //newbiophysics.cs.vt.edu/H++/).All systems are functional as monomers, except for Bcl-X L , which is functional as a homodimer.While the holo form of Bcl-X L in PDB ID: 4C52 reports the correct dimeric assembly, the apo form of Bcl-X L in PDB ID: 1R2D shows only one protomer.To recreate the correct dimeric assembly, after alignment of 1R2D to 4C52, the protomer in 1R2D was copied and aligned to chain B in PDB ID: 4C52.Finally, all truncated systems were capped at N-and C-termini with acetyl and N-methylammine groups, respectively.

Pocket detection
In order to assess the crypticity of the binding pockets within the studied systems and to quantify their physicochemical properties, we performed an analysis of the energy-minimised and protonated holo-crystal structures, as detailed in Table 1 in the main text, for each system.The ligands were removed from all holo structures prior to pocket detection.
The analysis was performed using Fpocket 4.0, a geometry-based cavity detection algorithm. 10Fpocket uses Voronoi tessellation and α-spheres to identify distinct pockets within the protein structure.In this context, an α-sphere is defined as a sphere that touches four atoms on its boundary and contains no internal atoms.For consistency across all systems, we used the default Fpocket parameters, although for smaller pockets we generated grid maps with lower isovalues.
From these generated grid maps, we selected and saved a PDB file containing dummy atoms positioned at the grid points defining the regions of interest.Specifically, for each system we selected a single pocket located within the cryptic site of the protein.These PDB files were later used as input to track the volume of the different pockets.A link to a PyMOL 11 session of each selected cryptic pocket is available in the Data Availability section.The volume values of the selected cryptic pockets are given in a table at the end of this section.
We analysed the behaviour of the selected pockets along the different biased and unbiased MD simulations using MDpocket, an open-source tool designed to detect binding pockets during MD simulations and part of the Fpocket 4.0 package. 10MDpocket was applied to down-sampled and reference-aligned trajectories, with frames spaced at intervals of 100 ps.
A suitable reference structure was carefully selected for each simulated system.
The output generated by MDpocket allowed us to measure the volume of the selected pockets along both biased and unbiased MD simulations.Visualisation of the resulting volume profiles was performed using different Python packages.
Volumes of the presented cryptic pockets.

Energy minimisation and equilibration protocol
Energy minimisation was performed over 50,000 steps using the steepest descent algorithm with a tolerance set at 100 kJ mol −1 nm −1 .The systems were equilibrated in three steps, with harmonic restraints applied to all heavy atoms only during the first two equilibration steps (harmonic constant: 1000 kJ mol −1 nm −1 ).First, a 1 ns heating phase was performed in the NVT ensemble, using the V-rescale thermostat (τ = 0.1 ps). 12Two different temperature coupling groups were used, one for the protein and the other including water molecules and ions, with a reference temperature of 300 K.A second 5 ns equilibration phase was then performed in the NPT ensemble using the V-rescale (τ = 0.5 ps) and Berendsen (P = 1 atm, τ = 0.5 ps) as thermostat and the barostat, respectively. 12,13The same temperature coupling groups were maintained during the NPT equilibration step.An additional 5 ns equilibration step was performed in the NPT ensemble employing V-rescale (T = 300K, τ = 0.5 ps) and C-rescale (P = 1 atm, τ = 1 ps) as thermostat and barostat, respectively.

Effect of different initial data on the t-SNE embeddings
We tested the sensitivity of the t-SNE embeddings to the initial data points used to generate them.We performed different control simulations of the holo state of Bcl-X L .We accumulated 750 ns of sampling for Bcl-X L (initial structure PDB ID: 4C52) keeping the ligand in the pocket.Additionally, we accumulated 1.5 µs (3 replicas of 500 ns) of sampling of the holo-like structure of Bcl-X L , obtained by removing the ligand from the pocket.We therefore generated a holo-like and a holo t-SNE embedding.For the holo-like embedding, we incorporated structures from our SWISH-X simulation of Bcl-X L together with structures from the holo-like control simulations.For the holo embedding, we included structures from the SWISH-X simulation and structures from the holo control simulation.
Clustering of these embeddings using HDBSCAN yielded a similar number of clusters (Figure S2: a-b).
We then calculated the Jaccard index to compare the similarity of the clusters obtained from the holo-like and the holo embedding (Figure S2c).The Jaccard index, denoted as J(A, B), measures the similarity between two sets A and B. It is defined as the size of the intersection of the sets divided by the size of their union: where: • |A ∩ B| represents the cardinality (number of elements) of the intersection of sets A and B.
• |A ∪ B| represents the cardinality of the union of sets A and B.
The Jaccard index ranges between 0 and 1, where: • J(A, B) = 0 indicates that sets A and B have no elements in common.
• J(A, B) = 1 indicates that sets A and B have the same elements.
In most cases, any given cluster A from the holo-like embedding has a corresponding cluster B in the holo embedding such that J(A, B) ∼ 1 (Figure S2c).For the cases where we have J values << 1, it is still possible to identify which clusters these structures end up in.For example, clusters 4 and 12 in Figure S2a are grouped into cluster 7 in Figure S2b, with J(4, 7) = 0.26 and J(12, 7) = 0.71 respectively (Figure S2c).Similarly, cluster 18 from Fig. S2a is assigned to the much larger cluster 19 in Figure S2b, hence J(18, 19) ∼ 0. We not shown).
Taken together, these observations indicate that the majority of structures within a cluster in the holo-like embedding are consistently grouped into a single cluster in the holo embedding, and vice versa.This implies that similar structures sampled during the SWISH-X simulation tend to be consistently assigned to the same cluster, regardless of the original holo or holo-like control data used to create the embedding.

Effect of different tempering schemes on the opening of the cryptic pocket of TEM-1 β-lactamase
To further investigate the effect of temperature variations on the opening of TEM-1's cryptic pocket, we tested two different SWISH-X tempering schemes.The first, which we call SWISH-6X (yes, we ran out of imagination), gradually increases the temperature range explored by each replica (Table S3).The second scheme we tested is identical to that used in SWISH-X, with each replica covering the same temperature range, but we broadened the temperature range from 300-350 K to 280-350 K.In both cases, the resulting pocket sampling from both SWISH-6X and SWISH-X (280-350 K) is not as efficient as in SWISH-X at higher temperatures, with a median site opening of 33%, 38% and 53% respectively (Figure S4a).
In addition, comparison of the volume profiles of each replica clearly shows how the higher temperature range greatly improves the opening of the pocket (Figure S4 b-c).
Table S3: Different tempering schemes tested for TEM-1 β-lactamase.The RMSD was calculated as the heavy atom RMSD with respect to the residues located within 7 Å of the centre of the cryptic pocket in the corresponding holo crystal.Different numbers identify distinct clusters within each t-SNE cluster map.Isocontour lines highlight the region of the t-SNE space explored by holo-like simulations.
We calculated the opening times of the different cryptic cavities based on the different sampling schemes we tested (Table S4).We defined two opening thresholds, corresponding to 60% and 80% exposure of the cryptic pocket with respect to the corresponding holo-crystal.
For any given simulation, we selected only structures that met these opening thresholds.
We then filtered out high RMSD pocket configurations (RMSD > 2 Å) with respect to their corresponding holo-crystal configuration.The RMSD of a given pocket structure was calculated with respect to all heavy atoms of the residues within 7 Å from the centre of the pocket in the holo-crystal.Subsequently, we checked how long it took for each simulation strategy to sample at least one structure that met these criteria.This provided us with a quantitative indication of how long it takes each simulation to sample an open-crystal-like configuration of the cryptic site.

Figure S1 :
Figure S1: Example of the potential energy fluctuations during a SWISH-X simulation.Potential energy values of an example system along the first 50 ns of a SWISH-X replica.At the start of the simulation, the potential energy of the system is governed solely by the thermostat (300 K).In this example, after 1 ns, OPES MultiThermal introduces a bias on the potential energy that drives the system to match the temperature range selected (300-350 K).At convergence, the potential energy should fluctuate evenly across its range, as shown here from approximately 25 ns.
These control simulations allowed us to explore different pocket configurations: the holo control sampled only open configurations of the cryptic cavity, while the holo-like control explored not only the initial open conformations, but also semi-open and closed conformations.This is because, in the absence of the ligand, the cryptic site is free to partially (or completely) close.

Figure S2 :
Figure S2: t-SNE embeddings of Bcl-X L .t-SNE embeddings were generated from pocket configurations sampled during SWISH-X and holo-like (a) or holo simulations (b).Each t-SNE space displays coloured points corresponding to pocket configurations sampled during the SWISH-X simulation, with colours indicating the percentage of pocket exposure.Different numbers identify distinct clusters within each t-SNE cluster map.Isocontour lines delineate the regions explored by holo-like and holo simulations in a and b respectively.c) Jaccard index (J) for each holo-like and holo cluster pair from the t-SNE embeddings in a and b. 10

Figure S3 :
Figure S3: Exposure of TEM-1's cryptic pocket in different simulations.Comparison of the exposure profiles of TEM-1's cryptic site in unbiased MD (holo-like or apo), mixed-solvent, SWISH-X and SWISH simulations.Solid lines represent the running average of the pocket exposure over a 5 ns time window.Scatter points show the pocket exposure value for each structure in the simulation.Pocket exposure is expressed as a percentage of the volume of the cryptic site ( Å3 ) with respect to the volume of the corresponding pocket in the holo-crystal (PDB ID: 1PZO).

Figure S4 :
Figure S4: Opening of the cryptic pocket of TEM-1 under different tempering schemes.a) Violin plots of the cryptic site exposure of the selected cryptic pocket along different simulations.The simulations are presented as follows (from left to right): SWISH-6X, SWISH-X (280-350 K) and SWISH-X (300-350 K).Pocket exposure values from individual replicas of a given simulation are combined into a single violin plot.b) Comparison of pocket exposure profiles along different replicas of TEM-1's SWISH-X simulations sampling different temperature ranges.c) Comparison of the pocket exposure profiles along different replicas of the SWISH-X and SWISH-6X simulations of TEM-1.Pocket exposure is expressed as a percentage of the volume of the cryptic site ( Å3 ) with respect to the volume of the corresponding pocket in the holo-crystal (PDB ID: 1PZO).

Figure S5 :
Figure S5: t-SNE volume maps of the different cryptic pockets.Each point in a given t-SNE space corresponds to a pocket configuration sampled during either an apo unbiased, SWISH-X, mixed-solvent or SWISH simulation, with colours indicating the percentage of pocket exposure.Different numbers identify distinct clusters within each t-SNE cluster map.Isocontour lines highlight the region of the t-SNE space explored by holo-like simulations.

Figure S6 :
Figure S6: t-SNE RMSD maps of the different cryptic pockets.Each point in a given t-SNE space corresponds to a pocket configuration sampled during either an apo unbiased, SWISH-X, mixed-solvent or SWISH simulation, with colours indicating RMSD values ( Å).The RMSD was calculated as the heavy atom RMSD with respect to the residues located within 7 Å of the centre of the cryptic pocket in the corresponding holo crystal.Different numbers identify distinct clusters within each t-SNE cluster map.Isocontour lines highlight the region of the t-SNE space explored by holo-like simulations.

Figure S7 :
Figure S7: Exposure of Bcl-X L 's cryptic pocket in different simulations.Comparison of the exposure profiles of Bcl-X L 's cryptic site in unbiased MD (holo-like or apo), mixedsolvent, SWISH-X and SWISH simulations.Solid lines represent the running average of the pocket exposure over a 5 ns time window.Scatter points show the pocket exposure value for each structure in the simulation.Pocket exposure is expressed as a percentage of the volume of the cryptic site ( Å3 ) with respect to the volume of the corresponding pocket in the holo-crystal (PDB ID: 4C52).

Figure S8 :
Figure S8: Cryptic pockets in three different PPI systems.a) Structural alignment of IL-2's apo (white, PDB ID: 1M47) and holo (blue, PDB ID: 1PY2) structures.b-c) Close-up views of the cryptic cavity in apo and holo structures, respectively.d-e) Surface representation of apo (white) and holo IL-2 (blue).f ) Structural alignment of MDM2's apo (white, PDB ID: 1Z1M) and holo (green, PDB ID: 5LAV) structures.g-h) Close-up views of the cryptic cavity in apo and holo structures, respectively.i-j) Surface representation of apo (white) and holo MDM2 (green).k) Structural alignment of HPV-11 E2's apo (white, PDB ID: 1R6K) and holo (orange, PDB ID: 1R6N) structures.l-m) Close-up views of the cryptic cavity in apo and holo structures.n-o) Surface representation of apo (white) and holo HPV-11 E2 (orange).Relevant residues are depicted as sticks and labelled, with the inhibitor omitted for clarity (panels a, f and k ).Inhibitors are shown as green sticks (panels c, h and m).The cryptic pockets are detectable in the holo structures (dotted white circles in panels e, j and o) but not in the corresponding apo structures (solid red circles in panels d, i and n).20

Figure S9 :
Figure S9: Exposure of IL-2's cryptic pocket in different simulations.Comparison of the exposure profiles of IL-2's cryptic site in unbiased MD (holo-like or apo), mixed-solvent, SWISH-X and SWISH simulations.Solid lines represent the running average of the pocket exposure over a 5 ns time window.Scatter points show the pocket exposure value for each structure in the simulation.Pocket exposure is expressed as a percentage of the volume of the cryptic site ( Å3 ) with respect to the volume of the corresponding pocket in the holocrystal (PDB ID: 1PY2).

Figure S10 :
Figure S10: Exposure of MDM2's cryptic pocket in different simulations.Comparison of the exposure profiles of MDM2's cryptic site in unbiased MD (holo-like or apo), mixed-solvent, SWISH-X and SWISH simulations.Solid lines represent the running average of the pocket exposure over a 5 ns time window.Scatter points show the pocket exposure value for each structure in the simulation.Pocket exposure is expressed as a percentage of the volume of the cryptic site ( Å3 ) with respect to the volume of the corresponding pocket in the holo-crystal (PDB ID: 5LAV).

Figure S11 :
Figure S11: Example of novel open conformation in MDM2's SWISH-X simulation.a) Inhibitor-bound X-ray structure (PDB ID: 5LAV), the protein is represented as surface, the ligand is shown as green sticks.b) Representative structure from the SWISH-X simulation (ligand from PDB ID: 5LAV was included for reference).c) Secondary structure alignment between the X-ray inhibitor-bound MDM2 (green) and a representative conformation from the SWISH-X simulation (grey).The arrow illustrates the loop movement required to expose the deeper cavity.

Figure S12 :
Figure S12: Exposure of HPV-11 E2's cryptic pocket in different simulations.Comparison of the exposure profiles of HPV-11 E2's cryptic site in unbiased MD (hololike or apo), mixed-solvent, SWISH-X and SWISH simulations.Solid lines represent the running average of the pocket exposure over a 5 ns time window.Scatter points show the pocket exposure value for each structure in the simulation.Pocket exposure is expressed as a percentage of the volume of the cryptic site ( Å3 ) with respect to the volume of the corresponding pocket in the holo-crystal (PDB ID: 1R6N).

Table S2 :
SWISH-X and SWISH simulation parameters.

Table S4 :
Cryptic pockets opening times under different sampling schemes.