Exploring PROTAC Cooperativity with Coarse-Grained Alchemical Methods

Proteolysis targeting chimera (PROTAC) is a novel drug modality that facilitates the degradation of a target protein by inducing proximity with an E3 ligase. In this work, we present a new computational framework to model the cooperativity between PROTAC–E3 binding and PROTAC–target binding principally through protein–protein interactions (PPIs) induced by the PROTAC. Due to the scarcity and low resolution of experimental measurements, the physical and chemical drivers of these non-native PPIs remain to be elucidated. We develop a coarse-grained (CG) approach to model interactions in the target–PROTAC–E3 complexes, which enables converged thermodynamic estimations using alchemical free energy calculation methods despite an unconventional scale of perturbations. With minimal parametrization, we successfully capture fundamental principles of cooperativity, including the optimality of intermediate PROTAC linker lengths that originates from configurational entropy. We qualitatively characterize the dependency of cooperativity on PROTAC linker lengths and protein charges and shapes. Minimal inclusion of sequence- and conformation-specific features in our current force field, however, limits quantitative modeling to reproduce experimental measurements, but further development of the CG model may allow for efficient computational screening to optimize PROTAC cooperativity.

The complete potential energy function for a ternary complex is where x E , x T , and x P indicate the coordinates of the E3 ligase, the target protein, and the PROTAC respectively, q represent the charges of protein beads, and b are indicators of whether protein beads are at the binding pocket or not. All PROTAC beads are modeled with 0 charge and no attraction to the proteins. All parameters and variables are defined using a length scale of the large bead (σ = 0.8 nm) and an energy scale of ϵ = kT where k is the Boltzmann constant and T = 310 K.

Internal energy terms
Interactions within a protein are modeled by an elastic network model (ENM) such that every pair of beads within distance R c is connected by a harmonic spring: where k spring is the spring constant, d ij is the optimal distance between x i and x j , and The optimal distance between a pair of beads is its initial distance in the experimental structure. Experimental structures used in this work include VHL (PDB: S-2 PROTAC is modeled as a linear molecule, where adjacent beads are connected by springs (U spring (x P )) and non-adjacent beads are subjected to steric repulsions (U WCA (x P )).

Interaction energy terms
PROTAC-protein interactions consist of binding interactions modeled by springs between a binding moiety bead in the PROTAC and all beads in the corresponding binding pocket (1)) and steric repulsions (U WCA (x P , x T ) and Protein-protein interactions are captured by the steric repulsions (U WCA (x E , x T )), and depending on the modeling purpose, electrostatics (U elec (x E , x T ; q)) or nonspecific attrac- The electrostatic interaction is modeled by a Debye-Hückel (DH) potential. The functional forms and parameterization of both potentials can be found in. 5 When reducing the screening of electrostatics between BRD4 BD2 and VHL, the Debye length κ is multiplied by 10. The solvent in our system is treated implicitly. Nonspecific attractions aimed at broadly including Van der Waals forces and hydrophobic interactions are modeled by LJ potentials. The strength of the attraction is kept under that of electrostatic interactions (Fig. S1). The well depth of LJ, ϵ LJ , is currently set to be the same for all pairs of beads for nonspecific attraction. For future efforts, minor modifications to the formula 6 and parameterization of ϵ LJ to depend on the Wimley-White hydrophobicity scale, for example, can capture more sequence-specific interactions such as the hydrophobic effects.

Parameterization of ENM
ENM is a model that represents the tertiary structure of a protein by connecting every pair of protein beads within a certain distance cutoff R c by a Hookean spring of spring constant S-3 Interaction potential ("#) distance (%) Figure S1: The strengths of various interaction potentials are plotted over the distance between protein beads. The two vertical dashed grey lines bound the distance between 1 and 2 σ. The electrostatic potentials (DH) are plotted for beads with +1 and +1 charges or +1 and -1 charges.
k spring . Despite the simplicity of its parameterization, slow modes in ENM can capture biologically significant conformational changes. 7,8 This structure-based model can also be used in combination with other physics-driven forcefields to model macromolecular complexes.
Protein-protein associations and viral capsid assembly have both been successfully modeled by using Elnedyn, an ENM at the resolution of 1 residue per bead, 9 on top of the MAR-TINI CG forcefield. By fitting to atomistic simulations, Elnedyn preserves both structural properties and dynamics within each protein subunit for the CG simulations.
We follow a similar protocol and fit our CG ENM parameters in eq.(2) to Elnedyn simulations results. Three proteins -IKZF1 ZF2 (PDB: 6H0F 10 chain C), BRD4 BD1 (PDB:  Figure S2: Fitting results of ENM parameters arranged by proteins (rows) and evaluation metrics (columns). Numbers in parenthesis next to protein names are the number of CG beads. For each plot, blue regions indicate k spring and R c values that result in good fitting, and red regions indicate significant differences between our simulations and Elnedyn simulations. Each column shares the same colorbar range. In general, the boxed regions around k spring = 100ϵ/σ 2 and R c = 2.0σ has good fitting.

S-6
Substantial overlap is achieved when the neighboring states are similar, which requires a fine spacing of the coupling parameter values. In practice, distributions of quantities such as ∆U and ∂U/∂λ that are directly involved in free energy estimations are often treated as proxies for the high-dimensional phase space. 11 The similarity between distributions is quantified by KL divergence, where 0 indicates identical distributions and ≫ 1 suggests concerning differences. Based on this metric, all neighboring states have substantial overlap, as the Kullback-Leibler (KL) divergence values of ∆U and of ∂U/∂λ distributions both stay below 1 (Fig. S3a).
Bennett's overlapping histogram 12 provides another qualitative test for the overlap of ∆U distributions. The difference between C is an arbitrary constant between 0 and 1 and P λ i (∆U λ i ,λ i+1 ) and P λ i (∆U λ i ,λ i+1 ) are the distributions of ∆U λ i ,λ i+1 obtained by sampling from neighboring alchemical states λ i and  (Fig. S4b). Both equilibration time and decorrelation time are longer for simulations in lower value of λ LJ states that retain more memory of previously sampled configurations due to lower energetic costs. Currently, the equilibration and autocorrelation cutoffs depend on each system. For convenience, we used the same cutoffs for all λ states. In the future, this can be customized for each state to maximize the number of samples, especially from states of high λ values that requires less equilibration and decorrelation time (Fig. S4b). Linker beads (! ! ) Figure S5: ∆∆Gs calculated by TI and BAR are superimposed onto the MBAR results shown in Figure 3 to show that all three alchemical free energy calculation methods agree within noise.  Figure S6: ∆∆Gs calculated by TI and BAR agree with MBAR results shown in Figure 4 for the BRD4 BD2 -VHL system modeled with protein charges included. (a) ∆∆Gs at each PROTAC linker length calculated by TI and BAR are broken down using waterfall plots similar to Figure 4b. In each triplet, columns from left to right correspond to ∆G binary , −∆G ternary(other) , and −∆G ternary(charges) . Columns are arranged cumulatively such that the end point of a triplet of columns represent the final ∆∆G value calculated by the corresponding method. MBAR ∆∆G values with ±1 standard deviation are shown as horizontal yellow bands for reference. (b) TI and MBAR calculations of the electrostatic contribution to ∆∆G under different forcefield setups at the linker length of 3 beads agree with each other. Note that ∆G ternary(charges) is shown here rather than −∆G ternary(charges) in panel (a).

S-10
JQ1 I-BET726 linker E3 ligand Figure S7: The structure of MZ1, which is a PROTAC with linker length of 3 beads using a JQ1 warhead, extracted from the ternary crystal structure (PDB: 5T35 1 ) and the structure of I-BET726 warhead extracted from the crystal structure of a binary complex (PDB: 4BJX 14 ) are superimposed to highlight the difference in exit vectors (black arrows).