Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

You’ve supercharged your research process with ACS and Mendeley!

STEP 1:
Click to create an ACS ID

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

MENDELEY PAIRING EXPIRED
Your Mendeley pairing has expired. Please reconnect
ACS Publications. Most Trusted. Most Cited. Most Read
Ensemble Simulations and Experimental Free Energy Distributions: Evaluation and Characterization of Isoxazole Amides as SMYD3 Inhibitors
My Activity

Figure 1Loading Img
  • Open Access
Computational Biochemistry

Ensemble Simulations and Experimental Free Energy Distributions: Evaluation and Characterization of Isoxazole Amides as SMYD3 Inhibitors
Click to copy article linkArticle link copied!

  • Shunzhou Wan
    Shunzhou Wan
    Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
    More by Shunzhou Wan
  • Agastya P. Bhati
    Agastya P. Bhati
    Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
  • David W. Wright
    David W. Wright
    Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
  • Ian D. Wall
    Ian D. Wall
    GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K.
    More by Ian D. Wall
  • Alan P. Graves
    Alan P. Graves
    GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
  • Darren Green
    Darren Green
    GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K.
    More by Darren Green
  • Peter V. Coveney*
    Peter V. Coveney
    Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
    Advanced Research Computing Centre, University College London, London WC1H 0AJ U.K.
    Institute for Informatics, Faculty of Science, University of Amsterdam, 1098XH Amsterdam, The Netherlands
    *Email: [email protected]
Open PDF

Journal of Chemical Information and Modeling

Cite this: J. Chem. Inf. Model. 2022, 62, 10, 2561–2570
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jcim.2c00255
Published May 4, 2022

Copyright © 2022 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0 .

Abstract

Click to copy section linkSection link copied!

Optimization of binding affinities for ligands to their target protein is a primary objective in rational drug discovery. Herein, we report on a collaborative study that evaluates various compounds designed to bind to the SET and MYND domain-containing protein 3 (SMYD3). SMYD3 is a histone methyltransferase and plays an important role in transcriptional regulation in cell proliferation, cell cycle, and human carcinogenesis. Experimental measurements using the scintillation proximity assay show that the distributions of binding free energies from a large number of independent measurements exhibit non-normal properties. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to predict the binding free energies and to provide a detailed chemical insight into the nature of ligand–protein binding. Our results show that the 1-trajectory ESMACS protocol works well for the set of ligands studied here. Although one unexplained outlier exists, we obtain excellent statistical ranking across the set of compounds from the ESMACS protocol and good agreement between calculations and experiments for the relative binding free energies from the TIES protocol. ESMACS and TIES are again found to be powerful protocols for the accurate comparison of the binding free energies.

This publication is licensed under

CC-BY 4.0 .
  • cc licence
  • by licence
Copyright © 2022 The Authors. Published by American Chemical Society

1. Introduction

Click to copy section linkSection link copied!

SMYD3 has been characterized as a versatile lysine methyltransferase and is associated with multiple types of cancer, including colorectal, liver, and breast cancer. A range of histone and non-histone protein substrates are lysine N-methylated by methyl transfer from the SAM (S-adenosyl-l-methionine) cofactor of SMYD3. Notably, loss of SMYD3 catalytic activity inhibited tumorigenesis in the presence of oncogenic Ras, (1) suggesting that inhibition of SMYD3 in cancers with elevated RAS pathway signaling may be a useful therapeutic strategy. However, few SMYD3 inhibitors have been described. (2) GSK has previously reported a crystal structure of the ternary complex of SMYD3, SAH [S-adenosylhomocysteine, a reaction product of the methyl group donor SAM] and MEKK2 [MAPK/ERK (mitogen-activated protein kinase/extracellular signal-regulated kinase) kinase 2] and a second crystal structure showing GSK2807 binding in the SAM pocket. (3) More recently, the identification and optimization of a series of isoxazole amides as SMYD3 inhibitors was reported. (4) The ability of computational binding free energy calculations to predict the affinity of that series of ligands is presented here.
The last 10 years have seen substantial progress in the use of computational chemistry methods within both academia and the pharmaceutical industry for quantitative structure-based drug discovery, thanks to burgeoning computational power, the increasing number of crystal structures, the accuracy of force fields, the improvement of the sampling methods and control of errors, alongside the automation and general usability of the approaches. Many pharmaceutical companies have adopted free energy predictions as a routine tool to support their drug discovery efforts. (5,6) This progress has been prompted by Schrödinger’s drug discovery platform, especially the FEP+ implementation. (7) There are other packages and workflows used in academia and/or industry, which integrate and automate the process of free energy calculation, including the steps of planning, set up, simulation, and analyses. (8,9)
In the last few years, our team at UCL has developed two ensemble-based protocols for free energy calculations, termed “enhanced sampling of molecular dynamics with approximation of continuum solvent” (ESMACS) (8,10) and “thermodynamic integration with enhanced sampling” (TIES). (8,11) ESMACS is based on the molecular mechanics Poisson–Boltzmann surface area (MMPBSA) method, (12) while TIES centers on thermodynamic integration (TI). Although the protocols are built around the standard MMPBSA and TI methodologies, the names and abbreviations of these protocols are used to emphasize the central importance of the ensemble-based nature of the protocols employed. (8,10,13−16) The term “ensemble” here refers to a set of individual (often called “replica”) simulations conducted for the same physical system, starting from different initial conformations (13) (and possibly also with varying model parameters (14)). Advances in high-end computing capabilities offer the opportunity to run all of the replicas concurrently, ensuring that the results can be delivered rapidly, exactly as has been done in climate and weather forecasting for the past 20 years. Ensemble approaches lead to increased reliability and reproducibility, with tighter control of standard uncertainty for nonlinear systems which are chaotic in nature. (8,17,18) Ensemble approaches provide probabilistic statements about quantities of interest and the likelihood of various outcomes. Only by large scale ensemble sampling can the nature of statistical distributions of free energies be assessed; we have previously shown that they exhibit non-normal behavior. (8,16,18,19) Binding free energies obtained from independent replica simulations of the same molecular system can vary by a wide range, while their distributions deviates from the Gaussian behavior usually assumed as we have reported. (16) Such properties are important for the proper use of many statistical methods (discussed further in Section 3.5). ESMACS and TIES are performed using a binding affinity calculator (BAC), (20) which is a computational pipeline to automate the processes of building, running, and marshaling the molecular dynamics simulations, as well as collecting and analyzing data. It is notable that until now virtually no other groups have been using or reporting results based on ensemble calculations which hitherto remain anything but routine methods within the entire field of classical molecular dynamics. (8,16,18)
Depending on the usability, reliability, rapidity, and throughput, these automated packages could find application at various stages of the drug discovery process across the wider pharmaceutical industry. In practice, however, the application of computational approaches is still dependent on the experience and knowledge of the practitioner. It remains a challenge for non-expert users to apply these existing tools to make robust predictions on a timescale that can substantially impact drug discovery programs. For a given approach, the success of predictions also varies significantly across different protein targets with different sets of compounds. Studies have shown that the initial crystal structures and the existence of multiple conformations can have a significant effect on the quality of free energy predictions. (21,22) Based on the experience, knowledge, and intuition we have accumulated, we propose the following criteria to predict the quality of the calculations: (1) how well the binding site is defined and structured; (2) how well a compound fits into the binding pocket; and (3) how many rotamers and/or binding poses a compound may manifest.
The purpose of the present study is to evaluate the ability of ESMACS and TIES to estimate binding affinities of a set of 22 ligands (Table 1) to the protein target. For the SMYD3 systems studied here, the binding site of the protein is well structured (Figure 1), and the binding mode for the scaffold of the congeneric compounds (Table 1) is well-defined in the crystal structure. It is thus likely, based on our foregoing criteria, that a reasonable prediction can be achieved, although the relatively large size of the binding site, the presence of multiple components in the structure including several crystallographic water molecules involved in binding, and the rotatable bonds at the R2 site of the compounds (Table 1) still pose a challenge for the conformational sampling and hence the accuracy and precision of the predictions.

Figure 1

Figure 1. Structure of SMYD3, complexed with one of the compounds in this study. The N-terminal SET domain, the MYND domain, the post-SET domain, and the C-terminal region are colored in orange, yellow, pink, and white, respectively. The cofactor SAH is shown with a bond representation and the ligand (S01) in a ball-and-stick model. The Zn2+ ions are shown with sphere models and colored green.

Table 1. Compounds Investigated in This Study
a

The binding free energy is estimated by ΔGexp = RT × ln (IC50) = – RT × ln (10) × pIC50, where R is the gas constant and T is temperature, which is set to 300 K in this study.

b

There was no activity at the highest concentration (250 μM) tested.

2. Material and Methods

Click to copy section linkSection link copied!

The X-ray structure of SMYD3 consists of a co-factor SAH and three zinc ions which are important for the folding of the protein (Figure 1).

2.1. Experiments

A series of isoxazole amides was chosen to cover a range of binding affinity and chemical structure. These include a diversity of lipophilicity, aromaticity, and formal charge; the variation of formal charge has historically been challenging for free energy predictions. The IC50s acquired for the compounds were measured using the scintillation proximity assay using MEKK2-based peptide as a substrate previously reported. (4)

2.2. Computational Approach

2.2.1. Compounds

A set of compounds named SXX were provided by GSK, where XX was a two-digit number where integers lower than 10 were preceded by a 0 (Table 1), with mean values of pIC50 from experimental assay. Two of the compounds, S21 and S22, were reported as C01 and C28 in a previous publication. (4) All compounds shared the same scaffold (Table 1). The compounds were docked into the binding pocket of SMYD3 using Glide. (23) Modeling was carried out on a GSK internal structural precursor to 6P7Z. The rmsd between the structure used and 6P7Z is approximately 0.3 Å. The compounds were docked into the structure with a substructural constraint on the isoxazole-amide-piperidine-sulfone substructure (as shown in Table 1) using Glide XP in Maestro 2015-2.

2.2.2. Model Preparation

The preparation and setup of the simulations were implemented using BAC, (20) including parameterization of the compounds, solvation of the complexes, addition of counterions to electrostatically neutralize the systems, and generation of configurations files for the simulations. The Amber package (24) was used for the parameterization of the compounds, the setup of the systems, and the analyses of the results. The Amber ff14SB force field was used for the protein and TIP3P for water molecules. The protonation states were assigned using the “reduce” module of AmberTools. Parameters of the ligands were produced using the general Amber force field 2 (GAFF2) with Gaussian 16 to optimize compound geometries and to determine electrostatic potentials at the Hartree–Fock level with 6-31G** basis functions. The restrained electrostatic potential module in the AmberTools was used to calculate the partial atomic charges for the compounds. All systems were solvated in orthorhombic water boxes with a minimum extension from the protein of 14 Å.

2.2.3. ESMACS

We used the ESMACS (10) protocol for the simulations and analyses. The protocol uses replica simulations to obtain reproducible binding affinity predictions with robust uncertainty estimates. It is based on the molecular mechanics Poisson–Boltzmann surface area (MMPBSA), which is an approximate method for calculating absolute binding affinities from molecular dynamics trajectories. It is an endpoint free energy calculation, in which the difference in binding free energy, ΔG, is calculated using
(1)
where Gi is the free energy of component i which corresponds to either complex (com), protein (pro), or ligand (lig), and is calculated from a set of structures from MD simulations. The free energies can be broken down into a number of components, including the molecular mechanics energy in the gas phase and the solvation free energy. While the former is derived from the molecular modeling force field used, the latter is estimated as the sum of the electrostatic solvation free energy calculated using the Poisson–Boltzmann equation and the nonpolar solvation free energy calculated from the solvent accessible surface area. The binding free energy is calculated from the difference between calculations performed for the complex, ligand, and receptor conformations obtained from simulation. The 1-trajectory approach was used here, in which conformations of the protein and the ligands were extracted from the ligand–protein complex simulations. The ESMACS protocol has options to include configurational entropy within the free energy calculations, obtained from normal model analyses or other approximations. In most situations, however, the inclusion of entropic contributions fails to improve the rankings of predicted binding affinities. (25−28) In the current ESMACS study, the entropic term is neglected because of its minor contribution to the predicted rankings.

2.2.4. Thermodynamic Integration with Enhanced Sampling

We used TIES (11) to calculate the relative binding free energies for pairs of compounds. In TIES, an alchemical transformation for the mutated entity is used in both aqueous solution and within the ligand–protein complex. The free energy changes of the alchemical mutation processes, ΔGaqalch and ΔGcomplexalch, are calculated by
(2)
where λ (0 ≤ λ ≤ 1) is an alchemical coupling parameter such that λ = 0 and λ = 1 correspond to the initial and final thermodynamic states, and ∂V(λ)/∂λ is the partial derivative of the hybrid potential energy V(λ) at an intermediate state λ. ⟨···⟩λ denotes an ensemble average over configurations representative of the state λ. For each transformation, 13 λ windows are used with fixed values 0.0, 0.05, 0.1, 0.2, ..., 0.9, 0.95, 1.0.
The binding free energy difference is then calculated from
(3)

2.2.5. Simulations

The BAC, (20) an automated workflow tool for free energy calculations, was used to prepare and execute ESMACS and TIES simulations. The standard protocol (10,11,29) was used, in which NAMD (30) simulations with 25 and 5 replicas were performed for ESMACS and TIES, respectively. Each replica in the ensemble started with identical atomic coordinates, with different initial velocities generated independently from a Maxwell–Boltzmann distribution.
To avoid the well-known “end-point catastrophe,” (31) a soft-core potential was used for van der Waals (vdW) interactions involving the perturbed atoms. The electrostatic interactions were linearly scaled but at a faster rate than the vdW interactions, so that the partial charges were removed for the disappearing atoms before they were fully annihilated and were introduced on the appearing atoms after they already partially appeared.
The MD package NAMD2.12 (30) was used throughout the equilibration and production runs of all simulations. For each replica in an ensemble, energy minimizations were first performed with heavy protein atoms restrained at their initial positions. The initial velocities were then generated independently from a Maxwell–Boltzmann distribution at 50 K, and the systems were heated up to and kept at 300 K within 60 ps. A series of equilibration runs, totaling 2 ns, were conducted, while the restraints on heavy atoms were gradually reduced. Finally, 4 ns production simulations were run for each replica for all ESMACS and TIES simulations.
The ESMACS simulations for the compounds S01–S20 were initially conducted using 10-replica ensembles on the DNAnexus platform (https://www.dnanexus.com/) which provides strong cybersecurity. Previous studies (10,19,29,32,33) have established a standard ESMACS protocol with 25 replicas and have shown that the combination of the simulation length and the size of the ensemble provides a trade-off between computational cost and precision. The choice of a smaller number here was designed to reduce computational cost on the cloud environment. The study was later extended to include two more compounds, C01 and C28 from Su et al., (4) renamed as S21 and S22, to extend ESMACS to 25 replicas and, more importantly, to perform TIES studies on the selected compound pairs. The Blue Waters supercomputer at the National Center for Supercomputing Applications (NCSA) in the US was used for the extended ESMACS simulations. The SuperMUC supercomputer at Leibniz Supercomputing Centre in Germany was used for the TIES simulations. For 1 ns simulation, it took ∼5.4 h on a single GRID K520 card using DNAnexus, ∼2.1 h on 128 CPUs on Blue Waters, and ∼0.9 h on 220 CPUs on SuperMUC. It should be noted that Nvidia GRID K520 GPUs are now discontinued and that both the supercomputers referenced here are now retired. As a result, any state-of-the-art supercomputers and cloud services are much faster than the ones used for this study.

3. Results

Click to copy section linkSection link copied!

To assess the accuracy and precision of the method, we evaluated the binding affinities of the ligands (Table 1) to SMYD3 and compared the computed results with experimental data. ESMACS was used for the full set of the ligands, while TIES was applied to some selected pairs of the ligands with the same net charge. When making a direct comparison of specific correlation coefficients, we also quote the 95% confidence intervals (CI).

3.1. Reproducibility of the ESMACS Simulations

It is well known that the differences of the initial conditions among individual simulations lead to rapid divergence of trajectories. (13) Many complex systems hence exhibit sensitive dependence on initial conditions. The calculated thermodynamic properties from individual simulations will therefore inevitably differ. Two sets of ESMACS simulations were performed for the complexes SXX (Table 1) independently on Blue Waters and DNAnexus (see the Material and Methods section above). Figure 2 shows the variances and correlation of the calculated binding free energies from the two sets of simulations. Excellent agreement was observed between calculations using two different computational platforms: the HPC machine Blue Waters and the cloud environment DNAnexus, with a highly significant Spearman correlation of 0.98 [CI: 0.94–0.99]. No statistical differences were seen between the two sets of calculated binding free energies: 16 out of 20 compounds having identical results, within error bars, and the remaining 4 within two error bars (Figure 2). The two simulations produce consistent results, with a mean signed difference of 0.13 kcal/mol and a mean unsigned difference of 0.63 kcal/mol. Both of the calculations have good correlations with the experimental measurement, with correlation coefficients of 0.80 [CI: 0.56–0.92] and 0.78 [CI: 0.52–0.91] for the simulations on Blue Waters and DNAnexus, respectively. Because of the smaller number of replicas used in the DNAnexus simulations, the error bars in these simulations are ∼1.5 times larger than those from Blue Waters simulations. As the two simulations produce similar accuracies, only the results from Blue Waters are reported in the following analyses.

Figure 2

Figure 2. Comparison of calculated binding free energies from two independent studies of the ligand-SMYD3 models performed on Blue Waters (bw, horizontal axis) and DNAnexus (DNA, vertical axis). The dashed line shows an ideal y = x regression. The standard errors are calculated using a bootstrapping method.

3.2. Correlations between ESMACS Calculations and Experimental Measurements

The predicted binding free energies from the 1-trajectory approach exhibit a high correlation with the experimental data (Figure 3), with a Pearson correlation coefficient of 0.84 [CI: 0.65–0.93]. In a pharmaceutical drug development project, compounds are designed or selected for the same protein target. The ranking of the binding affinities is not affected by the energy of the protein Gpro (eq 1) when the conformational space is sufficiently sampled. Free energies of a protein differ in its bound and unbound states. The difference, called the adaptation free energy, (10) provides an indication of the conformational changes of the protein and the energetic costs upon binding. Inclusion of adaptation free energies improves the correlations between simulations and experimental measurements in some cases (10,34,35) and does not have obvious effects in other cases. (26) For the current data set of compounds, the binding site is relatively large. No significant strain is induced within protein upon compound binding. The inclusion of adaptation free energies of the protein degrades the correlations, with a correlation coefficient of 0.70 [CI: 0.40–0.87].

Figure 3

Figure 3. Comparison of calculated binding free energies and experimental binding affinity data from 1-traj ESMACS approach. The dotted line shows a linear regression. A correlation coefficient of 0.84 [CI: 0.65–0.93] is obtained for the entire set of compounds. The +1e charged compounds are shown in blue and electrostatically neutral ones in black.

The calculations correctly distinguish the charged compounds from the neutral ones (Figure 3). The R2 group (Table 1) locates in a hydrophilic pocket in which negatively charged residues GLU192, ASP241, and GLU294 form favorable electrostatic interactions with the positively charged R2 group (Figure 4). This makes the binding of charged compounds more favorable in general than the electrostatically neutral ones. The two variants at R1 studied here may not affect the binding affinities significantly because the ethyl group and the cyclopropyl ring are similar in their hydrophobic properties and their sizes. The two compounds, S01 and S21, differing only at the R1 group, have similar binding affinities from both experimental measurements and the ESMACS calculations (Figure 3). It should be noted that no activity was detected for S08 at the highest concentration (250 μM) in the experiments (Table 1), indicating that its binding affinity is likely to be less negative than that presented in Figure 3. This makes the point deviate even farther from the regression line. Our TIES calculations also show that S08 is an outlier (see details below).

Figure 4

Figure 4. Electrostatic interactions between the R2 group and the protein. The negatively charged residues GLU192, ASP241, and GLU294, and positively charged R2 group, along with positively charged LYS 329, form favorable electrostatic interactions.

3.3. TIES Results

Relative binding free energies (ΔΔG) are calculated using TIES for selected pairs of compounds. Each compound is paired at least once with other compounds. No compounds are paired if they have different net charges, as alchemical methods encounter specific difficulties when changes in the net charge arise and charge corrections are required. The results of these relative binding free energy calculations are compared with the data derived from experimental measurements (Figure 5).

Figure 5

Figure 5. Comparison between TIES-predicted relative binding affinities and experimental data for a total of 19 compound pairs. The long dashed line represents y = x, whereas dashed and dotted lines represent 1 kcal/mol and 2 kcal/mol ranges, respectively. The pairs involving S08 are highlighted in blue.

As the compound S08 may be denoted as an outlier (see details below), the analyses are performed separately for the data set with and without the compound. The overall mean unsigned error is 1.21 kcal/mol for the entire data set and 0.68 kcal/mol when pairs involving S08 are excluded. The mean signed errors are 0.62 kcal/mol and –0.06 kcal/mol for the data set with and without S08, respectively. Except the pairs with S08, only one compound pair, S22–S20, has a predicted ΔΔG value which differs from the experimental data by >2 kcal/mol. The main difference between the two compounds are the three rotatable bonds at R2 (Table 1). The rotation of these rotatable bonds leads to large conformational flexibilities in S22, which may need longer simulation time to get reliable prediction.

3.4. S08 Remains an Outlier

Our initial TIES calculations only contained one compound pair involving S08, of which the deviation between the calculation and the experimental data is large. As the ESMACS simulation also shows that it is an outlier, we have paired S08 with three other compounds for TIES simulations. The results only confirm that there is a systematic deviation in the binding free energy for S08 between calculations and experimental measurements. Based on the four TIES calculations for S08, the average difference between calculations and experimental data is 3.21 kcal/mol. In other words, the compound S08 needs to have a binding affinity 3.21 kcal/mol more negative in the experiments or 3.21 kcal/mol less negative in the calculations to make them agree with each other. This value is also in a good agreement with the ESMACS calculation, with which the data point for S08 can be shifted much closer to the regression line (Figure 3).
The compound S08 consists of a nitrile group of which the nitrogen is highly electronegative. Although the negatively charged residues at the R2 pocket are unfavorable for the presentence of the nitrile group, the positively charged residue LYS329 and the relatively spacious pocket appear to be able to tolerate the group. The detailed analyses of the simulation trajectories do not provide more insights. Further searching in the experimental data set identified another compound, which is very similar to S08 and shares the same nitrile group at the R2. The compound also does not show any activity at the highest concentration tested in the assay (data not shown). Although it could be an experimental issue, it is more likely to be a force field or possibly a sampling issue. As there are no satisfactory explanations for the disagreement between the experiments and the calculations, S08 remains as an unexplained outlier. Such unexplained outliers are not unusual in drug discovery and development projects. Machine learning approaches have been proposed to identify the differences between the calculations and the experimental data and to provide empirical correction terms to the predictions from the alchemical approaches, but these too depend on assumptions which are rarely articulated concerning the way in which the data are distributed. (36)

3.5. Non-normal Distributions of Free Energy Calculations and Measurement

Normal distributions have been typically assumed in experimental measurements and calculations of binding free energies. The assumptions are commonly made for the true ΔGbinding for a large number of compounds, for the experimentally determined and computationally predicted ΔGbinding for a given compound, as well as for the relative binding free energies ΔΔGbinding. The normal distributions are characterized by an average μ and a standard deviation σ. Although the presence of uncertainties is known to the scientific community broadly, they are still “known unknowns”: in many cases, we do not know the order of magnitude of the various uncertainties, the sources and the consequences of them, not to mention how to reduce them. It is important to describe the free energy distribution carefully as many statistical analyses are based on it. The most important assumption in regression dilution, (37) for example, is that all the variables under consideration are normally distributed. If this is not the case, regression dilution may not be applied.
The assertion that the calculated binding free energies ΔGcal or binding free energy differences ΔΔGcal follow a normal distribution conflicts with our observation that such data are not in general normally distributed. (8,16,18,19) Newtonian molecular dynamics is inherently nonlinear, and this is the underlying reason why the dynamics is chaotic in the technical sense. Not only are individual trajectories extremely sensitive to initial conditions, they become increasingly inaccurate as the duration of such a simulation unfolds. They manifest long range correlations which are not present in Gaussian statistics.
In experimental measurement of binding free energies, uncertainties on the order of 0.3–0.5 kcal/mol for ΔGexp and 0.4–0.7 kcal/mol for ΔΔGexp have been claimed from high-quality experimental measurements. (7) It is, however, very often the case that experimental data are reported as single numbers, without quantification of the uncertainties. We have no knowledge about the statistics of ΔGexp or ΔΔGexp reported, let alone the distribution of these quantities. This means that the unknown and unstated error bars may be varying in all manner of ways, so claiming that they are normally distributed is not credible.
There are four compounds in the current project, which have been tested >100 times for their activities to SMYD3. One of them is S21 (Figure 6a), which has been computationally investigated here. The other three are for more potent compounds that are from a related but slightly different series. The relatively large number of tests makes it possible to verify the distributions of the experimental data. It should be noted that while compounds a and b do not show any drift in the assay over time, compounds c and d (Table 2 and Figure 6) show a small amount of time dependency.

Figure 6

Figure 6. Distributions of experimental pIC50 values shown as histogram and kernel density curve. Four compounds from the current project (a–d) are used, which have been tested more than 100 times. A bin size of 0.05 is used. The dashed lines indicate the means of the experimental measurements.

Table 2. Statistics of the Experimental Measurements pIC50 for the Compounds with >100 Tests
compoundno. of testaverageSDskewnesskurtosis
a2645.290.100.881.47
b1147.310.12–0.350.23
c1247.470.192.047.56
d1167.090.212.115.30
All of the four distributions are skewed from a normal distribution, with skewness deviating from 0. Three of them are highly skewed with positive skewness (Table 2), indicating that the distributions have longer tails on the right side than those on the left (Figure 6). The other one is moderately skewed with a negative skewness and a longer tail on the left. The excess kurtoses are all positive, meaning that compared to a normal distribution, the tails are longer and heavier. It should be noted that the experimental data shown in Figure 6 are representative of the behavior of such ligand–protein binding affinities more widely and that other data remain confidential to GSK. Overall, these results imply the presence of non-normal distributions in the experimental measurements.
The fat-tailed distribution of binding affinities indicates that the expectation value of this quantity lies far from the most probable value of the distribution (Figure 6). The probability of observing a high value of the binding affinity is far higher than for a normal distribution (Table 3). The skewness (Table 2) characterizes the deviation of the distribution from symmetry, evidenced by the unequal distributions in the corresponding bins centered at their means and by the presence of longer tails on one side than the other (Table 3). The lack of symmetry is highlighted in Figure 6 for compounds c and d, which exhibit large skewness (Table 2). The high values of kurtosis indicate the existence of outliers in the distribution. The measured binding free energies for compound d, for example, have a probability of 5.2% to be found more than two standard deviations from the mean on the left side, compared with a probability of 2.3% were the distribution normal. Asymmetric distributions with a long tail on one side have been commonly observed in climate studies. (38) One such study, (39) which highlights the impact of fat-tailed distributions, found that, because of the probability distribution being fat-tailed at higher temperatures, there was a predicted 3% chance of an increase in global surface temperature by 8 °C or more following a doubling of carbon dioxide concentrations, whereas if the distribution were normal, this probability would be only 0.3%.
Table 3. Percentages of Samples in Different Binsa
compound(−∞, −2σ)[−2σ, −σ)[−σ, 0)[0, σ)[σ, 2σ)[2σ, ∞)
a4.99.829.543.911.00.8
b1.815.832.536.89.63.5
c3.27.328.254.86.50
d5.24.328.455.26.90
normal2.313.634.134.113.62.3
a

The experimental binding affinities pIC50 have been converted into binding free energies using ΔG = RT ln (IC50). σ values are the standard deviations of the normal distributions which best fit the binding free energies for each of the compounds. The percentages from a normal distribution are listed for reference.

4. Conclusions

Click to copy section linkSection link copied!

Using the TIES and ESMACS protocols, we have computed the binding free energies of a series of ligands to zinc finger protein SMYD3. Although an unexplained outlier exists, we obtain excellent statistical rankings across the set of compounds from the two protocols. ESMACS and TIES are again found to be powerful protocols for the accurate comparison of the binding free energies.
We have previously reported the non-normal properties of calculated binding free energies. In the current study, we investigate the distributions of experimentally measured free energies and find that the distributions are highly skewed. The practical implications of this discovery are important to apprehend. Non-normal distributions imply the occurrence of more “outliers”, making it essential to perform multiple measurements to pin down average behavior. It is also a call to exercise caution in the use of statistical methods for the comparison of experimental data and computational predictions, as the assumption of normal distributions is not generally valid.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
    • Peter V. Coveney - Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.Advanced Research Computing Centre, University College London, London WC1H 0AJ U.K.Institute for Informatics, Faculty of Science, University of Amsterdam, 1098XH Amsterdam, The NetherlandsOrcidhttps://orcid.org/0000-0002-8787-7256 Email: [email protected]
  • Authors
    • Shunzhou Wan - Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.Orcidhttps://orcid.org/0000-0001-7192-1999
    • Agastya P. Bhati - Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.
    • David W. Wright - Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.Orcidhttps://orcid.org/0000-0002-5124-8044
    • Ian D. Wall - GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K.
    • Alan P. Graves - GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
    • Darren Green - GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K.
  • Notes
    The authors declare no competing financial interest.

    The structures for the SMYD3 protein and the compounds are freely available at https://doi.org/10.23728/b2share.8e650caf35cb437dbf9007afc84c5257.

Acknowledgments

Click to copy section linkSection link copied!

We are grateful for funding for the UK MRC Medical Bioinformatics project (grant no. MR/L016311/1), the EPSRC funded UK Consortium on Mesoscale Engineering Sciences (UKCOMES grant no. EP/L00030X/1), the European Commission for EU H2020 CompBioMed2 Centre of Excellence (grant no. 823712) and for EU H2020 EXDCI-2 project (grant no. 800957), and NSF award (https://www.nsf.gov/pubs/2017/nsf17542/nsf17542.htm, Award No. NSF 1713749). We made use of the Blue Waters supercomputer at the National Center for Supercomputing Applications of the University of Illinois at Urbana-Champaign, access to which was made available through the aforementioned NSF award. Additional calculations were conducted using an award of computer time on the Theta machine at Argonne Leadership Computing Facility provided by the US Department of Energy’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program (through the INSPIRE project and a 2021 award “COMPBIO”). We are grateful to DNAnexus for both computational resources and technical support from Fiona Ford, Brett Hannigan and Chai Fungtammasan. We acknowledge the Leibniz Supercomputing Centre for providing access to SuperMUC (https://www.lrz.de/services/compute/) and the very able assistance of its scientific support staff.

References

Click to copy section linkSection link copied!

This article references 39 other publications.

  1. 1
    Mazur, P. K.; Reynoird, N.; Khatri, P.; Jansen, P. W.; Wilkinson, A. W.; Liu, S.; Barbash, O.; Van Aller, G. S.; Huddleston, M.; Dhanak, D.; Tummino, P. J.; Kruger, R. G.; Garcia, B. A.; Butte, A. J.; Vermeulen, M.; Sage, J.; Gozani, O. SMYD3 Links Lysine Methylation of Map3k2 to Ras-Driven Cancer. Nature 2014, 510, 283287,  DOI: 10.1038/nature13320
  2. 2
    Bottino, C.; Peserico, A.; Simone, C.; Caretti, G. SMYD3: An Oncogenic Driver Targeting Epigenetic Regulation and Signaling Pathways. Cancers 2020, 12, 142,  DOI: 10.3390/cancers12010142
  3. 3
    Van Aller, G. S.; Graves, A. P.; Elkins, P. A.; Bonnette, W. G.; McDevitt, P. J.; Zappacosta, F.; Annan, R. S.; Dean, T. W.; Su, D. S.; Carpenter, C. L.; Mohammad, H. P.; Kruger, R. G. Structure-Based Design of a Novel SMYD3 Inhibitor That Bridges the Sam-and Mekk2-Binding Pockets. Structure 2016, 24, 774781,  DOI: 10.1016/j.str.2016.03.010
  4. 4
    Su, D. S.; Qu, J.; Schulz, M.; Blackledge, C. W.; Yu, H.; Zeng, J.; Burgess, J.; Reif, A.; Stern, M.; Nagarajan, R.; Pappalardi, M. B.; Wong, K.; Graves, A. P.; Bonnette, W.; Wang, L.; Elkins, P.; Knapp-Reed, B.; Carson, J. D.; McHugh, C.; Mohammad, H.; Kruger, R.; Luengo, J.; Heerding, D. A.; Creasy, C. L. Discovery of Isoxazole Amides as Potent and Selective SMYD3 Inhibitors. ACS Med. Chem. Lett. 2020, 11, 133140,  DOI: 10.1021/acsmedchemlett.9b00493
  5. 5
    Schindler, C. E. M.; Baumann, H.; Blum, A.; Bose, D.; Buchstaller, H. P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuchss, T.; Gradler, U.; Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.; Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.; Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann, K.; Schulz, R.; Steinbrecher, T.; Tanzer, E. M.; Unzue Lopez, A.; Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects. J. Chem. Inf. Model. 2020, 60, 54575474,  DOI: 10.1021/acs.jcim.0c00900
  6. 6
    Ciordia, M.; Pérez-Benito, L.; Delgado, F.; Trabanco, A. A.; Tresadern, G. Application of Free Energy Perturbation for the Design of Bace1 Inhibitors. J. Chem. Inf. Model. 2016, 56, 18561871,  DOI: 10.1021/acs.jcim.6b00220
  7. 7
    Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc. 2015, 137, 26952703,  DOI: 10.1021/ja512751q
  8. 8
    Wan, S.; Bhati, A. P.; Zasada, S. J.; Coveney, P. V. Rapid, Accurate, Precise and Reproducible Ligand-Protein Binding Free Energy Prediction. Interface Focus 2020, 10, 20200007  DOI: 10.1098/rsfs.2020.0007
  9. 9
    Wan, S.; Tresadern, G.; Pérez-Benito, L.; Vlijmen, H.; Coveney, P. V. Accuracy and Precision of Alchemical Relative Free Energy Predictions with and without Replica-Exchange. Adv. Theory Simul. 2019, 3, 1900195  DOI: 10.1002/adts.201900195
  10. 10
    Wan, S.; Knapp, B.; Wright, D. W.; Deane, C. M.; Coveney, P. V. Rapid, Precise, and Reproducible Prediction of Peptide-MHC Binding Affinities from Molecular Dynamics That Correlate Well with Experiment. J. Chem. Theory Comput. 2015, 11, 33463356,  DOI: 10.1021/acs.jctc.5b00179
  11. 11
    Bhati, A. P.; Wan, S.; Wright, D. W.; Coveney, P. V. Rapid, Accurate, Precise, and Reliable Relative Free Energy Prediction Using Ensemble Based Thermodynamic Integration. J. Chem. Theory Comput. 2017, 13, 210222,  DOI: 10.1021/acs.jctc.6b00979
  12. 12
    Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E., 3rd Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. Acc. Chem. Res. 2000, 33, 889897,  DOI: 10.1021/ar000033j
  13. 13
    Coveney, P. V.; Wan, S. On the Calculation of Equilibrium Thermodynamic Properties from Molecular Dynamics. Phys. Chem. Chem. Phys. 2016, 18, 3023630240,  DOI: 10.1039/C6CP02349E
  14. 14
    Vassaux, M.; Wan, S.; Edeling, W.; Coveney, P. V. Ensembles Are Required to Handle Aleatoric and Parametric Uncertainty in Molecular Dynamics Simulation. J. Chem. Theory Comput. 2021, 17, 51875197,  DOI: 10.1021/acs.jctc.1c00526
  15. 15
    Wade, A.; Bhati, A. P.; Wan, S.; Coveney, P. V. Alchemical Free Energy Estimators and Molecular Dynamics Engines: Accuracy Precision and Reproducibility. ChemRxiv 2021,  DOI: 10.26434/chemrxiv-2021-nqp8r .
  16. 16
    Bhati, A. P.; Coveney, P. V. Large Scale Study of Ligand-Protein Relative Binding Free Energy Calculations: Actionable Predictions from Statistically Robust Protocols. J. Chem. Theory Comput. 2022, 18, 26872702,  DOI: 10.1021/acs.jctc.1c01288
  17. 17
    Knapp, B.; Ospina, L.; Deane, C. M. Avoiding False Positive Conclusions in Molecular Simulation: The Importance of Replicas. J. Chem. Theory Comput. 2018, 14, 61276138,  DOI: 10.1021/acs.jctc.8b00391
  18. 18
    Wan, S.; Sinclair, R. C.; Coveney, P. V. Uncertainty Quantification in Classical Molecular Dynamics. Philos. Trans. R. Soc., A 2021, 379, 20200082  DOI: 10.1098/rsta.2020.0082
  19. 19
    Bieniek, M. K.; Bhati, A. P.; Wan, S.; Coveney, P. V. TIES 20: Relative Binding Free Energy with a Flexible Superimposition Algorithm and Partial Ring Morphing. J. Chem. Theory Comput. 2021, 17, 12501265,  DOI: 10.1021/acs.jctc.0c01179
  20. 20
    Sadiq, S. K.; Wright, D.; Watson, S. J.; Zasada, S. J.; Stoica, I.; Coveney, P. V. Automated Molecular Simulation Based Binding Affinity Calculator for Ligand-Bound HIV-1 Proteases. J. Chem. Inf. Model. 2008, 48, 19091919,  DOI: 10.1021/ci8000937
  21. 21
    Suruzhon, M.; Bodnarchuk, M. S.; Ciancetta, A.; Viner, R.; Wall, I. D.; Essex, J. W. Sensitivity of Binding Free Energy Calculations to Initial Protein Crystal Structure. J. Chem. Theory Comput. 2021, 17, 18061821,  DOI: 10.1021/acs.jctc.0c00972
  22. 22
    Pérez-Benito, L.; Keränen, H.; van Vlijmen, H.; Tresadern, G. Predicting Binding Free Energies of PDE2 Inhibitors. The Difficulties of Protein Conformation. Sci. Rep. 2018, 8, 4883,  DOI: 10.1038/s41598-018-23039-5
  23. 23
    Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren, T. A.; Sanschagrin, P. C.; Mainz, D. T. Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes. J. Med. Chem. 2006, 49, 61776196,  DOI: 10.1021/jm051256o
  24. 24
    Case, D. A.; Cheatham, T. E., 3rd; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 16681688,  DOI: 10.1002/jcc.20290
  25. 25
    Zhang, X.; Perez-Sanchez, H.; Lightstone, F. C. A Comprehensive Docking and MM/GBSA Rescoring Study of Ligand Recognition Upon Binding Antithrombin. Curr. Top. Med. Chem. 2017, 17, 16311639,  DOI: 10.2174/1568026616666161117112604
  26. 26
    Wright, D. W.; Husseini, F.; Wan, S.; Meyer, C.; van Vlijmen, H.; Tresadern, G.; Coveney, P. V. Application of the ESMACS Binding Free Energy Protocol to a Multi-Binding Site Lactate Dehydogenase a Ligand Dataset. Adv. Theory Simul. 2019, 3, 1900194  DOI: 10.1002/adts.201900194
  27. 27
    Wright, D. W.; Wan, S.; Meyer, C.; van Vlijmen, H.; Tresadern, G.; Coveney, P. V. Application of ESMACS Binding Free Energy Protocols to Diverse Datasets: Bromodomain-Containing Protein 4. Sci. Rep. 2019, 9, 6017,  DOI: 10.1038/s41598-019-41758-1
  28. 28
    Wan, S.; Potterton, A.; Husseini, F. S.; Wright, D. W.; Heifetz, A.; Malawski, M.; Townsend-Nicholson, A.; Coveney, P. V. Hit-to-Lead and Lead Optimization Binding Free Energy Calculations for G Protein-Coupled Receptors. Interface Focus 2020, 10, 20190128  DOI: 10.1098/rsfs.2019.0128
  29. 29
    Wright, D. W.; Hall, B. A.; Kenway, O. A.; Jha, S.; Coveney, P. V. Computing Clinically Relevant Binding Free Energies of HIV-1 Protease Inhibitors. J. Chem. Theory Comput. 2014, 10, 12281241,  DOI: 10.1021/ct4007037
  30. 30
    Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 17811802,  DOI: 10.1002/jcc.20289
  31. 31
    Beveridge, D. L.; Dicapua, F. M. Free-Energy Via Molecular Simulation - Applications to Chemical and Biomolecular Systems. Annu. Rev. Biophys. Bioeng. 1989, 18, 431492,  DOI: 10.1146/annurev.bb.18.060189.002243
  32. 32
    Genheden, S.; Ryde, U. How to Obtain Statistically Converged MM/GBSA Results. J. Comput. Chem. 2010, 31, 837846,  DOI: 10.1002/jcc.21366
  33. 33
    Sadiq, S. K.; Wright, D. W.; Kenway, O. A.; Coveney, P. V. Accurate Ensemble Molecular Dynamics Binding Free Energy Ranking of Multidrug-Resistant HIV-1 Proteases. J. Chem. Inf. Model. 2010, 50, 890905,  DOI: 10.1021/ci100007w
  34. 34
    Wan, S.; Bhati, A. P.; Skerratt, S.; Omoto, K.; Shanmugasundaram, V.; Bagal, S. K.; Coveney, P. V. Evaluation and Characterization of Trk Kinase Inhibitors for the Treatment of Pain: Reliable Binding Affinity Predictions from Theory and Computation. J. Chem. Inf. Model. 2017, 57, 897909,  DOI: 10.1021/acs.jcim.6b00780
  35. 35
    Wan, S.; Bhati, A. P.; Zasada, S. J.; Wall, I.; Green, D.; Bamborough, P.; Coveney, P. V. Rapid and Reliable Binding Affinity Prediction of Bromodomain Inhibitors: A Computational Study. J. Chem. Theory Comput. 2017, 13, 784795,  DOI: 10.1021/acs.jctc.6b00794
  36. 36
    Scheen, J.; Wu, W.; Mey, A.; Tosco, P.; Mackey, M.; Michel, J. Hybrid Alchemical Free Energy/Machine-Learning Methodology for the Computation of Hydration Free Energies. J. Chem. Inf. Model. 2020, 60, 53315339,  DOI: 10.1021/acs.jcim.0c00600
  37. 37
    Frost, C.; Thompson, S. G. Correcting for Regression Dilution Bias: Comparison of Methods for a Single Predictor Variable. J. R. Stat. Soc. 2000, 163, 173189,  DOI: 10.1111/1467-985X.00164
  38. 38
    Ruff, T. W.; Neelin, J. D. Long Tails in Regional Surface Temperature Probability Distributions with Implications for Extremes under Global Warming. Geophys. Res. Lett. 2012, 39, L04704,  DOI: 10.1029/2011GL050610
  39. 39
    Weitzman, M. L. Fat-Tailed Uncertainty in the Economics of Catastrophic Climate Change. Rev. Environ. Econ. Policy 2011, 5, 275292,  DOI: 10.1093/reep/rer006

Cited By

Click to copy section linkSection link copied!

This article is cited by 5 publications.

  1. Hannes H. Loeffler, Shunzhou Wan, Marco Klähn, Agastya P. Bhati, Peter V. Coveney. Optimal Molecular Design: Generative Active Learning Combining REINVENT with Precise Binding Free Energy Ranking Simulations. Journal of Chemical Theory and Computation 2024, 20 (18) , 8308-8328. https://doi.org/10.1021/acs.jctc.4c00576
  2. Shunzhou Wan, Agastya P. Bhati, Alexander D. Wade, Peter V. Coveney. Ensemble-Based Approaches Ensure Reliability and Reproducibility. Journal of Chemical Information and Modeling 2023, 63 (22) , 6959-6963. https://doi.org/10.1021/acs.jcim.3c01654
  3. Shunzhou Wan, Agastya P. Bhati, Peter V. Coveney. Comparison of Equilibrium and Nonequilibrium Approaches for Relative Binding Free Energy Predictions. Journal of Chemical Theory and Computation 2023, 19 (21) , 7846-7860. https://doi.org/10.1021/acs.jctc.3c00842
  4. Alexander D. Wade, Agastya P. Bhati, Shunzhou Wan, Peter V. Coveney. Alchemical Free Energy Estimators and Molecular Dynamics Engines: Accuracy, Precision, and Reproducibility. Journal of Chemical Theory and Computation 2022, 18 (6) , 3972-3987. https://doi.org/10.1021/acs.jctc.2c00114
  5. Shunzhou Wan, Agastya P. Bhati, David W. Wright, Alexander D. Wade, Gary Tresadern, Herman van Vlijmen, Peter V. Coveney. The performance of ensemble-based free energy protocols in computing binding affinities to ROS1 kinase. Scientific Reports 2022, 12 (1) https://doi.org/10.1038/s41598-022-13319-6
Open PDF

Journal of Chemical Information and Modeling

Cite this: J. Chem. Inf. Model. 2022, 62, 10, 2561–2570
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jcim.2c00255
Published May 4, 2022

Copyright © 2022 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0 .

Article Views

1343

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. Structure of SMYD3, complexed with one of the compounds in this study. The N-terminal SET domain, the MYND domain, the post-SET domain, and the C-terminal region are colored in orange, yellow, pink, and white, respectively. The cofactor SAH is shown with a bond representation and the ligand (S01) in a ball-and-stick model. The Zn2+ ions are shown with sphere models and colored green.

    Figure 2

    Figure 2. Comparison of calculated binding free energies from two independent studies of the ligand-SMYD3 models performed on Blue Waters (bw, horizontal axis) and DNAnexus (DNA, vertical axis). The dashed line shows an ideal y = x regression. The standard errors are calculated using a bootstrapping method.

    Figure 3

    Figure 3. Comparison of calculated binding free energies and experimental binding affinity data from 1-traj ESMACS approach. The dotted line shows a linear regression. A correlation coefficient of 0.84 [CI: 0.65–0.93] is obtained for the entire set of compounds. The +1e charged compounds are shown in blue and electrostatically neutral ones in black.

    Figure 4

    Figure 4. Electrostatic interactions between the R2 group and the protein. The negatively charged residues GLU192, ASP241, and GLU294, and positively charged R2 group, along with positively charged LYS 329, form favorable electrostatic interactions.

    Figure 5

    Figure 5. Comparison between TIES-predicted relative binding affinities and experimental data for a total of 19 compound pairs. The long dashed line represents y = x, whereas dashed and dotted lines represent 1 kcal/mol and 2 kcal/mol ranges, respectively. The pairs involving S08 are highlighted in blue.

    Figure 6

    Figure 6. Distributions of experimental pIC50 values shown as histogram and kernel density curve. Four compounds from the current project (a–d) are used, which have been tested more than 100 times. A bin size of 0.05 is used. The dashed lines indicate the means of the experimental measurements.

  • References


    This article references 39 other publications.

    1. 1
      Mazur, P. K.; Reynoird, N.; Khatri, P.; Jansen, P. W.; Wilkinson, A. W.; Liu, S.; Barbash, O.; Van Aller, G. S.; Huddleston, M.; Dhanak, D.; Tummino, P. J.; Kruger, R. G.; Garcia, B. A.; Butte, A. J.; Vermeulen, M.; Sage, J.; Gozani, O. SMYD3 Links Lysine Methylation of Map3k2 to Ras-Driven Cancer. Nature 2014, 510, 283287,  DOI: 10.1038/nature13320
    2. 2
      Bottino, C.; Peserico, A.; Simone, C.; Caretti, G. SMYD3: An Oncogenic Driver Targeting Epigenetic Regulation and Signaling Pathways. Cancers 2020, 12, 142,  DOI: 10.3390/cancers12010142
    3. 3
      Van Aller, G. S.; Graves, A. P.; Elkins, P. A.; Bonnette, W. G.; McDevitt, P. J.; Zappacosta, F.; Annan, R. S.; Dean, T. W.; Su, D. S.; Carpenter, C. L.; Mohammad, H. P.; Kruger, R. G. Structure-Based Design of a Novel SMYD3 Inhibitor That Bridges the Sam-and Mekk2-Binding Pockets. Structure 2016, 24, 774781,  DOI: 10.1016/j.str.2016.03.010
    4. 4
      Su, D. S.; Qu, J.; Schulz, M.; Blackledge, C. W.; Yu, H.; Zeng, J.; Burgess, J.; Reif, A.; Stern, M.; Nagarajan, R.; Pappalardi, M. B.; Wong, K.; Graves, A. P.; Bonnette, W.; Wang, L.; Elkins, P.; Knapp-Reed, B.; Carson, J. D.; McHugh, C.; Mohammad, H.; Kruger, R.; Luengo, J.; Heerding, D. A.; Creasy, C. L. Discovery of Isoxazole Amides as Potent and Selective SMYD3 Inhibitors. ACS Med. Chem. Lett. 2020, 11, 133140,  DOI: 10.1021/acsmedchemlett.9b00493
    5. 5
      Schindler, C. E. M.; Baumann, H.; Blum, A.; Bose, D.; Buchstaller, H. P.; Burgdorf, L.; Cappel, D.; Chekler, E.; Czodrowski, P.; Dorsch, D.; Eguida, M. K. I.; Follows, B.; Fuchss, T.; Gradler, U.; Gunera, J.; Johnson, T.; Jorand Lebrun, C.; Karra, S.; Klein, M.; Knehans, T.; Koetzner, L.; Krier, M.; Leiendecker, M.; Leuthner, B.; Li, L.; Mochalkin, I.; Musil, D.; Neagu, C.; Rippmann, F.; Schiemann, K.; Schulz, R.; Steinbrecher, T.; Tanzer, E. M.; Unzue Lopez, A.; Viacava Follis, A.; Wegener, A.; Kuhn, D. Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects. J. Chem. Inf. Model. 2020, 60, 54575474,  DOI: 10.1021/acs.jcim.0c00900
    6. 6
      Ciordia, M.; Pérez-Benito, L.; Delgado, F.; Trabanco, A. A.; Tresadern, G. Application of Free Energy Perturbation for the Design of Bace1 Inhibitors. J. Chem. Inf. Model. 2016, 56, 18561871,  DOI: 10.1021/acs.jcim.6b00220
    7. 7
      Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M. K.; Greenwood, J.; Romero, D. L.; Masse, C.; Knight, J. L.; Steinbrecher, T.; Beuming, T.; Damm, W.; Harder, E.; Sherman, W.; Brewer, M.; Wester, R.; Murcko, M.; Frye, L.; Farid, R.; Lin, T.; Mobley, D. L.; Jorgensen, W. L.; Berne, B. J.; Friesner, R. A.; Abel, R. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc. 2015, 137, 26952703,  DOI: 10.1021/ja512751q
    8. 8
      Wan, S.; Bhati, A. P.; Zasada, S. J.; Coveney, P. V. Rapid, Accurate, Precise and Reproducible Ligand-Protein Binding Free Energy Prediction. Interface Focus 2020, 10, 20200007  DOI: 10.1098/rsfs.2020.0007
    9. 9
      Wan, S.; Tresadern, G.; Pérez-Benito, L.; Vlijmen, H.; Coveney, P. V. Accuracy and Precision of Alchemical Relative Free Energy Predictions with and without Replica-Exchange. Adv. Theory Simul. 2019, 3, 1900195  DOI: 10.1002/adts.201900195
    10. 10
      Wan, S.; Knapp, B.; Wright, D. W.; Deane, C. M.; Coveney, P. V. Rapid, Precise, and Reproducible Prediction of Peptide-MHC Binding Affinities from Molecular Dynamics That Correlate Well with Experiment. J. Chem. Theory Comput. 2015, 11, 33463356,  DOI: 10.1021/acs.jctc.5b00179
    11. 11
      Bhati, A. P.; Wan, S.; Wright, D. W.; Coveney, P. V. Rapid, Accurate, Precise, and Reliable Relative Free Energy Prediction Using Ensemble Based Thermodynamic Integration. J. Chem. Theory Comput. 2017, 13, 210222,  DOI: 10.1021/acs.jctc.6b00979
    12. 12
      Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E., 3rd Calculating Structures and Free Energies of Complex Molecules: Combining Molecular Mechanics and Continuum Models. Acc. Chem. Res. 2000, 33, 889897,  DOI: 10.1021/ar000033j
    13. 13
      Coveney, P. V.; Wan, S. On the Calculation of Equilibrium Thermodynamic Properties from Molecular Dynamics. Phys. Chem. Chem. Phys. 2016, 18, 3023630240,  DOI: 10.1039/C6CP02349E
    14. 14
      Vassaux, M.; Wan, S.; Edeling, W.; Coveney, P. V. Ensembles Are Required to Handle Aleatoric and Parametric Uncertainty in Molecular Dynamics Simulation. J. Chem. Theory Comput. 2021, 17, 51875197,  DOI: 10.1021/acs.jctc.1c00526
    15. 15
      Wade, A.; Bhati, A. P.; Wan, S.; Coveney, P. V. Alchemical Free Energy Estimators and Molecular Dynamics Engines: Accuracy Precision and Reproducibility. ChemRxiv 2021,  DOI: 10.26434/chemrxiv-2021-nqp8r .
    16. 16
      Bhati, A. P.; Coveney, P. V. Large Scale Study of Ligand-Protein Relative Binding Free Energy Calculations: Actionable Predictions from Statistically Robust Protocols. J. Chem. Theory Comput. 2022, 18, 26872702,  DOI: 10.1021/acs.jctc.1c01288
    17. 17
      Knapp, B.; Ospina, L.; Deane, C. M. Avoiding False Positive Conclusions in Molecular Simulation: The Importance of Replicas. J. Chem. Theory Comput. 2018, 14, 61276138,  DOI: 10.1021/acs.jctc.8b00391
    18. 18
      Wan, S.; Sinclair, R. C.; Coveney, P. V. Uncertainty Quantification in Classical Molecular Dynamics. Philos. Trans. R. Soc., A 2021, 379, 20200082  DOI: 10.1098/rsta.2020.0082
    19. 19
      Bieniek, M. K.; Bhati, A. P.; Wan, S.; Coveney, P. V. TIES 20: Relative Binding Free Energy with a Flexible Superimposition Algorithm and Partial Ring Morphing. J. Chem. Theory Comput. 2021, 17, 12501265,  DOI: 10.1021/acs.jctc.0c01179
    20. 20
      Sadiq, S. K.; Wright, D.; Watson, S. J.; Zasada, S. J.; Stoica, I.; Coveney, P. V. Automated Molecular Simulation Based Binding Affinity Calculator for Ligand-Bound HIV-1 Proteases. J. Chem. Inf. Model. 2008, 48, 19091919,  DOI: 10.1021/ci8000937
    21. 21
      Suruzhon, M.; Bodnarchuk, M. S.; Ciancetta, A.; Viner, R.; Wall, I. D.; Essex, J. W. Sensitivity of Binding Free Energy Calculations to Initial Protein Crystal Structure. J. Chem. Theory Comput. 2021, 17, 18061821,  DOI: 10.1021/acs.jctc.0c00972
    22. 22
      Pérez-Benito, L.; Keränen, H.; van Vlijmen, H.; Tresadern, G. Predicting Binding Free Energies of PDE2 Inhibitors. The Difficulties of Protein Conformation. Sci. Rep. 2018, 8, 4883,  DOI: 10.1038/s41598-018-23039-5
    23. 23
      Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren, T. A.; Sanschagrin, P. C.; Mainz, D. T. Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes. J. Med. Chem. 2006, 49, 61776196,  DOI: 10.1021/jm051256o
    24. 24
      Case, D. A.; Cheatham, T. E., 3rd; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 16681688,  DOI: 10.1002/jcc.20290
    25. 25
      Zhang, X.; Perez-Sanchez, H.; Lightstone, F. C. A Comprehensive Docking and MM/GBSA Rescoring Study of Ligand Recognition Upon Binding Antithrombin. Curr. Top. Med. Chem. 2017, 17, 16311639,  DOI: 10.2174/1568026616666161117112604
    26. 26
      Wright, D. W.; Husseini, F.; Wan, S.; Meyer, C.; van Vlijmen, H.; Tresadern, G.; Coveney, P. V. Application of the ESMACS Binding Free Energy Protocol to a Multi-Binding Site Lactate Dehydogenase a Ligand Dataset. Adv. Theory Simul. 2019, 3, 1900194  DOI: 10.1002/adts.201900194
    27. 27
      Wright, D. W.; Wan, S.; Meyer, C.; van Vlijmen, H.; Tresadern, G.; Coveney, P. V. Application of ESMACS Binding Free Energy Protocols to Diverse Datasets: Bromodomain-Containing Protein 4. Sci. Rep. 2019, 9, 6017,  DOI: 10.1038/s41598-019-41758-1
    28. 28
      Wan, S.; Potterton, A.; Husseini, F. S.; Wright, D. W.; Heifetz, A.; Malawski, M.; Townsend-Nicholson, A.; Coveney, P. V. Hit-to-Lead and Lead Optimization Binding Free Energy Calculations for G Protein-Coupled Receptors. Interface Focus 2020, 10, 20190128  DOI: 10.1098/rsfs.2019.0128
    29. 29
      Wright, D. W.; Hall, B. A.; Kenway, O. A.; Jha, S.; Coveney, P. V. Computing Clinically Relevant Binding Free Energies of HIV-1 Protease Inhibitors. J. Chem. Theory Comput. 2014, 10, 12281241,  DOI: 10.1021/ct4007037
    30. 30
      Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. Scalable Molecular Dynamics with NAMD. J. Comput. Chem. 2005, 26, 17811802,  DOI: 10.1002/jcc.20289
    31. 31
      Beveridge, D. L.; Dicapua, F. M. Free-Energy Via Molecular Simulation - Applications to Chemical and Biomolecular Systems. Annu. Rev. Biophys. Bioeng. 1989, 18, 431492,  DOI: 10.1146/annurev.bb.18.060189.002243
    32. 32
      Genheden, S.; Ryde, U. How to Obtain Statistically Converged MM/GBSA Results. J. Comput. Chem. 2010, 31, 837846,  DOI: 10.1002/jcc.21366
    33. 33
      Sadiq, S. K.; Wright, D. W.; Kenway, O. A.; Coveney, P. V. Accurate Ensemble Molecular Dynamics Binding Free Energy Ranking of Multidrug-Resistant HIV-1 Proteases. J. Chem. Inf. Model. 2010, 50, 890905,  DOI: 10.1021/ci100007w
    34. 34
      Wan, S.; Bhati, A. P.; Skerratt, S.; Omoto, K.; Shanmugasundaram, V.; Bagal, S. K.; Coveney, P. V. Evaluation and Characterization of Trk Kinase Inhibitors for the Treatment of Pain: Reliable Binding Affinity Predictions from Theory and Computation. J. Chem. Inf. Model. 2017, 57, 897909,  DOI: 10.1021/acs.jcim.6b00780
    35. 35
      Wan, S.; Bhati, A. P.; Zasada, S. J.; Wall, I.; Green, D.; Bamborough, P.; Coveney, P. V. Rapid and Reliable Binding Affinity Prediction of Bromodomain Inhibitors: A Computational Study. J. Chem. Theory Comput. 2017, 13, 784795,  DOI: 10.1021/acs.jctc.6b00794
    36. 36
      Scheen, J.; Wu, W.; Mey, A.; Tosco, P.; Mackey, M.; Michel, J. Hybrid Alchemical Free Energy/Machine-Learning Methodology for the Computation of Hydration Free Energies. J. Chem. Inf. Model. 2020, 60, 53315339,  DOI: 10.1021/acs.jcim.0c00600
    37. 37
      Frost, C.; Thompson, S. G. Correcting for Regression Dilution Bias: Comparison of Methods for a Single Predictor Variable. J. R. Stat. Soc. 2000, 163, 173189,  DOI: 10.1111/1467-985X.00164
    38. 38
      Ruff, T. W.; Neelin, J. D. Long Tails in Regional Surface Temperature Probability Distributions with Implications for Extremes under Global Warming. Geophys. Res. Lett. 2012, 39, L04704,  DOI: 10.1029/2011GL050610
    39. 39
      Weitzman, M. L. Fat-Tailed Uncertainty in the Economics of Catastrophic Climate Change. Rev. Environ. Econ. Policy 2011, 5, 275292,  DOI: 10.1093/reep/rer006