Rapid estimation of catalytic efficiency by cumulative atomic multipole moments: application to ketosteroid isomerase mutants. Journal

We propose a simple atomic multipole electrostatic model to rapidly evaluate the effects of mutation on enzyme activity and test its performance on wild-type and mutant ketosteroid isomerase. The predictions of our atomic multipole model are similar to those obtained with symmetry-adapted perturbation theory at a fraction of the computational cost. We further show that this approach is relatively insensitive to the precise amino acid side chain conformation in mutants and may thus be useful in computational enzyme (re)design.


System set-up and parameters
The protein coordinates of the KSI dimer were taken from PDB entry 1OHP, chains A and B, with Asn38 mutated back to the wild-type catalytic base Asp38. In chain A, the intermediate of the KSI reaction was included, based on the coordinates of the co-crystallised inhibitor 5αestran-3,17-dione, whereas 5α-estran-3,17-dione was removed from the active site of chain B. Asp99 was treated as protonated in both chains (in line with its role in the enzyme and predicted pK a values from PropKa3.1 1 of 8.2 in chain A with the substrate bound and 9.2 in chain B with an empty active site), and Asp38 was treated as protonated only in chain A, based on the mechanism (Asp38 has abstracted the proton from C4 of the substrate to form the intermediate; see scheme in main manuscript). All other ionisable residues were treated in their standard protonation states (Asp/Glu negatively charged, Arg/Lys positively charged, His neutral). All three histidines in both chains were modeled as the tautomer protonated on Nε2 and the functional groups of the following residues were 'flipped' (rotated by 180°) as predicted by the Optimal Hydrogen Bonding Network 2 : Asn2, Asn104, His100 and His122 in chain A, and Asn19, Asn57 and Asn104 in chain B. All crystallographic waters were kept and additional water molecules were added using the solvate plugin to VMD 3 (with a 'buffer' for overlapping water deletion of 2.3) such that the edge of the waterbox was at least 11 Å away from any protein atom. The autoionize plugin was then used to replace 6 bulk waters with Na + to neutralise the system. The resulting PSF file (generated by the VMD plugin psfgen) was converted by the AmberTools14 utility chamber into a prmtop file that can be used by AMBER. This ensures that CHARMM force-fields are accurately interpreted by AMBER 4 . The CHARMM36 5 force-field was used for the protein, TIP3P 6 for water and CgenFF 7 parameters (atom types assigned using https://cgenff.paramchem.org) with CHELPG charges (obtained with the RED Server 8  Simulation and analysis -methods All simulations were performed with the AMBER package v. 14. Two independent simulations were performed for each mutant (with randomized positions of the Na + ions prior to initial minimisation). Minimisations were performed with sander (on single CPUs) and MD simulations with pmemd.MPI (on 8 CPUs). Each system was prepared for MD simulation by initial minimisation of all solvent, the substrate and all hydrogens (300 steps), a brief (50 ps) MD simulation of all solvent and finally a minimisation of all atoms (300 steps) with positional restraints on the protein Cα atoms of 5.0 kcal mol −1 Å −2 . The system was then heated (after random assignment of velocities at 25 K) to 298 K in 20 ps, using Langevin dynamics for temperature control (collision frequency of 1 ps), again with the positional restraints on the protein Cα atoms. The positional restraints were gradually released and the pressure equilibrated to 1 atm in 4 steps of 10 ps, using the Berendsen barostat (pressure relaxation time 1 ps). Finally, 1 ns simulations were performed in the NPT ensemble at 298 K and 1 atm (with Langevin dynamics for temperature control and the Berendsen barostat). Throughout all MD simulations, a one-sided restraint was applied to ensure that the distance between Asp38 Oδ2 and C4 of the substrate does not become larger than 2.65 Å, so that the conformations sampled are in line with the reaction mechanism. Further, a 2 fs timestep was used with SHAKE applied to bonds involving hydrogen, the cut-off for directspace non-bonded interactions was 8 Å and long-range electrostatics were treated using the Particle Mesh Ewald method. The 2 trajectories obtained from two 1 ns simulations (saved every 1 ps) were clustered based on the mass-weighted RMSD of the substrate, Asp38 (the catalytic base) and the residues at positions 14 & 99 in chain A. The hierarchical agglomerative algorithm was used and 4 clusters were requested, using the AmberTools program cpptraj 9 .

Convergence of Differential Intermediate State Stabilization (DISS) energy calculated by Cumulative Atomic Multipole Moments (CAMM)
To test whether the multipolar expansion of the differential intermediate state stabilization (DISS) in KSI converges, calculations were performed at different truncation levels L (L=L a +L b , the sum of ranks of multipole moments of monomer a and b, related to the inverse distance term in the multipole expansion via R −(L+1) (exponent truncation), see ref. 10 ). Calculations were performed for all KSI variants considered (Figures S1-S5), employing the whole (rather than the truncated) substrate (see Figure 3) and structures obtained with the minimal perturbation approach (see main text). The CAMM expansion of the DISS energy is close to convergence after the R -4 term (L=3) in most cases. Results diverge after R -6 or R -7 (L=5 or 6, respectively) in some cases, which is likely related to numerical errors in the determination of high-rank; see also further discussion in the main text. Overall, the truncation level R -5 (L=4) is a reasonable choice.      Table S1. Best-fit parameters, Pearson correlation coefficients ( P ) and Spearmann rank correlation coefficients ( S ) between differential intermediate state stabilization (DISS) and experimental apparent free energy barriers (calculated from the works of Kraut and Choi 12,13 , see Computational details). Calculations are based on the cluster centroid structures obtained from MD simulation (see Computational details).

Method
All mutants Without Y14S slope intercept

Comparison of DISS obtained with CAMM or with fixed point charges
Here, we compare the use of CAMM with the use of a fixed point charge model for calculating Differential Intermediate State Stabilization for KSI variants. Point charges were CHELPG charges as obtained with the RED Server 8 for the reactant and intermediate states. Both approaches have a similar trend, but it should be noted that the point-charge approach benefits from a significant cancellation of errors (Table S3); Note that DISS (defined as E int (IS) -E int (RS)) is dominated by the interaction energy with the intermediate state, which in turn is quite well described by a point charge model (due to its non-zero charge). This is reflected in the fact that the difference between the point charge (FF) and CAMM interaction energies is small this state (compared to the reactant state). The errors in interaction energies of the point charge model (compared to the CAMM model) are not systematic, indicating that changing the point charges used cannot easily improve the results.  Y14S) ). Figure S6.

Further comparison of CAMM with point charge approximations is provided in
Here, we consider reactants and Asp99; by increasing the separation between their centers of mass, we monitor Asp99 contribution to the DISS as a function of distance difference DX (with respect to the value from QM/MM study). As a reference we took the DMA multipole moments 14 . The results indicate that point charge models cannot reproduce charge anisotropy well in comparison to CAMM.  The differences between Chakravorty and Hammes-Shiffer's work 19 and our work arise mainly from different sizes of system used in both set of calculations (in our work, the Asp38 side-chain was included in the system considered for the interaction energy calculations, in contrast to Chakravorty and Hammes-Shiffer's work). However, applying our approach to point-charge model of whole proteins leads to values consistent with presented CAMM results and of magnitudes comparable to those from EVB study 19 (see Table S5). There are also minor technical differences impeding direct numerical comparison of these two approaches. Firstly, different PDB structures where used as a starting point for modeling (PDB code 1QJG in EVB study and 1OHP here). Table S5. Interaction energies (in kcal/mol) between androstedione and its environment calculated with point charges (structures obtained from clustering of MD simulations). The values are similar to those presented in Table S4 in the work of Chakravorty and Hammes-Schiffer 19 .

Testing the importance of many-body polarization
To examine the three-body effects associated with polarization, we chose the Tyr14-Tyr55 interaction as a test-case. The hydrogen bond donated by Tyr55 to the Tyr14 -OH may polarize this group and thus strengthen its O-H dipole. To include this effect, the wavefunction was calculated for both residues treated as one system and used for the determination of CAMMs. Subsequently, DISS calculations with this polarized Tyr14 system were compared with DISS calculations using including the individual, non-polarized tyrosine residues (Table S6). In case of the Y14S and Y14F mutants, the same strategy was applied, including Ser14,W TyrOH and Tyr55 in the former, and Phe14 and Tyr55 in the latter case. All three variants that contain the Tyr14-Tyr55 system show a similar influence of polarization: the differential stabilization decreases (i.e. stronger relative stabilization of the intermediate state) by about 0.5 kcal/mol compared to the non-polarized model. Thus, the error arising from the neglecting Tyr14-Tyr55 polarization is ~7% of the 'polarized' value. It is further found that the Tyr55 contribution is nearly constant among mutants (~ −1.5 kcal/mol) and we expect similar behavior from other residues common in all considered variants.
In Y14F, the polarization effect of Tyr55 increases the DISS (i.e. weaker relative stabilization of the intermediate state compared to the reactant state); it's worth noting that in this mutant, the value presented in Table S6 is dominated by Tyr55 (−1.35 kcal/mol). On the other hand, in Y14S the difference is large in comparison with other variants (about 1.2 kcal/mol). This mutation is more difficult to model using the DISS(CAMM) approach (see discussion in main text).

The influence of structure on SAPT results and mutual cancelling of interaction energy terms
In the main text, it was shown that the difference between DISS(CAMM) energies obtained for Minimal Perturbation (MP) structures and MD cluster centroids (MD) is small or at least nearly systematic. Here we present a similar comparison for other terms as defined by SAPT ( Figure S7). The most dramatic case is the D99L mutant, where the short-range penetration term is responsible for overestimation of Leu99 electrostatic (Δ (10) ) contribution. Other levels (Δ and Δ 0 ) are less affected by the structural change by mutual cancelling of those short-range terms (see Figure S8). However, this change is larger and less systematic than that observed in CAMM (see main text).

The effect of substrate truncation
We provide details of comparison between results obtained with full substrate and the one truncated to the two rings encompassing atoms participating in reaction (see Computational Details in main text). As can be seen in Table S7, the maximum error is smaller than 0.5 kcal/mol for overall DISS, with errors for DISS calculated with CAMM are the lowest (<0.25 kcal/mol).

Comparison of computational requirements of CAMM, HF and SAPT0 calculations
To indicate the difference in computational effort of CAMM, HF(SCF) and SAPT0 calculations, we compare the computer times required for evaluation of electrostatic interaction energies for homodimers within S22 training set 21 , see Figure S9. (Hereafter, A and N state for number of atoms and basis set functions in monomers, respectively.) Note that CAMM(R -5 ) calculations for this training set take only a few miliseconds, scaling approximately as O(A 2 ), a negligible value compared to the SAPT0 electrostatic term evaluation in the dimer basis set, scaling as O((2N) 4 ) . It should be mentioned that CAMM energy calculations of any new chemical species requires two SCF runs for each monomer, each one scaling as O(N 4 ), to determine multipole moments (see the SCF series in Fig. S9). However, with a library of pre-computed CAMMs for aminoacids and one initial SCF calculation for each ligand/transition state, evaluation of interaction energies with any number of amino acid residues (including e.g. different enzyme variants) will follow the CAMM curve rather than the SCF curve presented in Fig. S9.
All times presented in Figure S9 were evaluated using the same machine, containing two Intel® Xeon® E5-2630 v2 @ 2.60 GHz CPUs. SCF for monomers from S22 training set 21 were obtained using GAMESS 22 with 12 threads, whereas SAPT0 energies were calculated with PSI4 23 , using the same number of threads. All computations were done with aug-cc-pVDZ basis set. Figure S9. Comparison of computational times of evaluation of CAMM and SAPT0 energies in homodimers from S22 training set 21 . System size is measured as total number of basis set functions in dimer. Here, SCF series mean total time required to determine CAMMs for each monomer. Note that sole CAMM interaction energy calculation takes a few milliseconds.