Efficient In Silico Saturation Mutagenesis of a Member of the Caspase Protease Family

Rational-design methods have proven to be a valuable toolkit in the field of protein design. Numerical approaches such as free-energy calculations or QM/MM methods are fit to widen the understanding of a protein-sequence space but require large amounts of computational time and power. Here, we apply an efficient method for free-energy calculations that combines the one-step perturbation (OSP) with the third-power-fitting (TPF) approach. It is fit to calculate full free energies of binding from three different end states only. The nonpolar contribution to the free energies are calculated for a set of chosen amino acids from a single simulation of a judiciously chosen reference state. The electrostatic contributions, on the other hand, are predicted from simulations of the neutral and charged end states of the individual amino acids. We used this method to perform in silico saturation mutagenesis of two sites in human Caspase-2. We calculated relative binding free energies toward two different substrates that differ in their P1′ site and in their affinity toward the unmutated protease. Although being approximate, our approach showed very good agreement upon validation against experimental data. 76% of the predicted relative free energies of amino acid mutations was found to be true positives or true negatives. We observed that this method is fit to discriminate amino acid mutations because the rate of false negatives is very low (<1.5%). The approach works better for a substrate with medium/low affinity with a Matthews correlation coefficient (MCC) of 0.63, whereas for a substrate with very low affinity, the MCC was 0.38. In all cases, the combined TPF + OSP approach outperformed the linear interaction energy method.


■ INTRODUCTION
Traditional directed-evolution methods utilize a two-step protocol with an initial generation of a rich library by random mutagenesis 1 and then identifying those library members that show improvements in the desired functions. 2 Such a randomly generated library must be huge in order to be relevant, and both generating and screening such a library is expensive. Novel methods have advanced in recent years to substitute the random mutagenesis by knowledge-based design. 3 These modern design methods are fit to allow smaller libraries but preserve or even enhance their relevance. They do so by replacing the random components by information about the structure and function of protein sequences, usually supported by computational algorithms such as QM or MD calculations or machine-learning methods. 4−6 Rational methods can improve the productivity toward the engineered protein in two, usually in sequential steps: (1) locating potential target sites for mutation and (2) narrowing the list of possible amino acids for substitution.
We recently provided a successful example of such a rational design procedure. 7 In this work, statistical and computational methods were jointly applied to engineer human Caspase-2 (Casp-2). The task was to create a biochemical scissor that is fit to cleave fusion tags from a wide variety of proteins. In close proximity to the active site, two point mutations were located to yield a more promiscuous S1′ subsite. After locating potential target sites through a combination of statistical methods with structural information, changes in binding free energies upon mutation were assessed using MD simulations. In short, free energies were calculated with the thermodynamic integration (TI) approach 8 along progressive perturbations using a λ-dependent Hamiltonian of the system. The information-based hypotheses were confirmed experimentally through measurements of cleavage times and Michaelis Menten parameters. These alchemical methods−if done correctly−have been proven to be very accurate and reliable. 9−13 The drawback of these calculations is their high cost in terms of computational power.
Information-based mutagenesis, although more efficient than random mutagenesis, still leaves a huge part of the protein sequence space unknown. On the experimental side, efficient protocols have been described to introduce mutations with all possible amino acids, also known as saturation mutagenesis. 14−16 A novel protocol for in silico saturation mutagenesis was described recently. 17 Here, the non-bonded contributions to the binding free energy are split up into two independent simulation steps for the non-polar (Lennard− Jones, LJ) contributions and the polar (Coulomb) contributions. For the non-polar contributions, a reference state is designed such that it samples relevant conformations to multiple end states. This reference state does not represent a physical molecule and the amount of relevant configurations sampled can be enhanced through a soft-core potential energy function. 18,19 From this state, the free-energy differences to multiple end states can be calculated in a single step using the one-step perturbation (OSP) approach. 20−22 The polar contribution to the free energies are calculated in a two-step charging process where only both end states have to be described explicitly. To approximate a full thermodynamic− integration profile, a cumulant expansion to determine the second derivatives of the free energy is utilized in the thirdpower fitting (TPF) method. 23 Within this study, this in silico saturation mutagenesis method was used to explore the protein sequence space for two point mutations of E105 and G171. Both amino acid positions are in close proximity to the active site of Casp-2 ( Figure 1) and were found to impact the specificity of the S1′ pocket. 24 To develop a versatile protease that can cleave fusion tags from the N-terminus of any protein, the general aim was to identify mutations that enhance the promiscuity of the S1′ binding pocket. 7 To achieve this, we calculated changes in the binding free energy of a model substrate upon mutation of both sites while keeping the respective other site unmutated. All calculations were performed twice using two different P1′ amino acids in the substrate, Ile and Pro. These two amino acids were chosen because branched, apolar amino acids and Pro are known to be weak binders of the S1′ site as can be found in the Merops Database. 7,25 All predictions that are based on these free energies were validated by comparing to data from random saturation mutagenesis experiments. Furthermore, the predictions from this combined OSP + TPF approach were compared to predictions obtained from the linear interaction energy (LIE) approach.
■ METHODS MD Simulations. General Information. All simulations were run using the GROMOS11 molecular simulation software (http://www.gromos.net). 26 Molecular interactions were described through the GROMOS 54A8 force field. 27 For all simulations, water was treated explicitly and implemented by means of the three-site simple point charge model. 28 Simulations were carried out under periodic boundary conditions (PBCs) based on rectangular computational boxes with at least 0.8 nm between any protein atom and the nearest box wall. The equations of motion were integrated using the leap-frog scheme. 29 Bond vibrations were constrained using the SHAKE algorithm 30 with a relative geometric tolerance of 10 −4 . The center of mass translation of the computational box was removed every 2 ps. The temperature and pressure were maintained at 298.15 K and 1 atm by weak coupling 31 using a coupling time of τ T = 0.1 ps and τ P = 0.5 ps and an isothermal compressibility of 7.624 × 10 −4 (kJ mol −1 nm −3 ) −1 . 31 Electrostatic interactions were calculated using a Barker− Watts reaction field (BM) scheme 32,33 with a value of ϵ BW = 61. 34 Nonbonded interactions were calculated using a molecular twin-range charge-group cut-off scheme. The cutoff used for the short-range pairlist construction was set to 0.8 nm and the cutoff used for the long-range interactions was set to 1.4 nm. Interactions within the short range were calculated at every time step from a pairlist that was updated every 10 fs. At pairlist updates, interactions up to the long-range cutoff were computed and kept constant, as appropriate for simulations with the GROMOS force field. 35,36 Preparation of the Protein. The Caspase-2 crystal structure in complex with the inhibitor N-acetyl-L-leucyl-L-α-aspartyl-Lα-glutamyl-L-seryl-L-aspartic aldehyde (PDB ID:1PYO) 37 was retrieved from the PDB data bank (http://www.rcsb.org). 38 Caspase-2 was resolved as a functional dimer, with a disulfide bridge linking the two monomers. The inhibitor was extended after the C-terminus by a chain of the sequence Ile-Val-Ser-Ser and Pro-Val-Ser-Ser to span the entire active site using the MOE 2017 loop modeler. 39 From this loop modeler, the three structures with the highest scores were chosen as representative substrate starting structures. Subsequently, 120 ns long molecular dynamics (MD) simulations (40 ns per substrate starting structure) were performed. The last 30 ns of the individual simulations was used to find a representative substrate starting structure for the subsequent free-energy calculations. This was done by clustering using the algorithm by Daura et al. 40 with a cut-off distance of 0.23 nm. As a starting structure for all following simulations, one representative structure was chosen from the dominant cluster.
Regarding simulations of the reference states, all simulations were executed twice with two different reference states, one with two soft spheres (R2R) and one with a non-interacting dummy atom and one soft sphere (RDR) (Figure 2). These reference states (denoted R4 and R5 in Jandova et al.) were chosen as they showed the closest match to the more elaborate Figure 1. Snapshot of the Casp-2 active site with visualization of the entire substrate (orange), the P1′ site (green, here: Ile) and the two sites of point mutations E105 (red) and G171 (purple). While E105 is part of the substrate-binding cavity, the G171 site is not directly interacting but part of a binding loop that spans the entire binding cavity (purple stick representation of the backbone atoms).
TI method, relative to alternative reference state molecules. 17 The LJ parameters for the soft reference atoms were set to (C6) 1/2 = 0.27322 (kJ mol −1 nm 6 ) 1/2 and (C12) 1/2 = 0.056143 (kJ mol −1 nm 12 ) 1/2 . This yields an effective VdW radius of 0.662 nm for the reference atoms. An LJ soft-core parameter of 1.51 was set for the reference atoms. 18,19 The Cα−D and D−A bond lengths in RDR were set to 0.153 nm, while in R2R, the Cα−A bond length was set to 0.252 nm and the A−A bond length was set to 0.351 nm. The reference states were modeled into the protein at positions E105 and G171 such that the atoms of the backbone overlapped. This was done for both dimers. After energy minimization using the steepest descent algorithm, the entire system was equilibrated. The velocities were randomly assigned at 60 K and solute atoms were positionally restrained using a force constant of 2.5 × 10 4 kJ mol −1 . The system was heated up to 300 K in five discrete steps for 0.4 ns each, resulting in a total equilibration time of 2 ns. In each step, the force constant was lowered by 1 order of magnitude. The subsequent production runs were performed for 50 ns.
Free-Energy Calculations. Relative binding free energies for all point mutations Glu105Xxx and Gly171Xxx were calculated from a set of OSP + TPF calculations using a thermodynamic cycle ( Figure 3). Free energies of charging for amino acids bearing a net charge (as calculated from the TPF simulations) are artifacted raw free energies and need to be corrected using a set of correction terms. All employed methods are described in the following paragraphs.
One-step Perturbation. In order to predict the free-energy changes between the Hamiltonian of the reference states and the Hamiltonian of the end states (where "end state" refers to a physical amino acid), meaningful conformations of amino-acid side chains were fitted onto the trajectory with the reference states. These conformations of side chains were taken from a conformational library described in Jandova et al. 17 and used for both reference states, R2R and RDR. In the fitting procedure, the C α atoms of the reference state and the amino acid conformation were aligned and then the side chain of the amino acid conformation was rotated such that the distances between the center of geometry of the side chain and the soft atoms were minimized. This procedure was repeated for each of the 25,000 conformations from the reference state simulation and for five different conformations per amino acid, as taken from the conformational library. The free-energy changes between the reference state R and the individual neutral, physical states N i were calculated using the Zwanzig equation where H Ni refers to the Hamiltonian of the neutral, physical state N i , H R refers to the Hamiltonian of the reference state, and ⟨...⟩ R denotes an ensemble average from a simulation of the reference state. k B T is the Boltzmann constant multiplied by the absolute temperature. To take into account the different relative occurrences P i of the individual conformations, these were added to the free energies from OSP as Finally, the resulting free energies from all n = 5 individual physical states were exponentially averaged The procedure for Gly was different since this amino acid lacks a side chain. Here, free energies for removal of the side chains of the reference states were calculated using the OSP method to yield appropriate estimates. All free energies were calculated for both dimers of the protease and averaged. For comparisons against experimental data, the values of the two different tautomers of Histidine were also averaged.
Third-Power Fitting. The free-energy change between two end states, for example, with neutral and charged side chains, can be calculated formally exactly via the TI approach 8 with λ ∈ [0, 1] being the coupling parameter and ⟨...⟩ λ denotes ensemble averaging from a simulation at a discrete λ. The Hamiltonian of the system can be defined for any intermediate state between the neutral and charged end states ( N vs C ) using λ Figure 2. Structures of the two reference states used for the OSPs. Atom D is a non-interacting dummy atom while atom A represents a soft-core particle. Figure 3. Thermodynamic cycle to calculate meaningful relative binding free energies from OSP of a reference state (R) and thirdpower fitting between neutral (N) and charged (C) states. Free energies along the black arrows were calculated from the simulations (OSP: R → N; TPF: N → C). Free energies along the blue arrows are binding free energies between the Caspase-2 (Casp) and its substrate (sbstr). The free-energy differences along the green arrows can be calculated from the differences between the cyan and magenta arrows. Relative free energies between the mutants (A,B) of the protein can be calculated from the difference between the two green arrows. The free-energy difference between the two green arrows (non-physical path) is the same as between the two blue arrows (physical path).

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article This property can be calculated for several values for λ from discrete simulations.
TPF is a more efficient method that approximates the nonlinear response of the property in eq 5 between the neutral (N) and the charged (C) end states from simulations of the end states only through 23 with the parameters a, b, c, d fitted to the first and second derivatives of the free energy with respect to the neutral (λ = 0) and the charged (λ = 1) end states The quantity for ΔG N>C TPF was averaged over both dimers of the protease.
Linear Interaction Energy. The simulations that were executed for the TPF approach were analyzed to extract polar and nonpolar averages (Coulomb and VdW interaction energies). Binding free energies were then calculated using the approach of LIE. 41,42 These were obtained from the averages as with α being an empirical coefficient, which was found to be 0.161. 43 All energies were averaged over both dimers of the protease.
Corrections for Free Energies of Charging. The free energies calculated using the TPF procedure are artifacted raw values. The reason is that the electrostatic potentials deviate from the "correct" potentials since they were calculated using a non-Coulombic interaction function (Barker Watts Reaction Field Method 32 ) under PBCs. These deviations translate directly into free energies. In the case of charged amino acids, these artifacts do not cancel upon application of a thermodynamic cycle. Typically, these artifacts are of considerable size when free energies of ligand binding are calculated. 44,45 In these cases, a ligand was simulated in the bound state and in the unbound state, that is, free in solution. Because of the different solvation states, the deviations of the electrostatic potentials between a ligand free in solution and a ligand bound to a solvent-excluded host cavity are not similar and do not cancel upon application of a thermodynamic cycle. In this study, free energies of charging are calculated for point mutations inside the protein, with both legs of the thermodynamic cycle being the protein in the bound and in the unbound (protein without substrate, apo) states. The corrections were thus expected to be smaller in total size but still significant. An exact description of the applied methodology for the post-simulation corrections can be found elsewhere. 45 In short, the deviation of the "correct" charging free energies can be calculated using continuum electrostatics methods 46 and analytical models. 47 These corrections must account for (i) the deviation of the solvent polarization around the charged group of atoms due to the use of a microscopic system in combination with cut-off truncation and a reactionfield correction relative to the "correct" polarization in a macroscopic, non-periodic, and fully Coulombic environment, (ii) the deviation of the solvent-generated electric potential in a microscopic box under PBCs relative to the "correct" potential under full Coulombic, macroscopic, and non-PBCs, (iii) the inaccurate electrostatic interactions between the charged group of atoms and other solute atoms due to the usage of cut-off truncation in combination with a reaction field correction, and (iv) an inaccurate dielectric permittivity of the employed solvent model.
In Vitro Experiments. All chemicals were purchased from ROTH (Carl Roth GmbH + Co. KG, Karlsruhe, Germany) and were of analytical grade; primers were ordered from Sigma-Aldrich (Merck KGaA; Darmstadt, Germany) and HPLC-purified.
Saturation mutagenesis with degenerate primers, designed to create all possible 19 amino acid substitutions at one site in the protein, was performed with circularly permutated Caspase-2 (cp-Casp2) 48 with amino acid mutations E105V G171D (sequence can be found in the Supporting Information) in a pACYCDuet-1 vector as a template.
For site-specific random mutations, the NEB Q5 Site-Directed Mutagenesis Kit (New England BioLabs; Ipswich, MA, USA) was used. Mutations at positions V105 and D171 were inserted sequentially in two separate PCRs.
Primers were designed with the degenerate codon NNS at the site of mutation, which generates all 20 amino acids with 32 codons and reduces codon redundancy.
For the mutation at position V105, the forward primer TCGTTGTAAAnnsATGAGCGAGTATTG and the reverse primer TGAAATTCTGTACCCGGTG were used. The ligated product was purified and used as a template for the following mutagenesis to insert mutations at position D171. The forward primer CATTTTACCnnsGAAAAAGAACTG and the reverse primer AACATTGCTCAGAACCAG were used. Sequencing of the pooled gene library showed a clear preference for the nucleotide G at the degenerate position which only produced a reduced sequence space. An additional set of primers was used to exclude codons already found in the previous PCR. The forward primers CATTTTACChhc-GAAAAAGAACTG and CATTTTACChhgGAAAAA-GAACTG and for both the same reverse primer AA-CATTGCTCAGAACCAG were used.
Following this, a KLD (kinase, ligase, dpnI) reaction was obtained. NovaBlue heat shock competent cells (Novagen, Merck KGaA; Darmstadt, Germany) were transformed with the ligated product and the cells were diluted into an overnight culture which was used for DNA preparation. Sequencing of the pooled gene library from primers with NNS, HHC, and HHG codons was used to control the quality of the library before selection. All nucleotides were represented in the first two degenerate positions to theoretically produce all 400 possible variants.
Escherichia coli BL21(DE3) pyr cells that contained an essential substrate protein that was blocked with a Caspase cleavage tag 48 with P1′ Thr and Pro were transformed with the gene library. The selection was executed either in a liquid assay or as a plate assay. 49 The DNA of 161 single colonies was analyzed by sequencing, detecting combinations of mutations in active cp-Casp2 variants.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article ■ RESULTS AND DISCUSSION Free-Energy Calculations. The Lennard−Jones contributions to the free energies were calculated from two independent simulations using two different reference states, RDR and R2R. Most of the pairwise results from both reference states showed differences in the free energies that are above the thermal energy (k B T). To evaluate the relevance of individual values, the percentage of contributing conformations was calculated for both reference states over all five individual structures that were used for the fitting procedure in the unbound and bound states. A contributing conformation was defined as a snapshot where comparison of the free energies with the energies after OSP gave ΔG > ΔH (Tables 1 and 2). In general, the, "smaller" reference state RDR (1 soft atom, bond length 0.153 nm) sampled more relevant conformations for amino acids with low molecular weights, while the, "bigger" reference state R2R (2 soft atoms, bond lengths 0.252 and 0.351 nm) was more appropriate for heavier amino acids. This is also reflected in the huge deviations between the free energies calculated from both reference states. For example, the calculated ΔΔG for Arg in the E105 site with Ile in the P1′ site is 110.2 kJ/mol for the state RDR, with the percentage of relevant conformations being 0.5%. However, the equivalent numbers for the R2R state are −1.9 kJ/mol and 16.2%. The least contributing frames from both reference states were counted for the hydrophobic and aromatic amino acids Phe, Tyr, and Trp, making it the most challenging to calculate meaningful free energies for these amino acids. In particular, the free energies obtained for Trp at position 105 with Ile at the P1′ site are seemingly artificially large (Table 1). For future work, a more specific larger reference state may be designed to improve predictions for these amino acids.
The Coulomb contributions to the free energies were calculated from the average energies and the fluctuations (rmsd) of the energies in the charged and the neutral states using the TPF method. For the four charged amino acids Arg, Asp, Glu, and Lys, a sole TPF analysis is not sufficient because of the different net charge of the system in the charged and the neutral states. In these cases, the TPF results are raw values that need to be corrected by a set of correction terms ΔG pol , ΔG dir , and ΔG dsm for both bound and apo states (see Tables 3  and 4). The total sizes of the corrections for both sites, E105 and G171, differ significantly in size. These differences can be explained structurally: while the site G171 is not directly interacting with the substrate but is highly solvated, the environments between both states, bound and apo, are highly similar and the main artifacts cancel. However, the site E105 is in close proximity to the binding site. In this position, this site is exposed to a higher negative-charge density (P3−P1 sites of the substrate bear negative charges) in case of the bound state compared to the apo state where the binding pocket is more solvated. This can be found in the bigger ΔG dsm values for the apo states compared to the bound states for the E105 site. Table 5 combines the OSP results from the respective reference state with higher percentages of relevant conformations with the TPF results to yield the final free energies of binding. The individual OSP results are shown for the reference states with a higher number of contributing frames. OSP results for Glu (105 site) and Gly (171 site) are shown for both reference states in order to be able to calculate relative free energies for the individual mutations Glu105Xxx and Gly171Xxx through cycle closure (see next section).
Comparison to Experiments. From the free energies of binding for the individual amino acids, relative free energies of binding for the mutations Glu105Xxx and Gly171Xxx were calculated and compared to the experimental observations (see Table 6). Note that comparison of these data has to be handled very carefully. A mutant will be found in the experiments if the effect of the point mutation renders the protease more active, unchanged, or only marginally less active. On the other hand, a negative change of the numerically derived binding free energies has to be interpreted as more favorable binding. It follows that an exact match of the simulation and experimental data cannot be expected; however, the overall pattern can be compared. Also, the search space covered in the experiment was 20 × 20 = 400 . 7 As a consequence, the data for these two amino acids will be compared within the following analysis. While it is unfortunate that an exact comparison cannot be made, we chose not to adjust our simulations after the validation experiments were performed but to compare how relevant the predictions are in a realistic experimental setting where the exact relevant experiments may not be accessible. All effects of the mutations in both sites were compared with Ile and Pro in the P1′ site. With Ile in the P1′ site, only three mutations for the 105 site were not found in the experiment− Asp, Lys, and Phe. Two of these mutations were calculated to be unfavorable, while Lys was calculated to be favorable for binding of the substrate (−11.4 kJ/mol). All amino acids that were found in the experimental random mutagenesis were also calculated to have favorable binding free energies. The situation is more complex with P1′ Pro. Here, only three mutations were found in the experiment: Ala, Met, and Val. All three amino acids were also attributed with favorable binding free energies. However, five amino acids were calculated to have favorable changes in free energies of binding, but were not found in the experiment. For mutations of the 171 site, nine amino acids were experimentally found with Ile in the P1′ site. Seven of these were also favorable in the free-energy calculations. Six of ten amino acids that were not found in the experiment showed unfavorable binding free energies. With P1′ Pro, again only three amino acids were found in the experiment, Asp, Glu, and Val. All three were also associated Journal of Chemical Information and Modeling pubs.acs.org/jcim Article with favorable free energies. 10 of 15 amino acids that were not found also had unfavorable free energies attributed to them. To validate the predictive power of the computationally derived free energies, Table 7 compares these data in three cross tables: two for the data with P1′ = Ile and P1′ = Pro, respectively; another cross table shows both data sets combined. From these numbers, the Matthews correlation coefficient (MCC) was calculated to validate the predictive power of the computational data. Both experiment and simulation come to a comparable conclusion for a majority of the amino acids: with Ile in the P1′ site, the predictive power of computational saturation mutagenesis is very good.
61% of the amino acids showed favorable free energies and were also found in the experiment. 22% were calculated to have unfavorable free energies attributed to them and were not found in the experiment. Only 17% of the amino acids were false positives or false negatives. The MCC was found to be 0.63. With Pro in the P1′ site, the predictive power was found to be less: 22% of the amino acids had favorable free energies attributed to them but were not found in the experiment. However, 47% of the amino acids were predicted to have unfavorable free energies and were also not found in the experiment. The MCC for the Pro data was found to be 0.38. A possible explanation for the higher rate of false positives in the Pro data is the following: experimental findings reflect the effects on binding and cleavage, while the computational data focus on free energies connected to substrate binding only. With Pro being a highly unfavorable substrate for the S1′ pocket, the ligand may be able to bind but the probability of reaching an orientation in a catalytically active pose remains low. In the combined data set, 42% of the amino acids showed favorable free energies and were found experimentally; 40% showed unfavorable free energies and were not found experimentally. The remaining set of amino acids had favorable free energies attributed but were not found−or vice versa. For the combined data, the MCC was found to be 0.55. The data show that 76% of the predictions by this method are correct. For the incorrect predictions, it is more likely to predict false positives than false negatives. This reflects the fact that the calculation of binding affinities does give an insight into the actual binding process but gives no information if cleavage is likely to happen.
Comparison to the Linear Interaction Energy Approach. The predictions based on the OSP + TPF approach were evaluated against another established end-point method. Hence, the trajectories simulated for the TPF approach were reused to extract polar and nonpolar energies to calculate binding free energies via the LIE approach. Mind that trajectories for TPF were only generated for polar amino The corresponding correction terms were calculated for both mutation sites, E105 and G171, as well as both P1′ amino acids, Ile and Pro. The total correction for the TPF results was deduced from the corrections in the bound and the unbound states of the protease. All values are reported in kJ/mol.  Table 9. The combined OSP + TPF approach outperformed the LIE method in every case. While the LIE approach requires approximately half the simulation time compared to the OSP/ TPF approach, an MCC value close to zero was obtained for three of the four combinations of mutation sites and the P1′ residues. Only for the mutation site G171 and the Pro residue at P1′ is an MCC value of 0.29 obtained, which is still considerably worse than the value obtained with the OSP/TPF approach, at 0.56.

■ CONCLUSIONS
In the presented work, an efficient method for in silico saturation mutagenesis was qualitatively validated against experimental data. Two sites in the binding pocket were screened in human Caspase-2. The entire procedure was repeated using two substrates with different amino acids at the P1′ position. Nonpolar contributions to the free energies of binding were calculated from two different reference states, each in the bound and the unbound protease and the electrostatic contribution was added using the TPF method. After judicious application of a thermodynamic cycle, relative binding free energies compared to the natural form of the protease were obtained. These free energies were compared to data from experimental saturation mutagenesis and selection with an auxotrophic E.coli strain. The overall agreement was very good, with the MCC being 0.55. It is obvious that it is much more likely that the in silico saturation mutagenesis predicts false positives than false negatives. This reflects that a ligand that is a good binder does not necessarily undergo cleavage. Although the intrinsic reactivity of the mutated protease is believed to be the same toward substrates with different P1′ sites, our method is not fit to discriminate bound states that undergo cleavage from bound states where cleavage is unlikely. However, this is acceptable since the general aim of this method is to narrow the sequence space of a protein of interest a priori. It can be concluded that free energies that were derived through the efficient OSP + TPF approach can be used as a powerful predictor to qualify the effect of single point mutations on protease−substrate cleavage. Thus, we believe