Tuning Proton Transfer Thermodynamics in SARS-CoV-2 Main Protease: Implications for Catalysis and Inhibitor Design

The catalytic reaction in SARS-CoV-2 main protease is activated by a proton transfer (PT) from Cys145 to His41. The same PT is likely also required for the covalent binding of some inhibitors. Here we use a multiscale computational approach to investigate the PT thermodynamics in the apo enzyme and in complex with two potent inhibitors, N3 and the α-ketoamide 13b. We show that with the inhibitors the free energy cost to reach the charge-separated state of the active-site dyad is lower, with N3 inducing the most significant reduction. We also show that a few key sites (including specific water molecules) significantly enhance or reduce the thermodynamic feasibility of the PT reaction, with selective desolvation of the active site playing a crucial role. The approach presented is a cost-effective procedure to identify the enzyme regions that control the activation of the catalytic reaction and is thus also useful to guide the design of inhibitors.

: Normalized distribution of the distance between the backbone oxygen of Cys145 and the side chain N δ of Asn28 in the MD simulation with His41D in the apo state. The vertical line shows the corresponding distance in the crystal structure. Figure S2: Normalized distribution of the distance between the guanidine carbon of Arg40 and the gamma carbon of Asp187 in the MD simulation with His41D in the apo state. The vertical line shows the corresponding distance in the crystal structure.

S-3
The Perturbed Matrix Method The MD-PMM approach is a hybrid quantum/classical theoretical-computational approach based on molecular dynamics (MD) simulations and on the pertubed matrix method (PMM. S6,S7 As commonly done in hybrid multiscale approaches, S8-S11 also for the investigation of enzyme catalysis, S12-S18 the portion of the system in which the chemical event takes place is treated quantum mechanically (the quantum center, QC) while the rest of the system is treated classically and atomistically and exerts an electrostatic perturbation on the QC electronic states. The main difference with other hybrid methods is that in the MD-PMM, the whole system (including the QC) phase space is sampled by classical MD simulations, allowing an extensive sampling of the QC and environment configurational space. The electrostatic perturbation of the environment is included a posteriori: the electronic properties of the isolated QC (unperturbed properties) are calculated quantum-chemically in vacuum (i.e., in the gas phase) and then, for each configuration generated by all-atoms classical MD simulations of the whole system, the electrostatic effect of the instantaneous atomistic configurations of the environment is included as a perturbing term within the QC Hamiltonian operator.
This allows to take into account the effect of the fluctuating perturbing environment (the solvent and the part of the solute which is not treated at the quantum level) on the quantum properties of the QC.
For each configuration of the whole system obtained from the MD simulation, the effect of the external environment on the QC eigenstates is included by building and diagonalizing the perturbed electronic Hamiltonian matrix H constructed in the basis set of the unperturbed Hamiltonian eigenstates of the QC. Indicating with V and E the perturbing electric potential and field, respectively, exerted by the environment on the QC: where H 0 is the QC unperturbed electronic Hamiltonian (i.e., as-obtained considering the isolated QC) and q T ,μ and φ 0 j are the QC total charge, dipole operator and unperturbed electronic eigenfunctions, respectively, I is the identity matrix and the angled brackets indicate integration over the electronic coordinates.
At each frame of the MD simulation, the perturbed electronic Hamiltonian matrix is constructed and diagonalized, providing a continuous trajectory of perturbed eigenvalues and eigenvectors to be used for evaluating the QC instantaneous perturbed quantum observable of interest as, in the present case, the QC ground state energy in the protonated and deprotonated states. More details on the method can be found in the original articles. S6,S19 For the more specific task of investigating the proton transfer thermodynamics, the free energy change associated to the PT reaction between Cys145 and His41 is calculated. Given the PT reaction: the energy variation upon PT (the PT energy) is calculated with the MD-PMM approach for each configuration obtained from the MD simulations in both the reactant and the product states as defined in Eq. 3.
In the MD simulations, Cys145 and His41 sample a highly variable range of relative conformations, including configurations in which a direct (hydrogen bond) HB between the sulfur of Cys145 and the ε nitrogen of His41 involving the proton to be exchanged is present.
In the absence of a direct HB between the catalytic dyad residues, Cys145 and His41 are treated as separate QCs in the MD-PMM, while for the configurations in which this HB is present, Cys145 and His41 are treated as a unique QC.
When treated separately, the two QCs interact with each other as well as with their en-S-5 vironment. The perturbed states of Cys145, either in the protonated or in the deprotonated condition, are obtained considering the corresponding QC as perturbed by the electric field provided by His41 as well as the rest of the atomic-molecular system, both treated within the semiclassical approximation. The same procedure is performed for His41, the corresponding QC of which is perturbed by Cys145 and the rest of the atomic-molecular system In the configurations in which a direct Cys145-His41 HB is formed, and therefore a single QC is considered, the relative orientation of the two residues is scarcely variable. The most probable relative orientation of Cys145 and His41, as provided by each simulation condition, is used to perform QM calculations of the complex, with the proton either on Cys145 or on His41. The perturbed states of the complex, in both protonation states, are then obtained S-6 considering the single QC as perturbed by the electric field provided by the rest of the atomic-molecular system, treated within the semiclassical approximation. This again allows the calculation of the perturbed energy variation upon PT at each MD configuration i.e., the time evolution of the PT energy ∆ε sing .
The final time evolution of the PT energy ∆ε is then obtained by considering ∆ε sep for the configurations in which no direct Cys145-His41 HB is present, and ∆ε sing for the configurations in which the HB is present.
Then, the Gibbs free energy change ∆G 0 associated to the PT can be calculated as: In the above equation ∆H is the QC-environment whole energy change upon PT, with ∆ε the corresponding QC perturbed electronic ground state energy change. The angle brackets subscripts R and P indicate that both the energy change as well as the averaging are obtained either in the reactant (R) or product (P ) ensemble as defined in Eq. 3, and the approximation ∆H ∼ = ∆ε is used, i.e., the environment internal energy change associated with the QC reaction is disregarded (being exactly zero when considering typical MD force fields).
From Eq. 5 it follows that −k B T ln e −β∆ε R and k B T ln e β∆ε P provide the upper and lower bounds of ∆G 0 and hence it can be written: In this last equation the perturbed electronic ground state energy change as well as the ensemble averages are evaluated via the previously described MD-PMM approach, i.e. by diagonalizing at each MD frame the perturbed electronic Hamiltonian matrices (Eq. 1).

S-7
As a further test, we compared the free energy estimated obtained by using the above outlined procedure (i.e. a single QC when the Cys145-His41 HB is present and two separate QCs when this HB is not present) to the one estimated by using two separate QCs for all the frames of the MD simulations. We obtain from the two approaches identical results within the estimated errors.
The PT reaction free energy was also calculated within the linear response approximation, i.e. by assuming a Gaussian distribution for the energy change upon PT and considering the average of the mean values of the PT energy obtained in the two ensembles along the MD trajectories. S20,S21 The results are in qualitative agreement with those obtained by explicitly calculating the reaction free energy ∆G 0 (see Eq. 6) providing estimates for the free energy change upon PT of 40 and 49 kJ/mol for His41D and His41E, respectively, with a standard error of ≈ 5 kJ/mol. The full calculation of ∆G 0 , being based on an exact relation (see Eq. 6), provides a more accurate result than the one that can be obtained within the linear response approximation. However, the latter is less affected by inaccuracies due to finite-sampling issues. The qualitative agreement between the results obtained with the two approaches enhances therefore the reliability of the computed estimates.
For the calculation of the PT reaction free energy in the presence of the inhibitor N3 we did not use the full free energy calculation (with Eq. 6) because the configurational sampling of the non-covalent complex between the protein and the inhibitor (100 ns) was not as extended as for the apo system (almost 1 µs). As a reliable estimate of the free energy with Eq. 6 requires a very extended sampling, in the presence of N3 we use the energy change upon PT to estimate the reaction free energy, i.e. we use the linear response approximation, S20 that is less affected by inaccuracies due to sampling problems. Then, to improve the free energy estimate, we shifted the computed values obtained in the presence of N3 within the linear response approximation by the same energy difference that is obtained in the apo state between the linear response approximation and the full free energy calculation with Eq. 6 (i.e., by -9 kJ/mol, see above). This shift takes into account the deviations from S-8 the linear response approximation due to the instantaneous fluctuations of the PT energy and to the reorganization energies in the reactant and product ensemble, S21 which can be reasonably assumed approximately equal in the apo state and in the presence of the inhibitor.

Quantum mechanical calculations
To calculate the free energy change corresponding to the proton transfer reaction between the NAMD 2.13 was used for simulations with the N3 inhibitor. S48,S49 As in the apo simulations, we used the CHARM36m force field, the TIP3P water model, a 2-fs timestep. S50,S51 In addition, we used the CGenFF program in order to obtain CHARMM-compatible CGenFF force field parameters for N3. S52-S54 In order to use a longer 2-fs timestep, all covalent bonds with hydrogens were kept rigid. For long range van der Waals interactions a 1.2 nm cutoff was employed, and a smooth decay to zero was ensured by applying a smoothing function from 1-1.2 nm. For long-range electrostatic interactions the particle-mesh Ewald method was used, S55 as in the apo simulations. Pressure and temperature were kept constant at biologically relevant values of 1 bar and 310 K using a Langevin thermostat and barostat, respectively. S56 The 7BQY S57 structure was used as a starting point for MD M pro with N3 . Because this structure is missing the C-terminal residues, we used the older N3-bound structure (PDB entry 6LU7) for the C-terminal positions. The structure was solvated and ionized with 0.15 M NaCl in VMD. S58 After minimization, water and ions were equilibrated for 1 ns while the protein and inhibitor, were restrained with a force constant of 2 kcal/mol/Å 2 .
In a second equilibration step only the protein backbone was restrained with the same force S-11 constant for 4 ns. Subsequent unrestrained simulations were run in triplicate for 20 ns for each protonation state, and used for the analysis.

Additional structural analyses
In the MD simulations in the apo state reactant ensemble, Cys145 and His41 sample a highly variable range of relative conformations, with an average distance between the sulfur of Cys145 and the ε and δ nitrogen of His41 of ≈0.38 nm and ≈0.50 nm, respectively (see Figure S3). In addition, the side-chain proton of Cys145 only seldom points toward the ε nitrogen of His41 (see Figure S4). In a small number of configurations in the MD trajectory with His41D, the relative position of Cys145 and His41 is however compatible with a direct HB between the sulfur and the ε nitrogen involving the proton to be exchanged (i.e., the sulfur proton). In the MD simulations in the apo-state product ensemble (i.e., the ionic couple) the relative configurations of Cys145 and His41 compatible with a direct S-N ε HB are much more frequent (see Figure S5). S-12

S-17
Contribution of protein residues and water molecules to the proton transfer energy: the His41E apo state in comparison with the 13bbound state As reported in Figure S8C, the residues that most relevantly contribute to the PT energy for the 13b bound complex are the same that are observed in the apo state (with both His41D, see main text Figure 2A, and His41E, see Figure S8A) and in the presence of N3 (see main text, Figure 2C): Arg40, Asp 48, Glu55, Lys61, Glu166, Asp187, Arg188 and Ser1 of the other monomer. It can be observed, comparing Figure S8A and Figure 2A in the main text, that in the His41D apo state the total contribution of the solvent is almost negligible, while in the His41E apo state the solvent exerts a positive contribution disfavoring the PT reaction.
In Figure S8E the difference ∆(qV) between the single residue contribution obtained from the MD in the presence of 13b and that obtained in the apo (∆(qV ) = qV (13b)-qV (apo)) is also reported, highlighting the protein regions that more relevantly contribute to the variation of the energy change upon PT in the presence of the inhibitor. In Figure   S8E the contribution of Glu166 and Arg188, already observed in the presence of N3 and discussed in the main text, can be observed. Two additional relevant contributions from Arg40 and Asp187 can be also observed, which are not present for the N3-bound complex.
These contribution arise from a different arrangement of Wcat (vide infra) that brings these two residues, that interact via a salt bridge, closer to His41. As in the case of the His41D apo state and the N3-bound state, the contributions of Wcat, Wdyad and the small water wires are analyzed separately in the PT energetics analysis (see Figure S8B,D,F).In the 13bbound state the contribution of Wcat is essentially the same as in the apo state (see Figure   S8, panels B and D). However, this contribution is opposite to the one observed in the His41D apo state (see Figure 2B in the main text). The analysis of the HB network involving Wcat along the MD simulations with His41D and His41E reveals relevant differences. In the His41D S-18 simulation the HB network observed in the crystal structures is essentially maintained in the MD (see Figure S9). In contrast, in the MD simulations with His41E the interaction between Wcat and His41 Nδ is maintained, the one between Wcat and His164 Nδ is only partially maintained and the one with Asp187 side chain oxygens is lost (see Figure S9). In fact, as a consequence of the protonation of His41 at Nε, Wcat changes its orientation, assuming a new configuration that prevents the simultaneous formation of the HBs with Asp187 and His164 (see representative structures in Figure S9). This rotation implies a different electrostatic effect on the side chain of His41, favoring the PT reaction in the His41D reactant state and disfavoring it in the His41E reactant state. The negative peak due to Asp187 and the negative peak due to Arg40 in Figure S8E arise from the fact that in the MD simulation in the presence of 13b, the interaction between Wcat and His164 is lost while the one with Asp187 is recovered. The protonation of His41 at Nε also determines a different arrangement of Wdyad in the His41E apo state, that differently from what observed in the His41D apo state, slightly favors the PT reaction. A small contribution of the wires can also be observed.
In the presence of 13b, both Wdyad and the wires are expelled form the active site, leading to an overall positive contribution of the solvent.
S-19 Figure S8: A and C: qV is plotted for each protein residue and all the water molecules as an additional virtual residue SOL for the apo state (A) and in the presence of the inhibitor 13b (C). qV is the mean value along the MD trajectories of the contribution due to the electrostatic potential to the PT energy. The residues featuring an absolute value of qV higher than 20 kJ/mol are labelled in the figure. The residues with a negative contribution exert an electrostatic effect that favors the PT reaction, while the opposite is true for the residues with a positive contribution. The contributions of the residues of the catalytic dyad (His41 and Cys145) are not included in the plot. E: ∆(qV) = qV(13b)-qV(apo) is plotted for each protein residue and SOL. The residues featuring an absolute value of qV higher than 10 kJ/mol are labelled. The contributions of the residues of the catalytic dyad (His41 and Cys145) are not included in the plot. The residues with a negative contribution are those that contribute to lower the PT energy in the presence of the inhibitor with respect to the apo state while the opposite is true for the residues with a positive contribution. B, D and F: dissection of the contribution of the solvent (SOL): the contribution of Wcat, Wdyad, the molecules forming the wires and the rest of the water molecules are reported together with the total solvent contribution for the apo state (B), in the presence of 13b (D) and for the difference ∆(qV) = qV(13b)-qV(apo) (F).
S-20 Figure S9: Normalized distribution of the distance between the oxygen atom of Wcat and His41 N δ (black), His164 N δ (red), Asp187 C γ (magenta) and His41 backbone N (blue) in the MD simulation with His41E (A) and His41D (B). The vertical lines show the corresponding distances in the crystal structure. Representative structures of the different hydrogen bonding pattern are also shown. S-21