Dynamical Nonequilibrium Molecular Dynamics Simulations Identify Allosteric Sites and Positions Associated with Drug Resistance in the SARS-CoV-2 Main Protease

The SARS-CoV-2 main protease (Mpro) plays an essential role in the coronavirus lifecycle by catalyzing hydrolysis of the viral polyproteins at specific sites. Mpro is the target of drugs, such as nirmatrelvir, though resistant mutants have emerged that threaten drug efficacy. Despite its importance, questions remain on the mechanism of how Mpro binds its substrates. Here, we apply dynamical nonequilibrium molecular dynamics (D-NEMD) simulations to evaluate structural and dynamical responses of Mpro to the presence and absence of a substrate. The results highlight communication between the Mpro dimer subunits and identify networks, including some far from the active site, that link the active site with a known allosteric inhibition site, or which are associated with nirmatrelvir resistance. They imply that some mutations enable resistance by altering the allosteric behavior of Mpro. More generally, the results show the utility of the D-NEMD technique for identifying functionally relevant allosteric sites and networks including those relevant to resistance.


■ INTRODUCTION
The ongoing COVID-19 pandemic has led to worldwide efforts to develop antiviral drugs effective against the SARS-CoV-2 coronavirus. The SARS-CoV-2 main protease (M pro ) cleaves viral polyproteins into functional units essential for viral replication and pathogenesis and is a validated drug target. 1 M pro hydrolyzes the amide bond between P1 and P1′ residues in the conserved sequence: [P4 = Ala/Val/Pro/Thr]-[P3 = X]-[P2 = Leu/Phe/Val]-[P1 = Gln]-[P1′ = Ser/Ala/Asn], where X is any residue. 2−4 Due to the essential role of M pro , the apparent absence of human proteases with a similar substrate selectivity, and the high sequence identity (96%) between SARS-CoV and SARS-CoV-2 M pro s, inhibitors targeting M pro enzymes are likely useful for treatment of many coronavirus diseases. 5−8 The covalently acting M pro inhibitor PF-07321332 (nirmatrelvir), the active ingredient in Paxlovid, is the first oral COVID-19 drug that has been authorized for emergency use by the FDA. 9,10 Another M pro inhibitor in clinical-stage development is S-217622 (ensitrelvir), a noncovalently binding, nonpeptidic inhibitor that was discovered following virtual screening and medicinal chemistry optimization. 11 Crystal structures 4,12−14 and molecular simulations 15−20 have provided insights into the M pro structure, dynamics, and interactions with its substrates and other ligands. Molecular simulations of M pro have identified cryptic pockets, 21 informed on mechanisms of inhibition, 22−24 and helped to suggest potential lead compounds for M pro inhibitor design. 25,26 We have identified conserved interactions between M pro and its 11 native substrates, each modeled as an 11-mer P6-P5′ peptide. 20 However, how the binding of substrates or inhibitors induces changes in the M pro structure and its dynamics has been unclear. Changes in dynamics and conformational behavior have been shown to be the cause of mutation-induced drug resistance for some targets. 27−30 For instance, mutations distal to the active site of the HIV-1 protease have been shown to confer resistance to darunavir through alterations in conformational dynamics. 31 Given the relatively rapid rate at which SARS-CoV-2 mutates, M pro targeting drug resistance is a major concern. 8,32,33 A detailed understanding of how M pro recognizes its natural substrates thus may aid the development of next-generation inhibitors. 12,34 Here, we describe the application of dynamical-nonequilibrium molecular dynamics (D-NEMD) 35−38 simulations to investigate the response of M pro to the instantaneous removal of a substrate in one of the active sites in the dimer. The D-NEMD approach aims to identify in the protein any structural changes, which refer to conformational rearrangements, as well as dynamic changes, which relate to changes in fluctuations, following a perturbation. D-NEMD simulations are emerging as a useful tool to probe signal propagation and allosteric effects, 38 e.g., they have identified a general mechanism of signal propagation in nicotinic acetylcholine receptors; 39,40 communication pathways between allosteric and active sites in clinically relevant β-lactamases; 30 and modulation of SARS-CoV-2 spike protein behavior by pH 41 and ligand occupancy in the fatty acid binding site. 42−44 Notably, in this context, the pathways identified in the β-lactamases contain positions that differ between enzymes with different spectra of antibiotic breakdown activity. 30 In the D-NEMD approach, the system is sampled first by equilibrium molecular dynamics (MD) simulations. Configurations from these trajectories provide starting points for multiple short nonequilibrium simulations, through which the effect of a perturbation can be determined using the Kubo− Onsager relation. 37,38 Running a large number (usually tens to hundreds) of nonequilibrium simulations allows the statistical significance of the response of the protein to the perturbation to be tested. 37 The results here identify structural pathways involved in M pro communication, both within and between subunits of the dimer. We use a substrate sequence representing the nsp8/9 P6-P5′ substrate (Ser-Ala-Val-Lys-Leu-Gln|Asn-Asn-Glu-Leu-Ser, where "|" denotes the cleavage site), which we have previously characterized by experiment and simulation, and refer to as s05 according to previous nomenclature. 20 s05 was selected because it is a conserved substrate in coronaviruses, 4 and its P4-P1 sequence is reported to be optimal for M pro activity among natural amino acid sequences. 1

Equilibrium Simulations of the M pro -Substrate Complex
Initially, five independent equilibrium MD simulations of 200 ns (an aggregate of 1 μs) were carried out for homodimeric M pro noncovalently complexed with a single 11-mer peptide substrate s05 (Figure 1a,b) using established protocols. 20 s05 occupies the active site, which is formed predominantly by residues of the M pro chain denoted as chain A (ChA), while the other (chain B, ChB) active site was simulated without a substrate. In subsequent descriptions, all residues relate to M pro chain A (Ser1-Gln306), except chain B residues, which are indicated by prime (Ser1′-Gln306′).
Analysis of root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values, protein secondary structure content, and M pro -substrate interactions confirms the stability of the M pro dimer-s05 complex, with the system considered to be well equilibrated from t = 50 ns onward in all simulations (Section S1). The 12 M pro -substrate hydrogen bonds (HBs) previously found to be conserved across 11 native substrates are also observed here (Figure 1b and Figures S1.9−S1.11). 20 These equilibrium trajectories were then used as the basis for D-NEMD simulations.

Immediate Effect of Substrate Removal
To analyze the response of the M pro dimer to substrate removal, snapshots were extracted every 5 ns from t = 50 to 195 ns from all five equilibrium simulations (a total of 150 conformations), and the s05 peptide was instantaneously annihilated (see Methods section). The time-dependent response of the protein to the annihilation of s05 was  Figure S1.9) shown as orange dotted lines. s05, the M pro catalytic dyad, and other HB-forming M pro residues are shown as black, brown, and blue sticks, respectively (hydrogens omitted). (c) Initial sub-picosecond response of M pro to s05 removal in terms of Cα deviation represented on a white-yellow-red scale. All figures were generated using PyMOL. 45 monitored using the Kubo−Onsager relation 37,38 by subtracting the position of each Cα atom between the nonequilibrium and equilibrium trajectories at equivalent time points ( Figure  S2.1). Cα movements were used, employing the same approach as in previous investigations of other systems, as they are less susceptible to thermal fluctuations than those of sidechain atoms, and they normally reflect the most substantial responses in the protein. 38 The subtraction approach enables a clear identification of the response via cancellation of most random fluctuations between the equilibrium and nonequilibrium trajectories. 30,39 Note that the perturbation (in this case, the deletion of the s05 peptide) is not intended to simulate the physical process of substrate binding or dissociation, but rather to trigger a fast response from M pro as it adapts to s05 removal. Such a perturbation forces the system out of equilibrium, thus creating the driving force for conformational changes to occur. Similarly, the simulation timescale does not reflect the real timescale of conformational changes. 39 Soon after the perturbation, a clear response from M pro is observed, notably in residues directly involved in substrate binding ( Table S2.1). The statistical significance of the response is supported by determination of standard errors of the mean (SEM) ( Figure  S2.3), as performed in previous investigations of other systems. 30,42 At t = 0.25 ps, the most perturbed regions include Gly23-Thr26, which forms HBs with the substrate P2′ and P4′ residues ( Figure 1b); Asn142-Cys145, which constitutes the oxyanion hole loop that stabilizes the scissile amide carbonyl oxygen; Met165, which forms the S2 hydrophobic pocket; Glu166, whose backbone forms two strong HBs with the substrate P3 backbone; and Ala191, which is adjacent to Thr190 that binds the substrate P4 residue via a HB. Hence, as expected, the residues with the largest initial deviations are those originally in direct contact with, or very close to, the substrate, and these residues are likely to rearrange to accommodate the substrate.
Notable initial responses are also observed in residues belonging to chain B (Figure 1c-B and Figures S2.2 and S2.3). Chain B residues that show a deviation >15 pm at t = 0.50 ps include Ser1′, Gly2′, Asn214′, Gly215′, Ser301′, and Gly302′. These residues are in the vicinity of chain A regions that also show significant responses, such as Phe140-Leu141 and Tyr118-Gly120 ( Figure S2.4). These responses highlight the importance of inter-subunit dynamics, where structural and dynamic changes occurring in one are swiftly propagated to the other. For example, the structural importance of the Nterminus (Ser1′-Gly2′) of chain B is consistent with its role of forming part of the S1 pocket that accommodates the conserved P1 Gln substrate residue. 5,6 Asn214′ and Gly215′ are further away from the substrate binding site; notably, the N214A variant of SARS-CoV M pro is much less active than the wild-type enzyme against a fluorogenic substrate peptide, despite being similar in dimerization affinity and structure, with the inactivation proposed to be driven by changes in protein dynamics. 15,46 Finally, the observed perturbations in Ser301′ and Gly302′ of the C-terminal tail of chain B are consistent with crystallographic observations that this region adopts distinct conformations upon the binding of a substrate or the peptidomimetic inhibitor N3. 13,47 Overall, the D-NEMD analysis highlights the communication between chains A and B and, in agreement with reported experimental and crystallographic data, identifies three structurally and dynamically important regions of chain B.
At 0.75 ps, the structural responses intensify and propagate further from the substrate binding site into both chains A and B (Figure 1c-C and Figure S2.3). These responses include signal transmission in chain A from Ser46 to the whole Ser46-Leu50 helix and then to the Tyr54-Arg60 helix above the active site. There is also transmission through the helical regions in chain B away from the dimer interface. These results show communication between the subunits of the dimer.

Propagation of the Signal to Allosteric Sites
The conformational changes at the dimer interface are of particular interest. While the initial response (Figure 1c) in chain B is concentrated at dimer interface regions with little secondary structure, such as Asn214′-Gly215′ and Ser301′-Gly302′, notable response in the Gln244′-Thr257′ kinked helix is observed across the picosecond timescale ( Figure 2). The helix residues Pro252′, Leu253′, and Gln256′; the C-terminal tail residues Cys300′, Ser301′, and Gly302′; Ile213′, which is adjacent to the perturbed Asn214′-Gly215′ region; and the earlier highlighted chain A residues Tyr118, Leu141, and Asn142 all form an allosteric site (colored orange in Figure  2a), which binds small molecules as demonstrated by crystallography. 7,48,49 One such compound is pelitinib, which has high antiviral activity and does not bind at the active site. 48 It is striking that the D-NEMD simulations, which were initiated by substrate removal, identify this allosteric site. Our results show a direct connection between the allosteric site and the active site via conformational motions. Based on these D-NEMD results, it may be proposed that the presence of a ligand at this allosteric site will modulate enzyme dynamics, including those affecting the rate and/or affinity of binding of substrates. Such a ligand may also alter how dynamic and structural changes are communicated within the protein, a likely factor in allosteric inhibition.
As the time following substrate removal increases across the picosecond and nanosecond timescales, the structural response in residues in terms of Cα deviations increases and spreads across the M pro dimer (Section S3). Perturbed allosteric regions are found in both M pro chains, and except for the region around Gln189-Ala191, which has higher deviation in chain A than B, the deviations in the two chains are similar ( Figures S3.2−S3.4). Some of the most perturbed residues (e.g., Asn277 and Gly278 in both chains at t = 1 ns) also display high mobility in the equilibrium simulations ( Figure  S1.3). This is unsurprising because more flexible regions require less energy to change their positions. 36,50

Coherent Displacements in Residues upon Substrate Removal
Complementarily to the above approach, which averages scalar deviations across replicas, the displacement vector of Cα atoms between the equilibrium and nonequilibrium trajectories at equivalent time points can be averaged. The resulting signals indicate the average direction of the response of the residues upon the perturbation; however, it is important to note that the responses of residues that are dynamically affected in multiple directions or in a disordered manner are not apparent in this analysis.
The response of M pro to substrate removal calculated from Cα displacement vectors is shown in Figure S4 Table  1). These regions include Thr26-Leu27 in the S1′-S2′ pockets; Asn142-Gly146 in the oxyanion hole loop, where the largest displacement is observed for Gly143; His163 and Met165 whose sidechains form parts of the S1 and S2 pockets, respectively; and Ala191-Gln192, which lie behind the substrate P5-P4 residues in the substrate complex.
At t = 10 ps (Figure 3a-B and Table 1), additional displacements are observed in the Ser46-Leu50 and Tyr54-Arg60 helices and their connecting loop above the active site in chain A. Displacements in residues surrounding Ala191 and Gly143 increase in magnitude, with the Arg188-Gln192 loop moving outward into the substrate S5 and S4 pockets and the Asn142-Gly143 loop closing inward into S1. These motions collectively reflect a closure of the M pro substrate binding site upon substrate removal.
From t = 10 to 100 ps (Figure 3a-C), a reduction in the vector response is observed in certain regions, such as the Ser46-Leu50 helix, suggesting that there is no longer a consistent directional difference in the motions between the equilibrium and nonequilibrium trajectories. By contrast, the displacements in the Gly143 oxyanion hole loop and in the Val186-Gly195 loop persist. In the latter case, the response is initially centered at Ala191 but gradually propagates downstream to residues as far as Gly195 (Figure 3a-C and Table 1 and Figure S4 These observations likely reflect long-lived structural alterations to accommodate the substrate. The conformational adaptation of the Val186-Gly195 loop observed here echoes structural studies of M pro complexed with substrate peptides: the loop has crystallographically been found to be conformationally variable, which enables the binding of diverse substrate sequences. 12 Overall, the D-NEMD approach provides a statistically meaningful prediction of how each M pro region dynamically adapts to bind its substrate. The collective and coherent motion is not limited to the active site but extends well beyond.

■ CONCLUSIONS
Selection pressure caused by treatments that target M pro , such as nirmatrelvir, drives the emergences of drug-resistant mutations in SARS-CoV-2 M pro . 24,34,54,55 Such mutations would be expected to weaken the binding affinity of nirmatrelvir but still allow M pro to perform its biological function by binding, recognizing, and hydrolyzing its substrate sequences in the viral polyproteins. The effect of such mutations on dynamics is of particular interest. Indeed, M pro mutations that confer high levels of resistance to nirmatrelvir have been reported, including the combined L50F + E166A + L167F substitutions, 52 L50F + E166V substitutions, 53 and naturally observed variations including S144M/F/A/G/Y, M165T, E166G, H172Q/F, and Q192T/S/L/A/I/P/H/V/ W/C/F (Figure 3b). 54 Some of these mutated residues, such as Glu166 and Met165, are in direct contact with the substrate P3/P1 and P2 residues, respectively, and these interactions are exploited by nirmatrelvir. 9 The effect of mutations that are more distant from the active site, such as those involving Leu50 and Gln192, on substrate and inhibitor binding, however, is less clear. The D-NEMD results here indicate that these residues are involved in allosteric response to active site binding.
Our D-NEMD results (Figure 3a and Table 1) highlight the dynamic behavior of the oxyanion hole loop around Gly143. The dynamics of this loop will probably be affected by the observed mutation of Ser144 due to alterations in size, polarity, and HB interactions. Further away from the catalytic site, the simulations identify Leu50 and its associated helix (Ser46-Leu50), as well as Gln192 and the loop of which it forms part, as being affected by the loss of substrate. While the exact influence of these mutations on binding affinities and reactivities of substrates or inhibitors remains to be investigated, this study has showcased the ability of the D-NEMD approach to provide valuable insights into timeordered protein structural changes and pinpoint positions of drug-resistant mutations. The evaluation of the perturbationinduced responses also enables the identification, and potentially prediction, of allosteric sites that play major roles in regulating enzyme activity.
The D-NEMD approach is emerging as a useful tool to investigate signal propagation, identify communication networks, and characterize allosteric effects. 38 As demonstrated by this study, D-NEMD simulations will be useful in predicting allosteric sites and mutation sites relevant to drug resistance.

Equilibrium MD
MD simulations were performed with GROMACS (v 2019.2) 56 using the AMBER99SB-ILDN forcefield. 57 Dimeric M pro (PDB 6YB7; 58 1.25 Å resolution; chains A and B) in a noncovalent complex with the s05 peptide was constructed and prepared as described previously, including the setup of histidine residues (Table S5.1) and the neutral catalytic dyad. 20 The complex was placed in a rhombic dodecahedral box with at least 1.0 nm separation from box edges, solvated with TIP3P water, 59 neutralized with sodium ions (86,304 atoms in the solvated system), and minimized until the maximum force was below 1000 kJ mol −1 nm −1 . From the minimized system, five replicas were initiated by assigning random velocities at 298.15 K. Each replica was subjected to 200 ps (1 fs step) NVT and 200 ps (1 fs step) NPT equilibration at 298.15 K and 1.0 bar, before a production run of 200 ns (2 fs step), during which coordinates were saved every 100 ps and velocities saved every 1 ns. Temperature coupling to the protein and non-protein at 298.15 K was achieved using a velocity-rescaling thermostat with a stochastic term, with a time constant of 0.1 ps. 60 Pressure was maintained at 1.0 bar with a Parrinello−Rahman barostat with a time constant of 2 ps. 61,62 Long-range electrostatics were treated using smooth particle mesh Ewald with a 1 nm cutoff. 63,64 For van der Waals interactions, a 1 nm cutoff was used. For these 5 × 200 ns equilibrium simulations, analysis was performed using GROMACS tools (v 2019.2) 56 and DSSP (v 2.0.4). 65,66 Hydrogen bonds were defined based on a combined distance (d D-A ≤ 3.5 Å) and angle (∠(H-D-A) ≤ 30°) criteria.

D-NEMD
From each of the five equilibrium trajectories, coordinates and velocities were extracted every 5 ns from t = 50 up to 195 ns, providing a total of 5 × 30 = 150 configurations for D-NEMD initiation. For the nonequilibrium simulations, the s05 peptide was instantaneously removed. No new molecules were added in its place. The nonequilibrium simulations were then started from these conformations. To capture events happening over different timescales, four length scales were chosen for the restarted MD simulations (2 fs step) with different saving frequencies of the protein coordinates: (i) over 2 ps saving every 0.05 ps; (ii) over 200 ps saving every 0.5 ps; (iii) over 1 ns saving every 10 ps; (iv) over 5 ns saving every 100 ps (only conducted for the nonequilibrium propagation as the resulting time points coincided with the equilibrium trajectories). Residues corresponding to and adjacent to mutation sites are colored cyan and orange, respectively (see Figure 3b).

D-NEMD Analysis
At equivalent delays after perturbation, the M pro structure from the nonequilibrium propagation was fitted onto the corresponding equilibrium structure based on the 612 M pro Cα atoms, before calculation of the scalar deviation and equilibrium-to-nonequilibrium displacement vector of every Cα atom. The response of each Cα atom was determined using the Kubo−Onsager approach, comparing the equilibrium and nonequilibrium simulations at equivalent points in time. 37,38 For each time, the Cα atom scalar deviation and displacement vector were averaged over the 150 replicas. According to the Kubo−Onsager relation, the averages yield estimates of the macroscopic time-dependent response of the system. The statistical significance was assessed by the standard error of the mean (SEM; N = 150). Average deviations and displacements were visualized using the M pro -s05 complex structure following energy minimization and prior to any MD simulations, with the aid of PyMOL (v 2.3.0) 45 and the modevectors.py script. 51 ■ ASSOCIATED CONTENT