Open-Boundary Molecular Mechanics/Coarse-Grained Framework for Simulations of Low-Resolution G-Protein-Coupled Receptor–Ligand Complexes

G-protein-coupled receptors (GPCRs) constitute as much as 30% of the overall proteins targeted by FDA-approved drugs. However, paucity of structural experimental information and low sequence identity between members of the family impair the reliability of traditional docking approaches and atomistic molecular dynamics simulations for in silico pharmacological applications. We present here a dual-resolution approach tailored for such low-resolution models. It couples a hybrid molecular mechanics/coarse-grained (MM/CG) scheme, previously developed by us for GPCR–ligand complexes, with a Hamiltonian-based adaptive resolution scheme (H-AdResS) for the solvent. This dual-resolution approach removes potentially inaccurate atomistic details from the model while building a rigorous statistical ensemble—the grand canonical one—in the high-resolution region. We validate the method on a well-studied GPCR–ligand complex, for which the 3D structure is known, against atomistic simulations. This implementation paves the way for future accurate in silico studies of low-resolution ligand/GPCRs models.


■ INTRODUCTION
Membrane proteins constitute as much as 60% of the overall proteins targeted by FDA-approved drugs. 1 Among them, Gprotein-coupled receptors (hGPCRs) 2,3 are the largest family, comprising more than 800 members, 4 and the most important one from a pharmaceutical perspective. 5−7 Unfortunately, the lack of experimental structural information for most (more than 90%) hGPCRs 8,9 along with low sequence identity (SI) across these proteins (often below 30% 10,11 ) has hampered rational design efforts for many highly promising hGPCR drug targets. For instance, 99% hGPCRs have an SI < 30% with bovine rhodopsin, 12 the first solved GPCR structure. 13 This condition strongly limits the predictive power of computeraided structural predictions because traditional homology modeling techniques applied to proteins with low SI with their template may lead to rather inaccurate models. 14,15 Allatoms molecular dynamics (MD) simulations, very successful in refining high-quality homology-based models, 16,17 may fail to improve the predictions and even lead to partial unfolding 18−21 if one starts from structures with highly inaccurate side-chain orientations as encountered in very low resolution protein models. An alternative strategy to address this issue consists of including the minimum number of degrees of freedom of the system for the specific problem that one has in mind, leaving out unreliable information that could bias the simulation results. 18,22,23 In this context, hybrid multiscale simulation methods, transcending a single, uniform resolution, represent a highly optimized approach to predict ligands poses; 24−27 on one hand, a high-resolution, atomistic description of the region of interest (which includes the binding site) allows unveiling of the interactions between the receptor and the ligand; on the other hand, the surrounding environment can be described in a coarse-grained, less computationally intensive way.
Here, we describe and validate a novel hybrid multiscale approach, the open-boundary molecular mechanics/coarsegrained (OB-MM/CG) scheme, tailored for the study of lowresolution GPCR−ligand complexes. It involves a concurrent multiscale simulation of both the protein and the water molecules ( Figure 1).
(i) The dual-resolution representation of the protein 28 is based on a scheme previously developed in our team, 22,28−30 successfully applied to predict binding poses in several lowresolution GPCR−ligand complexes. 18,19,23,31 It aims at preserving an atomistic description of the binding pocket and of the neighboring residues (MM region, described with the Gromos force field 32 ) while leaving to a coarse-grained description the rest of the protein 22 (CG region, where each residue is modeled through a CG bead centered on the C α and interacting with the other beads through a Go ̅ -type potential 33 ). An intermediate region (I) between the MM and the CG domains ensures the coupling between the two levels of description and the protein backbone integrity. I residues are described atomistically, and their C α and C β atoms interact with CG beads according to the CG potential. The selection of the atoms in the MM, I, or CG regions does not require updates during the simulation. The presence of the membrane is implicitly modeled as a series of boundary potentials ( Figure 1): two repulsive potentials placed at the level of the lipids' heads prevent penetration of water molecules in the membrane space, while a softened Lennard-Jones-like potential (2-1) acting on transmembrane residues' C α mimics the interactions between the protein and the membrane.
(ii) As opposed to the original MM/CG scheme, where hydration around the binding site is taken into account by introducing a droplet of atomistic water molecules confined through a repulsive potential preventing water evaporation ( Figure SI 1), in the new implementation we adopt a multiscale representation of the solvent, based on an adaptive resolution scheme 34−37 in the Hamiltonian formulation (H-AdResS). 38 This allows water molecules to freely diffuse between fully atomistic (MM w ) and coarse-grained (CG w ) domains, maintaining a uniform density between the two while changing on-the-fly their resolution. The MM w regions, where atomistic water is described by the SPC/E potential 39 (V w   MM   ), are shaped as hemispheres capping the extracellular and intracellular domains of the protein, whereas coarse-grained water is present outside of these boundaries in a bigger box (CG w ). Coarse-grained water is modeled by beads coinciding with the center of mass (CoM) of each molecule, interacting through a potential, V w CG , derived from fully atomistic simulation of pure water at the same state point through iterative Boltzmann inversion. 40 The smooth coupling between the two domains is achieved by introducing an intermediate hybrid region (HY), where the potentials describing the MM w and CG w models are linearly combined through a switching function λ α = λ(R α ), which depends on the instantaneous position R α of the CoM of the α water molecule. Specifically, λ α smoothly changes from 1 to 0 when moving from MM w to CG w boundaries. A correction term is added to water molecules in the HY region to ensure uniform density across the MM w , HY and CG w domains. This term preserves a constant chemical potential, 41 leading to the simulation of a grand canonical ensemble in the high-resolution region. In this respect, the CG w region can be regarded as a reservoir of coarse-grained water molecules, and we call the overall setup open-boundary (OB)-MM/CG (see the Theory section for a more complete description of the method).
Compared to the previous MM/CG method, the new OB-MM/CG enables thus a priori rigorous ligand binding free energy calculations. 41 Moreover, by ensuring the free diffusion of water between the binding cavity and the CG W reservoir, it prevents possible artifacts due to the solvent confinement in the hemisphere while keeping the number of the additional degrees of freedom smaller than a fully atomistic simulation. Given the recognized role of water for ligand binding in membrane receptors 42−47 and the overall interplay between water dynamics and protein properties, 48 this is likely to improve the description of interactions between the ligand and the protein. In this paper, we tackle in particular this point.
It is worth noticing that the applicability of H-AdResS for the simulation of water solvating biomolecules has been recently assessed 49 using as test systems fully atomistic cytoplasmic proteins in the center of a spherical MM w region (OB-MM). However, it has never been applied in the presence of a dual-resolution protein and, importantly, never in the presence of a discontinuity, here represented by the repulsive walls mimicking the membrane, which break the spherical symmetry of the MM w region of H-AdResS tested in the previous OB-MM implementation. 49 The higher pressure attained by the CG w model with respect to the atomistic one requires a correction of the repulsive membrane potential in order to impose the correct virial pressure on the CG w domain 50−52 and guarantee a uniform density profile in the vicinity of the membrane planes. Details on the derivation of this correction term are given in the Theory section.
The paper is organized as follows. After the Theory section, where the OB-MM/CG method is presented in detail, and a

Article
Methods section with the simulation details, we validate the approach by comparing structural and dynamical properties computed with the OB-MM/CG scheme with those from allatom MD simulations of a GPCR−ligand complex, namely, the β2-adrenergic receptor in complex with its inverse agonist Scarazolol. 53 This validation represents the necessary preliminary step toward future usage of the method for the calculation of binding free energies, on which we are currently working. For comparison, we present results obtained from simulations performed using the original MM/CG scheme as well.

■ THEORY
The total Hamiltonian of the system reads K represents the total kinetic energy. We now describe the terms associated with the potential energy of the system. V p is the protein's potential energy 22 The first term on the right-hand side of eq 2 describes the MM and interface (I) regions of the protein, along with the ligand. ). V p CG represents the CG protein potential energy. The mapping employed for the coarse-graining procedure reduces each residue to a singe CG bead, centered on the C α . V p CG is a Go ̅ -type potential. 33 It integrates out the degrees of freedom defining the chemical nature of the CG residues while preserving the folded structure of the protein and its characteristic fluctuations by stabilizing the native contacts. It includes both bonded and nonbonded interactions between CG protein beads; the former are modeled as a quartic potential and the second as a Morse-type potential. 22 The same scheme is applied to model the interactions between the CG and I regions. In particular, the nonbonded potential is applied here not only between C α 's but also between C α belonging to CG residues and C β belonging to I residues.
According to the H-AdResS 38 scheme, water is described by the potential energy term V w , which includes the contributions from the atomistic water molecules in the MM w region (V w MM ), and those from the coarse-grained water molecules in the CG w region (V w CG ). V w reads 38 As pointed out in the introduction, V w MM is described by the SPC/E water potential energy, 39 while V w CG is obtained through iterative Boltzmann inversion. 40 The switching function λ α = λ(R α ) depends on the position R α of the CoM of the α water molecule. 38 Its value is 1 in the MM w region, 0 in the CG w region, and smoothly changes from 1 to 0 in the HY region. The term ( ) λ Δ α represents a correction that is applied independently on each water molecule in the HY region. It has the double purpose of (1) removing on average the effect of the spurious force that emerges as a consequence of the linear combination between V w MM and V w CG38,41 and (2) correcting for the density imbalance across the MM w and CG w regions. 54 Specifically, Here ΔF(λ α ) and Δp(λ α ) represent, respectively, the Helmholtz free energy difference and the pressure difference between a system at hybrid resolution λ α and the CG w system at resolution λ α = 0. N is the number of water molecules, and ρ 0 is the reference density of water. Correspondingly, in each subdomain, the pressure attains the value at which that model (either atomistic or coarse-grained) gets the target density. 54 The term V p/w describes protein/water interactions. Only MM w water molecules are in contact with the protein. The interactions between atomistic water and the atomistic region of the protein are described by the standard atomistic force field, while the interactions between atomistic water and CG residues are modeled with a Lennard-Jones potential so as to reproduce the excluded volume around the C α of the CG residues. 55 The last term in eq 1, V surf , is the potential mimicking the membrane effects. Specifically, it contains a term aiming at reproducing the interactions between the protein and the membrane and a term preventing water penetration in the transmembrane space. The first is represented by a softened Lennard-Jones-like potential (2-1), with the minimum at a distance r p from a virtual surface enveloping the transmembrane portion of the protein, acting on the C α of the transmembrane protein residues and on each atom of TRP and TYR residues. 22 This term constrains the shape of the protein while providing a good degree of flexibility. The second one is a repulsive term proportional to 1/r, where r is the distance from virtual planar walls coinciding with the heads of the lipid bilayer. 22 These virtual walls represent boundaries that introduce spatial inhomogeneity in the solvent. Because of the higher pressures attained in the CG w region with respect to the atomistic one, the water confinement achieved by applying the repulsive force mimicking the membrane is less effective in the CG w region than that in the MM w one. Therefore, closer to the repulsive membrane surface, CG w water expands more than MM w water, leading to density fluctuations in the vicinity of the membrane upon passing from CG w to MM w regions (Figure 2a,c). This requires the introduction of a correction of the repulsive boundary potential in order to impose the correct virial pressure on the CG w domain. 50−52 The correction of the repulsive force is proportional to the gradient of the difference between the target density profile (resulting from an atomistic simulation) in the direction perpendicular to the membrane, ρ t (r), and the current CG w density profile, ρ CG (r) 56 where r is the distance to the planar wall. In this way, a uniform depletion layer in proximity of the repulsive planes of the membrane is guaranteed (Figure 2b,c), and the uniform bulk density across MM w , HY, and CG w is preserved ( Figure SI 2).
In the latter, only small fluctuations in the HY regions, smaller than 5%, are observed.
Here we stress that the differences between the new OB-MM/CG scheme and the previous MM/CG lie in the boundary conditions and in the solvent representation, while the hybrid scheme used for the protein−ligand system is identical. Comparing term by term the MM/CG Hamiltonian

Journal of Chemical Theory and Computation
Article with the OB-MM/CG one (eq 1), the potential energy internal to the protein, V p , is identical; the potential term for water, V w , reduces to the MM w term, V w,α MM , with λ α = 1 (no coarse-grained water is present in the MM/CG); the V surf term contains an additional boundary potential, proportional to 1/r, to confine water in the hemispherical domain (placed above the binding cavity) and prevent its evaporation. Moreover, the repulsive potential from the membrane planes is uniform in the xy-plane, as opposed to the analogous term in the OB-MM/CG scheme.
■ METHODS System Setups. The system simulated is the human β2adrenergic receptor (β2-AR) in complex with its inverse agonist S-carazolol (PDB code: 2RH1). 53 The third intracellular loop (ICL3, between residues 231 and 262), lacking in the PDB file, was modeled using the Web server HHPred 57 in conjunction with MODELLER, 58 as in ref 22. The system was first equilibrated by all-atoms MD. To this aim, the protein was inserted into a bilayer of 1-palmitoyl-2-oleoyl-phosphatidylcholine (POPC), which represents the most abundant phospholipid in animal cell membranes. 59 After having solvated the box with SPC/E water molecules, 39 the total size of the system amounted to ∼232000 atoms. The protein is described using the Gromos43a1 force field, 60 while the parameters for the POPC are those by Berger, Edholm, and Jaḧnig. 61 The fully atomistic setup differs from a previous atomistic simulation of the same system 62 (employed in the validation of the MM/CG approach 22 ) in the choice of the protein force field and in the composition of the membrane. The atomic charges for the ligand have been derived by RESP fitting 63−65 using the Gaussian03 package, 66 as in ref 62. Details on the simulation parameters for the equilibration step are reported in the next section.
In the OB-MMCG setup, performed using the equilibrated structure, the radius of each MM w hemisphere, r MM , is set to 3.4 nm, while the width of the HY region, r HY , is 1.2 nm. The former is chosen so that the distance between the atomistic protein and the boundary of the MM w region is at least 1.6 nm, as suggested in ref 49; the latter, instead, corresponds to the cutoff used for the nonbonded interactions, so that water molecules at the border of the MM w region do not interact directly with CG water molecules. The x,y-coordinates of the center of the MM w hemisphere correspond to the x,ycoordinates of the CoM of the protein. The distance between the membrane-mimicking planes is 52 Å. As in previous MM/ CG works, 22 Comparisons are made both with an all-atoms MD, in order to validate the OB-MM/CG simulation, and with the original version of the MM/CG in order to assess possible differences in structural and dynamical properties. The setup for the allatoms production runs is the same as that for the fully atomistic equilibration. In the MM/CG simulation, the protein is coarse-grained in the same way as in the OB-MM/CG case. The radius of the hemisphere, r hemi , is slightly bigger than r MM ; in this way, a similar number of atomistic water molecules (∼2200) can be accommodated with the right density when the potential preventing evaporation is switched on. The planar walls are placed as in the OB-MM/CG system. The total number of particles in the MM/CG system is ∼8100.
Equilibration and Production Simulations. The equilibration simulation, fully atomistic, has been performed with Gromacs 5.1.2. 67 First, the all-atoms system underwent 1661 steps of minimization using the steepest descent integrator. Subsequently, the temperature was slowly raised to 300 K using the Nose−Hoover thermostat; 68 once the final temper-

Journal of Chemical Theory and Computation
Article ature was reached, we ran a 300 ns simulation in the NPT ensemble, fixing the pressure at 1 bar by means of the Parrinello−Rahman barostat, 69 with a time constant of 5.0 ps. The pressure coupling was isotropic in the x,y-directions and independent in the z-direction. In the NPT simulation, a temperature of 300 K was controlled through a leapfrog stochastic dynamics integrator, 70 with an inverse friction constant of 0.5 ps. For all of the simulations, a 2 fs integration time step was used, constraining the hydrogen-containing bonds with the LINCS algorithm. 71 A cutoff of 1.2 nm was used for electrostatics and van der Waals interactions. Periodic boundary conditions were used with the minimum image convention. The equilibrated system was used as a starting structure for the OB-MM/CG, MM/CG, and fully atomistic production runs. All of these simulations were performed with the same parameters as the equilibration but in the NVT ensemble. The OB-MM/CG and MM/CG simulations were performed using customized versions of Gromacs 4. 72 For the calculation of structural properties, two 10 ns replicas of the production run were performed on each system with a sampling time of 10 ps. For calculation of dynamical properties, two replicas, each 2 ns long, were performed with a sampling time of 0.05 ps.
Postprocessing Analyses. Visual inspection of the trajectories was made using VMD 1.9.2. 73 Data analyses were performed with GROMACS utilities, VMD, and in-house scripts. The 10 ns replicas were used for the calculation of structural properties and the 2 ns replicas for dynamical properties. Water properties were calculated on the MM w regions of the OB-MM/CG and MM/CG simulations and on the corresponding, hemispherical volume of the all-atom system (with an approach similar to that in refs 22, 49, and 74). The probability distribution of the water tetrahedral order parameter was calculated as in ref 75. The reorientation time correlation function (tcf) of the water OH bonds was defined as ⟨P 2 [u(0) · u(t)]⟩, where P 2 is the second-order Legendre polynomial and u is the vector along the OH bond. The characteristic reorientation times were computed as the integral of the corresponding tcfs. Receptor ligand hydrogen bonds were identified assuming that an H-bond is formed when the distance is less than 3.5 Å and the angle is less than 30°. Hydration of the binding site has been assessed by monitoring the number of water molecules at a maximum distance of 5 Å from the ligand. The reorientational tcf of the ligand in the binding pocket was computed with respect to the angle between the vector crossing the carbazole ring of the ligand and the z-axis (defined as the axis perpendicular to the membrane plane).

■ RESULTS AND DISCUSSION
Our approach is validated by comparing structural and dynamical properties computed with the OB-MM/CG scheme with those from all-atoms MD simulations of the human β2adrenergic receptor 76 (β2-AR) rhodopsin-like (or "Class A") hGPCR. β2-AR is an important target for a variety of drugs, including the FDA-approved antiasthma agonist salbutamol. 77 We focus on the complex with the inverse agonist S-carazolol [4-(2-hydroxy-3-isopropylamino-propoxy)-carbazole] 78−80 (Chart 1), for which experimental structural information (PDB code 2RH1 53 ) and MD studies 62,81 have been reported. The same analyses have been carried out using the original MM/ CG scheme as well in order to assess possible differences. In the following analysis, hydration water refers to the water molecules in the hemispherical MM w region of MM/CG and OB-MM/CG simulations and to those in an equivalent hemispherical region in the case of all-atoms MD ( Figure SI  3).
Structural properties of the hydration water, such as the oxygen−oxygen and oxygen−hydrogen pair radial distribution functions ( Figure SI 4) and the tetrahedral order parameter 75 (Figure 3), show that the OB-MM/CG is in closer agreement with fully atomistic simulations than the MM/CG. We observe here that also the average orientation times of hydration water molecules are similar (1.88 ± 0.10 and 1.82 ± 0.03 ps for OB-MM/CG and all-atoms MD, respectively, as opposed to 2.07 ± 0.03 ps for MM/CG). These are slightly higher values than that for all-atoms MD of pure water using the same force field as the one used here (SPC/E) 39 (1.7 ps 82 ). This can be related to the presence of the protein, which is known to slow down the dynamics of water molecules in its first hydration shells. 83 In accordance with the experimental structural study 53 and the reported all-atoms MD simulations, 62,81 the carbazole ring of the ligand forms, in our simulations, hydrophobic interactions with the side chains of residues Trp286 (6.48 according to Ballesteros−Weinstein numbering 84 ), Phe289 (6.51), and Phe290 (6.52) (Figure SI 5a,c), as well as directed H-bonds with Asp113 (3.32), Ser203 (5.42), and Asn312 (7.39) (Figure SI 5b,d). The ligand−Asn312 interaction is at times mediated by a water molecule, as in previous MD studies. 62 Another water molecule bridges the ligand and residue Tyr316 (7.43). In the MM/CG simulation, in addition to the mentioned interactions, Ser204 (5.43) competes with Ser203 for binding to the ligand. Histograms of the H-bond distances involving residues Asp113, Ser203, and Asn312 as Chart 1. Chemical Structure of the Inverse Agonist S-Carazolol a a The orange dashed line represents the vector coplanar to the carbazole ring that is used for calculation of the reorientational tcf ⟨P 2 (cos θ)⟩. Figure 3. Histograms of the tetrahedral order parameter q tet for the water molecules above the binding site, calculated using all-atoms, MM/CG, and OB-MM/CG approaches.

Journal of Chemical Theory and Computation
Article obtained from OB-MM/CG, all-atom MD, and MM/CG are displayed in Figure 4. The OB-MM/CG scheme reproduces fairly well the atomistic case, both in terms of the average and variance of the distances, which are instead slightly underestimated by the original MM/CG in the case of Asp113 and Asn312. In the case of Ser203, the overall distribution obtained from MM/CG is shifted to larger distances than that in fully atomistic or OB-MM/CG, very likely because of the competitive interaction with Ser204.
We also investigate ligand dynamics through the tcfs relevant to the reorientational dynamics of the vector crossing its carbazole ring ( Figure 5). In the system under study, we observe that the reorientational time scales estimated by atomistic MD and OB-MM/CG are consistent among them, while MM/CG produces faster relaxation dynamics. The plateau values of the tcf, related to the degree of restriction of the reorientational dynamics, show that both atomistic MD and OB-MM/CG lead to less restricted dynamics compared to MM/CG. These differences are consistent with the root-meansquare fluctuation (RMSF) values of C α atoms of the residues stabilizing the ligand through hydrophobic interactions or hydrogen bonds ( Figure 6). They show that indeed these residues can in general undergo smaller fluctuation in MM/CG simulations, indicating more rigid binding compared to atomistic MD or OB-MM/CG. The spreading of ϕ and ψ Ramachandran angles estimated through the PAD ω parameter 85 (Figure SI 6), on the other hand, shows that these residues maintain comparable torsional plasticity in the three simulation schemes. Regarding hydration properties, Figure 7, illustrating the number of water molecules in the binding cavity, clearly proves that the binding pocket hydration in the OB-MM/CG simulation is more representative of the fully atomistic case with respect to the MM/CG simulation. Therefore, although the original MM/CG scheme reproduces ligand poses fairly well, as proved here and for a plethora of protein−ligand complexes including GPCRs, 18,19,22,23,30,31 the more realistic hydration achieved in our system through OB-MM/CG can explain the observed improvements in the description of the binding configurations explored by the ligand−protein complex and of the binding site flexibility in particular.

■ CONCLUSIONS
This paper introduced a novel hybrid multiscale approach, named OB-MM/CG, for the simulation of membrane protein−ligand complexes. The method couples concomitant atomistic/coarse-grained representations of the protein and the solvent, while the cell membrane is represented implicitly through a series of confining potentials. Specifically, the representation of the protein−ligand complex (based on the MM/CG scheme) aims at reducing the number of degrees of freedom of residues far from the binding site, making the approach tailored for low-resolution protein models where the inaccurate orientations of side chains would introduce artifacts in all-atoms simulations. On the other side, the implementation of H-AdResS for the representation of hydration water leads to the simulation of a rigorous statistical ensemblethe grand canonical onein the region at atomistic resolution, allowing a more accurate description of hydration and ligand− protein interactions and, in principle, opening the possibility of binding free energy calculations.
In this paper, we validated the OB-MM/CG method on a well-studied GPCR, the β2-adrenergic receptor, in complex with its inverse agonist S-carazolol. Structural and dynamical properties of both the solvent and the complex in the OB-MM/CG simulations are in good agreement with results from fully atomistic simulations. Moreover, some analyses (namely,

Notes
The authors declare no competing financial interest.

Journal of Chemical Theory and Computation
Article