Machine Learning-Based Modeling of Olfactory Receptors in Their Inactive State: Human OR51E2 as a Case Study

Atomistic-level investigation of olfactory receptors (ORs) is a challenging task due to the experimental/computational difficulties in the structural determination/prediction for members of this family of G-protein coupled receptors. Here, we have developed a protocol that performs a series of molecular dynamics simulations from a set of structures predicted de novo by recent machine learning algorithms and apply it to a well-studied receptor, the human OR51E2. Our study demonstrates the need for simulations to refine and validate such models. Furthermore, we demonstrate the need for the sodium ion at a binding site near D2.50 and E3.39 to stabilize the inactive state of the receptor. Considering the conservation of these two acidic residues across human ORs, we surmise this requirement also applies to the other ∼400 members of this family. Given the almost concurrent publication of a CryoEM structure of the same receptor in the active state, we propose this protocol as an in silico complement to the growing field of ORs structure determination.

tiple sequence alignments built on state-annotated data (i.e., using only active or inactive conformations for GPCRs). Since our intention was to model the apo receptor (i.e., without ligands), we therefore chose the inactive conformation. In practice, such model was obtained from GPCRdb 5 (https://gpcrdb.org), which maintains an updated database of GPCR models generated with AlphaFold-MultiState.
SwissModel (SM). A homology model was retrieved from the SwissModel repository. 9 The template used was the human neuropeptide Y receptor type 2 (PDB structure 7X9B, chain D), which has a sequence identity of 18.12% with hOR51E2. Unfortunately, hORs exhibit a low sequence identity ( < 20%) with other class A GPCRs and thus a better template cannot be found. 10 The SM model covers residues 22-307.

System preparation
Initially, we preprocessed all the structures obtained via ab initio and homology modeling algorithms by using the Protein Preparation Wizard implemented in Schrödinger Maestro version 2022-3, 11 which automatically assigns amino acid protonation states based on their microenvironment. Two exceptions were represented by D69 2.50 and E110 3.39 , that were kept in the charged state. All the structures prepared in this way were further processed via the online interface of CHARMM-GUI 12,13 (https://charmm-gui.org/). First, we built a disulphide bond between C96 3.25 and C178 45.50 . Then, we defined a cubic simulation box S2 with dimensions (100Å) × (100Å) × (120Å), with the receptor in the center, embedded in a POPC membrane. The lipid bilayer and the receptor were solvated in water with a NaCl concentration of 150 mM, in line with standard experimental and physiological conditions for GPCRs. The protein, lipids, and ions were parameterized using the CHARMM36m force field, 14 while water was described with the TIP3P 15 model.
• step 6: 100 ns of MD with a time step of 2 fs, restraining the protein backbone (k = 50 kJ/mol/nm 2 ); this step is 10-fold longer than in the standard CHARMM-GUI protocol.
• step 7: 100 ns of MD with a time step of 2 fs, restraining the protein backbone (k = 5 kJ/mol/nm 2 ); this is a completely new step, which increases the length of the restrained equilibration.
The last, production step (8) consists of 2.5 · 10 8 steps with a time step of 2 fs (500 ns) of unrestrained MD simulation. For both van der Waals and short-range interactions, a cutoff distance was set at 10Å and long-range electrostatic interactions were computed via the smooth particle mesh Ewald summation method. 16 Temperature was kept at 310 K via velocity rescale thermostat 17 with a coupling time of 0.2 ps on three groups: protein, membrane, and water and ions. Pressure was kept constant at 1 bar with the semi-isotropic cell rescale barostat 18 with a coupling time of 0.5 ps. All the simulations were performed using GROMACS 19 version 2021.2; three replicas were run for each model.

A 100 activation index
A 100 is an estimator of the activation state of class A GPCR structures, 20 defined as a weighted sum of distances between C α atoms of selected residues, namely: where d(X, Y ) is the Euclidean distance between residues X and Y . In equation (1) the amino acid names are based on the generic residue numbering proposed by Ballesteros and Weinstein 21 (BW) for class A GPCRs. In the case of hOR51E2, the BW numbering was taken from GPCRdb 5 (see Table S1). We would like to note here that, because of the lack of conservation of some class A GPCR motifs in ORs (see Table S3), the nature of some residues is different, yet the BW position is the same. Thus, the A 100 activation index for hOR51E2 is defined as: This estimator has been implemented in PLUMED 2.8. 22,23 As shown in Figure S1, the A 100 index remains around a value of 11 throughout the entire 500 ns simulation for all six systems, regardless of the replica considered.  Figure S2: Time evolution of the RMSD of the heavy atoms of the whole protein for the eighteen MD simulations of sodium-bound hOR51E2 performed in this work.  Comparison of the in silico models with sodium ion bound We assessed the similarity of the conformations explored during the MD simulations by calculating the mutual backbone RMSD and using Multidimensional Scaling (MDS), a nonlinear dimensionality reduction approach to visualize the information contained in the RMSD matrix into a 2D projection, 24 shown in Figure S3. We qualitatively confirmed the same results as in Figure 1: AF and OF are superimposed, while all the other models are wellseparated.
In addition to clustering analysis reported in the main text, we also performed a structurebased sequence alignment of the cluster centroid structures shown in Figure 3, using the STAMP algorithm implemented in the MultiSeq tool 25 of VMD 26 (version 1.9.4a57). Such alignment ( Figure S4) reveals that TM6 contains the largest number of mismatches within the helix, followed by TM7. This is in line with the differences in the TM6-TM7 interface described in the main text (see Figure 4). Figure S3: MDS-based similarity representation of all the sodium-bound hOR51E2 trajectories. S11 Figure S4: Structure-based multiple sequence alignment of the cluster centroid structures shown in Figure 3. The MSA was visualized using ESPript tool. 27 Residues identically aligned in all six models are displayed with white characters and enclosed in a box colored according to their physicochemical properties, residues showing a mismatch in one model are shown within a grey box and font color based on their properties, and gaps and residues mismatched in two or more models are shown in black font.

experimental structure of hOR51E2
We used the cryo-EM structure of hOR51E2 in the active state (PDB 8F76) to validate the two best inactive models according to our MD-based protocol, AF/OF and AF in (i.e. the centroid structures of clusters 1 and 4 in Figure 2). We first calculated the backbone RMSD of the seven transmembrane (TM) helical bundle, using the definition of the TM helices listed in Table S1 and the experimental structure as reference. The obtained RMSD values are 2.46 A and 3.54Å for the AF/OF and AF in models, respectively. However, such overall RMSD reflects both structural differences due to the quality of the models and conformational changes upon receptor activation. Indeed, the experimental structure was solved in the active state, whereas the computational models correspond to distinct inactive states (see Figure S1).
Hence, we additionally quantified the structural changes due to the different activation state of the three structures using the 7x7 RMSD matrix of the individual TM helices, as defined in reference. 28 Such matrix was obtained with VMD 26 (version 1. for activation of hOR51E2 29 in particular and class A GPCRs in general. 30 In this regard, it is worth noting that the AF/OF models were trained on both active and inactive GPCR experimental structures, whereas AF in only on inactive GPCR structures. In addition, we analyzed (i) the residue-residue contacts at the interface between TM6 and TM7 and (ii) the side chain orientation of the residue lining the ligand binding site based S13 on the cryo-EM structure. Such comparisons are reported in the main text and revealed a better qualitative agreement of the AF in model with the experimental structure. Figure S5: 7x7 RMSD calculation of AF/OF (Left) and AF in (Right) cluster centroids vs. the experimental hOR51E2 cryoEM structure (PDB 8F76).