Binding Mechanism of Riboswitch to Natural Ligand Elucidated by McMD-Based Dynamic Docking Simulations

Flavin mononucleotide riboswitches are common among many pathogenic bacteria and are therefore considered to be an attractive target for antibiotics development. The riboswitch binds riboflavin (RBF, also known as vitamin B2), and although an experimental structure of their complex has been solved with the ligand bound deep inside the RNA molecule in a seemingly unreachable state, the binding mechanism between these molecules is not yet known. We have therefore used our Multicanonical Molecular Dynamics (McMD)-based dynamic docking protocol to analyze their binding mechanism by simulating the binding process between the riboswitch aptamer domain and the RBF, starting from the apo state of the riboswitch. Here, the refinement stage was crucial to identify the native binding configuration, as several other binding configurations were also found by McMD-based docking simulations. RBF initially binds the interface between P4 and P6 including U61 and G62, which forms a gateway where the ligand lingers until this gateway opens sufficiently to allow the ligand to pass through and slip into the hidden binding site including A48, A49, and A85.


Table of contents Section S1
Multicanonical MD algorithm S3 Section S2 McMD-based dynamic docking simulations S3 Section S3 McMD-based dynamic docking analysis S4 Section S4 Binding mechanism analysis S5 Fig. S1 2D structure of the ligand riboflavin S6 Fig. S2 Multicanonical potential energy distribution of the riboswitch -riboflavin dynamic docking S7 Fig. S3 3D structure of picked representative configurations  and the experimental structure S8 Fig. S4 Density distribution of RBF in 3D around the RNA S10 Fig. S5 3D structure of picked representative configurations  and  S11 Fig. S6 RMSF analysis of the RNA S13 Fig. S7 Dynamic Cross Correlation analysis of the RNA S14 Fig. S8 Density distribution of Mg 2+ in 3D around the RNA S15 Fig. S9 Overview of binding pathway structures obtained from the multicanonical ensemble S17 Table S1 Convergence of McMD dynamic docking pre-run simulations S18 Table S2 McMD-based dynamic docking results using subsets of the simulation data S19 Table S3 RMSDs of riboswitch's residues during binding of the ligand RBF along λ S21 Table S4 System & simulation parameters for Riboswitch -RBF binding simulations.S23 Supplementary References S24 Section S1: Multicanonical MD algorithm We used our own developed McMD-based dynamic docking method that has been thoroughly described in various previous papers, [1][2][3][4][5][6][7][8][9] but here we will shortly review the algorithm.The probability distribution of the potential energy of the multicanonical ensemble is defined by the following equation: where E is the potential energy, T0 the simulation temperature, n(E) the density of states and Zmc the partition function: W(E) is a weighting function to modulate the probability distribution Pmc in order for it to become constant and enables the system to take a random walk along the target energy range, and is defined as follows: where R is the gas constant and Pc the canonical energy distribution at T0.During the McMD simulations, this weighting function is used to scale the forces by a factor of ∇W(E), where the multicanonical temperature Tmc, which corresponds to T0/∇W(E), is restricted to a specific target range between Tlow to Thigh, which we generally set at 280 K and 700 K, respectively.Multiple iterations of sampling are required to estimate the correct bias that enables a random walk along a wide energy range, where the weighting function is updated between iterations using:  () =  () + ln  (,  ) (S4) After obtaining a flat potential energy distribution, a production run is executed to sample phase space.Due to the bias applied during the McMD simulations, the resulting multicanonical ensemble must be reweighted to obtain the canonical distribution at room temperature.A multicanonical distribution can be reweighted to a canonical distribution at any given temperature T within the flat energy range using the following equation:

Section S2: McMD-based dynamic docking simulations
To prevent unfolding at high temperatures during the McMD simulations and the initial high temperature dissociation simulation, we employed distance restraints that restrain the hydrogen bonds formed within the RNA (Fig. 1), primarily between the base pairs, using a force constant of 1 kcal/mol/Å/Å and a flat bottom distance of 4.5 Å, except for the hydrogen bonds in the terminal bases (at P1), where a force constant of 10 kcal/mol/Å/Å was used.We used the McMD algorithm described in Section S1, with 30 parallel trajectories initialized with different random seeds for the initial velocity from the initial structure constructed as described in section 1 of the Methods.After initialization, first a 2 ns simulation at Thigh (= 700 K) was performed for each parallel trajectory to randomize the ensemble with the above-described weak distance restraints in place.From here, the initial bias was estimated using Eq.S3 and subsequent iterations of increasing simulation lengths were executed, updating the bias using Eq.S4 between iterations, until a sufficiently flat potential energy distribution had been obtained corresponding to a wide Tmc range of Tlow -Thigh, where Tlow = 280 K, followed by several more iterations for equilibration (Table S1).In total, the pre-run lasted for 27.16 µs (905.24ns per trajectory).Finally, a 60 µs (2 µs per trajectory) production run was executed to sample the structures including bound and unbound states, which were saved at 5 ps intervals, producing 12.0 x 10 6 structures, with the potential energy distribution shown in Fig. S2.

Section S3: McMD-based dynamic docking analysis
We used the same analyses techniques as we did in our recent works. 10First, we performed PCA on a distance array derived from the structures.The array consists of riboswitch -riboflavin pairs and riboflavin -riboflavin pairs.From the riboswitch, the backbone phosphate atom, as well as a representative atom from each base was taken (G/A: N1, U/C: N3), while from the ligand the atoms as indicated in Fig. S1 were taken.The distance based approach does not require prior superposition of the structures unlike the quasi-harmonic approach, 4,11 while taking periodic boundary conditions into account and being a more sensitive approach to detecting intermolecular contacts along the entire surface between the interacting molecules.The structures are then projected onto the first two principal components, and the probability of each bin i on the landscape is calculated as  = ∑   , 300  using each structure j within bin i.The free energy as the Potential of Mean Force (PMF) is finally calculated as  = −  for each bin, giving the 2D FEL after normalizing its minimum to zero.
After the PCA, we performed K-means clustering on the PCA coordinates of the structures, using K = 1000 clusters and a selected number of PC coordinates, so that the sum of the contribution to their variance exceeds 90 %, here corresponding to PC1-PC8, with each PC contributing 37.02%, 21.34%, 16.52%, 9.92%, 2.74%, 1.36%, 1.09%, 0.94% to the configurational variance.For each cluster k', one representative structure was selected and the clusters were then ranked based on the relative free energy at 300 K of the clusters (cluster free energy, CFE), which was calculated as  = −  , where  = ∑   , 300 K using each structure j belonging to cluster k'.Then, using these 1000 representative structures, all-to-all R-value analysis. 12,13was performed.Here, the R-value is a distance-based approach measuring the intermolecular contacts compared to a reference configuration, and is based on the Q-value, 14,15 which measures the fraction native contacts of proteins.Starting from the most stable cluster k'=1 in order of their free energy contribution, similar representative structures in terms of their R-value (R > 0.7) were grouped together and their clusters merged.After re-calculating the CFE of the new clusters, we used a CFE cutoff value of 2.5 kcal/mol to distinguish between potentially interesting structures and less stable ones.This gives us k clusters and their corresponding representative structures rk.
Representative complex structures rk within the CFE cutoff obtained from the multicanonical ensemble were further refined using canonical (NVT) MD simulations at 300 K.For each representative structure, ten 100-ns MD simulations at 300 K were performed (with different random seeds for the initial velocity), with only restraints on the COM of the riboswitch present to prevent translation and rotation of the molecule.Then, refined complex structures qk were picked by taking the nearest-to-average structure from the final 40 ns of the canonical MD simulations.In addition, ten 100-ns canonical MD simulations at 400 K were performed to compare the relative stabilities of the binding configurations, like we have done before. 10ction S4: Binding mechanism analysis We used our previously developed pathing method 12 to construct a binding pathway starting from the equilibrated configuration q4.As we did not use a cylinder to restrain the sampling region of the ligand, we did not have a reaction coordinate to use for the pathing method.Previously, 8 we developed a naïve method to estimate the optimal unbinding direction  ⃑ of a ligand, given its size, the shape of the pocket and the center of the pocket, and applied it in the same way that we had applied it for our previous work docking Bcl-xL with two medium-sized ligands. 16Then, using q4 and the obtained reaction coordinate from the naïve method, we used our pathing algorithm to construct the binding/unbinding pathway.In short, the reaction coordinate is split into pre-defined windows along  ⃑ .In each window, one representative structure was picked from the multicanonical ensemble that is similar in terms of its R-value to of the picked structure from the previous window, where for the initial (λ = 0) window, the structure q4 was used.This produces a smoothly connected pathway of structures along the dissociation direction  ⃑ starting from the structure q4 to the outside state (Fig. S9).        .Density distribution of Mg 2+ in 3D around the RNA.A 100 x 100 x 100 grid (each cell has a dimension of 0.963776 x 0.841263 x 0.60446 Å) was created, in which the presence of Mg 2+ was measured for the reweighted multicanonical ensemble (300 K).The density was min-max normalized, with the isosurfaces at densities 0.1, 0.05 and 0.01 shown as a red surface, green mesh and blue semi-transparent surface.Also shown is the apo conformation of the RNA along with the aptamer sub-domains colored in red, orange, yellow, green, blue and magenta for P1-P6, respectively.(a) front view like Fig. 1     a The McMD-based dynamic docking ensemble was re-analyzed using subsets of the simulation data, starting from the PCA to the analysis of the top ranked (less than 1.0 kcal/mol) structures rk, with the statistics of those structures shown in the table.E.g., for the 25 % (15 µs) dataset, 500 ns of each trajectory (30 parallel trajectories) was used for the PCA, after which those snapshots were re-clustered using those PC coordinates, to finally obtain the representative structures.
Table S3.RMSDs of riboswitch's residues during binding of the ligand RBF along λ.

Fig. S1 .
Fig. S1.2D structure of the ligand riboflavin.Chemical structure of the ligand, with the atoms used to calculate the distance matrix for the PCA indicated.Atoms indicated with a "*" were paired with atoms from the riboswitch, while atoms indicated with a-d were paired with each other.

Fig. S2 .
Fig. S2.Multicanonical potential energy distribution of the riboswitch -riboflavin dynamic docking.Potential energy probability distribution (PMcMD(E)) as sampled during the production run.Also shown are the reweighing canonical distributions (Pc(E, T)) at 300 K, 500 K and 700 K in blue, yellow and red, respectively.

Fig. S3 .
Fig. S3.3D structure of picked representative configurations  and the experimental structure.Representative structures from the dynamic docking simulations rk (colored by subdomains, white for the rest) and the experimental structure (cyan, PDB ID 3F4G) are shown with the sidechains of the nearby residues as their front view.

Fig. S4 .
Fig. S4.Density distribution of RBF in 3D around the RNA.A 100 x 100 x 100 grid (each cell has a dimension of 0.963776 x 0.841263 x 0.60446 Å) was created, in which the presence of the ligand (via its center of mass) was measured for the reweighted multicanonical ensemble (300 K).The density was min-max normalized, with the isosurfaces at densities 0.05, 0.01 and 0.001 shown as a red surface, green mesh and blue semi-transparent surface.Also shown is the apo conformation of the RNA along with the aptamer sub-domains colored in red, orange, yellow, green, blue and magenta for P1-P6, respectively.Finally, the center of mass of the ligand in each configuration is labeled by the corresponding rk.(a) front view like Fig. 1 (b) top view with the molecule rotated 90° about the x-axis.

Fig. S5 .
Fig. S5.3D structure of picked representative configurations  and  .Representative structures from the dynamic docking simulations rk (colored by subdomains, white for the rest) and equilibrated structures qk (cyan) are shown with the sidechains of the nearby residues are shown as their front view.

Fig. S8
Fig. S8.Density distribution of Mg2+  in 3D around the RNA.A 100 x 100 x 100 grid (each cell has a dimension of 0.963776 x 0.841263 x 0.60446 Å) was created, in which the presence of Mg 2+ was measured for the reweighted multicanonical ensemble (300 K).The density was min-max normalized, with the isosurfaces at densities 0.1, 0.05 and 0.01 shown as a red surface, green mesh and blue semi-transparent surface.Also shown is the apo conformation of the RNA along with the aptamer sub-domains colored in red, orange, yellow, green, blue and magenta for P1-P6, respectively.(a) front view like Fig.1(b) top view with the molecule rotated 90° about the x-axis.
Fig. S8.Density distribution of Mg2+  in 3D around the RNA.A 100 x 100 x 100 grid (each cell has a dimension of 0.963776 x 0.841263 x 0.60446 Å) was created, in which the presence of Mg 2+ was measured for the reweighted multicanonical ensemble (300 K).The density was min-max normalized, with the isosurfaces at densities 0.1, 0.05 and 0.01 shown as a red surface, green mesh and blue semi-transparent surface.Also shown is the apo conformation of the RNA along with the aptamer sub-domains colored in red, orange, yellow, green, blue and magenta for P1-P6, respectively.(a) front view like Fig.1(b) top view with the molecule rotated 90° about the x-axis.

Fig. S9 .
Fig. S9.Overview of binding pathway structures obtained from the multicanonical ensemble.Top view of 19 picked configurations along the binding/unbinding pathway, with the riboswitch and riboflavin colored by their λvalue from the bound state (λ = 0 Å, blue) to the outside state (λ = 14 Å, red) in a blue-red gradient.

Table S1 .
Convergence of McMD dynamic docking pre-run simulations.a measured as the standard deviation of the log-probability values corresponding to the energies within the multicanonical range and the simulation length per trajectory (N=30).After achieving a standard deviation of less than 0.2 with a simulation length of 40 ns (iteration #29), five more equilibration iterations at 80 ns are performed to finalize the pre-run.

Table S2 .
McMD-based dynamic docking results using subsets of the simulation data.a Shown are the RMSDs in Å with respect to r4 (and the experimental structure in parentheses) for the nucleotides in and around the pocket for the structures shown in Fig.S8. α

Table S4 .
System & simulation parameters for Riboswitch -RBF binding simulations.Pre-run simulation time 0.90525 µs per trajectory Production-run simulation time 2.0 µs per trajectory