GradNav: Accelerated Exploration of Potential Energy Surfaces with Gradient-Based Navigation

Exploring the potential energy surface (PES) of molecular systems is important for comprehending their complex behaviors, particularly through the identification of various metastable states. However, the transition between these states is often hindered by substantial energy barriers, demanding prolonged molecular simulations that consume considerable computational resources. Our study introduces the gradient-based navigation (GradNav) algorithm, which accelerates the exploration of the energy surface and enables proper reconstruction of the PES. This algorithm employs a strategy of initiating short simulation runs from updated starting points derived from prior observations to effectively navigate across potential barriers and explore new regions. To evaluate GradNav’s performance, we introduce two metrics: the deepest well escape frame (DWEF) and the search success initialization ratio (SSIR). Through applications on Langevin dynamics within Müller-type PESs and molecular dynamics simulations of the Fs-peptide protein, these metrics demonstrate GradNav’s enhanced ability to escape deep energy wells and its reduced reliance on initial conditions, as denoted by the reduced DWEF values and increased SSIR values, respectively. Consequently, this improved exploration capability enables more precise energy estimations from simulation trajectories.


Introduction
The reconstruction of potential energy surface (PES) based on molecular simulation is important for modeling and understanding molecular systems, especially proteins.Exploring this energy surface through molecular simulations enables an understanding of the intricate behaviors of complex molecular systems.A crucial aspect of these explorations is the identification of multiple metastable states, which manifest as local minima or potential wells in the energy surface. 1 These states are essential for gaining insights into the characteristics of atomic and molecular systems.3][4][5] Additionally, identifying the boundaries of metastable states in superheated crystals and supercooled liquids plays a crucial role in understanding nucleation behavior. 6,7All-atom molecular simulations, however, often face a significant challenge: they tend to get trapped in deep potential wells of lower energy states. 8,9This entrapment necessitates excessively long simulation trajectories to thoroughly escape these wells and continue exploration across the energy landscape.Such prolonged simulations demand substantial computational resources, rendering them less feasible for many applications.
To address these challenges, numerous enhanced sampling methods have been introduced. 8These methods typically modify the physical parameters of the simulation to facilitate the exploration of the energy surface, such as by applying bias potentials or elevating the temperature.For example, bias potentials are utilized in metadynamics, 10,11 umbrella sampling, 12 and variationally enhanced sampling (VES) 13 to help the system escape from potential energy wells and sample a broader range of configurations.Techniques such as replica exchange molecular dynamics (REMD) 14 and temperature-accelerated dynamics (TAD) 15,16 increase the temperature of the simulation to accelerate the occurrence of rare events, thereby enabling the study of processes that occur over longer timescales.However, the introduction of bias or temperature perturbations requires careful correction to recover unbiased physical properties, as these modifications can alter the fundamental nature of the events being studied or lead to different mechanisms being observed. 17,18 response to these limitations, we propose an observation-driven algorithm designed to accelerate the exploration of molecular systems while adhering strictly to the original physics of the system.This algorithm leverages the concept of restarting short chunks of simulations, guided by previous observations, to systematically optimize the starting point away from previously explored regions.By updating the initial points based on the gradient of observation density, this method systematically directs the exploration away from regions with high observation concentration.This facilitates quicker escape from deep potential wells, thereby enabling the investigation of nearby regions for the discovery of additional metastable states.Importantly, this methodology preserves the original physical settings of the system, guaranteeing the authenticity of each simulation chunk and thus ensuring a reliable investigation of the energy surface.By relying exclusively on observations from previous simulations to guide the exploration towards unexplored states, without imposing any artificial biases to the physical settings, our approach offers a cost-effective and physically consistent strategy for navigating the energy surface.
0][21][22][23] For example, autoencoders have demonstrated success in translating the Brownian dynamics trajectories of a 2D energy surface into a latent space representation. 24Additionally, supervised machine learning has been employed to pinpoint suitable collective variables for study. 25Furthermore, flow-based Boltzmann generators have effectively mapped the structure of the bovine pancreatic trypsin inhibitor (BPTI) protein into a latent space, while retaining distributional characteristics from the original space. 26Successful mapping of atomic systems into latent space preserves their real-space distribution.
Hence, areas densely populated in real space should correspond to similarly dense regions in latent space.This indicates the feasibility of applying observation-driven exploration within latent space.
This study presents the observation density gradient-based navigation (GradNav) algorithm, designed to enhance the exploration of potential energy surfaces by facilitating the escape from deep potential wells.The algorithm is designed to accelerate the exploration process across molecular energy surfaces while maintaining the original physical parameters of molecular simulations-such as potential energy and temperature-unchanged.This approach guarantees that the resulting trajectory is a true reflection of the system's inherent physical behavior.Furthermore, we introduce two metrics to evaluate the algorithm's ability to escape from deep potential wells and its robustness against variations in initialization points.Utilizing these metrics, we assess the algorithm's effectiveness with model systems under Langevin dynamics within both Müller potential energy surface and its modified version. 27The algorithm's validity is further confirmed through its application to a real-world example: the molecular dynamics simulation of the Fs-Peptide protein.[30]

Framework
The observation density gradient-based navigation (GradNav) algorithm introduces a datadriven method for effectively navigating the potential energy surface, specifically designed to address the common challenge in molecular simulations: escaping deep potential wells once trapped.Unlike ordinary molecular simulations that rely on prolonged simulation runs to escape potential wells, this approach employs repeated initiations of molecular simulations to facilitate a more dynamic exploration of the energy surface.Utilizing the observation density gradient, this method intelligently directs simulations toward less explored, potentially more insightful areas.As illustrated in Figure 1, the algorithm incorporates two iterative loops of molecular simulation: the outer loop runs molecular simulations with relatively long frames to calculate the observation density gradient and determine the boundaries of potential wells; the inner loop conducts shorter, exploratory simulations to discover new areas without using excessive computational power.Although the duration of the outer loop simulations exceeds that of the inner loop ones, these frames are considerably shorter compared to conventional extended molecular simulations.
The algorithm starts with a molecular simulation in an outer loop, which runs for a relatively long frame sequence, often exceeding 100 frames.This phase aims to accumulate sufficient trajectory data for calculating observation density, spatial boundaries of observations, and a centroid representing their average location.In this study, we designated 300 to 500 frames for this purpose.While this number varies by system, exceeding 100 frames generally suffices for these calculations.The boundary defined in the outer loop run is used to determine if subsequent simulations manage to identify a new potential well or a metastable state.
Upon the centroid's placement within this boundary, the algorithm's inner loop is activated.This indicates a failure to detect multiple potential wells, thereby necessitating the initiation of a new simulation for further exploration.The inner loop begins with molecular simulations of excessively short duration.These simulations are conducted iteratively, with an update rate that gradually escalates, until the centroid of a subsequent run extends past the boundary defined by the preceding outer loop run.The update of the initialization point of the simulation is defined as follows: A new initialization point x i n+1 is determined using the final point from the preceding run x l n , by applying the update rate -

Escaping the Deep Potential Well
The GradNav algorithm is primarily designed to expedite the escape of molecular simulations from deep potential wells effectively.To quantify this escape capability, we introduce a metric denoted as the deepest well escape frame (DWEF), which measures the number of frames required for the simulation to exit the deepest potential well.We conduct comparisons using Langevin dynamics simulations of a single particle within the Müller potential and its modified versions.The Müller potential is a widely recognized potential energy surface, extensively studied for its characteristics, especially in evaluating the efficacy of algorithms designed to identify reaction paths or metastable states. 24,26,27It features three minima: two are relatively deep, and one, located between the two, is comparatively shallow.
Modifications to the Müller potential include creating a deeper valley amidst two comparatively shallow metastable states, 27 aiming to test the algorithm's ability to navigate across the deepest valley to reach these shallower wells.Further details on the potential energy surface are provided in the Methods section.
Simulations initiate from the initial positions situated at the deepest potential wells within each energy surface.As these simulations progress, we track the number of frames needed to escape these wells.The trajectories generated from both Langevin dynamics simulations and the application of the GradNav algorithm are depicted in Figure 2. Ordinary The process of escaping the deepest well involves iterative re-initiation of inner loop runs, ultimately leading to the discovery of a nearby metastable state.This method involves updating the initial positions for the simulations within the inner iteration loop.The updates are directed away from the deep potential wells, aligning with the direction of the negative

Initialization Sensitivity
Ordinary molecular simulations often become trapped within deep potential wells, making the simulations sensitive to initial starting points.However, the GradNav algorithm enhances the exploration of energy landscapes, thereby diminishing this sensitivity to the initial setups.
Consequently, gaining holistic insights into the energy surface becomes feasible, irrespective of where the simulation begins.To evaluate this reduced sensitivity, we introduce a metric named the search success initialization ratio (SSIR).This metric is calculated as the ratio of the total number of successful identifications of potential wells across iterations i N i success to the product of the total number of potential wells in the energy surface N wells and the number of initializations N init , across the energy surface: The iteration of initial points is conducted using a grid spacing of 0.8, with each iteration spanning 10,000 frames.The number of successful potential well identifications for each initial point is denoted by distinct colors in Figure 4.For Langevin dynamics simulations, the SSIR values are calculated to be 50% for the Müller potential and 39% for the modified Müller potential energy surfaces.The implementation of the GradNav algorithm leads to a substantial increase in SSIR values, reaching 100% and 94% for the Müller and modified Müller potential energy surfaces, respectively.This significant improvement underscores the GradNav algorithm's ability to lessen the dependency on initial positioning, thereby streamlining the need for comprehensive initialization procedures.

Potential Energy Surface Reconstruction
In specific potential energy surfaces, the distribution of trajectories adheres to the Boltzmann distribution.This distribution is a fundamental concept in statistical mechanics that describes the probability of a system being in a particular state as a function of that state's energy.The Boltzmann distribution can be expressed as follows: 31,32 Here, p i denotes the probability of the system being in state i, ε i represents the energy of state i, k B is the Boltzmann constant, and T symbolizes the thermodynamic temperature.
In the context of Langevin dynamics simulations involving a single particle, the "system" in question refers to this individual particle.
Utilizing this relationship allows for the estimation of the energy based on the probabilities extracted from the particle's trajectory data. 33

Pseudo Molecular Dynamics
To evaluate the GradNav algorithm with a real-world system, we use molecular dynamics trajectories from the Fs-Peptide protein, building upon earlier demonstrations with model systems.For this validation, we employ an existing trajectory dataset in a manner akin to pseudo molecular dynamics simulations, bypassing the execution of actual molecular dynamics simulations.This dataset consists of 28 trajectories, comprising a total of 280,000 frames, illustrating a variety of behaviors and topologies. 28The topology of the Fs-Peptide protein is visualized in Figure 6.While certain trajectories demonstrate protein folding, others lack this characteristic behavior (see Supporting Information S2).Detailed information on the simulation settings and dataset specifics is provided in the Methods section.
The pseudo molecular dynamics process initiates with the selection of a single trajectory, starting from its initial frame.The simulation's progression is emulated by collecting subsequent frames as per the designated number in either an outer or inner loop run.A new potential starting point on the energy surface is then proposed based on the last frame and the GradNav algorithm's update equation.Points within a cutoff radius of 0.02 from the last frame are identified, from which one is randomly chosen as the new starting point.If no points are located within this radius, the closest point is selected instead.Figure S2 in Supporting Information illustrates both the initial points tentatively proposed by the update rule and the actual initial points selected from the dataset.This step is followed by  the continuation of the process for a predetermined number of frames, defined as the frame count for each loop, from this new starting point.
This methodology mirrors the initialization of the starting point based solely on identified collective variables, which serve as axes in the surface.Given that only collective variables are known for the new point, it may not be feasible to specify a unique molecular system solely on these collective variables.Our approach, therefore, involves random initialization based on the updated collective variables, notably the O-C-N angle in the alanine 9 (ALA9) residue and the CB-CD-NE angle in the arginine 20 (ARG20) residue.The random selection from candidate points adheres to the generation of a new initial structure with constraints on these two angles, indicating that the simulation may re-initiate from any structure that aligns with the newly updated angles for ALA9 and ARG20.With the dataset encompassing 280,000 frames across 28 trajectories, it ensures structural diversity.

Validation with Fs-Peptide
We demonstrate the ability to escape deep potential wells and the reduced sensitivity to the initial point using Fs-Peptide trajectories, as shown in Figure 7.A density map in the background, derived from a single trajectory (specifically, number 15 in the dataset), showcases stable protein folding behavior.While this map, originating from one trajectory, may not fully capture the comprehensive energy surface, it offers a preliminary estimation.Likewise, in real-world scenarios, acquiring a complete understanding of the energy surface is often infeasible.The trajectories are analyzed using two collective variables: the angle formed within O-C-N atoms in the ALA9 residue and the CB-CD-NE angle in the ARG20 residue, where the former being correlated to the Fs-Peptide's folding behavior 29 (see Figure 6).The escape from a deep potential well is marked by a notable shift in the ALA9 angle, transitioning from approximately 2.5 rad to about 1.6 rad.On the other hand, the identification of all metastable states in SSIR remains the same as in previous cases, involving probing all density wells within the energy surface.
We assess the performance of GradNav against a specific molecular dynamics trajectory that is unable to escape a region around an ALA9 angle ranging from 2.2 rad to 2.8 rad.
To ensure a fair comparison, GradNav initiates from the same starting point and follows the trajectory of the molecular dynamics simulation.Figures 7 (a total frame count of 10,000, which mirrors the frame count per molecular dynamics trajectory.When relying solely on molecular dynamics simulations, the SSIR is 69%.This rate increases to 95% with the application of GradNav, highlighting its diminished dependence on the initial setup.After obtaining the trajectory across the comprehensive energy surface, the energy curve can be successfully reconstructed.The molecular dynamics trajectory, denoted as MD confined in Figure 8, remains confined within the vicinity of an ALA9 angle of 2.5 rad.In contrast, the molecular dynamics simulation labeled as MD transition demonstrates a transition between two regions of ALA9 angle, exhibiting two peaks in the distribution.Figure 8(b) shows that while the trajectory trapped in a potential well fails to effectively reconstruct the energy estimate curve, the comprehensive trajectory that explores both potential wells enables successful reconstruction.The trajectory derived from GradNav accurately captures both potential wells.However, there is a discrepancy in the potential well depths between the molecular dynamics and GradNav trajectories.Upon identifying all potential wells, their depths can be accurately determined through targeted molecular simulations in those specific regions.
A key focus of molecular dynamics simulations of proteins is understanding their folding behavior, which significantly influences the protein's functional characteristics.illustrates the root mean square deviation (RMSD) of each frame relative to a reference frame in an unfolded state, providing insight into the folding process.For example, a specific molecular dynamics trajectory demonstrates a stable transition towards a folded state, as evidenced by a jump in the RMSD plot.The trajectory produced by the GradNav algorithm exhibits frequent alternations between unfolded and folded states.This behavior is expected, given that GradNav generates its trajectory through multiple simulation restarts, offering a perspective not of a continuous path but rather an exploration across a broad region of the energy surface.This exploration reveals multiple potential folding states for further examination.GradNav enables the identification of various folded and unfolded states of the Fs-Peptide and provides initial conditions for each re-initiation of the simulations.This underscores its utility in probing the protein's conformational landscape, making GradNav an effective tool for gathering candidate conformations for a detailed analysis of folding dynamics.

Model Potential Energy Surface
In this study, we utilize the Müller potential along with its modified versions to define the potential energy V acting on a single particle in the Langevin dynamics simulations.The first model energy surface explored in this study is the widely studied Müller potential, characterized as the sum of four Gaussian functions, defined as follows: To enhance the evaluation of our algorithm across a broader range of scenarios, we introduce a variation to the Müller potential by incorporating an additional term, V add .
This modification introduces a more complex challenge for the simulation in exploring all potential wells, as the additional term creates a deeper valley, further distancing two shallow metastable states.Consequently, for a comprehensive exploration of all minima, the simulation must successfully navigate out of the deep valley in both directions: towards the upper right and the lower left.

Update of Initial Points in Fs-Peptide Trajectories
We assess the implementation of GradNav in Fs-Peptide molecular dynamics simulations through a pseudo molecular dynamics approach.In this method, a new starting point is proposed, followed by the selection of an actual starting point located within a cutoff radius from the newly proposed point.Given that the dataset includes 280,000 trajectory frames, thoroughly covering the molecular space with diverse topologies, we consistently identified points within the cutoff radius, as depicted in Figure S2.
Figure S2: Proposed vs. actual initial points.The proposed initial points are updated using the GradNav algorithm's update rule, while the actual initial points are selected from the trajectory dataset based on a cutoff radius of 0.02.

Figure 1 :
Figure 1: Overview of the GradNav Algorithm.a Illustration of the update rule: the update rate is increased if the centroid of the subsequent trajectory falls within the previously determined boundary.This process repeats until a new potential well is identified.b Flowchart detailing the steps of the GradNav algorithm.

Figure 2 :
Figure 2: Trajectories generated using Langevin Dynamics (LD) and the GradNav algorithm, starting from the deepest valley.

Figure 3 :
Figure 3: Simulation starting point updates.Upper panels display results from Müller potential, while lower panels feature results from modified Müller potential.Panels a and c (left) demonstrate updates from the end of one run to the start of the next, marked by arrows.Panels b and d (right) show how update rates change across inner loop iterations and reset to zero when a new well is found.For clarity, updates from the first 5,000 frames are depicted.

Figure 4 :
Figure 4: Count of successful potential well identifications by initial position.The left panels, a and c, depict the results obtained using Langevin dynamics (LD).In contrast, the right panels, b and d, illustrate the outcomes derived from GradNav.

Figure 5
displays the estimated energy surfaces for both the Müller and modified Müller potentials: Müller potential results are shown in the left panels, while the modified Müller potential results appear in the right panels.The top panels, (a) and (b), display trajectory points on the energy curve extracted from cross-sections, while the cross-section creation is detailed in the inner panels.Panels (c) and (d) feature a density histogram of these points.Probabilities from the histogram are converted into energy estimates via Equation 3, with adjustments made to match the lowest energy state, E 0 .Trajectories from Langevin dynamics simulations in both potential energy surfaces often find themselves localized within deep potential wells, confining the distribution of points to these regions.In contrast, GradNav simulations are capable of exploring all potential wells, offering a comprehensive view of the energy surface.As a result, energy curve estimates derived from Langevin dynamics simulation data only capture a single, deep potential well where the trajectories are trapped.On the other hand, GradNav provides a holistic reconstruction of the energy curve across the entire cross-sections for both potentials.This underscores the necessity for an expansive exploration to accurately estimate the energy surface from observed data.Additionally, capturing trajectories that span various metastable states is essential for effectively training deep learning models tasked with meticulously mapping these trajectories onto the latent space.26

Figure 5 :
Figure 5: Boltzmann distribution of trajectories.The left panels depict results from the Müller potential and the right panels from the modified Müller potential.Panels a and b show trajectories on the energy cross-section.Panels c and d present the distribution histogram of trajectories.Panels e and f illustrate energy estimates derived using the Boltzmann distribution equation.The energy of state i, denoted as ε i , is represented by E(X) within the energy surface.

Figure 6 :
Figure 6: Topology visualization of Fs-Peptide. a Amino acid sequence in Fs-Peptide protein.Unfolded b and folded c states of Fs-Peptide protein with dynamics of ALA9 (stable switch) and ARG20 (unstable switch) amino acids within the protein.

Figure 7 :
Figure 7: Trajectories across the energy surface with initial points and the count of successfully identified metastable states.Panels a and c show the molecular dynamics (MD) trajectory, labeled as number 4 in the dataset, while panels b and d display outcomes from GradNav, starting from the same points as MD.

Figure 8 :
Figure 8: Estimations of energy derived from the distribution of trajectories.Panel a presents a histogram of trajectory distribution.Panel b illustrates the reconstructed energy estimations based on the trajectory histogram.MD confined corresponds to trajectory number 4, and MD transition to trajectory number 15.

Figure 9 :
Figure 9: Comparison of root mean square deviation between MD simulation and GradNav algorithm trajectories, showcasing structural variation over time.MD trajectory here is the same as MD transition .

Table 1 :
Hyperparameters and simulation settings for different systems β(1+vn/k) |∇ρ| ∇ρ.Here, ρ denotes the observation density, β and k serve as hyperparameters, and v n represents the update stride at iteration n, which regulates the progressive intensification of the iteration.γactsas a conditional parameter, becoming one if the centroid lies within the boundary, and zero otherwise, effectively resetting the update rate to zero.The initial point updates are intentionally directed from highly populated regions towards areas with lower population densities, aligning with the direction of the negative observation density gradient (∇ρ).The magnitude of these updates increases linearly with the number of inner loop iterations, representing the attempts to identify other metastable states.If the centroid remains within the initial boundary, the update stride is incrementally increased by one, linearly raising the update magnitude.Upon the centroid successfully moving beyond the boundary, the update step is reset to zero, allowing a new outer loop simulation and exploration cycle to commence.Simulations in both the inner and outer loops proceed over a predetermined number of frames.The hyperparameters β and k govern the increase of the update stride, influencing the linear increase's intercept and slope, respectively.The specific hyperparameters used in each case are detailed in Table1.It should be noted that the hyperparameters have not been optimized for maximal exploration efficiency, as the primary objective of this paper is to illustrate the concept of the newly proposed algorithm.This iterative re-initiation enables the GradNav algorithm to explore the potential energy surface efficiently without imposing any artificial bias on the simulations.Hence, each simulation run reflects the outcomes of the precise physical settings.Additionally, by col-lecting the starting points of each run, it is possible to determine the moment of transition to another metastable state.