Accurate Memory Kernel Extraction from Discretized Time-Series Data

Memory effects emerge as a fundamental consequence of dimensionality reduction when low-dimensional observables are used to describe the dynamics of complex many-body systems. In the context of molecular dynamics (MD) data analysis, accounting for memory effects using the framework of the generalized Langevin equation (GLE) has proven efficient, accurate, and insightful, particularly when working with high-resolution time series data. However, in experimental systems, high-resolution data are often unavailable, raising questions about the impact of the data resolution on the estimated GLE parameters. This study demonstrates that direct memory extraction from time series data remains accurate when the discretization time is below the memory time. To obtain memory functions reliably, even when the discretization time exceeds the memory time, we introduce a Gaussian Process Optimization (GPO) scheme. This scheme minimizes the deviation of discretized two-point correlation functions between time series data and GLE simulations and is able to estimate accurate memory kernels as long as the discretization time stays below the longest time scale in the data, typically the barrier crossing time.


Introduction
A fundamental challenge in natural sciences involves the creation of a simplified yet accurate representation of complex system dynamics using a low-dimensional coordinate.For instance, in spectroscopy, atomic motions are observed solely through the polarization induced by the electromagnetic field, resulting in spectra. 1 In the case of molecules in fluids, the myriad of interactions with the solvent are often reduced to a one-dimensional diffusion process. 2,3In numerous studies, [4][5][6][7][8] the folding of a protein is described by a one-dimensional reaction coordinate.These diverse fields all share the common approach of projecting the complete many-body dynamics of 6N atomic positions and momenta onto a few or even a single reaction coordinate.Starting from the deterministic kinetics of a Hamiltonian system, the projection procedure yields a stochastic description based on the generalized Langevin equation (GLE), [9][10][11] which, in the case of a one-dimensional coordinate x(t) and its corre-sponding velocity v(t), reads where m is the effective mass of the coordinate x.The potential of mean force U (x) is directly available from the equilibrium probability distribution ρ(x) via U (x) = −k B T ln ρ(x), where k B is the Boltzmann constant and T the absolute temperature.As a direct consequence of the dimensionality reduction, non-Markovian effects arise. 12In the GLE, the memory kernel Γ(t) weights the effect of past velocities on the current acceleration.Stochastic effects, represented by the random force F R (t), are linked to the memory function via the fluctuation-dissipation theorem in equilibrium, ⟨F R (0)F R (t)⟩ = k B T Γ(t).When the relaxation of the environment governing Γ(t) is sufficiently fast, Γ(t) approaches a delta kernel, and the Langevin equation emerges from the GLE.Considerable efforts have been dedicated 12 In recent works, the memory function Γ(t) was extracted from time series data of proteins of biological relevance, allowing the non-Markovian description of a protein's folding kinetics in a non-linear folding landscape.Memory effects were found to be highly relevant, both in model systems 13 and real proteins. 14,15Multiple methods exist to extract memory functions from MD data.7][18][19] While Volterra equations offer good accuracy when high-quality time-series data is available, it is unclear if they remain efficient when the observations of the system are sampled with long discretization times.A recent research endeavour used an iterative scheme to approximate the memory kernel by adapting a trial kernel with a heuristic update based on the velocity autocorrelation function. 20Another work parametrized memory kernels by fitting correlation functions to an analytical solution of the GLE. 21In order to include the long-time scales of the system dynamics, the fit extended to both the twopoint correlation and its running integral.Both methods share the limitation of not being applicable to a non-linear potential energy function U (x).A recent paper not suffering from such a limitation used a maximum-likelihood model to estimate the GLE parameters that best fit the given MD data. 22In a different work on polymer solutions, star polymers were coarse-grained to single beads interacting via a non-linear U (x).A GLE system was set up to mimic the star polymers' kinetics.The simulation parameters of the GLE system were iteratively changed using Gaussian Process Optimization (GPO) such that the coarse-grained and MD velocity autocorrelations were most similar. 23The same idea was used to estimate a joint memory kernel over multiple temperatures. 24re, we consider the effects of temporal discretization, motivated by the fact that data is always discretized.For MD simulations, archived data often only contains the atomic positions at time intervals of hundreds of picoseconds to nanoseconds, as in the case of the data from the Anton supercomputer. 25When considering experimental data, measurement devices limit the time step of the observations, typically at the microsecond scale. 26,27 a prior publication, discretization effects were examined within the framework of datadriven GLE analysis.The GLE, without a potential, was solved analytically.To deal with discretization effects, the discretized mean-squared displacement and velocity autocorrelation functions were computed, allowing for the direct fitting of the memory kernel. 28The present work investigates how a GLE with a non-harmonic potential can be parametrized given discretized data by considering a highly non-linear molecular dynamics test system.
The Volterra-based approach is shown to be remarkably resilient to time discretizations.
Where the Volterra approach ceases to function, we demonstrate that Gaussian Process Optimization is a suitable method to obtain memory kernels from discrete time series data.In matching correlation functions computed from subsampled data, we present a method to deal with the discretization effects and extend the GLE analysis to non-linear data at higher discretizations.The choice of correlation functions involves some flexibility, demonstrating the broad applicability of our approach.For a small alanine homopeptide used as a test system, the Volterra method is suitable for discretization times that reach the memory time of about 1 ns.In comparison, the GPO method extends the range to discretization times up to the folding time of about 58 ns.

Results and Discussion
We investigate the effect of data discretization starting from a 10-µs-long MD trajectory of alanine nonapeptide (Ala 9 ) in water, which was established as a sensitive test system for non-Markovian effects in our previous work. 14As in our original analysis, the formation of the α-helix in Ala 9 is measured by the mean distance between the H-bond acceptor oxygen of residue n and the donor nitrogen of residue n + 4, In the α-helical state, x has a value of approximately 0.3 nm, the mean H-bond length between nitrogen and oxygen.The potential of mean force U (x) in Fig. 1C

Volterra Equations
To extract memory kernels from time-series data, the GLE in Eq. 1 is multiplied with v(0) and averaged over space or time.By using the relation ⟨F R (t)v(0)⟩ = 0, 9,10 one obtains the Volterra equation 14,19 where C vv (t) is the velocity autocorrelation function and C ∇U v (t) is the correlation between the gradient of the potential and the velocity.By integrating Eq. 3 from 0 to t, we derive a Volterra equation involving the running integral over the kernel G(t) = t 0 dsΓ(s) and insert mC vv (0) = C ∇U x (0) 14 to obtain with C ∇U x (t) being the correlation between the gradient of the potential and the position.
Computing the memory kernel directly from Eq. 3 is possible 29,30 but prone to instabilities. 17tracting G(t) using Eq. 4 and computing Γ(t) via a numerical derivative improves the numerical stability. 17,31The discretization and solution of Eq. 4 is discussed in section of the Supporting Information.We fit Γ(t) extracted from the full-resolution data at ∆t = 1 fs using least-squares to a multiexponential of the form The fitted memory times τ i and friction coefficients γ i are presented in Table 1.The fitting involves both Γ(t) and G(t), as elaborated in the Methods section and accurately captures the MD kinetics, similar to our previous work. 14In order to estimate the impact of the non-Markovian effects on the kinetics, we turn to a heuristic formula for the mean-first passage times τ MFP of a particle in a double well in the presence of exponential memory functions. 13,32,33Validated by extensive simulations, the heuristic formula accurately described the non-Markovian effects occurring in the folding of various proteins. 15For a single exponential memory function, the heuristic formula identifies three different regimes by com-paring the single memory time τ to the diffusion time scale τ D = γ tot L 2 /k B T , which is the time it takes a free Brownian particle to diffuse over a length L in the reaction coordinate space.The first regime is the Markovian limit, where τ ≪ τ D and non-Markovian effects are negligible.The second regime is a non-Markovian regime where τ D /100 ≲ τ ≲ 10τ D , in which a speed-up of τ MFP compared to the Markovian description is observed.The third regime occurs when τ ≳ 10τ D , where τ MFP is slowed down compared to the Markovian description due to the non-Markovian memory effects.
To compute τ D , we take L = 0.22 nm, the distance between the folded state at x = 0.32 nm and the barrier at x = 0.54 nm, the total friction γ tot = 5 i=1 γ i and obtain τ D = 6.8 ns.The τ i values in Table 1 span times from τ 1 = 7 fs ≪ τ D up to τ 5 = 5.7 ns ≈ τ D .In a previous work, 15 τ mem = ∞ 0 ds sΓ(s)/ ∞ 0 ds Γ(s), the first moment of the memory time of the kernel was proposed as the characteristic time scale for a multi-scale memory kernel.For the memory kernel in Table 1, we find τ mem = 1 ns, correctly predicting the non-Markovian speed-up of τ MFP that a previous study demonstrated for Ala 9 . 14In this work, we will establish τ mem as the limit for the discretization time ∆t beyond which the Volterra method ceases to produce accurate results.
In the following, the full-resolution kernel obtained for a time step of ∆t = 1 fs will serve as a reference for results using a higher ∆t.Comparing the extracted G(t) with the corresponding fit according to Eq. 5 (red line) in Fig. 2F shows no significant differences in the long Table 1: Fitted memory function parameters for ∆t = 1 fs according to Eq. 5.The fits for ∆t > 1 fs are shown in section in the Supporting Information.2C shows oscillations of the extracted Γ(t) for t < 1 ps, which are discarded by the exponential fit.As we will show later, they do not play a role in the kinetics.For both Γ(t) in Fig. 2B and C vv (t) in Fig. 2A, the oscillations disappear for ∆t ≥ 0.1 ps, indicating that they are caused by sub-picosecond molecular vibrations.Moreover, the value of Γ(t) for t < 1 ps is consistently attenuated as ∆t increases, mirroring the same trend observed in C vv (0), as illustrated in the inset of Fig. 2A.In contrast, Γ(t) for t > 1 ps in the inset of Fig. 2B shows an exponential decay that is well preserved for all ∆t < 1 ns.The running integral G(t) in Fig. 2D stays mostly unchanged for discretizations smaller than ∆t < 1 ns.This demonstrates that the Volterra extraction scheme is accurate for discretization times below the mean memory time, i.e. for ∆t < τ mem = 1 ns.
The multiexponential kernel in Eq. 5 allows for the efficient numerical simulation of the GLE by setting up a Langevin equation where the reaction coordinate x is coupled harmonically to one overdamped, auxiliary variable per exponential component 10,34 (see section in the Supporting Information).Utilizing this simulation technique, Fig. 2G compares profiles for the mean-first passage times τ MFP originating from both the folded and unfolded states.For ∆t ≤ 10 ps, the τ MFP values obtained from the GLE simulations closely align with those derived from MD simulations, thereby manifesting the precise correspondence between the non-Markovian GLE description and the kinetics observed in the MD simulation.In Fig. 2E, we present the asymptotic limit lim t→∞ G(t), representing the total friction coefficient γ tot of the system, estimated by summing the individual γ i values obtained from the exponential fits.When ∆t ≥ 1 ns, we find that G(t) does not show a plateau value in the long-time limit.Consequently, this leads to a notable discrepancy between the τ MFP profiles presented in Fig. 2G and their MD counterparts for ∆t ≥ 1 ns.Combining the information provided in Fig. 2D, E, and G, it becomes evident that the extracted profile of G(t), the total friction γ tot , and folding times τ MFP all deviate significantly from the MD reference data when the discretization time approaches the memory time τ mem .As a result, we assert that the Volterra extraction becomes inadequate when the discretization time exceeds the memory  A Velocity autocorrelation C vv (t).B Memory kernel Γ(t), from numerical differentiation of G(t).C Multi-exponential fit of Γ(t) computed for ∆t = 1 fs (grey) compared to the numerical data (dark red).The fitted parameters are shown in Table 1 and 2. D Running integral over the memory kernel G(t).E Total friction γ tot , computed from the exponential fits of the kernels.The vertical grey line indicates grey) computed at ∆t = 1 fs compared to numerical data (dark red).G Comparison of the mean-first passage times τ MFP computed from the MD data to τ MFP obtained from GLE simulations using kernels extracted at different ∆t.time τ mem .To visualize the Gaussian Process Optimization (GPO), we plot the mean of observables over the ten best optimization runs.We compare GPO results using the loss L v (blue), based on C vv (t), L x (orange), based on the autocorrelation of the position x(t) = x(t) − ⟨x⟩ and L vx , a linear combination of L v and L x (green).For ∆t = 2 ps, we compare the observables

Gaussian Process Optimization
to the MD reference (black, broken line).Equally, for ∆t = 10 ns, we show The kernels in D, E, I, and J, parametrized by Eq 5, are plotted time-continuously.
So far, we have demonstrated that the Volterra equation can be used to extract a consistent memory kernel for a wide range of discretization times up to ∆t ≈ τ mem .The resulting GLE system captures the underlying kinetics faithfully when judged by τ MFP for discretizations below the memory time scale τ mem but fails when exceeding it.Given that the discretization time may exceed the dominant memory time scale in typical experimental settings, an improved method is clearly desirable.We describe a scheme that allows the extraction of Γ(t) for ∆t that significantly exceeds τ mem .For this, we use a matching scheme between the discretized time correlation functions of the MD reference system C MD (n∆t) and of the GLE C GLE (n∆t, θ) via the mean-squared loss The type of correlation function will be specified later.The loss is evaluated over N samples, where N is determined based on the decay time of the correlation (see Table 4).In an iterative optimization, the friction and memory time parameters in Eq. 5 that serve as the GLE parameters θ = (γ 1 , τ 1 , . . ., γ 5 , τ 5 ) are updated, and the GLE is integrated using a simulation time step δt chosen small enough that discretization effects are negligible.For the sake of comparability, we maintain a constant mass value of m = 31.4u, derived using the equipartition theorem according to m = k B T /⟨v 2 ⟩, from the MD data.In fact, the precise value of m has no significant influence on the method's outcome since it can be accommodated within the kernel.Furthermore, the system's inertial time τ m = m/γ tot = 0.09 fs is markedly shorter than all other relevant time scales, leading to an overdamped system in which the mass value is irrelevant.To find the best θ, the choice of the optimizer is crucial.
The loss L, defined in Eq. 6, is inherently noisy due to the stochastic integration of the GLE and possesses, in general, many local minima in a high-dimensional space.Faced with such a task, common gradient-based or simplex methods fail. 35,368][39] Given the computational cost of a converged GLE simulation, we choose Gaussian Process Optimization (GPO) [40][41][42] as a method to minimize L. GPO builds a surrogate model of the real loss L that incorporates noise [43][44][45] and allows for non-local search 46,47 (see section in the Supporting Information).
[50] In principle, any correlation function can serve as an optimization target.Fig. 2A shows that Figure 4: A The total friction γ tot = 5 i=1 γ i obtained via the Volterra scheme (orange) is constant for discretizations of ∆t < 1 ns.For ∆t higher than the memory time τ mem = 1 ns, it decreases until the extraction fails.Gaussian process optimization (GPO, blue) estimates the correct friction for much higher ∆t.The horizontal grey line shows γ MD tot , the total friction from the MD data.B The folding and unfolding mean-first passage times from the GLE simulations of kernels extracted at different discretizations, given by the mean time it takes the system to get from x = 0.32 nm to x = 0.98 nm and back.The MD folding times, τ MD fold = 58 ns and τ MD unfold = 26 ns are indicated as horizontal grey lines, τ mem = 1 ns and τ MD fold as vertical grey lines.The GPO estimates the correct folding and unfolding times up to ∆t ≈ τ MD fold , significantly higher than the Volterra scheme.
the velocity autocorrelation function C vv (t) decays to zero after about 1 ps, while Fig. 3B and G show that C xx (t), the autocorrelation of the position x(t) = x(t) − ⟨x⟩, decays over about 50 ns.With such a difference in the decay times of the two correlations, we define two losses based on Eq. 6, L v , using C vv (t), and L x , using C xx (t), anticipating that the two correlations probe different scales of the dynamics.Furthermore, we define a linear combination of L v and L x , to test if including both correlations in the loss function improves the quality of the GLE parameters.The parameter α is selected for each ∆t to achieve a balanced weighting between the two losses and is tabulated in Table 4  Fig. 4 provides a comparison of the performance of the Volterra and GPO approaches across various discretizations.This comparison focuses on the overall friction, folding, and unfolding mean-first passage times, as these observables are not included in the GPO optimization process.As shown in the previous section, the applicability of the Volterra method is limited to discretizations below the memory time τ mem = 1 ns.Extraordinarily, the GPO approach can surpass the boundary set by the memory time and estimates folding times with good accuracy for discretizations up to ∆t = 40 ns.This limit roughly corresponds to the mean time it takes the system to fold, τ MD fold = 58 ns, which is given by the mean-first passage time from the unfolded state at x = 0.98 nm to the folded state at x = 0.32 nm.For the highest discretization time tested, ∆t = 240 ns, the GP optimization still finds meaningful folding times while underestimating the total friction.

Conclusions
We investigate the effect time discretization of the input data has on the memory extraction.
As a specific example, we consider MD time-series data of the polypeptide Ala 9 .Computing a memory kernel via the inversion of the Volterra equation 4 requires the velocity autocorrelation and potential gradient-position correlation function.The velocity autocorrelation changes significantly as a result of increasing time discretization, and with it, a surrogate kernel is obtained that differs from the full-resolution kernel.Our key finding is that given a discretization time lower than the characteristic memory time, the Volterra approach is still able to compute a kernel that reproduces the kinetics of the MD system.Here, we define the characteristic memory time τ mem via the first moment of the memory kernel, taking into account all decay times of the kernel and finding 1 ns for Ala 9 .By extracting the memory kernel from MD trajectories at different discretizations, we show that the Volterra approach is able to reproduce the kinetics when the discretization time ∆t is below τ mem .
To also cover the important regime when ∆t > τ mem , we introduce a Gaussian Process Optimization scheme based on matching discretized time correlation functions of the reference and the GLE system.We test losses based on the velocity and position autocorrelation functions, for which GPO obtains memory kernels very similar to the Volterra scheme and is able to reproduce the reaction-coordinate dynamics and the folding times.
We demonstrate the effectiveness of GPO for discretization times up to the folding time of τ MD fold = 58 ns, about 50 times higher than the highest discretization for which the Volterra approach is applicable.As elaborated in previous works, [13][14][15] memory can affect the kinetics of protein barrier crossing on time scales far exceeding the memory time, up to the longest time scale of the system.Therefore, the presented GPO approach is expected to extend the applicability of the non-Markovian analysis to a wide range of discretized systems not suitable for the Volterra method.
In fact, the GPO analysis is not limited to data from MD simulations but can be used whenever encountering highly discretized experimental data.The application to data from single-molecule experiments [51][52][53] is a promising venue for future research.

Methods
The MD simulation data is taken from our previous publication, see 14 for details.The MD simulation has a simulation time step of δt = 1 fs, while all GLE simulations use a time step of δt = 2 fs.In the computation of the hb4 coordinate (Eq.2), the distances are computed between the oxygens of Ala2, Ala3, Ala4 and the nitrogens of Ala6, Ala7, Ala8, where Ala1 is the alanine residue at the N-terminus of the polypeptide of Ala 9 .
All analysis code is written in Python 54 or Rust. 55Table 4 shows the weights α for the loss L vx , which includes C vv (t) and C xx (t).The memory kernels are fitted using the differential evolution algorithm implemented in the Python package 'scipy' 56 by minimizing a mean-squared loss, including both the kernel and the running integral over the kernel, , where L Γ is the mean-squared loss of the kernel and L G is the meansquared loss of the running integral of the kernel.The resulting kernels and values for α mem are shown in Table 2.
The GPO is performed using the 'GaussianProcessRegressor' implemented in the Python package 'scikit-learn', 57 using ten optimizer restarts.When computing the loss L, the correlation functions are evaluated over a finite number of sample points, N , always beginning with t = 0.The number of sample points N is given in table 4. To minimize the expected improvement in Eq. 18 or maximize the standard deviation in Eq. 19, we use the 'L-BFGS-B'

Supporting Information Subsampling of the Potenital
For the computation of the correlation between the gradient of the potential and the position C ∇U x (t), we estimate the density ρ(x) from the trajectory via a histogram and compute the potential using U (x) = −k B T ln ρ(x).Section explains how to compute C ∇U x (t) from U (x).

Discretization of the Volterra Equation
Starting from Eq. 4, we discretize all functions of time and use the trapezoidal rule for the integral, obtaining 14 Regarding the computation of C vv (t), Section tests different numerical differentiation schemes to compute the velocities v(t) from the positions x(t).

Fits of the Memory Kernels
We fit the memory functions for different discretizations ∆t according to the five-component exponential fit in Eq. 5. Table 2 shows the resulting memory parameters (γ 1 , τ 1 , . . .γ 5 , τ 5 ).
The fits (coloured) are compared to the memory kernels (grey) calculated using Eq.7 in Fig. 6.
At discretizations of ∆t ≥ 8 ns, which is higher than the ones considered in this paper, the Volterra method experiences a complete breakdown, signified by the occurrence of negative values for G(t > 0).As negative frictions are unphysical, we do not show the corresponding G(t) in Fig. 6, nor in the main text.We plot the fits of the memory kernels shown in Table 2.The coloured lines compare the fit and the running integral over it to the numerical data being fitted (grey lines).
Table 2: We show all memory times τ i and the corresponding friction coefficients γ i for the five-component exponential memory kernel fit following Eq. 5 of the MD of Ala 9 subsampled at different time steps ∆t.The fits are plotted in Fig. 6.We also show the total friction γ tot = 5 i=1 γ i .The weights α mem are chosen such that the mean-squared loss of the kernel and the running integral are approximately of the same order.Values are shown in units of ps for τ i and ∆t, u/ps for γ i and atomic mass units u for m.

Markovian Embedding
The generalized Langevin equation can be simulated by the following set of coupled Langevin equations where the random forces η i fulfill the fluctuation-dissipation theorem ).The form of Γ(t) is given by Solving for y i (t) in Eq. 10 and inserting into Eq.9 yields the solution which is identical to Eq. 8 with a random force which satisfies the fluctuation-dissipation . The mass m is computed via the equipartition theorem
B lower 0.9 5 10 30 30 10 10 10 10 10 GPO is a widely known method, which will be briefly discussed here.For a more thorough explanation, the reader is referred to Ref. 40 The quality of a set of GLE parameters θ = (γ , τ 1 , . . ., γ 5 , τ 5 ) over the parameter space given in Table 3 is characterized by the loss L in Eq. 6 via a GLE simulation.To enhance the efficiency of the GPO for a wide range of the input parameters θ, we use a logarithmic embedding θ = (log 10 γ 1 , log 10 τ 1 , . . ., log 10 γ i , log 10 τ i ).
We assume that for a set of n simulation parameters Θ = [ θ1 , . . ., θn ] T with results L( Θ) = [L( θ1 ), . . ., L( θn )] T , the loss follows a Gaussian distribution: where N (µ, σ) is a multivariate normal distribution with mean vector µ and covariance matrix σ.The process generating the set Θ is a Gaussian process with the mean function µ( θi ) and the covariance function k( θi , θj ).The mean function is set to the scalar value where N is the dimension of Θ.To better model data containing noisy samples, a scalar noise parameter σ is added to the diagonal elements of the covariance matrix where I is the identity matrix.The complexity of the loss function the process can represent is determined by the choice of k( θi , θj ).For k( θi , θj ), we choose the radial basis function where ||•|| is the Euclidean distance.To determine the parameters s, l and c, the negative loglikelihood in Eq. 14 of the observed loss log L( Θ) is minimized, i.e. arg min s,l,c −log p(L( Θ)).
The noise level σ is kept constant at 0.005.
To predict the loss value at samples not observed so far, the data vector Θ is split into the observed samples Θ (where L( Θ) is available) and unobserved samples Θ * .The resulting covariance matrix has four blocks, the covariance of Θ and Θ * with themselves and each other.The distribution can be conditioned on the samples observed Θ and their losses L( Θ), yielding a normal distribution over L( Θ * ) with mean and standard deviation To propose new samples, the loss space is explored or exploited.When exploiting, we maximize the expected improvement where L best is the best loss observed so far, ϕ is the probability density function and Φ the cumulative distribution function of the standard normal distribution.To encourage sampling around the θ with the best loss, we smooth the expected improvement by adding the parameter ξ = 0.05.
When exploiting, we maximize the standard deviation of the Gaussian Process Σ L( θi ) in Eq. 19.In each GPO run, we begin with five samples drawn uniformly over the sample space given by the bounds in Table 3, then we explore for 25 iterations, followed by 270 iterations where we alternatingly explore and exploit.
Table 4 shows all results of the GP optimizations that are shown in Fig. 3  We show the sum of the friction, γ tot = 5 i=1 γ i , the mean first passage time from x = 0.98 nm to x = 0.32 nm, τ fold , the mean first passage time from x = 0.32 nm to x = 0.98 nm, τ unfold , and the weights α for L vx with L vx = αL v + L x .The folding and unfolding times are not comparable when computed from the data at different discretizations, so we compare τ MFP computed from a GLE simulation trajectory at the full resolution of ∆t = 2 fs.As in the main text, τ fold and τ unfold are computed via a mean over the ten best GPO runs.α is determined such that the magnitude of L v and L x is roughly equal.The table also shows the number of samples N v and N x used for the GPO losses, as given in Eq. 6.Comparison of τ MFP starting from the folded state at x S = 0.32 nm and the unfolded state at x S = 0.98 nm as a function of the endpoint of τ MFP , x F , for GPO runs using L v , L x and L vx at ∆t = 2 ps and ∆t = 10 ns.We show the mean of the ten best GPO runs (coloured lines) and the standard deviation over the ten best runs (shaded area).Here, τ MFP is computed from full-time resolution data, i.e. ∆t = 2 fs.The MD reference is indicated by the black-broken line.

Interpolation of the Potenital
To compute the correlation between the gradient of the potential and the position C ∇U x (t), we first estimate the probability ρ(x) via a histogram using n bins .Next, we compute the potential U (x) = −k B T ln ρ(x).To compute a trajectory ∇U [x(t)], we compute a numerical gradient of U (x) and interpolate it such that it can be evaluated at every frame x(t).Finally, we correlate the trajectories ∇U [x(t)] and x(t), yielding C ∇U x (t).Here, we compare how C ∇U x (t) and G(t) are influenced by the choice of n bins .We also compare a linear to a cubic spline interpolation.Fig. 8B and F show that for both interpolations when choosing a low n bins , the mean of ∇U [x(t)], ⟨∇U ⟩, is far from zero.This is in contradiction to the fact that the simulation is stationary.With ⟨∇U ⟩ far from zero, both C ∇U x (t) and G(t) depend on the value of n bins .When n bins is increased, ⟨∇U ⟩ converges to zero, and both interpolations agree on the shape of C ∇U x (t) in Fig. 8A and E, G(t) in Fig. 8D and H, and Γ(t) in Fig. 8C and G, independent of n bins .This demonstrates that a consistent interpolation is possible.
For the main text and in Appendix section , we use a linear interpolation with n bins = 600.We compare how the correlation between the gradient of the potential and the position C ∇U x (t), the kernel Γ(t) and the running integral of the kernel G(t) depend on the number of interpolation points n bins and interpolation type of the potential.The dependence on n bins vanishes when ⟨∇U ⟩ is close to zero.We show C ∇U x (t) (A), ⟨∇U ⟩ (B), Γ(t) (C) and G(t) (D) for different n bins using a linear interpolation.We show the same for a cubic spline interpolation, i.e.C ∇U x (t) (E), ⟨∇U ⟩ (F), Γ(t) (G) and G(t) (H).In D and H, the black, dashed line indicates γ tot = i γ i from Table 1.

Discretized Velocity
To compute the velocities v(t) from the positions x(t), we compare two different differentiation schemes.We define a central gradient of order n: Second, we define smoothed positions x(t), where we smooth the data by taking the running average over two adjacent points: We define smoothed positions of order m, where we apply the smoothing m times The smoothed gradient of order m is defined by smoothing the position m times and then applying the central gradient of order 1: We note that the smoothed velocities of first-order equal the central gradient of second-order ṽ(m=1 Figures 9 and 10 show that the integrals over the kernel G(t) that oscillate the least are obtained by the central differences of first order v (n=1) i . Accordingly, we use v (n=1) i in the main text.In all cases, we compute the kernel Γ(t) from G(t) via a numerical gradient using central differences of second order, which alleviates the oscillations at low ∆t.

Figure 1 :
Figure 1: I-III Representative snapshots for different values of the mean hydrogen-bond distance reaction coordinate of Ala 9 , x, defined in Eq. 2. A Multiple folding and unfolding events occur within a 450 ns trajectory segment.B A single folding event.The orange circles indicate the time series discretized at ∆t = 1 ns.C The potential landscape U (x) for Ala 9 , computed from the trajectory at full resolution.The folded state (I) forms a sharp minimum at x = 0.32 nm.A local minimum is found at x = 0.62 nm (II).The unfolded state forms a broad minimum around x = 1.0 nm (III).
displays several metastable states along the folding landscape.Fig.1Ashows a 450 ns long trajectory.To test how time discretization affects the memory extraction, frames of the trajectory are left out to achieve an effective discretization time step ∆t.Such a discretized trajectory (orange data points, for ∆t = 1 ns) is compared in Fig.1Bto the time series at full resolution.The potential U (x) is always estimated from a histogram of the entire data to separate time discretization from effects arising due to the undersampling of the potential (see section in the Supporting Information).

Figure 2 :
Figure2: Memory extraction by the inversion of the Volterra equation 4 for different discretization times ∆t, using MD data of Ala 9 .A Velocity autocorrelation C vv (t).B Memory kernel Γ(t), from numerical differentiation of G(t).C Multi-exponential fit of Γ(t) computed for ∆t = 1 fs (grey) compared to the numerical data (dark red).The fitted parameters are shown in Table1 and 2. D Running integral over the memory kernel G(t).E Total friction γ tot , computed from the exponential fits of the kernels.The vertical grey line indicates τ mem =

Figure 3 :
Figure3: To visualize the Gaussian Process Optimization (GPO), we plot the mean of observables over the ten best optimization runs.We compare GPO results using the loss L v (blue), based on C vv (t), L x (orange), based on the autocorrelation of the position x(t) = x(t) − ⟨x⟩ and L vx , a linear combination of L v and L x (green).For ∆t = 2 ps, we compare the observables A C vv (t), B C xx (t), C τ MFP , D Γ(t) = Γ(t)/Γ(0) and E G(t) = G(t)/G(0) to the MD reference (black, broken line).Equally, for ∆t = 10 ns, we show F C vv (t), G C xx (t), H τ MFP , I Γ(t) and J G(t).The kernels in D, E, I, and J, parametrized by Eq 5, are plotted time-continuously.
in the Supporting Information.For every GP optimization, 300 different θ values are evaluated via a 18-µs-long GLE simulation each.The ten θ samples with the lowest loss form the basis for the following analysis.When optimizing the loss function L v with a discretization of ∆t = 2 ps, Fig.3Aillustrates that L v (blue) accurately replicates the MD reference for C vv (t), whereas L x (orange) exhibits discrepancies.Conversely, in Fig.3B, Lx perfectly reproduces C xx (t), while L v struggles to do so.Remarkably, the combined loss function L vx (green) successfully aligns with both reference correlations simultaneously.To evaluate the quality of the GLE parameters, Fig.3Cprovides a comparison of the mean first-passage times τ MFP between GLE results from GPO solutions and the MD reference.We calculate τ MFP for GPO-based GLE simulations and the MD reference using identical discretizations ∆t.Notably, we observe that L v fails to align with the MD reference, whereas both L x and L vx exhibit consistency with it.This outcome underscores the insufficiency of C vv (t) in capturing the complete kinetics of the barrier crossing.A comparison of τ MFP between L vx and L x reveals a slightly better correspondence to the MD reference for L vx , signifying that the inclusion of C vv (t) improves the optimization.Examining the obtained kernels in Fig.3D-E, all loss functions yield kernels that largely conform to the exponential fit of the MD reference but exclude the first memory component with a decay time of approximately τ 1 ≈ 7 fs.Both L x and L vx correctly identify the plateau of G(t), while L v underestimates it, leading to an underestimated τ MFP .Next, we evaluate the performance of the GPO for discretization times exceeding τ mem .In Fig.3F-J, we show the results for ∆t = 10 ns, demonstrating that the GPO approach yields similar results for all losses.The discretized C vv (t), C xx (t) and τ MFP are in perfect agreement with the MD reference.The kernels agree for all but the lowest times.To confirm that the increased discretization used for the τ MFP computation does not introduce any bias into the results, we perform an additional comparison of τ MFP computed at the full-time resolution of ∆t = 2 fs (see Fig.7in the Supporting Information).
Fig 5 shows how the estimated U (x) would behave if frames were left out to achieve an effective time step ∆t.In the main text and all further sections, the potential U (x) is always estimated from a histogram of the entire data.

Figure 5 :
Figure 5: When generating a trajectory at an effective time step ∆t from the 10 µs trajectory of Ala 9 , the estimated potential visibly changes for ∆t ≥ 100 ps.We show the potential for all ∆t used in the main text.All memory extractions in the main text use the potential at full resolution, i.e. ∆t = 1 fs.

Figure 6 :
Figure6: Memory extraction by the inversion of the Volterra equation 4 for different discretization times ∆t, using MD data of Ala 9 .We plot the fits of the memory kernels shown in Table2.The coloured lines compare the fit and the running integral over it to the numerical data being fitted (grey lines).
−03 9.6e−03 9.9e−02 1.7e+00 1.5e+01 and 4 and discussed in the main text.Fig. 7 repeats the mean-first passage time calculations from Fig 3, while computing τ MFP not using the discretization time ∆t of the GP optimization but the full resolution of the GLE simulation, i.e. ∆t = 2 fs Table 4: GPO results from Fig 3 and Fig 4.

Figure 7 :
Figure7: Comparison of τ MFP starting from the folded state at x S = 0.32 nm and the unfolded state at x S = 0.98 nm as a function of the endpoint of τ MFP , x F , for GPO runs using L v , L x and L vx at ∆t = 2 ps and ∆t = 10 ns.We show the mean of the ten best GPO runs (coloured lines) and the standard deviation over the ten best runs (shaded area).Here, τ MFP is computed from full-time resolution data, i.e. ∆t = 2 fs.The MD reference is indicated by the black-broken line.

Figure 8 :
Figure 8: Memory extraction by the of the Volterra equation 4 using MD data of Ala 9 .We compare how the correlation between the gradient of the potential and the position C ∇U x (t), the kernel Γ(t) and the running integral of the kernel G(t) depend on the number of interpolation points n bins and interpolation type of the potential.The dependence on n bins vanishes when ⟨∇U ⟩ is close to zero.We show C ∇U x (t) (A), ⟨∇U ⟩ (B), Γ(t) (C) and G(t) (D) for different n bins using a linear interpolation.We show the same for a cubic spline interpolation, i.e.C ∇U x (t) (E), ⟨∇U ⟩ (F), Γ(t) (G) and G(t) (H).In D and H, the black, dashed line indicates γ tot = i γ i from Table1.

Figure 9 :
Figure 9: Memory extraction by the inversion of the Volterra equation 4 for different discretization times ∆t, using MD data of Ala 9 .Comparison of the memory kernels for central differences (Equation 21) of orders n = 1 to n = 3.The left column shows the velocity autocorrelation function C vv (t).The centre column shows the kernel Γ(t).The right column shows the running integral over the kernel G(t).

Figure 10 :
Figure 10: Memory extraction by the inversion of the Volterra equation 4 for different discretization times ∆t, using MD data of Ala 9 .Comparison of the memory kernels for smoothed differences (Equation24) of orders m = 2, 3.The left column shows the velocity autocorrelation function C vv (t).The centre column shows the kernel Γ(t).The right column shows the running integral over the kernel G(t).We do not show order m = 1, as it is equivalent to the central gradient of order n = 2 shown in figure Fig.9.(see Eq. 25).