Kernel-Based Machine Learning for Efficient Simulations of Molecular Liquids

Current machine learning (ML) models aimed at learning force fields are plagued by their high computational cost at every integration time step. We describe a number of practical and computationally efficient strategies to parametrize traditional force fields for molecular liquids from ML: the particle decomposition ansatz to two- and three-body force fields, the use of kernel-based ML models that incorporate physical symmetries, the incorporation of switching functions close to the cutoff, and the use of covariant meshing to boost the training set size. Results are presented for model molecular liquids: pairwise Lennard-Jones, three-body Stillinger–Weber, and bottom-up coarse-graining of water. Here, covariant meshing proves to be an efficient strategy to learn canonically averaged instantaneous forces. We show that molecular dynamics simulations with tabulated two- and three-body ML potentials are computationally efficient and recover two- and three-body distribution functions. Many-body representations, decomposition, and kernel regression schemes are all implemented in the open-source software package VOTCA.

Triplets are aligned in a plane: we rotate a triplet configuration m such that r m ab is oriented along the z axis with r m ab,z > 0 and the r mac vector lies in the positive half plane with r macy > 0 where r m ab is chosen as the longest of the two pair distances connecting the reference particle a, as shown in Figure S1.
FIG. S1. Illustration of the rotation of one triplet m into the yz plane, such that the longer of the two pair distances, rm ab , is oriented along the z-axis with rm ab,z > 0, while rm ac lies in the positive half plane with rm acy > 0. The x axis comes out of the page.

B. Hessian of the scalar kernel
As mentioned in the main text, we state here the general form of a Hessian kernel for triplets for any descriptor q m . The general structure of a Hessian kernel is: For a scalar kernel of the form the second derivatives lead to the following form: where δ α,β is the Kronecker delta and α and β denote the components of the descriptors q m and q l .
The gradient of descriptor component q m,α with respect to the position vector of the central particle o of the sample (which can be either particle a, b, or c of triplet m, see Fig.1  wherer m ab = (δ mo,m b −δ mo,ma )r m ab /r m ab . Note that the Kronecker deltas only change the sign of the kernel element, depending on whether the central particle of the sample, o, coincides with particle a or b. Due to δ α,β , Eq. (S1) reduces to: Inserting Eq. (S4) then leads to: This is the form of a Hessian kernel with general descriptors q m and q l .

C. Conservative rotation kernel
As stated in the main text, one can enforce conservativeness of the predicted force of the explicit rotation kernel. This is done by explicitly imposing the structure of the Hessian kernel in the fixed reference frame of the rotated triplets (see Fig. S1).
This means that the 3 × 3 dimensional local kernel matrixk ml instead of being simply the scalar Gaussian kernel k ml (q m , q l ) times unit 3 × 3 (see equation (18) of the main text) now has a more complicated structure: where cos θ l = [(r m ab · r mac ) / (r m ab r mac )] is the angle between the two interparticle vectors r m ab and r mac . This expression can be obtained by substituting the descriptor q (3) m = (r m ab , r mac , r m bc ) into Eq. S6.

S2. TWO-BODY LENNARD-JONES FLUID
Fluid simulations with only two-body interactions are run using a Lennard-Jones (LJ) potential where LJ = 0.15535 kcal/mol and σ LJ = 3.166 Å denote the well depth and particle size, respectively. M is the total number of all pairs in the system with a pair distance r m ab smaller than the cutoff r cut = 6.0 Å. The cubic simulation box of length L box = 14.39 Å contains 100 particles. This implies a particle density of 0.3356 nm −3 . Periodic boundary conditions are applied. We use a time step of ∆t = 2 fs, and a chain of 3 Nosé-Hoover thermostats 1,2 with a damping parameter of 200 fs. 3 After an equilibration of 2 ns, the atom positions and forces are recorded every 2 ps within a 20 ns production run. The recorded data contains 10, 000 frames of 100 atoms each, totaling 1, 000, 000 configurations. A 10-fold splitting of the data is used for the kernel predictions.
Hyperparameters for the kernels were optimized to σ = 0.3 Å, σ = 0.5 Å, and σ = 0.5 Å for the SO(3) integration kernel, Hessian kernel, and explicit rotation kernel, respectively. The regularization strength is set to λ = 1 · 10 −10 . In each case, training of the ML model is performed using cross validation: N configurations randomly picked from the 10 independent data sets. The generalization error is estimated by out-of-sample kernel predictions. The mean prediction error of each sample ∆F is averaged over the three Cartesian components, ∆F x , ∆F y , and ∆F z . These generalized errors are averaged over the 10 independent data subsets, where error bars refer to the standard error of the mean.

S3. THREE-BODY STILLINGER-WEBER FLUID
Fluid simulations with explicit three-body interactions are run using a Stillinger-Weber (SW) potential. For all particles being of the same type this reads as where f (θ m ) is a tabulated angular term. The angle θ m is defined as the one between the distance vectors r m ab and r mac (see Fig.1 of the main text): θ m = arccos [(r m ab · r mac ) / (r m ab r mac )]. M is the total number of triplets of a system including permutations, as explained in section "Three-body Stillinger-Weber fluid" and Fig. 1 of the main text. The parameter r cut = 3.7 Å is the three-body short-range cutoff which we choose in a way that the three-body potential is fully switched on in the first coordination shell. This implies that the contribution of a triplet m to the total three-body interaction potential (Eq. S10) is only unequal to zero whenever both interparticle distances r m ab and r mac are smaller than r cut = 3.7 Å. Setting σ SW = 1, the parameter γ = 0.8 Å controls the steepness of the switching. The parametrization is taken from a concurrent two-and three-body parametrization of a one-site water model developed by some of us. 4 We run simulations in a cubic box of length L box = 31.0648 Å containing 1, 000 particles, corresponding to a system density of ρ = 0.990 g /cm 3 . Two-body interactions are modeled using linear-interpolation tables with a short-range cutoff r (2) c = 12 Å. We employ a time step of 1 fs and a chain of three Nosé-Hoover thermostats 1,2 with a damping parameter of 200 fs when integrating the equations of motion. 3 After an equilibration of 1 ns, the positions and forces are recorded every 1 ps within a 9 ns production simulation. To isolate the three-body contributions, we perform a trajectory rerun with only three-body interactions switched on. We obtain a dataset of 9, 000 frames each containing 1, 000 atoms, totaling 9, 000, 000 configurations. A 5-fold splitting of the data is used for the kernel predictions.

S4. CUTOFFS AND SWITCHING FUNCTIONS
In section "Cutoffs and switching functions" of the main text, we depict a force scan on partilce a of a test triplet with a specific orientation (see Fig.4(a) of the main text). The detailed description of the choosen orientation is as follows: We exemplarily depict the force prediction f (3) ma,x in the cartesian x direction on particle a of a sample triplet m oriented as follows (see inset of Fig.4(a)): the interparticle vector r m ab is oriented along the x-axis with r m ab,x > 0. The vector r mac is oriented in the xy-plane such that r mac,y > 0. We fix the angle between r m ab and r mac , θ bc = 110 • . We then vary the distances r m ab and r mac simultaneously from r min = 2.5 Å to r max = 3.7 Å and display f (3) ma,x .

S5. COVARIANT MESHING
Here, we describe in detail the exact procedure of binning the force pair or triplet kernels to a kernel matrix of fixed size (see section "Covariant meshing" of the main text) To do so, we precompute the pair or triplet kernel elements on a fixed grid,k bins , limiting the number of kernel elements to M bins ×M bins for training. Due to the covariance of the force kernels, one can choose an arbitrary reference orientation of the binned pair and triplet representations. One then calculates the Hessian kernel elements for these reference representations.
For the two-body kernel, we choose a linear binning with M bins between r bins min and r bins max and an orientation of the pair vector r bins m ab along the z-axis with r bins m ab,z > 0. For the three-body kernel, we choose a uniform binning on a 3-dimensional grid. The triplet orientation is such that the interparticle vector r bins m ab is oriented along the z-axis with r bins m ab,z > 0 and the distance vector r bins mac is located within the yz-plane with r bins mac,y > 0. This fixes the orientation of the third interparticle vector r bins m bc . We then vary r bins m ab in M r steps from length r bins min to r bins max . The distance vector r bins mac is varied from length r bins m ab to r bins max which is sufficient due to the ordering of the representation. Finally, we vary the angle between the distance vectors r bins m ab and r bins mac , θ bins m = arccos r bins m ab · r bins mac / r bins m ab r bins mac , in M θ steps from θ bins min to θ bins max . This implicitly fixes the third interparticle distance r bins m bc . This means, the total number of bins for triplets is M bins = M θ M r (M r + 1) /2. The fixed orientation of the binned pairs and triplets implies that the pairs and triplets of each sample have to be rotated to this orientation. To do so, we introduce a new mapping matrixL bins connecting the N configurations to the M binned pairs or triplets: Again, each matrix elementL bins im is a 3 × 3 matrix. The index i denotes the N configurations and m denotes the M bins pairs or triplets in the set of binned representations. Introducing an additional index i l , denoting the pairs or triplets of configuration i,L bins im has the following structure: is the rotation matrix that rotates pair or triplet i l to the fixed orientation of the binned representation m. In addition, we apply a Gaussian smearing by means of weights w tot i l m with 0 < w tot i l m ≤ 1. This implies that one pair or triplet i l is mapped to multiple binned representations.
For pairs, the weights are calculated due to a one dimensional Gaussian distribution of the difference in pair distances of the actual pair i l , r i l,ab and each binned pair m, r bins m ab : For triplets the weights are a product of three Gaussians: We truncate the Gaussian distributed weights such that all w tot ≥ 0.001 to keep a certain degree of sparseness of L bins . In addition, we normalize the weights such that m w tot i l m = 0 for each i l . We obtained the best results for smearing widths of σ r = dr/1.8 and σ θ = dθ/1.8 with dr = r bins max −r bins min /Mr and dθ = θ bins max −θ bins min /M θ . This now implies:K =L binskbinsL bins . (S15) Analogously, training of the ML model is achieved by Prediction of a set of forces F * leads to F * =L * L binsk * bins α. (S17) Note thatk * bins is covariant, meaning that the force prediction does not need a posteriori rotations to the global frame.

S6. PROJECTION ON TABULATED FORCE FIELDS
The tabulation of the force fields is done in a straightforward way. For pairs, we tabulate the magnitude of the pair force on a fixed grid. We construct a pair vector with orientation along the positive x-axis, and vary its length in M out steps from r min to r max . We then evaluate the kernel matrixk * between each grid point and all training pairs. We finally tabulate the pair force prediction f * according to equation (8) of the main text. For triplets, we construct the table grid as follows: the interparticle vector r out m ab is oriented along the z-axis with r out m ab,z > 0 and the distance vector r out mac is located within the yz-plane with r out mac,y > 0. This fixes the orientation of the third interparticle vector r out m bc . We then vary r out m ab in M r steps from length r out min to r out max . The distance vector r out mac is varied from length r out m ab to r out max which is sufficient due to the ordering of the representation. Finally, we vary the angle between the distance vectors r out m ab and r out mac , θ out m = arccos r out m ab · r out mac / r out m ab r out mac , in M θ steps from θ out min to θ out max . This implicitly fixes the third interparticle distance r out m bc . This means, the total number of bins for triplets is M out = M θ M r (M r + 1) /2. The three-body interactions need to be further projected onto the interparticle vectors. The Hessian kernel ensures that the three force vectors on all particles m a , m b , and m c of each tabulated triplet m, f ma , f m b , and f mc , lie within the plane defined by r m ab , r mac , and r m bc . We can use this to project the forces For each m ∈ {1, . . . , M out } we tabulate the 6 coefficients f ma1 , f ma2 , f m b1 , f m b2 , f mc1 , and f mc2 . Due to symmetry reasons, as explained in section "Three-body Stillinger-Weber fluid" and Fig. 1 of the main text, the following relations hold: We implement a new pair style called 3b/table in the LAMMPS simulation package, 5 which we make available in a repository. 6 It uses the same cutoff scheme as for the SW system before: the three-body forces are evaluated for all triplets with r m ab < r cut and r mac < r cut . Because the tabulated force field assumes that r mac > r m ab , we may need to swap the particles involved during force evaluation.

S7. COARSE-GRAINING
Coarse-grained (CG) models are parametrized from a reference SPC/E water simulation. 7 Boxes of 1, 000 water molecules are first equilibrated for 10 ns in the N P T ensemble at T = 300 K and P = 1.0 bar, followed by a 10 ns N V T production run at the average density of the preceding NPT simulations, ρ = 0.998 g /cm 3 . We employ a 1 fs time step and a chain of three Nosé-Hoover thermostats 1,2 with a damping parameter of 200 fs when integrating the equations of motion. 3 N P T simulations are run using an additional Nosé-Hoover-type barostat with a damping parameter of 2 ps. Electrostatic interactions are treated with a particle-particle particle-mesh solver with an accuracy of 10 −5 . 8 We apply a cutoff r cut = 10 Å to construct short-ranged interactions. The positions and forces are recorded every 1 ps. This leads to a dataset of 10, 000 frames with 1, 000 molecules each, totaling 10, 000, 000 configurations. A 10-fold splitting of the data is used for the learning procedure. Each configuration of a molecule embedded in its environment is then mapped to a single CG bead.
In Fig. S2, we compare the two-body force kernel predictions for 10 4 configurations with 1000 bins and a switching function, Eq. (24) of the main text, with d = 0.4Å. One can see that for pair distances greater than 6 Å, the pair force as determined by force-matching is close to zero. The ML force as average over 10 independent training sets is close to zero, as well, though it shows slightly larger fluctuations.
In Fig. S3, we compare the liquid water distribution functions for three different simulations: the original atomistic simulations, and the Hessian-kernel model with and without a switching function. All schemes use a cutoff for the three-body forces of r cut = 3.7 Å. This figure complements Fig. 6 of the main text and shows that the use of switching is essential in this case.