Solvation Free Energies from Machine Learning Molecular Dynamics

The present work proposes an extension to the approach of [Xi, C; et al. J. Chem. Theory Comput.2022,18, 6878] to calculate ion solvation free energies from first-principles (FP) molecular dynamics (MD) simulations of a hybrid solvation model. The approach is first re-expressed within the quasi-chemical theory of solvation. Then, to allow for longer simulation times than the original first-principles molecular dynamics approach and thus improve the convergence of statistical averages at a fraction of the original computational cost, a machine-learned (ML) energy function is trained on FP energies and forces and used in the MD simulations. The ML workflow and MD simulation times (≈200 ps) are adjusted to converge the predicted solvation energies within a chemical accuracy of 0.04 eV. The extension is successfully benchmarked on the same set of alkaline and alkaline-earth ions.


■ INTRODUCTION
−3 While implicit solvent models have been often used in first-principles (FP) molecular dynamics (MD) simulations, 4,5 they generally cannot capture specific molecular effects that may be important at the solute/ solvent interface.Conversely, an explicit treatment of all solvent molecules is often computationally intractable.Alternatively, hybrid approaches involve treating a few inner solvation layers explicitly, and the rest of the solvent implicitly. 6,7Such approaches receive a rigorous statistical treatment within the quasi-chemical theory (QCT) framework of Pratt and coworkers, 8−10 whereby solvation properties are expressed as a sum of free-energy contributions, each one involving a fixed number of solvent molecules within the inner solvation shell.In turn, the free-energy contributions can be calculated by various methods, including thermodynamic integration, 11 and stochastic sampling and relaxation of solvation structures. 10,12,13owever, those methods may remain challenging in terms of computational intensity or accuracy.Alternatively, in a recent work, 14 Xi et al. have used an approach to directly calculate the free energy of the hybrid system from first-principles molecular dynamics (FPMD) simulations.Their predictions of alkaline (earth) ion solvation free energies were in good agreement with the experiment.However, their FP trajectories were limited to 15 ps, preventing them from converging the average total energy within less than 0.2−0.3eV.
In the present work, after contextualizing this approach within the appropriate QCT framework, we introduce a machine learning extension to perform longer MD simulations and improve energy convergence.This extended approach is satisfactorily benchmarked for the same set of cations.
■ METHODOLOGY General Approach.Given an ion in solution, the inner shell region is defined as the set of points within a distance r c from the ion, and the outer environment as the rest of space (Figure 1).The probability to find exactly n solvent molecules in the inner shell at equilibrium is denoted by P i (n) when the ion interacts with the solvent, and by P 0 (n) when the interaction is fictitiously switched off (noninteracting ion).Within QCT, one shows that the ion solvation free energy can be expressed as 10 i k j j j j j y where k is Boltzmann's constant, T is the temperature, and ΔG i (n) is the solvation free energy with an inner shell constrained to contain exactly n solvent molecules.In practice, this formula is useful if the probability distributions P i (n) and P 0 (n) have sufficient overlap so that their ratio can be reliably estimated for some value of n.This is the case for the small solutes considered in the present study, while for larger solutes, other formulas are typically used. 10irst, the probability ratio in eq 1 is addressed.In principle, P i (n) and P 0 (n) must be obtained at a fixed chemical potential of the solvent molecules, as achieved for instance via grand canonical Monte Carlo simulations. 15Here, instead, we find an estimate of the probability distributions by running MD simulations in the canonical ensemble.The ion is placed at the center of a cubic box filled with 57 water molecules, representing a concentration of ≈1 M, and the volume of the simulation box is adjusted to the absolute molar volume of the ion as given by ref 16.A FPMD simulation of 1 ps (equilibration) + 10 ps is carried out in periodic-boundary conditions (PBC) at 300 K.Then, for an inner shell strictly enclosed within the simulation box, the probabilities P i (n) and P 0 (n) are obtained by counting the corresponding configurations over the 10 ps sampling period.The validity of this approximation is discussed a posteriori (see Results section).
Next, the term ΔG i (n) is addressed.It is by definition obtained as where G ion+water (n) and G water (n) are, respectively, the free energies of the ion + water system and water system subject to the n-molecules inner shell constraint, and G ion is the free energy of the ion in vacuum.Referring to the n water molecules of the inner shell as the water cluster, we further decompose the free energies into where E X and S X are the enthalpy and entropy of the system X (ion + water cluster, or water cluster only) in vacuum, ΔG solv X is its solvation free energy in the outer environment, and ZPE ion aq is the zero-point energy of the solvated ion.Likewise, the unsolvated ion free energy is obtained as G ion = E ion + ZPE ion vac − TS ion , where E ion , ZPE ion vac and S ion are, respectively, its enthalpy, zero-point energy, and entropy.
In their work, 14 Xi et al. use an inner water cluster of 18 molecules, shown to encompass the first and second solvation layers of the ion.They obtain E X + ΔG solv X as the average total energy ⟨E tot X ⟩ of a FPMD simulation.The total energy E tot X of each MD step is calculated as the DFT energy of the explicit system X embedded in a continuum dielectric medium, capturing the effect of the outer solvation environment.The continuum model of Andreussi et al. 5 is used as implemented in the Environ package 17 of Quantum ESPRESSO. 18In the MD simulation, a confining potential wall is applied around the water cluster to prevent its diffusion into the outer solvation environment.The radius of this spherical wall is dynamically adjusted to maintain the wall pressure around a preset value.The entropies S X and zero-point energy ZPE ion aq are calculated based on the vibrational spectra obtained from the force−force correlation functions in a quasi-harmonic approximation.Finally, the energy of the unsolvated ion is calculated by DFT, and for a monatomic ion, ZPE ion vac = 0 and its entropy is obtained from the perfect gas translational partition function.
The present work follows the same general approach but with some notable differences as listed below: • E X alone is calculated as the average total energy ⟨E tot X ⟩ of the MD simulation, while ΔG solv X is treated separately (see second bullet).To allow for longer MD simulation times, the total energy E tot X is obtained from a machine-learned (ML) potential energy of the explicit system (see details below).
• Instead of using the continuum dielectric medium, the effect of the outer environment is treated analytically.ΔG solv X is taken as the sum ΔG hyd X + ΔG diel X , where ΔG hyd X is the free energy of hydration, originating from hydrogen bonds created around the cluster, and ΔG diel X is the free energy change coming from dielectric screening by the outer solvation environment.Following Born's equation (in atomic units), 19 i k j j j j j y where z is the electric charge of X and ϵ r is the relative permittivity of the solvent in the bulk (here taken to be 78).As a decoupling hypothesis, ΔG hyd X is assumed, at a given density, depend only on the cluster size (and not on its charge).Then, for a given cluster radius, + G hyd ion cluster and ΔG hyd cluster cancel out in the expression of G ion+water (n) − G water (n).
• In line with the QCT approach, MD simulations are performed at a fixed radius r c of the potential wall acting on the oxygen atoms of the water molecules.The value of r c = 5.05 Å is used to set the density of the 18-molecules water cluster to 1 kg/L.Machine-Learned Total Energy Function.The total energy function E tot X (ν), where ν is any configuration of the system X, is ML from a set of DFT calculations.For the latter, the Perdew−Burke−Ernzerhof (PBE) exchange correlation functional 20 is used here in combination with pseudopotentials of the SSSP PBE efficiency 1.1.2library for ionic cores. 21The system is placed in a 16 Å × 16 Å × 16 Å box, and the electrostatic potential of periodic images is removed by the parabolic electrostatic correction of Dabo et al. 22 implemented in Environ.
The machine learning architecture is the E(3)-equivariant graph neural network (NN) Nequip of Batzner et al. 23 Two NN models (NN-1 and NN-2) are used, with respective cutoff radii of the convolution filter of 5 and 6 Å, and respective numbers of interaction blocks of 2 and 3.The models are similar for all other parameters: a maximum rotation order of 2; a multiplicity of the features of 8; no odd parity; a "default" radial NN comprising 8 basis radial functions, 3 layers, and 64 hidden neurons.Each NN model is trained over 100 epochs with the Adam optimizer, putting equal weights on forces and the total energy per atom in the cost function.The optimized model is then exported and used in LAMMPS 24 to perform MD simulations.
The overall ML workflow (Figure 2) consists of 3 main stages: • Stage 1 (first row): an initial FPMD trajectory is produced in PBC with Quantum ESPRESSO.A high temperature (700 K) is used to widen the sampling of bond length distributions and thus improve the robustness of the subsequent NN.From this trajectory, 150 random snapshots of the inner water cluster are extracted, and recalculated with DFT in open-boundary conditions (OBC) using the parabolic correction. 22This data set is then used to train the first model NN-1.• Stage 2 (second row): a MD trajectory of 200 ps is produced with LAMMPS at a temperature of 300 K using the NN-1 model.From this trajectory, 1500 random snapshots are recalculated with DFT in OBC, and then used to train the more accurate NN-2 model.

■ RESULTS
The methodology has been applied to 5 alkaline (Li + to Cs + ) and 5 alkaline-earth (Be 2+ to Ba 2+ ) cations.
For n = 18 and r c = 5.05 Å, we find P 0 (n) ≈ 0.24, and the P i (n) values are reported in Table 1.According to eq 1, the P 0 (n)/ P i (n) ratio then contributes a maximum free energy shift of −0.02 eV.Although approximate, this result indicates that given the small size of the solute, the P 0 (n) and P i (n) distributions largely overlap, and for the n and r c values chosen here, the corresponding free energy contribution is small.In the following, we thus approximate ΔG solv ion ≈ ΔG i (n).For ΔG i (n), the calculation workflow has been converged within a chemical accuracy of 0.04 eV, by checking that (i) the mean LAMMPS and DFT absolute energy difference is less than 0.04 eV over the last cross-validation set, and (ii) the mean total energy ⟨E tot X ⟩ of the last MD simulation is converged within 0.04 eV.
The calculated solvation energies are reported in Table 2. Our results are in general good agreement with the previously calculated values of Xi et al. and experimental values.For alkaline-earth cations, the agreement with experiment is also improved, leaving only Be 2+ and Mg 2+ slightly above the reported experimental range.

■ CONCLUSIONS
The hybrid solvation method of Xi et al. has been re-expressed in the QCT framework and extended/modified to run longer MD simulations at a fraction of the computational cost.The extension consists mainly in using a FP-based ML force field in the MD, leveraging the data-efficient O(3)-equivariant NN Nequip architecture.Another modification is to treat the outer solvation environment analytically, in contrast with the implicit continuum medium of the original approach.The extended approach is tested by running 200 ps MD simulations on alkaline(-earth) cations and converging the mean energy within chemical accuracy (0.04 eV).The calculated solvation free energies are in good agreement with previously calculated values and experimental values.

Figure 1 .
Figure 1.Hybrid solvation model comprising an explicit inner shell region and an implicit outer environment.

Figure 2 .
Figure 2. Computational workflow to produce a MD trajectory converged with respect to cross-validation DFT calculations.

•
Stage 3 (third row): a new MD trajectory of 200 ps is produced using the NN-2 model.A new selection of 150 random snapshots is recalculated with DFT.If the LAMMPS energies compare well with DFT energies, the workflow is considered converged, and the relevant thermodynamic quantities are extracted from the last MD simulation.

Table 1 .
P i (n) Values for Alkaline(-Earth) Ions with n = 18 and r c = 5.05 Å

Table 2 .
Calculated and Experimental25−31Absolute Values of the Free Energies of Solvation a The MRE is the mean relative error of the calculated value vs the average experimental value.The same standard state correction is applied as in eq 20 of ref 14.MD trajectories, energies and forces used to obtain the solvation energies of Table2are available on Materials Cloud (DOI: 10.24435/materialscloud:a0-jh). a