To Pair or not to Pair? Machine-Learned Explicitly-Correlated Electronic Structure for NaCl in Water

The extent of ion pairing in solution is an important phenomenon to rationalize transport and thermodynamic properties of electrolytes. A fundamental measure of this pairing is the potential of mean force (PMF) between solvated ions. The relative stabilities of the paired and solvent shared states in the PMF and the barrier between them are highly sensitive to the underlying potential energy surface. However, direct application of accurate electronic structure methods is challenging, since long simulations are required. We develop wave function based machine learning potentials with the random phase approximation (RPA) and second order Møller–Plesset (MP2) perturbation theory for the prototypical system of Na and Cl ions in water. We show both methods in agreement, predicting the paired and solvent shared states to have similar energies (within 0.2 kcal/mol). We also provide the same benchmarks for different DFT functionals as well as insight into the PMF based on simple analyses of the interactions in the system.

Understanding the nature of ion pairing and the solvation structure of electrolyte solutions is a fundamental challenge in the quest to design efficient next generation energy storage devices. 1 For example, ionic conductivity and redox stability are predominantly influenced by the electrolyte solvation structure. 2,3Moreover, an understanding of the solvation behaviour of ions is crucial to ensure a uniform and stable solid-liquid interphase to limit dendrite formation, which currently presents a significant challenge with respect to efficiency and safety of electrochemical devices. 4More generally for transport properties, Peng et al. showed that the diffusion of sodium ions at interfaces is significantly affected by their hydration number. 5he potential of mean force (PMF) between an ion pair in solution provides a direct window into the ion pairing behaviour and solution structure of electrolytes.However, for the prototypical electrolyte solution of NaCl in water, no experimental benchmark exists.Moreover, from a computational modelling perspective, it is highly sensitive to the underlying potential energy surface 6,7 and there is currently no quantitative or qualitative consensus on the PMF of NaCl in water.For example, two of the best-established classical force-fields for NaCl in water -the Joung-Cheatham (JC) 8 and Smith-Dang (SD) 9 models -both disagree significantly in their PMFs, a) Electronic mail: am452@cam.ac.uk b) Electronic mail: cs2121@cam.ac.uk with the JC model predicting thermodynamically unfavourable ion pairing. 10,11In general electrolytes are very challenging to model, where on top of the already arduous situation for bulk water, 12 they have the additional complication of ion-water and ion-ion interactions that need to be accurately described. 13Therefore, it is clear that to rationalise the PMF, it is imperative to first have a model that accurately describes the structure of NaCl in water.
The past decades have seen pioneering work in the development of classical force field models to describe ions in water, with the most recent Madrid scaled charge models for ions giving excellent agreement with experiment for a range of structural and dynamical properties. 14owever in general, such models can typically lack accuracy beyond the property or phase space to which they have been parameterised.It has been suggested that to simultaneously and faithfully capture dynamical and structural properties of electrolytes, a model that explicitly treats its electronic structure is desirable. 15,16til now, the workhorse method of ab initio computational chemistry -density functional theory (DFT) -has been the method of choice for studying these systems, offering an acceptable balance of accuracy and computational overhead.While the solvation structure of water around ions has been the subject of numerous previous ab initio simulation studies, [17][18][19][20][21][22] there is currently no consensus on a sufficiently accurate model to describe NaCl in water.In fact both Duignan et al. and Panagiotopoulos et al. have recently highlighted the urgent need for accurate ab initio models for electrolytes to capture their dynamics 16 and collective properties. 6or example, tried and tested exchange-correlation (XC) functionals for liquid water, such as the generalised gradient approximation (GGA) revPBE-D3 are not guaranteed to perform well when ions are added. 19,20,23This is because standard GGAs tend to perform poorly for interactions involving charges, overestimating electrostatic contributions to binding energies 24 along with their wellknown delocalisation error, 25,26 both arising from the self-interaction error.Moreover, the interplay of the electronic structure method and nuclear quantum effects has been shown to be highly sensitive for liquid water, [27][28][29] but their impact on electrolyte solutions has not been exhaustively explored so far. 30,31Inclusion of more complex ingredients into the density functional approximation following Perdew's Jacob's Ladder 32 may improve the description of water-water 33,34 or ion-water 35 interactions, however a model that faithfully captures the behaviour of ions in solution for the right reasons remains elusive.
Going beyond DFT, correlated wavefunction-based methods such as the Random Phase Approximation (RPA) and second-order Møller Plesset Perturbation Theory (MP2) are expected to perform well for electrolyte systems.These methods naturally incorporate van der Waals interactions and do not suffer from delocalisation error. 368][39] While Duignan et al. hint that RPA could outperform lower rungs of Jacobs Ladder for the case of the solvated sodium ion, 20 routine application of these high-level methods in condensed phase simulations has been sporadic.This is primarily because of their high computational cost to implement, with canonical scaling behaviour between O(N 4 ) and O(N 5 ), albeit reducedscaling variants also exist. 40Meanwhile, to obtain statistically converged properties to probe the structure of electrolyte solutions including radial distribution functions, densities and solvation free energies, even DFT becomes extremely computationally challenging.While valuable developments in electronic structure code algorithms and computer hardware 38,41 have significantly increased the accessibility of these methods, it would be highly desirable to confine them to a small number of single-point energy (and force) calculations rather than finite temperature simulations which typically require hundreds of thousands of such computations.
Fortunately machine learning potentials provide a gateway to perform simulations at ab initio levels of theory with a decrease in several orders of magnitude in their computational cost. 42,43Machine learning models have been successfully trained and applied to study bulk water at various levels of theory, [44][45][46][47][48] including recent neural network models for bulk water with MP2 49,50 and RPA. 51While the natural next step is to add ions to the water, [52][53][54] this brings additional complexity to the con-figuration space to be explored and so the training set must be judiciously chosen to reflect this.To this end we use a previously developed automated active learning framework 55 and an initial training set describing NaCl dissolution in water 56 to generate MLPs at various levels of electronic structure theory for Na and Cl ions in water.The forthcoming discussion will simply refer to all of these models by the name of the reference method to which they have been trained.We believe this is valid since we have rigorously benchmarked the capability of the MLPs to reproduce their underlying reference method (see Supplementary Information (SI)).We first explore the capabilities of different DFT XC functionals and correlated wave-function methods to accurately describe the structure of ions in water, showing that both RPA and MP2 generally reproduce experimental densities and radial distribution functions for both bulk water and solvated ions.We then use these wavefunction based methods as a benchmark against which to compare the predictions of DFT and classical force-fields for the potential of mean force of a NaCl ion pair in water.We provide simple metrics based on the constituent interactions to rationalise the overall shape of the PMF, while also providing insight into the ability of DFT to capture these interactions.In all cases, the simulations performed are well-converged with respect to statistical sampling and are not significantly impacted by finitesize effects, highlighting a major advantage of the MLP approach over ab initio methods.We finish with an outlook on the potential of these models for describing more complex situations such as confined electrolytes and for computing dynamical properties.
In Figure 1, we compare with experiment the performance of various DFT XC functionals, along with two wave function based methods MP2 and RPA, in predicting the radial distribution functions (RDFs) of both bulk water and Na and Cl ions in water.The DFT functionals have been chosen to span the various levels of Jacob's Ladder, with each level incorporating increased complexity into the XC functional description.Specifically we have chosen the GGA revPBE-D3, 60,61 the van der Waals inclusive optB88-vdW, 62 the meta-GGA r 2 SCAN 63 and the hybrid revPBE0-D3. 61Beyond DFT, we consider RPA and MP2, with both classical and quantum nuclei.Inclusion of nuclear quantum effects have been shown to be necessary to obtain accurate agreement with experiment for liquid water structural and dynamical properties, including RDFs, diffusion coefficients and spectroscopy. 27,33e first consider the performance of the RPA and MP2 wavefunction-based methods.Overall, out of all methods tested, RPA performs best in describing the structure of both bulk water and Cl and Na ions in water (   19 Also, the experimental Cl-O RDF is only available from a KCl solution. 58While there are discrepancies beyond the first peak, the cation should not significantly influence the first solvation peak in the large separation limit.The challenges of accurately modelling bulk water and ions in solution for DFT is apparent when comparing the different DFT XC functional predictions for the RDFs with experiment.None of the functionals tested can simultaneously describe water and ions with the same accuracy as the best performing correlated wavefunction method, RPA.As has been previously observed, revPBE-D3 performs very well for liquid water, 27,65 however it has been shown that this is in part due to a fortuitous cancellation of errors and this breaks down upon inclusion of NQEs. 27Moreover, while it also performs well for the Cl-O RDF, it significantly underestimates the first peak of the Na-O RDF.Inclusion of a fraction of exact exchange via the hybrid revPBE0-D3 functional shows similar good performance for bulk water and Cl-O, but does not improve the performance for Na-O.In contrast, r 2 SCAN significantly overstructures bulk water, predicting a first RDF peak for O-O approximately 20% greater than experiment, yet shows good agreement with experiment for both ion types in water.Various density corrections to the r 2 SCAN functional have shown to improve its description of liquid water, 66 which warrant further tests on their suitability for aqueous electrolytes.Similar to r 2 SCAN, the van der Waals inclusive optB88-vdW functional also overstructures bulk water, but accurately predicts the ion-water RDF for both Na-O and Cl-O.The poor performance for both revPBE-D3 and revPBE0-D3 for Na-O can be potentially ascribed to the fact that the D3 correction does not account for changes in dispersion due to charge-transfer effects (i.e., the formation of ions).Cations have been shown to have a significantly different polarizability (i.e.dispersion) than their neutral atom counterpart compared to anions, 67 thus explaining the greater impact on sodium.More sophisticated treatment of these cases such as the D4 correction, 68 methods incorporating iterative Hirschfeld partitioning, 69 and using van der Waals inclusive methods (as shown here with optB88-vdW) may alleviate this problem.It has also been suggested that simply neglecting the D3 correction for the problematic cation interactions can improve agreement with experiment. 70curately predicting the density is another important benchmark of the ML models and their underlying electronic structure reference method.Figure 2 (a) compares the concentration dependent density prediction of RPA and MP2 with experiment.This is a stern test of the electronic structure method but also the MLP quality, covering a significant concentration range.Both MP2 and RPA show a similar qualitative increase in density with concentration of NaCl as observed in experiment.MP2 is within 1 % of the experimental values, while RPA more significantly overestimates the density by approximately 5 %.Both RPA and MP2 MLP predictions for the bulk water density are also in close agreement to previous ab initio predictions of 0.994 g/cm 337 and 1.020 g/cm 339 for RPA and MP2 respectively, and are within 5% of the experimental value.There is again a wide spread in DFT predictions, with most functionals tested overestimating the density of the 2 M NaCl solution apart from revPBE0-D3 which underestimates the density by 5% compared to experiment, while r 2 SCAN is in good agreement with experiment.In comparison, RPA slightly overestimates the density, while MP2 is in very good agreement with experiment.It should also be noted that while classical force-field models can accurately reproduce experimental density predictions, particularly in low concentration regimes, 71 its agreement arises because the density is a property to which the water model has been explicitly fitted. 72t higher salt concentrations, away from the regions explicitly used in the parameterisation, force-field predictions also deviate from experiment. 71To summarise, although RPA gives excellent structural properties for NaCl in water, it is slightly outperformed by MP2 in terms of the density response with increasing NaCl concentration.However, MP2 yields poorer agreement with experiment for structural properties than RPA.Nevertheless, the overall commendable all-round performance of RPA and MP2 for all cases of ions and water suggests them as reliable methods with which to study NaCl in water.Of course the enhanced computational cost for initial training set computations for both RPA and MP2 over DFT should be mentioned, however this is a one-off investment during training.Once a model is obtained, it can then be applied in simulations with a reduction of several orders of magnitude in computational costwhere the saving is greater for models based on more expensive methods.
Going beyond experimentally accessible properties, the potential of mean force (PMF), denoted here as w(r) is a fundamental property of the electrolyte governing the extent of ion pairing in solution and capturing collective solvent motion.The key features of this quantity for Na + and Cl -in water are two local minima corresponding to the so-called contact-ion pair (CIP) and solvent-shared ion pair (SSIP), with free energies U CIP and U SSIP respectively.These are separated by a barrier (TS) along the inter-ionic separation coordinate as illustrated in Figure 3 (a), where U b is defined as U TS − U CIP .The relative stabilities of these minima (∆U SSIP−CIP = U SSIP − U CIP ), along with the barrier height (U b ) are crucial to understanding the kinetics of ion-pair association and dissociation. 74However as previously mentioned, the PMF is highly sensitive to its underlying potential energy surface. 6,7Among both DFT and classical models, there is a major unresolved question regarding the relative stabilities of the CIP and SSIP, 10,73,75,76 with some classical models predicting a more stable CIP by up to 4 kcal/mol compared to DFT, which generally predicts almost degenerate CIP and SSIP states.It should also be noted that the error bars on the DFT predictions are typically much greater, due to the significant computational effort required to converge the PMF.Despite commendable efforts to resolve the PMF using ab initio methods, 7,73,75,77 it is computationally intensive to statistically converge, requiring long simulation times (∼ 2-3 ns) and several replicates to obtain statistical error bars.Therefore, in absence of experimental benchmarks a reference PMF based on high-level electronic structure method is particularly valuable to the community.It can provide clear atomistic insight to explain experimental observations of ion-pairing, and allow comparison among theoretical models -both ab initio and classical.
The MLPs developed in this work are ideal for efficient and statistically converged calculation of the NaCl PMF at a given level of theory.We compute the PMF via the Na-Cl RDF of a 1 M NaCl solution comprising six ion-pairs in a box of 332 waters.This highlights another advantage of the MLP approach, where the potential energy surface along the inter-ionic separation can be sufficiently sampled through standard molecular dynamics simulations, without the requirement for enhanced sampling methods such as thermodynamic integration (we show in the SI the equivalence of both methods).Moreover, these models have enabled sampling of the PMF out to the large-separation limit, providing valuable insights into the ion-pair binding energies.
Overall, over 200 ns of MD simulations were performed, which is significantly beyond the capabilities of ab initio methods.Figure 3 compares the PMF predictions for MLPs at the same levels of electronic structure theory described above, as well as some commonly used classical force fields.Statistical convergence and finite size effects have been carefully investigated and details are given in the SI.We first note that the correlated wavefunction methods MP2 and RPA are in very good agreement, predicting that the CIP and SSIP have very similar stabilities.They both predict that the two states are within ∼ 0.2 kcal/mol of each other.This resolves the so far unanswered question regarding the relative stabilities of the CIP and SSIP states.Moreover, of all the electronic structure methods tested, MP2 and RPA predict the largest barrier between the CIP and SSIP states of roughly 1.6 kcal/mol.The excellent agreement between both high-level methods suggests this PMF can now be used as a benchmark to compare other methods.
Similar to the RDFs and densities shown above, DFT predictions also vary significantly (although not to the same extent as classical force-fields), and reinforce the considerations for selecting a suitable functional to study electrolyte systems.While all methods are in relatively good agreement (within 0.2 kcal/mol) for the relative stabilities of the CIP and SSIP states, there is a larger range in the predictions of the barrier from SSIP and CIP and the depth of the CIP state.The free energy barrier between the CIP and SSIP for both r 2 SCAN and optB88-vdW is similar at approximately 1.25 kcal/mol.Meanwhile, although revPBE-D3 and revPBE0-D3 predict similar barrier heights for the CIP to SSIP transition, revPBE-D3 shows a shift to slightly larger distances for the locations of the CIP, SSIP and transition state.Given the wide-ranging performance of the DFT methods tested here with respect to various aspects of the electrolyte structure (shown in Figures 1 and 2), it is perhaps unsurprising that they also struggle to concur on the PMF, a property that depends sensitively on the collective interactions of sodium and chlorine with the surrounding water.
We now rationalise these trends in the PMFs by a heuristic analysis of the subtle balance of interactions that qualitatively match our findings.We focus on the two key interactions, namely the ion-ion and ion-water interactions.We compute the binding energy of an ion pair in vacuum up to 3 Å (before the transition to SSIP), to understand ion-ion (II) effects.For ion-water (IW) effects, we compute the energy of interaction for an ion pair in water from equilibrium periodic snapshots taken from the points of interest along the PMF (CIP, SSIP, TS).Details of these calculations as well as further comparisons are given in the SI. Figure 4 shows the relationships between these constituent interactions and the PMF observables.First, in Figure 4 (a), there is a strong correlation between the depth of the CIP minimum in the PMF and the difference of the ion-water and ion-ion interac-tions.We show in the SI that considering the ion-ion interactions alone is not the best descriptor for the CIP well depth.However we find that subtracting the ion-water interactions from the ion-ion interactions as shown here effectively captures the trend of CIP well depth across the functionals.All functionals except optB88-vdW show the same ordering for both properties (within error bars), suggesting the CIP region is primarily governed by a delicate interplay between ion-ion and ion-water interactions, whereby stronger ion-ion (ion-water) interactions strengthen (weaken) the CIP well depth.However, the deviation of optB88-vdW from this trend suggests that there still may be important yet subtle effects for example from water-water interactions in the close contact region.This analysis builds on work by Duignan et al. 20,78 who compute interaction energies from clusters extracted from simulations of cations in water.In Figure 4 (b) there is again a clear correlation between the barrier going from the CIP to SSIP states and the strength of the ion(-pair)-water interaction energy.Here, within error bars, the largest transition barrier corresponds to the largest ion-water interaction energy.We rationalise this observation by the fact that going from the SSIP to CIP state requires a rearrangement of the ions' solvation shell.A lower ion-water interaction energy will thus result in a more facile transition from one minimum to the other, thereby lowering the barrier between the two states.
These qualitative metrics can also be further used to understand the performance of the various electronic structure methods employed here.Firstly, for the ionion interactions, MP2 and RPA both shows the strongest binding compared to DFT. 79,80 With respect to the ion-water interactions, MP2 and RPA also exhibit the strongest interactions, with the accuracy of the former having been verified by previous hydrated sodium cluster analysis. 81Meanwhile, revPBE-D3 and revPBE0-D3 demonstrate the weakest ion-water interactions.We show in the SI that while the Na + -H 2 O interaction is overbound due to the D3 (as discussed previously), there is an underbinding in the Cl --H 2 O interaction.This arises from the revPBE exchange enhancement factor causing stronger repulsion, tending to underbind, 82 which then dominates the overall ion-water interaction.In agreement with analyses on hydrated sodium ion clusters by Duignan et al., 20 we find that r 2 SCAN (or SCAN in their work) gives better agreement to RPA than revPBE-D3.
Finally, the largest discrepancies in PMFs from all the classes of methods tested are in the classical forcefield predictions.The Smith-Dang/SPCE model 9 is similar to the MP2 and RPA prediction, with the position of the SSIP slightly shifted to larger distances and a larger barrier by roughly 0.4 kcal/mol.However, the Joung-Cheatham/SPC/E and HMN/SPC/E models significantly differ from all the other tested methods.JC predicts a more stable SSIP by almost 1 kcal/mol, while HMN/SPCE predicts a more stable CIP by over 2 kcal/mol, and a barrier of over 4kcal/mol between CIP and SSIP.As shown in the analysis of interactions above, given the sensitive dependance of the PMF on the interplay of interactions, this perhaps explains why classical force fields, which are typically fitted to observables rather than the interactions may struggle to converge on the PMF.
The machine learning approach used in this work offers a high level of accuracy and efficiency for studying electrolyte solutions, yielding high-quality, well-converged structural properties.We compute the disputed potential of mean force of the NaCl ion pair in water using correlated wavefunction methods.We find that correlated wavefunction methods are in very good agreement for the description of structural properties and densities.Importantly, they also agree that the CIP and SSIP are essentially degenerate (within 0.2 kcal/mol).DFT can reproduce these results with larger variations depending on the chosen functional, where we do not identify a functional that delivers convincing performance throughout the chosen properties.Compared to the scale of force field predictions, in particular for ion-pairing, these differences remain small and candidates such as revPBE-D3 or r 2 SCAN are expected to deliver a good compromise when simulating extended system sizes.Going forward, the remaining differences among DFT XC functionals and even the discrepancies that remain among the wavefunction based methods point towards a need for an unambiguous benchmark quality model for these systems.Given the recent successes by Chen et al. 83 and Daru et al. 84 in performing CCSD(T) -considered the 'gold-standard' of quantum chemistry-based molecular dynamics for bulk water, there is an exciting opportunity to build on this work for the case of ions in water.Their machine learning models accurately predict structural and dynamical properties of liquid water compared to experiment.Using these approaches to train a model for water including ions is therefore a promising prospect to obtain a high quality potential energy surface for this system that can be used to further benchmark the lower level methods, and also to be directly used for simulations.
Furthermore, we have explored the effect of the various interactions of the system on the PMF, providing insight into the delicate interplay of ion-ion and ion-water interactions.The simple metrics can now be used to benchmark other electronic structure or forcefield methods, using for example future CCSD(T)-based models as references.From a more general perspective, our work highlights how ML potentials can now be used to efficiently screen various levels of theory directly on properties of interest, as has also been recently shown for the case of perovskite phase transitions. 85eyond simply comparing levels of theory, the PMFs and models generated in this work can next be used to provide valuable insights into experiment and computationally measured dynamical properties of electrolytes.Obtaining accurate dynamical properties remains a fron-tier of research in the field of electrolyte simulations.There has been significant work using ab initio data to parameterise continuum scale models of electrolytes to access quantities such as osmotic coefficients and activities, along with transport properties such as diffusion coefficients and conductivity. 6This work shows that using MLPs offers an alternative yet complementary approach in the quest to obtain a fully ab initio description of electrolytes.In Figure S.10 of the SI, we report the water self-diffusion coefficients as a function of concentration relative to pure water for RPA and MP2 in good agreement with experiment, in particular for MP2.We note that at the accurate electronic structure level employed in this work other factors such as the quantum nature of the nuclei 29 start to become a dominating factor to reach quantitative agreement with experiment, as shown recently for pure water. 27,45The here developed machine learning potentials combined with path integral molecular dynamics enable to address this issue in future studies via efficient sampling of the potential energy surface.From a fundamental perspective, building on the work of Geissler et al., a high level reference model is imperative to establish a quantitative understanding of the kinetics of ion pair dissociation. 74Similarly, a reliable model capable of accurately describing the bulk structure of electrolytes is primed to target dynamical properties of interest, including activity and osmotic coefficients, along with transport properties such as diffusion coefficients and conductivity.Relating the PMF to Onsager transport coefficients, which quantify correlations in ion motion 86,87 would be highly insightful to rationalise various transport phenomena, and an interesting first step would be to compute standard ion pair association energies. 88In particular, the ionic conductivity as a measure of the quantity of current an electrolyte can transport is vital for the development of next generation energy storage devices. 89Another important consideration and open question for these problems is the influence of long-range effects, given the large simulation cells required to compute these properties and the long-range electrostatic interactions between ions.More sophisticated long-range schemes than that used in this work to name but a few include Refs.90-92 and a comparison of their performance would be highly insightful.Moreover, recent equivarient graph-based architectures including MACE, 93 NEQUIP 94 and GRACE 95 are by construction longer range and are again highly promising to address the question of long-range interactions in an efficient manner.Beyond the bulk, confinement of electrolytes leads to interesting physics and unexpected phenomena that are highly relevant to a range of applications, including blue energy harvesting 96 and desalination. 97,98n particular, extension of our current models to explore the intriguing phenomenon of confinement-induced ion pairing is very attractive.Recent work has shown the assembly of confined ions in solution into long chains under an electric field, resulting in the so-called 'memristor' effect, 99 for which accurate atomistic insights are urgently needed. 100

METHODS
Machine learning potential MLPs provide a direct mapping between a structure and its energy and forces, bypassing the computationally costly step of solving the Schrödinger equation for each timestep of a simulation.All of the models in this work are based on the seminal work of Behler and Parrinello, 101 where we train a committee of eight neural network models 47 for Na and Cl ions in water on forces and energies computed at various levels of electronic structure theory.The model is systematically trained over three generations.The first comprises a general training set common to all models obtained from previous work 56 for NaCl ions in water.We then use an active learning procedure 55 screening different solution concentrations to augment the training set for each model to ensure relevant configurations for a given level of theory are included to give the second generation model used in production simulations.The third generation ensures the large ion-ion separation limit is accessible, by performing another active learning scheme on non-isotropic simulations cells or transfer learning depending on the specifc model.To address the question of long-range forces in our models, the final model is trained only on short range energies and forces, where the longrange electrostatic term (based on fixed point charges assigned to each atom, and computed by particle mesh Ewald summation) has been subtracted.This Coulombic baseline is then added on during simulations to give the full forces and energies.Complete technical details of the models' training and validation are given in the SI.Molecular dynamics simulations All classical simulations were carried out using the CP2K/Quickstep code at a constant temperature of 300 K maintained using the Canonical Sampling Through Velocity Rescaling (CSVR) thermostat 102 and with a 1 fs timestep.
PMF: The PMF calculations were carried out with a 1.0 mol/kg NaCl solution comprising 6 NaCl ion pairs in 332 waters.This was first equillibrated in the NpT ensemble to obtain the equillibrium density for a given level of theory.To ensure sufficient statistical sampling, between five to ten uncorrelated configurations were then sampled from this trajectory and used as independent initial conditions.Production simulations were then performed for at least 2 ns in the NVT ensemble at the previously computed equilibrium density, after which the radial distribution functions were obtained.The potential of mean force w(r) between Na and Cl was computed using the relation w(r) = −k B T ln g(r Na−Cl ), where k B and T are the Boltzman constant and temperature, with g Na−Cl (r) being the radial distribution function between Na and Cl ion pairs.The final result was given by the average of these 10 runs, with the standard deviation providing an error estimate.We show in the SI that this approach is fully consistant to performing constrained molecular dynamics with thermodynamic integration and does not suffer from finite size effects.
RDFs: The RDF calculations for bulk water were obtained from simulations of 126 water molecules in the NVT ensemble at the experimental density.RDFs for ions were obtained from simulations of a single ion pair and 95 water molecules.Path integral simulations for RPA and MP2 were performed using ring polymer molecular dynamics with 16 replicas, using the PILE thermostat. 103Electronic structure Electronic structure calculations were all carried out using the CP2K/Quickstep code.A plane-wave cutoff of 1200 Ry was required to converge the PMF (see SI for detailed convergence tests), and a TZ quality basis set was used for each model.Further details of specific electronic structure settings for various levels of theory are given in the SI, including convergence tests for the various electronic structure settings.
Grimme's D3 dispersion correction.S10 The revPBE0 calculations were performed using the auxiliary density matrix method, S11 to reduce the cost of computing the exact exchange component.
Figure S1 shows the convergence of the forces on the four atom types with respect to plane-wave cutoff.A large cutoff is required to converge the forces on Na.Using the Gaussian and Augmented Plane Wave (GAPW) method S12 resolves this issue, with forces converging after 400 Ry.However, the GAPW method is not available for wavefunction methods RPA and MP2.We therefore consider the effect of these stochastic forces on the sodium atoms with respect to the property of interest in this work -the potential of mean force of Na and Cl in water.Figure S2 shows the PMF for 3 machine learning potential (MLP) models at revPBE-D3 level of theory using the GPW method with increasing plane-wave cutoff.This property is well-converged with PW cutoff, with an acceptable error of approximately 0.1 kcal/mol.Additionally, Figure S3 shows that again the target property, the PMF is not affected by using the GPW method over the GAPW method.Therefore in summary, all DFT calculations were performed using the GPW method with 1200 Ry plane-wave cutoff.

Correlated wave-function theory
Random-phase approximation (RPA) and second-order Møller Plesset perturbation theory (MP2) were performed in CP2K S1 to generate both total energies and nuclear gradients.
We used DFT with the PBE functional as the starting point for the RPA correlation energy calculations.The resolution-of-identity (RI) techniques was used for these methods.S13 We used triple-zeta (TZ) quality correlation consistent basis sets for H and O (taken from CP2K) as well as Na and Cl.Auxiliary basis sets for the RI integral operations were generated using the automatic auxiliary basis of Stoychev et al.S14,S15 for Na and Cl, with the defaults (from CP2K) used for H and O.A planewave cutoff of 1200 Ry was again used.We used the GTH-HF pseudopotentials from Goedecker-Teter-Hutter S2 for all of the atoms.Figure S4 shows the force convergence on the atoms with respect to the GPW integral cutoff.A cutoff of 300 Ry was shown to be well converged within 0.00001 meV/ Å.The calculation is also sensitive to the number of quadrature points and so 20 were used, which has an error of < 1 mHartree according to literature.

Automated work flow
The procedure for developing the committee neural network potentials (C-NNP) used in this work was followed as described in Ref S17.Overall 6 models were trained to describe NaCl ions in water at different levels of electronic structure theory (revPBE-D3, optB88-vdW, r 2 SCAN, revPBE0-D3, MP2 and RPA).The training of the individual models was divided into three generations, and is graphically depicted in Figure S5.The first generation comprised a common training set for all models, adapted from previous work.S18 Specifically, only configurations corresponding to ions in solution were used, since this previous model also pre-optimised r 2 SCAN weights (rather than randomly seeded weights as is conventionally done) using the relevant training set from the second generation.The specific benefits of this approach for our work are two-fold, typically requiring less training data and also easing the fitting procedure.S20

Details of Model
The chemical environment around each atom was described using a general set of atomcentered symmetry functions.S21 There are 10 radial and 4 angular functions for each pair and triple of atoms, following Ref.S17.All symmetry functions used a cutoff function of angular cosine form with a cutoff radius of 12 Bohr.The committee was comprised of 8 NNP members, of identical architecture with 2 hidden layers and 25 neurons in each layer.In all cases, random sub-sampling was performed to introduce variability between the committee members, where 10% of the total set of structures were discarded.The weights and biases of the NNPs were optimised using the n2p2 code.S22 Individual models during active learning were optimised for 15 epochs, while the final C-NNP model used in simulations was optimised for 50 epochs.
We have explicitly incorporated long-range effects beyond the cutoff of the symmetry functions (12 Bohr) of the machine learning potential.The predicted energy can in general be written as a sum of short range and long range contributions (E sr and E coul respectively): The long-range model was thus trained on the difference between the standard short-ranged model and the Coulomb contribution, calculated using point charges of +/-1 respectively for Na and Cl and using TIP3P model parameters for water.S23 We used this model in all production simulations, where the Coulomb contributions were explicitly included via particle mesh ewald summation.Details on the validation of the final models are presented in the next Section.

Validation
We validate the ability of the model to reproduce its underlying reference method based on a validation set of 100 configurations.These comprise a scan along the inter-ion separation coordinate up to 6 Å (giving a computationally reasonable box size for RPA and MP2 reference calculations).The force and energy RMSE for each model are shown in Table S1 for each model, and a representative correlation plot for r 2 SCAN in Figure S6.These errors compare favorably with our previous model   The standard Yuh and Hummer correction for finite-box size S25 was added, using the SPC/E viscosity η = 0.66 cP.

Convergence tests
This section describes several tests to ensure our simulation protocol was statistically converged and did not suffer from finite-size effects.Sampling -thermodynamic integration vs RDF To ensure that all regions along the inter-ion separation coordinate were sufficintly sampled, we compare a thermodynamic integration scheme as described in S26 to the RDF approach used in this work.Figure S8 shows the PMF obtained from RDF and thermodynamic integration (TI) for the RPA model, showing the equivalence of the RDF approach for a 1 M system and thermodynamic S11 integration for 1 ion pair within their own statistical errors.

Simulation time
One of the major advantages of machine learning potentials is the much greater timescales accessible during simulations than for standard ab initio methods.Here we compute the PMF using one of the MLPs over a time period of 500 ps, for 2 replicates, a time period at the extreme upper end of that accessible for ab initio simulations (Note that for the hybrid and wavefunction methods even this timescale is unfeasible).In figure S9, we compare this with the ease in which the PMF can be converged over 10 replicates each of over 3 ns with the MLP.

Diffusion Coefficients
The water self-diffusion coefficients were computed as a function of concentration for the MP2 and RPA models.Figure S10 shows the computed diffusion coefficients for MP2 and RPA compared with various values from literature.In all cases the self-diffusion coefficient D has been normalised with respect to the self-diffusion coefficient in bulk water with no ions (D 0 ).single ion in water was also computed in a similar manner, where the interaction energy of the ion of identity X with water E XW is given by: where again E sys is the complete interacting system, E W is the energy of the water with the ion removed and E G is the energy of the gas-phase ion.Table S2 shows the interaction energy of an individual Na + /Cl -ion with water.revPBE-D3 has the strongest Na + -water interaction since it is a GGA, with both delocalisation error and the D3 correction resulting in overbinding.S29 Inclusion of Hartree-Fock exchange in revPBE0-D3 reduces this overbinding.
RPA and MP2 show very similar behaviour with moderate Na + -water interaction but strong Cl --water interaction.Meanwhile most DFT functionals find it hard to agree with RPA or MP2 on the Cl --water interactions.This is a well known problem since it is difficult to localise the extra electron.This issue is worst for GGAs (revPBE-D3), but is improved with meta-GGA (r 2 SCAN) and vdW-inclusive (optB88-vdW) functionals S30 Additional RDFs Figures S13, S14 figuration space to be explored and so the training set must be judiciously chosen to reflect this.To this end we use a previously developed automated active learning framework55 and an initial training set describing NaCl dissolution in water56 to generate MLPs at various levels of electronic structure theory for Na and Cl ions in water.The forthcoming discussion will simply refer to all of these models by the name of the reference method to which they have been trained.We believe this is valid since we have rigorously benchmarked the capability of the MLPs to reproduce their underlying reference method (see Supplementary Information (SI)).We first explore the capabilities of different DFT XC functionals and correlated wave-function methods to accurately describe the structure of ions in water, showing that both RPA and MP2 generally reproduce experimental densities and radial distribution functions for both bulk water and solvated ions.We then use these wavefunction based methods as a benchmark against which to compare the predictions of DFT and classical force-fields for the potential of mean force of a NaCl ion pair in water.We provide simple metrics based on the constituent interactions to rationalise the overall shape of the PMF, while also providing insight into the ability of DFT to capture these interactions.In all cases, the simulations performed are well-converged with respect to statistical sampling and are not significantly impacted by finitesize effects, highlighting a major advantage of the MLP approach over ab initio methods.We finish with an outlook on the potential of these models for describing more complex situations such as confined electrolytes and for computing dynamical properties.In Figure1, we compare with experiment the performance of various DFT XC functionals, along with two wave function based methods MP2 and RPA, in predicting the radial distribution functions (RDFs) of both bulk water and Na and Cl ions in water.The DFT functionals have been chosen to span the various levels of Jacob's Ladder, with each level incorporating increased complexity into the XC functional description.Specifically we have chosen the GGA revPBE-D3,60,61 the van der Waals inclusive optB88-vdW, 62 the meta-GGA r 2 SCAN 63 and the hybrid revPBE0-D3.61Beyond DFT, we consider RPA and MP2, with both classical and quantum nuclei.Inclusion of nuclear quantum effects have been shown to be necessary to obtain accurate agreement with experiment for liquid water structural and dynamical properties, including RDFs, diffusion coefficients and spectroscopy.27,33We first consider the performance of the RPA and MP2 wavefunction-based methods.Overall, out of all methods tested, RPA performs best in describing the structure of both bulk water and Cl and Na ions in water (Figure1(a), (d) and (f) respectively), accurately reproducing experimental RDFs.For bulk water, the first peak height of the O-O RDF with RPA and classical nuclei (b) is overestimated by approximately 16 %.Inclusion of nuclear quantum effects (a) reduces the height of the first peak

Figure 2 (
b) compares the DFT and the wave-function MLP predictions of the density of a 2 M NaCl solution with the experimental value.

4 F
FIG. S6.Correlation plot for r 2 SCAN C-NNP predicted forces and corresponding reference DFT forces, with light grey line showing a perfect correlation coefficient of 1.
Figure S7 shows the PMF of the large 24.82 Å cubic box comprising 6 NaCl ion pairs and 332 waters used in this work and the 12.42 Å cubic box of one ion pair and 62 waters used in validation tests.

2 FIG. S7 .
FIG.S7.Comparison of finite size effects for small and large box sizes comprising a single ion pair and 1 M solution respectively.
FIG.S8.Comparison of thermodynamic integration and RDF approaches for the RPA model.
FIG. S9.Comparison of PMF obtained from MLP simulations for 500 ps vs fully converged MLP simulations of 10 replicates of 3 ns each.

FIG. S12 .
FIG.S11.Na + Cl -ion-ion interaction curve aligned at 3.5 Å for the various electronic structure methods used in this paper.
S18with RMSE values for both forces(37.0

TABLE S1 .
Summary of force and energy RMSE for each ML model.