Polarisation corrections and the hydration

: Classical nonpolarizable water models play a crucial role in computer simulations due to their simplicity and computational e ﬃ ciency. However, the neglect of explicit polarization can jeopardize their accuracy and predictive capabilities, particularly for properties that involve a change in electrostatic environment (e.g., phase changes). In order to mitigate this intrinsic shortcoming, highly simpli ﬁ ed analytical polarization corrections describing the distortion of the molecular dipole are commonly applied in force ﬁ eld development and validation. In this paper, we perform molecular dynamics simulations and thermodynamic integration to show that applying the current state-of-the-art polarization corrections leads to a systematic inability of current nonpolarizable water models to simultaneously predict the experimental enthalpy of vaporization and hydration free energy. We go on to extend existing theories of polarization and combine them with data from recent ab initio molecular dynamics simulations to obtain a better estimate of the real contribution of polarization to phase-change energies and free energies. Our results show that for strongly polar molecules like water, the overall polarization correction is close to zero, resulting from a cancellation of multipole distortion and purely electronic polarization e ﬀ ects. In light of these ﬁ ndings, we suggest that parametrization of classical nonpolarizable models of water should be revisited in an attempt to simultaneously describe phase-change energetics and other thermodynamic and structural properties of the liquid.


INTRODUCTION
Water is ubiquitous in chemical, biological, geological and industrial processes, making it the most extensively studied substance. It is also an extremely complex liquid, often demonstrating anomalous behavior. 1−6 This combination of importance and curiosity has sparked numerous computational studies of liquid water (see, e.g., refs 7−10 and references therein). In particular, the solvation properties of water are of great interest for a vast range of applications. The key thermodynamic property to describe solvation processes in water is the Gibbs free energy of hydration (ΔG Hyd ), and the calculation of ΔG Hyd of organic molecules is a huge area of research in computational chemistry. 11−13 In this paper, we set out to perform a systematic test of several classical nonpolarizable water models for their ability to predict the free energy of hydration of water, which is directly related to, but distinct from, its free energy of vaporization. 14 In doing so, we uncovered what we believe to be a fundamental inconsistency in the way polarization effects are currently handled in the calculation of phase-change energies and free energies. This prompted us to reassess the theoretical foundations of analytical polarization corrections for the case of water hydration, with implications for future model development.
At the highest level of theory, water can be described using ab initio quantum mechanical (QM) methods. 15−24 However, despite continued increase in computational power, representing liquid water at the appropriate level of detail using a fully QM treatment remains extremely challenging. 25 As such, the vast majority of computational studies that involve water, usually as a solvent, rely on the classical approximation. Within classical models of water, two main approaches may be distinguished: polarizable and nonpolarizable models. The latter describe the electrostatic interactions of water through a fixed set of point charges or, more rarely, multipole moments, which cannot respond to changes in the environment surrounding the molecule (i.e., polarization is not explicitly accounted for). Polarizable models, on the contrary, are able to respond on-the-fly to changes in the electrostatic environment by inclusion of polarizable point dipoles, fluctuating charges, or Drude oscillators. 26 This comes at a cost, however, and polarizable models can increase the computational cost by 3 to 10 times over that of similar fixed-charge pairwise additive models. 27 In many applications of interest, such as protein folding, 28−30 electrolyte solutions, 31−33 and, of most relevance to this paper, hydration free energy calculations, 34−36 the majority of the system is occupied by water molecules, and therefore, there is an inherent need to keep water models as computationally simple as possible.
The most popular nonpolarizable models of water make use of the point-charge approximation, differing in the total number of interaction sites and in the values of the parameters. Early models like SPC 37 and TIP3P 38 were composed of three interaction sites, with point charges on the oxygen and hydrogen atoms and a single Lennard-Jones (LJ) site on the oxygen. An alternative approach, initiated with TIP4P, 39 displaced the negative charge along the bisector of the HOH angle creating an additional interaction site ( Figure 1). As such, a generic three-site model has five independent tunable parameters: the charge of the oxygen (q O ; since the molecule has to be neutral, the charge of the hydrogen atoms, q H , is not independent), the well-depth (ε) and van der Waals radius (σ) for the Lennard-Jones potential, the oxygen−hydrogen bond length (L O−H ), and the hydrogen−oxygen−hydrogen bond angle (θ). The extra parameter for four-site models is the distance between the oxygen interaction center and the interaction center for the oxygen charge, the M-site, (L O−M ). Nevertheless, these models still have many similarities; they have fixed point charges, and their bonds and angle are rigid. Other alternatives have been proposed, such as five-site models that place two negative charges near the positions of the oxygen lone-pair electrons. 40 However, the additional level of complexity does not appear to bring a significant improvement in overall performance, 41 so only three-site and four-site models are considered further.
Regardless of the number and arrangement of the interaction sites, water models can be (rather arbitrarily) classified according to their parametrization approach into three categories. "Generation I" models, such as SPC, TIP3P, and TIP4P, were parametrized to match a small number of structural and thermodynamic properties; typically, the radial distribution functions (RDF), bulk density, and enthalpy of vaporization. The fact that they are still widely used today, particularly in biomolecular simulations, 42 attests to their success in balancing computational speed with a rather good overall description of liquid water. Nevertheless, several shortcomings of these models have become apparent over the years and have led to several reparametrization efforts.
A particularly important breakthrough in the context of this paper, which is discussed in more detail later, was proposed by Berendsen et al. 43 They developed a simple expression to implicitly correct for polarization effects in enthalpy of vaporization calculations by approximating the energy cost of distorting the dipole moment of the water molecule from its equilibrium gas-phase value to a larger value more appropriate to the liquid phase. The SPC/E model developed by Berendsen et al. 43 constitutes a reparametrization of the earlier SPC model by taking into account this polarization correction and led to improvements in the description of several properties, like the self-diffusion coefficient. Another issue with earlier models is that they were parametrized without the use of special techniques like Ewald sums to account for longrange electrostatics. It is well known that neglecting long-range electrostatics can lead to significant artifacts, particularly for polar molecules, 44−46 and thus Ewald sums and their derivatives 47 have become standard in modern molecular simulations. Indeed, later tests of Generation I models revealed systematic deviations from experimental data when Ewald sums were employed in the calculations, which motivated researchers to reparametrize these earlier models by running simulations that made use of Ewald sums; examples are TIP4PEw 48 and TIP4P/2005. 49 These models also made use of the polarization expression of Berendsen et al. 43 to correct the enthalpy of vaporization. We will collectively describe "Generation II" models as those that were still parametrized to match a restricted set of data but have taken into account polarization corrections on the enthalpy of vaporization during the parametrization process.
Finally, we will classify as "Generation III" models those more recent efforts that considered a much larger set of data in the parametrization process, including the static dielectric constant, often using advanced parameter fitting techniques. 42,50−52 These models also make use of the Berendsen expression to implicitly account for polarization costs in the enthalpy of vaporization. Table 1 summarizes all the models considered in this work, together with their parameters and classification.
A common feature of all the nonpolarizable models described above is that they possess a fixed dipole moment (μ, cf. Table 1) that is significantly higher than the equilibrium

Model
Generation  43 were the first to propose an analytical expression for the energetic polarization effect when moving a molecule from the gas to the liquid state. More specifically, they accounted for the internal selfpolarization energy, which represents the energy difference between the equilibrium configuration of the molecule in the gas and liquid phases. Using this polarization distortion correction, a molecular dynamics (MD) simulation to calculate the hydration free energy can now be represented as a thermodynamic cycle, as seen in Figure 2. This cycle relates results from simulations to those from experiment. In this case, the real process (experimental, ΔG Hyd ) can be decomposed into two separate steps: (i) While in the gas phase (G), the molecule's electronic distribution is polarized from its original state (represented here by the gas-phase dipole moment, μ G ) to a state that is representative of the bulk liquid (μ L ). (ii) The polarized molecule is then surrounded by other molecules, leading to a full liquid state (L). The first step corresponds to the cost of the intramolecular rearrangement of the molecule (ΔG Dist ), while MD simulations using classical nonpolarizable models yield the free energy due to the interactions with surrounding molecules (ΔG MD ). Berendsen et al. argued that the energy cost of rearranging the geometry of the molecule is dominated by the change in the dipole moment between the two phases. For a force field with fixed charges, this distortion correction takes the form 43 where μ L and μ G are the respective dipoles of the liquid and gaseous states, and α is one-third of the sum of the trace of the dipole−dipole polarizability tensor. Despite being formulated more than 30 years ago, this is still the most commonly used polarization correction for nonpolarizable water models in molecular simulations. It is in itself an approximation since it is only the leading term of a summation over all multipoles, and its use implies the assumption that the leading dipole term will dominate. Swope et al., 67,68 however, have challenged this assumption and shown that, specifically for water, the difference between using only the dipole and an expansion up to quadrupoles is approximately 15% (0.4 kJ/mol in their case). The work of Leontyev and Stuchebrukhov made two important contributions to polarization corrections. 69,70 First, they offered a possible explanation for why the polarization corrections used in the parametrization of water models may be incorrect. 70 They posited that to correctly calculate the distortion correction the actual experimental liquid-phase dipole moment should be used. Generally, in the then absence of reliable estimates for the dipole of the liquid, μ L was estimated from the dipole moment of the model itself, which, as we have argued above, is much lower than the real value. Since the distortion correction scales with the square of the change in the dipole, this discrepancy has a large impact. For example, the SPC/E model has a dipole of 2.35 D, 43 whereas the experimental value is around 2.9 D 55 and the gas phase value is 1.86 D. 54 Using the model value we would obtain a correction of 5.1 kJ/mol; however, application of the experimental value results in a correction of 23 kJ/mol. For comparison, ΔH vap = 44 kJ/mol, 7 so this discrepancy is approximately half the total enthalpy change of the process. Even worse, the free energy of self-solvation (hydration) of water is only −26 kJ/mol. 71 Leontyev and Stuchebrukhov also proposed a second polarization correction, 69 which is of opposite sign to the distortion correction, based on the electronic screening of a solvated molecule. As already mentioned, charges on nonpolarizable water models are usually lower than indicated by experimental and ab initio studies. This is because classical molecular simulations cannot account for the response of the purely electronic degrees of freedom of the solvent. In practice, classical simulations assume a relative permittivity of unity, i.e., equal to a vacuum for calculating electrostatics using Coulomb's law. Under the Born−Oppenheimer approximation, however, it is more realistic to think of molecules (i.e., nuclei) immersed in a bath of electrons. 70 This leads to a higher relative permittivity of the surrounding medium, thus effectively reducing the charges. Leontyev and Stuchebrukhov proposed that this effect can be accounted for by using the high-frequency dielectric constant of water, which describes the purely electronic response of the liquid phase, and thus managed to approximately reconcile the model dipole magnitudes with those of experimental studies through a simple scaling law: 69 where q is the charge, and ε ∞ is the high-frequency dielectric constant, which is commonly expressed as the square of the fluid's refractive index measured at the Sodium D-line frequency. 72 Reducing the magnitude of the charges works for a single environment (e.g., a pure liquid) but is not valid when a change of said environment occurs. The model charges will not alter their magnitude to reflect the new level of screening, and hence, this will not be captured in the difference between two MD simulations (say, in the gas and liquid states). For this reason, Leontyev and Stuchebrukhov suggested a correction to try and quantify this difference analytically. They defined this as the energy of solvating a water molecule (in its liquid-phase configuration) in a purely electronic continuum, i.e., taking the molecule from vacuum (ε 0 ) to a solution of electrons (ε ∞ ). To this end, they applied a simple solvation model to quantify the correction. The approach of Leontyev and Stuchebrukhov can be seen as a thermodynamic cycle in Figure 3. It it similar to that shown in Figure 2 but now includes the extra step of solvating the polarized molecule in the purely electronic continuum (ΔG El ). Their results for water 69 showed that the two correction terms (distortion and electronic) almost canceled each other out, leaving an overall correction of around 5 kJ/mol. This correction turns out to be almost identical to that proposed by Berendsen et al. 43 in the development of SPC/E and close to those of other simple water models. They then theorize that the good overall performance of these water models is due to cancellation of error rather than to a rigorous treatment of polarization corrections.
In this paper, we bring together the various improvements presented in the work of Swope et al. 67,68 and Leontyev and Stuchebrukhov 70 into one unified correction scheme and as a result attempt to answer the following questions: (1) What is the relative magnitude of the individual components of polarization and to which extent do they cancel out? (2) Is it necessary to include higher order multipoles in polarization corrections or is using only the leading dipole term sufficient? (3) How sensitive is the magnitude of the polarization correction to different values of the input variables (e.g., multipole moments, polarizabilities)? In particular, we have extended the method of Leontyev and Stuchebrukhov to calculate the electronic screening correction up to quadrupoles and combined it with the expressions of Swope et al. 67 for the distortion correction (also up to quadrupoles). After a detailed sensitivity study of these expressions, we have evaluated the significance of polarization corrections in the calculation of free energies of hydration and heats of vaporization using Molecular Dynamics (MD) for the 11 water models shown in Table 1. We end the paper with some general conclusions and recommendations for future development of classical nonpolarizable molecular models.

THEORY AND METHODS
2.1. Molecular Dynamics Simulations. Hydration free energies (ΔG MD ) for each of the classical nonpolarizable water models presented in Table 1 were calculated using MD and thermodynamic integration (TI). This followed closely the procedure described previously, 73,74 and the reader is referred to those publications for further technical details. MD simulations were carried out with Gromacs 75,76 using the leapfrog algorithm 77 with a time step of 2 fs. Simulations were performed in the NPT ensemble, with the temperature controlled at 298 K using a Langevin stochastic dynamics thermostat 78 and the pressure controlled at 1 bar using the Parrinello−Rahman barostat. 79 A cutoff of 1.2 nm was used for Lennard-Jones (LJ) interactions, together with long-range dispersion corrections for energy and pressure. Long-range electrostatics were handled with the PME method, 47 also using a cutoff of 1.2 nm for the real part of the summation. Water molecules were kept rigid using the SETTLE algorithm. 80 All simulations were carried out with at least 900 water molecules in cubic boxes with periodic boundaries.
Each system was first equilibrated for 5 ns, from which pure liquid properties were calculated after discarding the first 1 ns. In particular, the total potential energy of the liquid was sampled and used to calculate the enthalpy of vaporization (ΔH Vap ) of water according to In this equation, E Liq is the molar potential energy of the pure liquid, E Vap is the potential energy of the vapor (it is strictly zero for all the models considered here, as they contain no intramolecular contributions to the energy), R is the ideal gas constant, T is the temperature, and C QM are quantum vibrational corrections. The value for C QM (0.29 kJ/mol) at ambient temperature was taken from the work of Horn et al. 48 and was added to both ΔH Vap and ΔG MD . From the sampling period of the pure liquid simulations, we have extracted 50 evenly spaced configurations, which were used as starting points for TI calculations, as described previously. 73,74 These 50 configurations hence served as an ensemble since ensemble simulations have been shown to be more precise and offer more robust uncertainty estimates 81,82 The usual procedure of decoupling solute−solvent interactions using a coupling constant, λ, was applied, with LJ and electrostatic interactions decoupled in two separate stages.  83 with the same parameters as reported previously 73,74 was employed in the LJ decoupling simulations to avoid instabilities in simulations close to λ = 1.0. Each starting configuration was run for 200 ps, with the gradient of the Hamiltonian being computed over the  last 180 ps of each trajectory. This was then averaged over all starting configurations (i.e., over a total of 9 ns for each λ value) and finally integrated over all λ points according to

Journal of Chemical Theory and Computation
It has been shown 84 that integrating the Hamiltonian gradient over the λ variable using the standard trapezoidal rule can lead to systematic errors in the free energy. In this work, we have used cubic splines 85 to obtain a more accurate hydration free energy from the individual simulation points. We calculated the standard error on the Hamiltonian gradient for each λ value by sampling over the 50 individual configurations. This was then propagated for the solvation free energy and used to calculate the 95% confidence interval, which we report as error bars in our data tables (Tables S3 and S5) and plots. All the averaging and integration was carried out using Gromacs utilities and in-house scripts. The MD input and output files used in this work can be found at the University of Strathclyde data repository (DOI: 10.15129/35c601f1-d21e-4cb2-9f40-8410709438e6).
In order to compare vaporization enthalpies and hydration free energies with experimental data, we applied the thermodynamic cycle depicted in Figure 3. The next two sections describe how the two polarization contributions were estimated.
2.2. Multipole Distortion Correction. Swope et al. 67,68 proposed a set of equations that expands the distortion polarization correction up to quadrupoles and in the process tested whether the usual approximation of truncating the series at the leading dipole term is suitable. They based their correction on a representation of the molecular electron density in terms of a multipole expansion. The full derivation can be found in their first paper from 2010, 67 and therefore, only the key concepts will be elucidated here. In Einstein notation, their correction is where V are the relevant derivatives of the external electric potential, whereas α, A, and C are the dipole−dipole, dipole− quadrupole, and quadrupole−quadrupole polarizabilities, respectively. The multipoles are linked to the electric field vectors and polarizabilities by where Θ represents the quadrupole moments as a column vector. These can then be represented in matrix form as Hence, knowing the change of the multipoles upon entering the liquid phase and the polarizability tensor, one can solve for the electric field vectors and then obtain the distortion free energy change of polarization. However, a final step is necessary for the application of this set of equations. Although there are an equal number of equations and unknowns (12), due to symmetry several are dependent. Hence, Swope et al. 67 applied a spherical coordinate system to make the system of equations solvable, reducing the number of both equations and unknowns to eight. The above equations provide an estimate of the distortion energy term. In order to apply them to estimate f ree energies, Swope et al. 68 made the argument that This assumption is accompanied by a caveat, namely, that the "polarisation cost is approximately constant over the range of thermally accessible conformations on either 1) the unpolarised quantum chemical potential surface or 2) the polarised fixed charge potential surface". 68 Swope et al. thoroughly analyzed this assumption in Appendix B of their paper and found, at least for their chosen set of molecules/side chains, that the assumption held well. In the least successful case, acetamide, this was attributed to the high restructuring free energy cost of that particular molecule. Since the focus of our study, water, does not have a complicated geometry, and eq 9 was valid for the most similar molecule in their test set (methanol), we do not foresee any difficulties with this assumption.
Solving the above set of equations therefore requires three inputs: the gas-phase multipoles, the liquid-phase multipoles, and the polarizability tensor. The former can be taken from experiment or high-level QM calculations. Assuming that the polarizabilities are the same in the gas and liquid phases allows one to also obtain the polarizability tensor from accurate ab initio calculations. This leaves the liquid-phase multipoles to be determined, which is a well-known challenge. 16 In their first paper, Swope et al. 67 validated their expressions by comparing distortion energies obtained directly from QM calculations in a dielectric continuum model with the results of their analytical treatment and found generally good agreement. We have tested the precision of our implementation of the distortion correction equations by successfully replicating the calculation for water presented in that paper, starting from the original data, kindly provided by one of the authors. 67 In their subsequent paper, 68 Swope et al. used their methodology to compare solvation free energies from corrected classical MD simulations with experimental data. In so doing, and similarly to Berendsen et al., 43 they used the effective multipole moments of the classical models as an estimate of the liquid phase moments. As discussed above, this can make a significant difference to the magnitude of the positive correction. Hence, we believe that a re-evaluation of this method with more realistic liquid multipole moments is in order.
2.3. Electronic Polarization Correction. The previous two sections describe how we accounted for the free energy contributions due to intermolecular (classical, nonpolarizable) interactions with the surrounding medium (via MD and TI) and due to intramolecular polarization effects (via the distortion correction). The missing term, as postulated by Leontyev and Stuchebrukhov,69,70 is the intermolecular nuclear−electron interactions, i.e., those between the purely electronic continuum and the molecule being solvated. Since

Journal of Chemical Theory and Computation
Article DOI: 10.1021/acs.jctc.8b01115 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX they defined this as the free energy of solvating a water molecule in a solution of electrons, they applied continuum solvation models to quantify this contribution. More precisely, the simplest solvation model, the Born model, 86 was used because it allows for an analytical solution. The Born model provides an expression for the free energy of solvation of an ion in a spherical cavity surrounded by a uniform infinite dielectric continuum. Kirkwood 87,88 later showed that the Born model was the leading term in an expansion over all multipoles where M is any multipole with order l and component m, R is the radius of the cavity, and ε is the static dielectric constant of the continuum. The static dielectric constant describes the response of the fluid to an electric field in the limit of zero frequency (i.e., a static electric field). As such, both nuclear and electronic responses are included in this quantity. However, in our case, we are not interested in the nuclear response of the solvent but rather in the purely electronic response due to the presence of a solute molecule. Since the nuclei are much more massive than the electrons, they are much slower and therefore cannot respond to the field when the frequency is very high. As such, the relevant quantity for our purposes is the infinitefrequency dielectric constant (ε ∞ ), which captures only the response of the electronic degrees of freedom of the solvent. Thus, our equation becomes Leontyev and Stuchebrukhov 69 only included the dipole term in their work, i.e., l = 1, but here we will extend this treatment to include also quadrupole contributions. One problem with the Born/Kirkwood expression is that the term R (cavity radius) is not strictly defined, as well as being assumed to be spherical. The choice of cavity radius is largely arbitrary, with three common choices being (i) the radius of a sphere of volume equal to the molecular volume of the pure liquid, (ii) the molecular length plus the van der Waals radii of the outermost atoms, and (iii) adding a constant term (0.5 Å) to the value obtained from the pure liquid molar volume. 89 Additionally, some authors, such as Luo et al., 89 have attempted to find a unique theoretical determination of the cavity radius. However, this radius is only unique for a given set of multipoles, i.e., a specific quantum calculation. In general, consideration of multipoles of higher magnitudes leads to smaller cavity radii.
One way to introduce a degree of self-consistency is to use a modified version of the Onsager reaction field model, 90 developed by Barker and Watts. 91 Usually, this is used to make the Born model self-consistent by replacing the liquidphase dipole with the gas-phase dipole plus a distortion of this dipole induced by an electric field. However, in our case (similarly to Leontyev and Stuchebrukhov 69 ), we want to use the reaction field to make the cavity radius (R) self-consistent. The two necessary equations are The first merely expresses the dipole in solution (μ)a sa function of the permanent (i.e., gas phase) dipole (μ 0 ) plus a distortion through an electric field (F) multiplied by the isotropic polarizability (α). The second is the reaction field equation. Equating the two fields gives us This way the cavity radius is an analytical function of its environment. For simplicity, this equation only includes the dipole but could potentially be expanded to obtain an estimate of the cavity radius which includes quadrupoles. A final point has to be made here about the choice of dielectric constant in eq 16. In the case of the cavity radius calculation, the static dielectric constant of the fluid is used in order to provide an appropriate estimate of the electric field caused by the surrounding solvent.

RESULTS AND DISCUSSION
Our first objective is to assess the performance of classical nonpolarizable water models in reproducing the free energy of hydration and the enthalpy of vaporization. Clearly, a consistent model should be able to predict both properties to a good degree of accuracy. The results of our MD calculations for all the water models in Table 1 are reported in Tables S2−S5 of the Supporting Information, broken down into their individual LJ and electrostatic components. Figure 4 shows the deviations between MD results and experimental  Table 1). Generation I models are represented by solid circles, Generation II models by open squares, and Generation III models by shaded diamonds. The different colors represent different approaches to polarization corrections: no correction (blue points) and Berendsen's correction (red points).

Journal of Chemical Theory and Computation
Article DOI: 10.1021/acs.jctc.8b01115 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX F data for each of those properties, plotted against each other with horizontal and vertical error bars. In this plot, a model showing perfect agreement with experiment would be located at the origin. To obtain the red points in Figure 4, we have included the distortion correction proposed by Berendsen at al., 43 i.e., eq 1, using the dipole moments of each model as a proxy for the liquid-phase dipole in the calculation of both ΔH Vap and ΔG Hyd . This corresponds to the state-of-the-art in the development of nonpolarizable water models. 42,50−52 It is worth noting that, by convention, ΔH Vap is positive, as it is measured as a change from liquid to vapor (hence, it implies an increase in energy), while ΔG Hyd is negative, going from the vapor to the liquid. Therefore, a model that yields a liquidphase potential energy that is, say, too negative, will overestimate ΔH Vap and likely underestimate ΔG Hyd . For the same reason, when the distortion correction is added to ΔH Vap , it takes a negative sign, while it is positive when added to ΔG Hyd .
As expected when the Berendsen correction is applied (red line), the Generation I models perform rather poorly, significantly overpredicting ΔG Hyd and underpredicting ΔH Vap . This is mainly because although ΔH Vap was a target in the parametrization of those models the distortion correction was not accounted for in this process. Conversely, models of Generations II and III predict ΔH Vap rather well (the exception is TIP4P-Ice, which was not designed to do so). However, all of these models (again with the exception of TIP4P-Ice) systematically overestimate the hydration free energy, albeit not by a very large amount. What is most striking in this analysis, however, is that the performance of all the models falls on a straight line (R 2 = 0.9905) that does not pass through the origin. This suggests that, using the current approach for force field development, no classical nonpolarizable water model will be able to simultaneously predict the enthalpy of vaporization and the free energy of water hydration. In fact, the intercepts of the linear fit shown in Figure 4 tell us that the models that give the best performance for ΔH Vap overestimate ΔG Hyd by about 2 kJ/mol, while a hypothetical model that would be tuned to match ΔG Hyd would overestimate ΔH Vap by about 3 kJ/mol. Although these are not very large values, they are much larger than the uncertainty of the calculations (about ±0.2 kJ/mol). Of course, our analysis is not exhaustive, and it is always possible to argue that a model could be found within the current paradigm that could fit both properties simultaneously. However, we argue that the analysis of Figure 4 is compelling enough to warrant a rethink of the way polarization corrections are currently handled in the parametrization of classical nonpolarizable models.
Further insight is obtained by analyzing the blue points in Figure 4. These were obtained by comparing simulations to experiments without applying any polarization corrections to either property. This corresponds to the paradigm used in earlier model development (prior to 1987) and unsurprisingly leads to good performance of the Generation I models in predictions of ΔH Vap (and correspondingly poor performance by the more recent models). What is perhaps surprising here is that a straight line fit leads to an intercept very close to zero (0.51 kJ/mol). In particular, the SPC model is able to predict both energetic properties rather satisfactorily. Of course, it is now well known that these Generation I models present serious deficiencies in their description of other structural and thermodynamic properties of water (e.g., temperature of maximum density, phase diagrams, diffusion coefficient). 9 This leaves us with a paradoxnewer models are now able to provide a very good description of most water properties 42,49,52 but lead to systematic deviations in the hydration free energy, while older models are able to predict the latter quite well but lead to a generally poor description of liquid water. In this context, it is interesting to note that despite decades of advances in water model development the majority of simulations of biomolecular systems and of solvation free energy calculations still make use of rather outdated Generation I models like SPC or TIP3P. 42 To address this apparent paradox, we delve deeper into the issue of polarization corrections. As discussed in the methodology, application of more rigorous procedures 67−70 requires data on the polarizability tensors of water in the gas phase, as well as estimates of the multipole moments of water in the liquid phase (at least up to the quadrupole). The latter, in particular, are subject to a great degree of uncertainty, due to the well-known difficulty in separating the contributions of each molecule in a QM simulation of a condensed phase. 16 Indeed, there are relatively few QM studies that report both molecular dipoles and quadrupoles for liquid water, and most of these have been collected by Niu et al. in their recent paper. 92 Due to the simple geometry of water, all of its constituent atoms can be placed along one plane, with the hydrogens placed symmetrically across the bisector. Since water has one dipole, this means that only one component of the dipole vector will be nonzero. In our representation, the dipole is oriented along the z-axis. For the quadrupoles (we use the traceless version, in keeping with the work of Swope et al.), this means that only the diagonal components will be nonzero, i.e., three components. However, since the trace (the sum of the diagonal) must be equal to zero, this leaves us with only two independent quadrupolar components, thus, three independent parameters over both dipoles and quadrupoles. The QM data collected by Niu et al. are reproduced in Table 2, together with the experimental values for water in the gas phase. We also include results calculated by applying Wannier functions to individual water molecules in the trajectories of ab initio molecular dynamics (AIMD) simulations of Schienbein and Marx 24 (see SI Section 7 for details of these calculations). The results shown in Table 2 for the liquid multipoles can be split into two groups: hybrid quantum mechanics/molecular mechanics (QM/MM) studies (entries 3−5) and AIMD (entries 2, 6, and 7).
A significant degree of variability can be seen in the multipole moments for the liquid, with dipole moments between 2.43 and 2.95 D. Although there is still some debate in the literature about the magnitude of the dipole moment of liquid water, evidence from a variety of theoretical approaches is converging onto values between 2.8 and 3.0 D. The early AIMD studies of Parrinello and co-workers 16,17 made use of maximally localized Wannier functions to isolate the contributions of each molecule in the liquid, leading to dipole moments in good agreement with estimates obtained from experimental X-ray scattering 55 and from the experimental refractive index using a mean-field approach. 95 Although it was subsequently shown that the nonlocal exchange-correlation functionals used in early AIMD studies lead to an overstructured liquid phase with incorrect density, 25 more recent simulations using hybrid and dispersion-including functionals yield quite similar values for the dipole moment. 23,24,65 Dipole moments around 3.0 D were also obtained from MP2 calculations on large water clusters 56 and from MD simulations using an ab initio dipole moment surface. 96 Given this mounting evidence, we consider that our calculations based on the very recent and highly accurate simulations of Schienbein and Marx 24 represent the best currently available estimates of the multipole moments of liquid water.
Since the full polarizability tensor cannot be obtained directly from experiment, the results of quantum mechanical studies were used. In particular, we have gathered data from three separate studies where relatively high levels of theory were employed. 97−99 For completeness, we have also considered the data set used by Swope et al., 67 kindly provided by one of the authors, even though it made use of a comparatively low level of theory. All the nonzero values of the polarizability matrices used in this work can be found in SI Table S6. Overall, we have tested six different polarizability matrices, calculated using three different computational approaches (DFT, MP2, and Coupled Cluster). Together with the six different sets of multipole moments, this results in 36 individual data sets.
First of all, we considered the effect of the method and basis set size for calculating the polarizability tensor of water. The overall polarization corrections for all multipole sets and polarization tensors are shown in Table 3,w h i l et h e corresponding individual components are shown in Tables S7−S10. From the latter, it is clear that the distortion and electronic corrections largely cancel out, leaving an overall correction of much smaller magnitude than the individual terms. Furthermore, Table 3 shows that the two results calculated with small basis sets (B3LYP † and CCSD(T) ‡) are outliers, which confirms the importance of using a large basis set in the calculation of polarizabilities. 100 This leaves four sets, all calculated with the same high-level basis set (d-aug-cc-pVQZ). These remaining sets show a converging trend in the value of the overall correction as the level of theory increases. The difference between the highest and lowest values for the overall correction varies between 0.7 and 1.5 kJ/mol, although this would be significantly smaller (0.2 to 0.6 kJ/mol) were we to discard CCSD* as an outlier. In any case, it can be seen that the error introduced from the polarization tensor is fairly small, provided an adequate level of theory is employed. From this point on, we only consider that of Loboda et al. 99 (CCSD(T) ◊), since it is calculated at the highest level of theory with the largest basis set.
Next, we look at the breakdown of the components of the overall correction for all our multipole sets in Table 4. What is immediately clear is that the multipole set is a much larger source of uncertainty than the polarization matrix, with the difference between the largest and smallest corrections being 5.8 kJ/mol. Additionally, the first four entries in Table 4 are quite different to the last two, which is unsurprising, since the last two entries have dipoles (2.88 and 2.95 D, respectively) that are much closer to the experimental value (2.9 D). 55 Furthermore, it can be seen that the overall correction is a near cancellation of the two individual contributions, especially for the last two entries. Another aspect worth mentioning is that the last two sets of multipoles, those closest to the experimental estimate, yield values for the cavity radius in the Onsager model that are quite close to the value estimated from the experimental liquid density, 1.554 Å. This affords an additional degree of consistency to our approach for estimating the polarization correction.  Although both effects cancel out to a significant extent, for almost all cases, the negative contribution is larger in magnitude, leading to an overall negative correction. As described above, the magnitude of the correction tends generally to decrease as the multipole moments of the liquid phase increase. However, because there are three variables that can change independently between different rows (one dipole and two quadrupoles), it is not easy to discern clear trends from Table 4. In order to better elucidate the interplay between the two competing correction factors, we have simplified the calculation by assuming a relationship between the two quadrupole moments and the dipole moment ( Figures  S1 and S2). This allows us to obtain an approximate estimate of the polarization corrections using only the dipole moment as a free variable. The results using the polarizability matrix of Loboda 99 are plotted in Figures 5 and 6. It is important to emphasize that this procedure was carried out merely for mathematical convenience, and we are not suggesting that any physical dependence of the quadrupole magnitudes on the value of the dipole moment actually exists. Figure 5 shows how the magnitude of both corrections depend on the multipole moments of the liquid. The positive correction is equal to zero at a dipole magnitude equal to 1.86 (the start of the x-axis in this case) since this is equal to the gas phase dipole; i.e., no polarization has yet occurred so the multipoles are not yet distorted. The negative contribution is also zero since there are no other molecules to interact with the gas-phase molecule and produce a reaction field. This corresponds physically to a cavity of infinite size (eq 16), which would cause the continuum correction to tend to zero. One interesting result is the point of intersection between the positive and negative corrections, i.e., where the overall correction is exactly zero. In our case, the crossover is at 3.07 D, which is slightly higher than the experimental value of 2.9 D. 55 Again, in qualitative agreement with Leontyev and Stuchebrukhov, 69 this strongly suggests that if realistic liquid multipole moments are used the overall correction will be relatively small in magnitude.
In Figure 6, the overall correction is broken down into the individual multipole components (dipoles (μ) and quadrupoles (Θ)), using the polarizability matrix of Loboda. 99 The decomposition is as follows: If only the dipole contributions are used, the correction becomes positive for dipoles larger than 2.73 D. However, the quadrupole contribution is strongly negative up to high values of the dipole moment (intercept at 3.28 D), shifting the intercept of the overall correction to 3.07 D. This implies that, at dipoles close to the experimental value of 2.9 D, the change in magnitude of the dipole hinders solvation, whereas the change in the quadrupole promotes it. It also explains why our overall polarization correction is small and negative, while Leontyev and Stuchebrukhov, 69 who considered only dipole contributions, obtained a small positive correction in their analysis.
Up until now, we have worked only in terms of the average values of the liquid-phase multipole moments. In reality, a range of fluctuating geometries will be present in the liquid, leading to a distribution of multipole magnitudes. Again, the AIMD trajectories of Schienbein and Marx 24 allow us to calculate the distributions of dipoles and both quadrupole moments in liquid water (see SI Section 7 for details). The analysis scripts can also be found at the University of Strathclyde data repository (DOI: 10.15129/35c601f1-d21e-4cb2-9f40-8410709438e6). The results are shown in Figure 7, where it can be seen that the distribution is nearly symmetric for all multipoles.
In order to test whether using the average multipoles is a good proxy for the whole multipole distributions, the polarization correction was calculated for all 12,928 individual water molecules in the AIMD trajectory and then averaged. The aggregated results of the distribution analysis are shown in Table 5, along with the equivalent calculation using the average multipoles. For each case, we considered two different ways of calculating the cavity radius: (i) applying eq 16 to each individual set of multipoles and (ii) using a fixed value obtained from the experimental liquid density (i.e., 1.554 Å).   Table 5 show that using the average multipoles as a proxy for the full distribution produces results which are more negative by about 5 kJ/mol, which is a significant discrepancy. Since the full distribution is a better representation of the physical reality of liquid water, we believe that these results are a better estimate of the necessary polarization correction. Additionally, we can see that using the fixed rather than the individually calculated cavity radius produces results that are more negative by about 2 kJ/mol. This is due to the generally smaller cavity radius in the fixed version, increasing the negative correction.
From Table 5, it can also be seen that although the dipole− dipole interactions for the multipole distortion component are indeed the largest contribution, they are by no means dominant; they are generally a little more than twice as large as the quadrupole component. Nevertheless, the sum of the dipole−quadrupole and quadrupole−quadrupole interactions (i.e., the missing terms in a dipole-only treatment) is in almost all cases larger than the overall corrections, i.e., extremely important due to the delicate balance between the positive and negative corrections. For the electronic correction, this effect is even more pronounced; the quadrupole contributions are equal or larger than the dipole ones. Additionally, using only dipole terms, the overall correction in every case would be more positive, whereas including quadrupoles makes the overall corrections more negative. This questions the conventional wisdom that accounting only for dipole−dipole interactions is sufficient. Indeed, it also begs the question whether excluding multipole moments above quadrupoles (as in our case) is also a reasonable assumption.
Finally, in Table 6, we compare the results for the overall polarization correction when different approximations are used: (i) Only dipoles are considered, and the distortion correction is estimated using eq 1 (row 1). (ii) The electronic solvation correction as per Leontyev and Stuchebrukhov 69,70 is added, but only using dipoles (rows 2 and 3). (iii) Finally, the full treatment, up to quadrupole moments, is employed (rows 4 and 5, respectively). In all cases, the whole multipole distribution of Schienbein and Marx 24 was used, and the final values are the average of the correction over all molecules and frames. We can see that using up to quadrupoles with both terms results in a lower correction, regardless of the method chosen to calculate the cavity radius. In both cases, it can also be seen that using the fixed cavity radius, calculated from the liquid density, produces a slightly more negative result; the dipole-only correction is reduced from 1.6 to 1.0 kJ/mol, and the dipole and quadrupole correction from 0.8 to −1.4.
Having analyzed in detail the effect of each variable on the calculation of polarization contributions, we are now in a position to identify our best estimate for the overall polarization correction and revisit the MD results shown in Figure 4. As discussed above, our best estimate is obtained using the polarizability tensor at the highest level of theory (Loboda's 99 ) and the multipole distribution of Schienbein and Marx. 24 As can be seen from Table 6, our final value could be either 0.8 or −1.4 kJ/mol, depending on which method for defining the cavity radius is chosen. Since the cavity radius is not a strictly defined property, it is hard to judge which   method is more "correct", and as such, we shall consider both in our final analysis. We applied each of these corrections to the enthalpy of vaporization and hydration free energy results calculated from MD simulations for the 11 nonpolarizable water models, and we obtain the points shown in Figure S5. Fitting a linear trend through each of these data sets, we obtained the dashed and dotted lines shown in Figure 8. The straight line fits yield slightly positive intercepts, 0.13 kJ/mol for the fixed cavity radius and 0.74 kJ/mol for the variable cavity radius. The trend lines obtained by applying the original Berendsen correction (intercept = 2.0 kJ/mol) and applying no polarization correction (intercept = 0.52 kJ/mol) are also shown in Figure 8 for comparison. The difference between the two best estimates of the polarization correction gives an idea of the uncertainty of our procedure. Taking this into account, it is clear that the polarization correction for water should be close to zero, corresponding to almost complete cancellation of the distortion and electronic contributions. This leads to an intercept close to the origin in Figure 8. In contrast, as discussed previously, applying the Berendsen correction with model dipoles, as done in most water model parametrization efforts, leads to a systematic deviation between enthalpy of vaporization and hydration free energy predictions. The implication of our analysis is that it should be possible, at least in principle, to design a new water model that is able to simultaneously predict both these properties, while at the same time yielding an accurate description of other liquid properties.

SUMMARY AND CONCLUSIONS
In this work, an analytical method for calculating polarization corrections for MD simulations has been presented. This is based on two competing corrections to describe the electronic changes that molecules undergo when leaving the gas phase and entering the liquid phase. These are, namely, the cost (always positive for hydration) of distorting the electron density from one appropriate for the gas phase to that present in the condensed liquid phase and the energetic advantage (always negative for hydration) arising from the interaction between the molecule and the surrounding polarized electron clouds of the other molecules in the fluid. Our example here is based on water due to the large amount of data available for this molecule. We confirm that the overall polarization correction is a near cancellation of the above two fairly large competing factors, which explains why early water models have been able to achieve good performance while neglecting them.
In the present work, multipoles up to quadrupoles have been used, whereas usually only dipoles are applied. It has been shown that the contribution of quadrupoles is not negligible in comparison to that of the water dipole, ranging from 50% to 100% of the value of the dipole correction for both positive and negative corrections. We have also shown that the impact of the polarizability tensor on the value of the polarization correction is relatively small, provided a high enough level of theory is used in the underlying QM calculations. Additionally, we have calculated the correction for a distribution of liquidphase water molecules (based on the AIMD data of Schienbein and Marx 24 ); in this way, we have shown that using the average multipoles underestimates the value of the correction, making it more negative. In general, the near cancellation of the two contributions described above means that any errors in the calculation of either component will be magnified in the final result. Taking all these factors into account, we obtain an overall polarization correction that is close to zero, with an estimated uncertainty of around ±1 kJ/mol.
Although our treatment represents a clear advance over existing approaches, it still relies on several assumptions, the validity of which requires future testing. For instance, we have assumed that the gas-phase polarizability tensor can be applied also in the liquid phase. Although this is a commonly used approach, it is unclear to what extent the polarizabilities will change due to the presence of surrounding molecules. Furthermore, we have truncated our multipole expansion at the quadrupoles and have shown that the contribution of the latter is quite significant. Similarly, it is possible that even higher order multipoles will have a non-negligible contribution to polarization. Finally, we have employed a very simple continuum model to calculate the electronic polarization contribution, motivated by the need for an analytical expression. This is likely a reasonable approximation for water, as it is a small and nearly spherical molecule. However, much more accurate continuum methods exist, 101 and their application to the calculation of the electronic polarization term should be the subject of future research.
We applied our theory to the results from MD simulations of 11 different nonpolarizable water models. Our results showed that the usual Berendsen correction actually makes it more difficult to parametrize a model that accurately predicts both the heat of vaporization and the hydration free energy of water. In contrast, a value of the correction close to zero, as determined using our approach, eliminates this systematic deviation, even when uncertainty is taken into account. This suggests that, at least in principle, it should be possible to parametrize a water model that can provide an accurate description of the liquid phase, while predicting both energetic properties accurately. Confirming this assertion is left for future work.
The main drawback of our procedure is that it relies on rather complex QM calculations; high levels of theory are required to obtain accurate gas-phase polarizability tensors, while obtaining a distribution of multipole moments in the  Table 1 using different values of the polarization correction: Berendsen's original correction using model dipoles and our corrections (using either a fixed or a variable cavity radius) obtained from the multipole distributions calculated from the work of Schienbein and Marx. 24 Journal of Chemical Theory and Computation Article DOI: 10.1021/acs.jctc.8b01115 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX K liquid phase is highly nontrivial. Although accurate estimates of both quantities were available for water, this is very much an exception, arising from the practical relevance of this molecule. Extending the approach proposed here to other systems therefore remains rather challenging, particularly in the context of force field development. However, our results suggest that for strongly polar molecules, the overall polarization correction should be close to zero. As such, we propose that a reasonable approximation would be to parametrize such models by neglecting energetic polarization corrections altogether. This is certainly better than including a positive correction based on the current state-of-the-art, which, as shown, leads to systematic deviations in phase-change energies. For less polar molecules, however, the electronic component will dominate over a much smaller distortion, leading to an overall negative correction. 70 In this case, a simple approach for estimating polarization corrections needs to be applied. We plan to assess the impact of these corrections for nonpolar molecules in the near future.