Effect of the QM Size, Basis Set, and Polarization on QM/MM Interaction Energy Decomposition Analysis

Herein, an Energy Decomposition Analysis (EDA) scheme extended to the framework of QM/MM calculations in the context of electrostatic embeddings (QM/MM-EDA) including atomic charges and dipoles is applied to assess the effect of the QM region size on the convergence of the different interaction energy components, namely, electrostatic, Pauli, and polarization, for cationic, anionic, and neutral systems interacting with a strong polar environment (water). Significant improvements are found when the bulk solvent environment is described by a MM potential in the EDA scheme as compared to pure QM calculations that neglect bulk solvation. The predominant electrostatic interaction requires sizable QM regions. The results reported here show that it is necessary to include a surprisingly large number of water molecules in the QM region to obtain converged values for this energy term, contrary to most cluster models often employed in the literature. Both the improvement of the QM wave function by means of a larger basis set and the introduction of polarization into the MM region through a polarizable force field do not translate to a faster convergence with the QM region size, but they lead to better results for the different interaction energy components. The results obtained in this work provide insight into the effect of each energy component on the convergence of the solute–solvent interaction energy with the QM region size. This information can be used to improve the MM FFs and embedding schemes employed in QM/MM calculations of solvated systems.


■ INTRODUCTION
Simulating polar environments accurately is challenging at the quantum level but crucial in order to reproduce experimental data. For instance, many biochemical processes take place under strong polar conditions, where not only the closest surrounding solvent molecules take part actively but more distant molecules also influence the process by creating a nonnegligible electrostatic potential that changes the intra-and intermolecular interactions established. 1−5 The long-range character of these electrostatic interactions requires the consideration of a large number of atoms in the simulations, including a significant part of the environment. Thus, application of pure quantum mechanical (QM) methods is not feasible and these must be replaced by hybrid quantum mechanics/molecular mechanics (QM/MM) methods. 6−8 Pure molecular mechanics (MM) methods can also be employed to deal with systems including thousands of atoms, but the price to pay in terms of accuracy is sometimes too high, especially when dealing with processes where electrons have to be explicitly described. Particularly delicate are the studies of chemical transformations that involve, for instance, bond breaking or formation, photochemical activation, or strongly correlated systems. In these situations, a quantum mechanical treatment is strongly recommended or even mandatory. More accurate approaches employed to take into account environmental effects are hybrid QM/QM techniques, which combine two levels of QM theory, such as the polarizable density embedding 9 and frozen density embedding. 10 However, they are not extensively used in the simulation of large systems because of their computational cost. When the environment to be modeled is isotropic, e.g., solvents, continuum approaches are a very good alternative because of their low computational cost and ability to describe polarizing effects. 11 However, in many cases, explicit interactions, such as hydrogen bonding, are not properly described.
The fundamental idea behind QM/MM methods is the partition of the full system into two regions: a small one, the target, and a larger one, the surroundings, which are described at QM and MM levels, respectively. Obviously, the QM region includes the most important part of the system, where the process takes place, whereas the MM region contains the remaining atoms. Therefore, the full system Hamiltonian gets separated into three parts: QM, MM and QM-MM ones, the latter accounting for the interactions established at the interface between both regions. It is for this last term in particular, where the different schemes developed for the implementation of QM/MM methods differ. The QM/MM approaches 12 may be grouped within either the additive scheme, 13,14 where a term describing the interaction between the QM and MM regions is explicitly constructed, or the subtractive scheme, 15,16 where the explicit construction of a QM-MM interacting term is avoided. The most popular subtractive schemes are the integrated molecular orbital molecular mechanics (IMOMM) 15 method and its subsequent improvements, all of them belonging to the family of "our own n-layered integrated molecular orbital and molecular mechanics" (ONIOM) methods 17 by Morokuma et al. 18 The original formulation of the subtractive scheme computed the interaction between the different layers only at classical level by using a classical force field (FF). However, later formulations of the subtractive scheme allow a more accurate description, as will be explained below.
Even though the subtractive scheme is computationally simpler, the additive scheme is often preferred due to its higher flexibility. In the additive schemes, the interaction between the QM and the MM regions may be treated at four different levels: using mechanical, 15,16 electrostatic, 13,19−21 polarizable, 22−24 or flexible boundary 25,26 embedding schemes. The mechanical embedding treats the interface interactions at the MM level, which does not allow including the polarization of the wave function of the QM region by the MM charges, a factor extremely important for systems subjected to strong polar environments. The electrostatic embedding is by far the most employed one due to its good compromise between accuracy and computational simplicity. In this approach, the polarization of the QM region by the electrostatic potential created by the MM atomic charges is accounted for. In the polarizable embedding, mutual polarization between the QM and MM regions is taken into account, self-consistently or not. In addition to the electrostatic interaction with the MM atomic charges, the interaction between the induced dipole moments of the MM region and the electric field created by the QM region is introduced in the Hamiltonian. For this reason, a polarizable FF must be employed to describe the classical region. It must be recalled that the MM atomic charges in electrostatic FFs are fixed and different than those of polarizable FFs since they have been parametrized without atomic dipoles. For obvious reasons, polarizable embedding schemes are more computationally demanding than the electrostatic ones.
Finally, flexible boundary embedding schemes are even more sophisticated approaches based on the principle of electronic chemical potential equalization, allowing both partial charge transfer and self-consistent polarization between the QM and MM regions. 25,26 Although these schemes improve the accuracy, they also increase the computational cost significantly. The higher computational cost of polarized embedding schemes, combined or not with the flexible boundary approach, is the main reason the electrostatic schemes are still preferred for most dynamic and static simulations.
As mentioned above, original subtractive ONIOM methods are based on the mechanical embedding scheme (ONIOM-ME), and thus, its main drawback in QM/MM calculations is the lack of polarization effects of the QM wave function by the MM region. This can be circumvented by adding the MM electrostatic potential in the QM calculation, which, in the context of ONIOM methods, is known as electronic embedding (ONIOM-EE). To avoid the computation of the electrostatic interactions between layers twice, this term is also computed classically and subtracted from the total energy. The accuracy improvement introduced by the ONIOM-EE method obviously entails a rise in computational cost with respect to ONIOM-ME.
Either for additive or subtractive schemes a crucial step in the setup of QM/MM calculations involving strong polar environments is the selection of the QM region size. If the target system involves small or medium size solutes (formed by a few tens of atoms) surrounded by solvent molecules, the QM region can be enlarged by just transferring more solvent molecules from the MM region. This way, nonbonded interactions are included in the QM-MM interface, thus avoiding one of the main problems of the QM/MM methods, namely, the truncation of the QM region through covalent bonds and the subsequent, often required, redistribution of charges within the atoms of the interface. Then, the crucial question here is how many solvent molecules must be incorporated into the QM region to accurately describe the intermolecular solute−solvent interactions? This work addresses this question from a rigorous point of view.
Previous works have already dealt with the problem of the QM size for the study of different properties, such as absorption spectra, 27−30 excitation energies, 31 charge transfer processes, 32 NMR shieldings, 33 proton transfer, 34,35 enzyme catalysis, 36−39 and interaction energies. 40 In the particular case of activation energies and/or reaction energies, a strong dependence of the optimum QM region size with the QM level and MM embedding scheme employed has been found. Thus, a relatively small number of atoms (lees than 100) is required when semiempirical methods are employed, 35,36 whereas a much larger number of atoms in the QM region is required when DFT methods are applied, 37−39 even more than 1000. 34 This slow convergence of reaction energies with the QM region size in enzymatic reactions has been related to the charge transfer between the active site and the surrounding protein, which cannot be addressed by improving the MM force fields. A faster convergence is obtained when going from mechanical to electrostatic embedding 31 and with the use of flexible boundary embedding schemes. 25,26 However, the problem of the QM region size convergence was never approached for the study of long and short-range solute−solvent interaction energies, at least using a rigorous QM energy decomposition analysis (EDA) reformulated under the action of a MM solvent environment. This will be referred to as QM/MM-EDA. The EDA scheme employed here was developed and implemented some years ago at the QM level. 41,42 Its extension to hybrid QM/MM methods was recently presented and applied to the study of lipid membrane permeability. 43 Herein, this QM/MM-EDA scheme is employed to investigate the QM size required to accurately describe long-and short-range interactions between an ionic solute and a highly polar solvent. As paradigm of strong polar solvents, we have chosen water, whereas cationic, anionic, and zwitterionic solutes have been chosen in order to account for the three different types of ionic molecules. A previous study of the effect of the QM region size on intermolecular interaction energies calculated with QM/MM methods was carried by Fox et al. 40 Although focused on the same issue, two important differences between the present work and that performed by Fox et al. should be remarked. The first one is that Fox et al. compared, for different sizes of the QM region, the interaction energy between fragments defined within the QM region but polarized by an MM electrostatic embedding, whereas in our work one fragment (the solute) is defined within the QM region but the other fragment (the solvent) contains all the solvent molecules included in the QM and MM regions. Thus, the interactions of the solute with the solvent molecules included in the MM regions are also included in the interaction energy. The second important difference is that Fox et al. only focused on the total interaction energy, whereas the different interaction energy terms (electrostatic, Pauli repulsion, induction and dispersion) are also analyzed in our study through the QM/MM-EDA scheme mentioned above. Additionally, there have been studies where an EDA scheme has been applied to the study of interactions between QM and MM regions and where the influence of the QM or MM region size on these interactions has been assessed. 44−46 The important difference of these works with respect to our work is that the fragments considered always belonged to one of the regions (QM or MM), whereas we consider QM/MM hybrid fragments in our QM/MM-EDA scheme.
It must be remarked that understanding the role of each energy term on the convergence of the interaction energy with the QM region size in QM/MM calculations will help to correct deficiencies in the MM FFs and embedding schemes. It has been found very recently that, in the permeation of chlorinated dioxins through lipid membranes, the dispersion energy is the term with the slowest convergence and not the electrostatic energy as could be expected in general. 47

■ THEORETICAL BACKGROUND: QM/MM-EDA
In this section, the EDA employed in this work is first introduced at QM level, and then the particular case of a hybrid QM/MM level is addressed. The equations at QM level were previously introduced elsewhere. 41,42 The readers are referred to these works to extend their knowledge on this perturbational EDA scheme based on electron deformation densities.
For an interacting quantum system formed by two closedshell fragments A and B, the interaction energy is defined as the difference between the AB complex energy and the energies of the isolated fragments at the complex geometry, as indicated in eq 1 by the superscript AB.
Due to the use of finite basis sets, the fragment energies in eq 1 are also calculated with the basis set of the complex in order to correct the basis set superposition error following the counterpoise method. 48 The AB complex energy may be written in terms of the one-electron density, the one-electron density matrix and the exchange-correlation density, In eq 2, ρ(r) and ρ(r, r′) r′=r represent the one-electron density and density matrix, respectively, ρ XC (r 1 , r 2 ) represents the exchange-correlation density, ν̂N represents the nuclei potential operator, and r and R represent the electron and nuclei positions, respectively. Operators, densities and density matrices may be decomposed into terms from a hypothetical noninteracting system (unperturbed fragments) and the interacting terms (eqs 3−6). Thus, the nucleus−nucleus repulsion energy term and the nuclei potential operator are given by, where the contributions from the Pauli repulsion and the electron polarization, Δρ Pau (r) and Δρ pol (r), have been included separately in eq 5 as well as the interfragment exchange and polarization, ρ X,AB (r 1 , r 2 ) and Δρ XC (r 1 , r 2 ), in eq 6. The first two terms in eqs 5 and 6 correspond, respectively, to the one-electron and exchange-correlation densities of the unperturbed fragments. An equivalent partition to that of eq 5 applies to the one-electron density matrix. Unperturbed fragment potentials, vÂ and vB, can be also defined by merging the nuclei potential and the electron potential operators, Thus, introducing eqs 3−8 into eq 2, removing the energies corresponding to the unperturbed fragment energies, and grouping those with the same physical origin, the interaction energy gets decomposed into electrostatic (elec), exchange (exch), repulsion (rep), and polarization (pol) energies, Even though repulsion is a functional of Δρ Pau and exchange is a functional of ρ X,AB , both arise from the same fundamental quantum mechanical principle, the Pauli exclusion principle. Thus, these energy terms arise from the Fermi correlation between same-spin electrons of different unperturbed fragments, so that they are generally merged into one term called Pauli energy, E Pau . On the contrary, polarization includes contributions from induction and dispersion forces, and can be separated exactly into induction and dispersion energies at second-order perturbation theory (PT) level. 49 Therefore, combining our EDA scheme with second-order PT, the induction energy can be extracted from the rest of polarization. 42 In the Rayleigh−Schrodinger PT (RSPT), the induction energy 49 and the first-order correction to the electron density, 42 the one required for the calculation of the second-order energy, are given by eqs 14 and 15, where the notation ρ A n0 and ρ B m0 has been introduced to denote induced transition densities within fragments A and B from ground state configuration 0 to excited state configurations m or n. In the equations above, denominators include the energies corresponding to these transitions.
As explicitly shown in eq 15, the first-order correction to the electron density is split into contributions from fragments A and B. This electron density correction represents the polarization density at second-order, so that substituting eq 15 in eq 13 the induction energy may be extracted. However, it is more straightforward to define the charge-induction energy, E ch-ind , as the sum of the second and third terms of eq 13 and work with it from now on, To extract the induction energy (eq 14) from the chargeinduction energy (eq 16), three straightforward steps must followed. First, introduce eq 15 into eq 16, Second, identify E ind (eq 14) as one-half of the first two terms of eq 17 and reorder the terms, and third, replace back the definition of first-order density (eq 15) to obtain the final expression for E ind , which corresponds to one-half of the charge-induction energy minus the intrafragment energies. Equation 19 may be rewritten as, which reproduces the classical result where half of the induction energy is wasted in the induction process. Once induction is obtained, the rest of polarization corresponds exclusively to dispersion at second-order PT. However, if the intermolecular interaction is large, then second-order PT is no longer a good approximation, and higher order corrections are required. These higher order corrections introduce more induction and dispersion but also mixed terms, so that the partition of polarization into pure induction and dispersion contributions is not reliable.
In the QM/MM-EDA scheme employed in this work, the potential created by the MM region in the QM region is introduced in the calculation of the total interaction energy and energy components. Notice that the center of the intermolecular interaction must be described at QM level, and so, the fragments must be partially, or even completely, described quantum mechanically. For instance, in the present study the solute molecule is fully described at QM level whereas the solvent is partially described at QM and MM levels. In QM/MM-EDA calculations with an electrostatic embedding, 43 the MM potential at the QM region, V̂F F (r), includes only the potential created by fixed atomic charges. The form of this potential is given by, (21) where q i and R i represent the atomic charges and their positions and NA MM denotes the number of atoms in the MM region. In our QM/MM-EDA scheme, V̂F F (r) is merged with the nuclei potential of the corresponding fragment so that it is included in the calculation of the electrostatic, repulsion and polarization energies (eqs 10, 12, and 13). In the present study, where the MM region is fully included within the solvent (fragment B), only the nuclei potential of this fragment is replaced by the QM/MM potential.
It must be noticed that the MM atomic charges also interact electrostatically with the nuclei of the solute molecule. Besides the replacement of ν̂N In addition to the polarization of the QM region by the MM region, which was accounted for by QM/MM energy calculations performed in previous works, this QM/MM-EDA scheme also introduces explicitly the MM potential in the EDA step, providing a correct interaction energy decomposition, which is not obtained otherwise.
On the other hand, in an electrostatic embedding including induced atomic dipole moments (polarizable FF) V̂F F (r) incorporates the potential created by the induced dipoles in the QM region. The form of the potential is now given by, where μ i represents the atomic dipole. Herein, each dipole is replaced by a pair of opposite induced charges, δ i + and δ i − , located on the atom, separated by a distance d i and aligned with the dipole vector. eq 24 is then rewritten as, |d i | must be much shorter than the shortest distance between a QM and a MM atom, so that the values of the MM potential within the QM region do not change when going from eq 24 to eq 25. With this very small value of d i , the induced charges have to be large in order to reproduce the values of the induced dipole moments. These large induced charges yield a huge total electrostatic energy at the MM region due to the intraatomic electrostatic interactions between the positive and the negative charge of each pair. This interaction is introduced by the model but has no physical sense. However, neither the interaction energy between the solute and the solvent nor the later EDA is affected by the model of induced charges as only the MM electrostatic potential at the solute region is taken into account for these calculations.

■ COMPUTATIONAL DETAILS
The systems here studied consist on three ionic solutes surrounded by a water environment; the ammonium cation, zwitterionic glycine and the formate anion. As we intend to investigate the full dynamic process instead of a static situation, previous to the QM/MM calculations, molecular dynamics (MD) simulations were carried out to generate an ensemble of geometries to perform a subsequent statistical analysis of the different interaction energy terms. For the MD simulations the solvent molecules were placed on a truncated octahedron with a distance of 22 Å separating any molecule from the edge of the simulation box. MD simulations were run with AMBER20, 50 modeling solutes with the GAFF(GAFF2), 51 and solvent molecules with the TIP3P 52 and POL3 53 water models as implemented in the PMEMD.CUDA 54,55 and SANDER modules, respectively. Replacing the TIP3P electrostatic FF by the POL3 polarizable FF allowed to evaluate the effect of introducing atomic dipoles in the MM region on the optimal QM region size. A minimization of 2000 steps switching from steepest descent to conjugated gradient algorithms after 10 steps was followed by heating to room temperature during a 0.5 ns thermalization step in the NVT ensemble employing a Langevin thermostat with a collision frequency of 1 ps −1 . A classical MD simulation lasting 10 ns with a time step of 2 fs in the isothermal−isobaric (NPT) ensemble was then performed to obtain the set of geometry snapshots, each of them saved every 10 ps. The Berendsen barostat 56 was used at pressure 1 bar and a relaxation time of 2 ps. Periodic boundary conditions were used during the simulation. The charge neutrality required by the Ewald particle summation 57,58 was achieved by including Na + and Cl − as counterions 59 for the formate anion and the ammonium cation, respectively. The cutoff for both Lennard-Jones potential and the real-space Ewald particle summation was set to 12 Å. Bond distance involving hydrogen atoms were restrained by the SHAKE algorithm. 60 In order to characterize the optimal QM region size for QM/MM calculations, a single snapshot was randomly selected from the last 5 ns of the MD simulation. Then, the QM/MM calculations were performed with Gaussian16, 61 using the M062X 62 functional and the cc-pVDZ basis set, later increasing the basis set to a cc-pVTZ for the whole system. 63−67 The QM region was defined with the solute at its center and the number of closest water molecules was increased from 10 to 240, with the remaining solvent molecules included as point charges assigned by the TIP3P model. The MoBioTools toolkit 68 (https://github.com/ mobiochem/MoBioTools) was used for the definition of each region. To perform the EDA, three single point QM/MM calculations were carried out for each system without periodic boundary conditions but including the whole environment in the MM layer, corresponding to the full system (QM region and MM point charges) and the fragments on which to compute the interaction energy. Fragment 1 consisted of the QM solute in the presence of the basis set of the QM water molecules, whereas fragment 2 included all QM water molecules as well as the basis set from the solute surrounded by the MM electrostatic embedding. Basis functions of absent atoms are aimed at accounting for the basis set superposition error (BSSE) by means of the counterpoise correction (CPC). 69,70 Finally, the EDA-NCI program (https://github. com/marcos-mandado/EDA-NCI) was used to perform the energy decomposition analysis. The full theoretical background of the QM/MM-EDA scheme applied in this work 41,42 was included in the previous section. To introduce vibrational sampling and compare the energy components obtained with  both the TIP3P and POL3 FFs, 100 snapshots were selected from the last 5 ns of the MD simulation each 50 ps so that they were not correlated, and thus, a broad region of the configurational space is covered by the sampling protocol. The induced atomic dipoles of the MM region are directly taken from the MD simulation, as are the fixed atomic charges. It is important to highlight that, within this electrostatic QM/ MM embedding scheme including induced atomic dipoles, the mutual polarization between the solvent and the solute is computed at classical level during the MD simulation, but then, the induced dipoles of the solvent are fixed in the subsequent QM/MM computations of the interaction energy.

■ RESULTS AND DISCUSSION
In this section, the convergence of solute−solvent interaction energies and its different components is evaluated for the three different solutes, namely, the ammonium cation, zwitterionic glycine and the formate anion. Given the long-range nature of electrostatic interactions, the polarity of each studied system is likely to impact the convergence of the interaction energy and its components. In this sense, one can tackle the problem of QM/MM convergence either by improving the description of the QM region or that of the MM embedding. For this reason, the initial QM calculations performed with the M062X functional and the cc-pVDZ basis set are supplemented by the larger cc-pVTZ basis. Regarding the MM region, a polarizable FF (POL3) 53 is introduced to further improve the description of the MM water molecules provided by the TIP3P electrostatic model.
Interaction Energy Convergence with QM Region Size. Initially, the behavior of the total interaction energy and its different components will be analyzed as a function of QM region size at the M062X/cc-pVDZ level for a single snapshot selected from the last 5 ns of the MD trajectory. As the surrounding MM potential will impact significantly the electron density distribution, specially for QM regions of small size, the effect of the MM region on the obtained results is also evaluated. For each solute, the deviations of the interaction energy components with respect to the values computed for a QM region containing 240 water molecules (reference values can be seen in Table 1) is calculated for various QM region sizes. An identical procedure is performed in the absence of the MM electrostatic embedding with results presented in Figure 1 (for full scale representations of the total and electrostatic interaction energy components check Figure  S1 in the Supporting Information). A QM region containing 240 water molecules corresponds to quasi-spheres of radii around 13−14 Å.
Ammonium and glycine show full convergence within the ±1 kcal/mol range for a QM sphere containing 160 and 70 QM water molecules (corresponding to a maximum distance of 11.3 and 9.3 Å between the atoms of the QM water molecules and solute's center of mass), respectively, for all interaction energy components under the QM/MM-EDA approach (Figure 1a,c). On the contrary, no convergence is found for formate (Figure 1e) or for any of the solutes when full QM calculations, i.e., in the absence of the electrostatic embedding, are performed. As expected, the electrostatic component is the most difficult to converge and given its magnitude with respect to the remaining components, the total interaction energy follows a similar trend. In agreement with this observation, the zwitterionic form of glycine requires the smallest QM region to obtain converged energies. ) are greatly increased to −60.8 kcal/mol, −52.1 kcal/mol, and −59.2 kcal/mol for each solute. The Pauli and dispersion components show the best convergence trends within the studied range as their short-ranged nature suggests. Indeed, within the QM/MM approach, it only takes 10, 30, and 20 QM water molecules to converge both components for the cationic, zwitterionic and anionic solutes, respectively. Larger QM regions are required to obtain similar results under a full QM scheme (Figure 1). The induction component deviates far more than dispersion for all solutes and accordingly does the polarization term. The slow convergence of the induction term is somewhat in between that of the short-range components and that of the long-range electrostatic component.
Overall, it can be seen that the introduction of a MM electrostatic embedding improves the convergence of the interaction energy and its components when comparing with the setup considering only the QM region. Nonetheless, a significant amount of water molecules is still needed to be treated quantum-mechanically for the total interaction energy to converge at reasonable accuracy, especially for systems where the long-range electrostatic interactions are likely to predominate.
Comparing the results obtained by the QM and the QM/ MM methods, no agreement within the studied range of QM region sizes (Figure 2) is achieved. Indeed, the differences in the total and electrostatic interaction energies are quite large and, thus, they are not represented in the figures for the ionic species for scale reasons (for full scale representations check Figure S2 in the Supporting Information). The reason for this large disagreement between the QM and QM/MM calculations can be partially found on the presence of counterions in the case of the QM/MM calculations. Since charge neutrality in the simulation box is required to compute the electrostatic interactions through the Ewald summation 57,58 as the MD simulation is performed, it was necessary to add counterions for both charged solutes. Those ions are later embedded within the MM region once the QM/MM partition of the system is performed. The removal of the MM region carried out for the full QM calculations thus leads to a significant change in the electrostatic interaction. Also, it should be emphasized that these magnitudes are not converged for the ionic QM systems (Figure 1). Removing the counterions from the QM/MM systems reduces both total and electrostatic energy differences with the QM systems from −75.0 to −58.2 kcal/mol and from −72.8 to −55.9 kcal/mol, respectively for the anion with the smallest QM water shell. A similar reduction from −116.3 to −100.8 kcal/mol and from −74.9 to −59.4 kcal/mol is found for such energy components for the cation with the QM shell containing 10 water molecules.
A common feature in all solutes can be found in the Pauli repulsion as it shows good agreement between QM and QM/ MM calculations even for QM regions containing only 10 to Journal of Chemical Information and Modeling pubs.acs.org/jcim Article 30 water molecules. The short-range nature of this component may be the reason for the harmony between models. Concerning polarization, it can be seen that it follows a similar trend to the Pauli repulsion, with an early convergence to the ±1 kcal/mol range. However, as Figure 2, parts a and c, make clear, this convergence results from the opposite trends shown by induction and dispersion since polarization is, indeed, sum of them both. For the neutral solute (Figure 2b) there is far better agreement for the polarization components, meeting when 60 water molecules are included in the QM partition.
In the context of a QM/MM calculation, one can aim at refining the obtained results by either improving the QM or the MM parts of the system. Regarding the former, both a systematic increase of the employed basis set and/or a better description of electron correlation by means of double-hybrid functionals, or even post-HF methods would improve the wave function of the QM region. Thus, DSD-BLYP, 71 B2GPPLYP, 72 and B2PLYP, 73 all of them corrected for dispersion, 74,75 would be an excellent choice for the study of noncovalent interactions as suggested in the literature. 76 As for the MM part of the system, the fixed-charge TIP3P model does not allow for any sort of polarization, a relevant feature in the study of intermolecular interactions. Thus, a polarizable FF would better describe the charge distribution within the MM region by means of introducing atomic induced dipoles. This 2-fold strategy is followed from here on, assessing its impact on both the previously obtained results and their convergence.
Effect of the Basis Set. The impact of a larger basis set (cc-pVTZ) on the different interaction energy components and the converged trends with QM region size is now assessed. For this purpose, the focus will be on systems with smaller QM regions, from 10 to 100 QM water molecules, as these differ the most from the reference value and are more susceptible to any change in the QM wave function. A QM region containing 100 water molecules corresponds to quasi-spheres of radii around 10−11 Å.
The total interaction energy behaves in a different way for each solute ( Table 2). The ammonium cation shows an increase of 8.6 kcal/mol unlike glycine and the formate anion where an energy decrease of −2.2 kcal/mol and −4.8 kcal/mol, respectively, is observed for a QM region containing 100 water molecules. It should be noted that all calculations performed so far have been carried out under the CPC, 48,77 and thus, its proneness to overcorrection for the BSSE must be taken into account. 69,70,78−80 In this sense, the increase in the total interaction energy may reflect the reduction of the BSSE with the larger basis set. On the other hand, the anionic character of formate (and to a lesser extent that of glycine) requires the use of a larger basis set for its description, hence the energy decrease for these systems when the cc-pVDZ basis is augmented to a cc-pVTZ.
Regarding the values obtained for the interaction energy components, it can be seen that the increase of the basis set affects each of them differently. The electrostatic component shows a significant increase for all solutes, specially for the ammonium cation (Table 2), where it rises by 14.3 kcal/mol for a QM sphere of 100 water molecules when compared to the cc-pVDZ value. On the other hand, increments of 7.8 and 4.5 kcal/mol are found for glycine and formate. Polarization follows the opposite trend, with energy decreases with respect to the computed cc-pVDZ values of −5.4 kcal/mol, −7.8 kcal/ mol, and −5.6 kcal/mol for the ammonium, glycine, and

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article formate. As expected, this trend is also followed by the two components of the polarization energy, induction and dispersion, both showing a general decrease except for the induction component in formate, where a slight increase of 0.7 kcal/mol is found. As for the Pauli repulsion, there is no significant variation when comparing results from different basis sets for ammonium, whereas for glycine and formate this component decreases with the larger basis set by −2.1 kcal/ mol and −3.7 kcal/mol, respectively. The convergence of each energy component computed with the larger cc-pVTZ basis set is compared in Figure 3 with the previous results obtained with the cc-pVDZ basis. The MMembedded QM sphere with 100 water molecules is now considered as the reference value as we aim at comparing the convergence between both basis, not determining the convergence point. In fact, according to Figure 1, only for glycine do all the interaction energy components converge in the ±1 kcal/mol range with less than 100 waters molecules in the QM region.
From Figure 3, an improvement in the convergence of the total interaction energy when using the cc-pVTZ basis can be observed, except for glycine, where no significant change is noticed. As for the smallest QM region (containing 10 water molecules), energy deviations of −2.3 kcal/mol, −7.8 kcal/mol and −1.0 kcal/mol are found for ammonium, glycine and formate, respectively, as opposed to −5.5 kcal/mol, −7.6 kcal/ mol and −3.8 kcal/mol when using the cc-pVDZ basis. Nonetheless, this improvement is not followed by the different components as their differences with respect to the reference value are increased to a greater or lesser extent when compared to the cc-pVDZ results. Hence, a larger basis set does not seem to result in an improvement of the convergence of the energy components. It is important to note that all trends represented so far correspond to a single randomly selected MD snapshot and, as such, the general tendencies here observed may vary when considering a sufficiently large sample of MD snapshots to ensure representability.
Regarding the absolute values for each component, there is a qualitative improvement when increasing the basis set. Focusing on the induction energy in Table 2, positive values appear for QM spheres from 10 to 30 QM water molecules when the cc-pVDZ basis set is considered for the ammonium cation. A positive value in the induction energy may reflect a limitation of the EDA scheme. Since the partition of polarization energy is based on second-order perturbation theory, this energy term can be exactly decomposed into induction and dispersion for weak interactions where secondorder is sufficient to obtain a good approximation to the correct interaction energy. However, the intermolecular interactions between ions and polar solvents are strong and higher order terms may contribute significantly, so the pure partition of polarization into induction and dispersion may lead to unexpected results. In any case, the separate analysis of the convergence of these energy terms with the QM region size still provides useful information about the quality of the QM/ MM scheme as they depend in different ways on the polarization density and on the intramolecular and intermolecular parts of the Hamiltonian. Additionally, these positive induction energies may stem also from a poor description of the potential created by the MM region or by the QM wave function. The former is addressed in the next subsection by including polarization into the MM region. The latter is partially improved when using the cc-pVTZ basis set, as can be seen in Table 2. The positive values for induction become negative for the QM region sizes of 30 and 20 water molecules and decrease for the smallest QM region of 10 water molecules. Polarizable Force Field. At this point, the effect of including induced atomic dipoles both for the MD simulations and the EDA will be evaluated. Herein, the pure electrostatic TIP3P FF is replaced by the polarizable POL3 FF. The procedure described previously is now followed, with a focus on QM regions containing 10 to 100 water molecules for the assessment of energy convergence, obtaining the representations shown in Figure 4. It can be inferred from it how the use of the POL3 FF does not produce a general improvement in the convergence of the interaction energy components, the electrostatic energy in particular. This component converges slightly better in ionic systems, but slightly worse for zwitterionic glycine (Figure 4). In order to introduce vibration sampling in the model and compare the energy components obtained with both FFs in a more rigorous way, a first study on the convergence of the sample means with respect to the number of sampled geometries was carried out for glycine solvated by a QM water sphere containing 100 molecules. Results are summarized in Figure 5, where a sample containing 20 or more geometries was found to be sufficiently converged, as fluctuations of the sample means are in the ±1 kcal/mol range with respect to the reference when 20 or more geometries are sampled. Thus, Figure 6 represents the normal probability distributions obtained with a sample of 20 geometries from each MD trajectory considering a QM region containing 100 water molecules.
It can be observed that the distribution functions obtained for the different energy terms show a displacement toward higher or lower values depending on the studied system. Sample means (Table 3) from distributions obtained with the POL3 FF take lower values for all components in glycine; the same happens for the formate anion to a lesser extent, whereas the ammonium cation shows the opposite behavior. An exception to this behavior is found for induction in the ammonium cation where a shift toward lower values arises when using the polarizable FF, leading to an improvement with respect to the TIP3P FF results for this component, where positive values where observed for the smaller QM regions.
For all three solutes, the electrostatic component is the most important, followed by the Pauli energy, whereas dispersion and induction are the least relevant terms with values that vary greatly from system to system. In fact, it can be appreciated how the electrostatic component accounts for most of the interaction energy in ionic solutes, with the polarization components playing less of a role, especially for the ammonium cation, in agreement with its less polarizable character. This is no longer the case for glycine, where it can be seen how polarization contributes significantly to the interaction energy.
Regarding the width of the distribution functions (Figure 6), it is worth mentioning the qualitative correlation between the relative standard deviation (RSTD) and the short or longrange character of each component. There is a direct relation between the average value of each distribution and its STD, however, when calculating the respective RSTDs, it can be seen how the short-range Pauli repulsion corresponds to the largest values, whereas the long-range electrostatic or the total interaction energy show the smallest ones as it can be inferred from Table 3. From the relatively large STDs, especially for short-range interactions, the importance of the configurational sampling, in this case performed with classical MD, is restated for the analysis of solute−solvent interactions. Interestingly, the use of a polarizable FF results in a broadening of the distributions for ammonium and particularly for glycine and its Pauli term, whereas the opposite is true for the formate anion.
Focusing now on the averaged values of the interaction energy components, it can be seen from both Figure 6 and Table 3 that there is no clear pattern regarding the distributions obtained with different FFs. For glycine and the formate anion, a displacement toward lower values appears when using the polarizable FF, whereas the opposite is true for the ammonium cation. The limitations when implementing polarizable FFs for the EDA-QM/MM analysis are reflected on the induction component, where negligible differences in its distributions are observed for the formate anion, whereas a displacement toward lower values appears for ammonium and glycine. The QM/MM-EDA scheme employed here, where the induced atomic dipoles of the MM solvent region are directly taken from the MD simulation of the whole system, does not allow to include in the interaction energy the effect of the changes in the induced atomic dipoles of the MM region due to the solute, an effect that it is certainly at the heart of the induction energy. Thus, it is not an unexpected finding that the induction energy values barely change from the pure electrostatic to the polarizable FF when the QM region is large (100 water molecules) for the anionic solute.

■ CONCLUSIONS
In this work, the solute−solvent interaction energy components of a series of ionic solutes in water have been investigated under our proposed QM/MM-EDA scheme in order to understand how the setup of the QM/MM calculation affects the convergence of such energy components with the QM region size.  By means of a classical MD trajectory, a series of conformations have been obtained for each solute in water solution, selecting one for the convergence study with QM region size at the M062X/cc-pVDZ level. A range of 10−240 QM water molecules has been included and a strong improvement has been found in the overall trends when the MM embedding is explicitly introduced in the EDA scheme (QM/MM-EDA). A reduction from −59.2 to −3.4 kcal/mol has been found in the interaction energy difference between a QM region containing 10 and 240 water molecules when going from a fully QM to a QM/MM-EDA scheme in the case of the formate anion.
The convergence is strongly dependent on the electrostatic energy component and, thus, a significantly large number of explicit water molecules must be treated quantum-mechanically to converge all components at chemical accuracy. Indeed, as many as 160 and 70 water molecules must be included in the QM region to reach convergence for all components at the ±1 kcal/mol range for ammonium and glycine, respectively. As numerically shown above, no convergence has been found with the chosen criteria for formate anion. Hence, it has been shown that, for the studied solute−solvent interactions, including a nonnegligible amount of solvent molecules in the QM region becomes necessary, and thus defining the QM region with just the solute and the first solvation sphere is a rough approximation in QM/MM calculations.
Using a larger cc-pVTZ basis set does improve the convergence of the total interaction energy but not that of different interaction energy components. The augmentation of the basis set yields a general increase of the electrostatic energy, whereas the polarization energy decreases and Pauli slightly changes. This result is in agreement with the importance of using large basis sets to accurately represent the molecular electric polarizability. In addition, the use of a When introducing induced atomic dipoles in the MM region by means of the POL3 FF, no clear trend has been found by comparing the normal distributions of each energy component with those obtained with the electrostatic embedding based on atomic charges, as displacements toward higher and lower values appear depending on the nature of the solute. However, a decrease in induction for the ammonium cation is observed, improving previous results from the TIP3P atomic charges. The width of the obtained distributions supports the need for configurational sampling, especially for the short-range Pauli component. Regarding convergence, no significant improvement has been found when introducing induced atomic dipoles in the electrostatic embedding.

■ ASSOCIATED CONTENT Data Availability Statement
The tleap and antechamber packages from the Amber-Tools20 50 toolkit were used to generate the topology and the coordinate files for the MD simulations. The PMEMD.-CUDA 54,55 and SANDER modules of the AMBER20 50 (https://ambermd.org/) software were used to perform the classical MD simulations. The MoBioTools toolkit (https:// github.com/mobiochem/MoBioTools) was used to automatically generate the QM/MM input files. QM/MM computations themselves were performed with the Gaussian16 61 (https://gaussian.com/) package. The EDA-NCI program (https://github.com/marcos-mandado/EDA-NCI) was used to perform the EDA. Trajectories were visualized with the Visual Molecular Dynamics (VMD, https://www.ks.uiuc.edu/ Research/vmd/).
Figures S1 and S2, corresponding to the full scale representations of Figures 1 and 2