Replica-Exchange and Standard State Binding Free Energies with Grand Canonical Monte Carlo

The ability of grand canonical Monte Carlo (GCMC) to create and annihilate molecules in a given region greatly aids the identification of water sites and water binding free energies in protein cavities. However, acceptance rates without the application of biased moves can be low, resulting in large variations in the observed water occupancies. Here, we show that replica-exchange of the chemical potential significantly reduces the variance of the GCMC data. This improvement comes at a negligible increase in computational expense when simulations comprise of runs at different chemical potentials. Replica-exchange GCMC is also found to substantially increase the precision of water binding free energies as calculated with grand canonical integration, which has allowed us to address a missing standard state correction.


Replica Exchange with grand canonical Monte Carlo
In the grand canonical ensemble, the equilibrium probability density for a system composed of N identical molecules with configuration r at a chemical potential µ, volume V , and inverse temperature β, is given by 1 π(r, N |µ, V, β) = 1 Ξ(µ, V, β) where Ξ(µ, V, β) is the grand canonical partition function (the normalization constant), Λ is the thermal wavelength of the molecule, and U (r) is the potential energy of the system.
In this work, we consider a super ensemble composed of M independent replica systems with the same volume and temperature but each with a different chemical potential, for which the probability density for a given microstate has the form π(r 1 , N 1 , µ 1 ; ...; We seek an unbiased Monte Carlo scheme for this expanded ensemble that allows for chemical potentials between replicas to be swapped with the aim of enhancing the sampling in GCMC titration simulations. In the Metropolis-Hastings algorithm, the probability to move from state a to b, with equilibrium probabilities π a and π b respectively, is accepted according to the following probability: where p(a → b) is the transition probability for going from state a to state b. By attempting moves that have transition probabilities equal to the reverse transition probability (i.e. p(a → b) = p(b → a)), we need only consider the ratio of equilibrium densities (π a /π b ) to find the acceptance probability. Attempting exchanges between uniformly selected replica pairs fulfills this requirement. If, in the super ensemble specified by equation S.2, the ith and jth chemical potentials were swapped and all others kept constant, the ratio of equilibrium densities can be shown, after minimal algebra, to be The simple relation on the right-hand side is the only function that needs to be evaluated in the acceptance test for a GCMC replica exchange swap. Notably, the right-hand side of the above does not depend on U (r) or V as the proposal does not involve a change of configuration or volume for any of the replicas.

Standard state binding free energies
The aim of this section is to derive an expression for the Gibbs binding free energy for N water molecules to a subvolume of a system of interest, within which water has been sampled with GCMC. In particular, the goal is to derive an expression that can be used with grand canonical integration (GCI). The binding free energy is given by where ∆G sys (N i → N f ) is the Gibbs free energy to change the number of water molecules from N i to N f in the system of interest and ∆G sol (N i → N f ) is the Gibbs free energy to change the number of water molecules in bulk water similarly. Although an expression for the binding free energy of water that exploited GCI was previously used by Ross et al., 2 the expression did not evaluate standard state binding free energies. The following analysis derives a standard state binding free energy equation for GCI.
As shown previously in S41, 2 the change in free energy of the solvent is given by where µ sol is the chemical potential of bulk water. See, for instance, McQuarry, 1 for more details on the relationship between chemical potential and Gibbs free energy. A general expression for µ sol is where µ sol is the excess chemical potential, k B T is Boltzmann's constant multiplied by temperature, ρ sol is the number density of bulk water, and Λ is the de Broglie thermal wavelength of a water molecule. 3 The system contribution to the binding free energy, G sys (N i → N f ), will be evaluated using the grand canonical integration (GCI) equation. The GCI equation gives the difference in Helmholtz free energy to transfer water molecules from ideal gas to the system of interest, which is denoted ∆F trans (N i → N f ) and is defined as is the Helmholtz free energy to change the number of molecules in ideal gas from N i to N f . The above free energies refer to changing the number of molecules in the same fixed volume, denoted V sys . As is common, the approximation ∆G sys (N i → N f ) ≈ ∆F sys (N i → N f ) will be used. This approximation is often invoked due to the small contribution changes in pressure have on differences of Gibbs free energies under physiological conditions. 3 With this approximation and equations S.6 and S.8, we have Owing to the explicit inclusion of the ideal gas free energy, the above differs from the expression for the binding free energy considered in equation 5 (and S44 in the Supplementary That assumption is not made in the following analysis. As described previously, 2 ∆F trans (N i → N f ) can be calculated by sampling water at a range of different chemical potentials, or, equivalently, Adams values and evaluating where B k is the Adams value in which an average of N k waters are present in V sys . As the average number of water molecules changes with the applied Adams value, N appears as a function of B in the integral on the right-hand side.
The Helmholtz free energy for N ideal gas particles in a volume V has the analytical expression Using the expression for µ sol (equation S.7), ∆F trans (N i → N f ) (equation S.10), and S.5 where β = 1/k B T has been included for notational simplicity. In contrast to the expression previously presented, 2 the above expression lacks the N factorial terms, and has the extra term ln(ρ sol V sys ). When the solvent is in the standard state with density ρ o , the standard state volume of water is defined as V o = 1/ρ o sol , 4 so that the standard Gibbs binding free energy is given by (S.14)

Equilibrium in grand canonical Monte Carlo
This section derives the condition for thermodynamic equilibrium for water binding to the system of interest using the above framework. When discussing thermodynamic equilibrium, we are obliged to consider water-protein binding in the thermodynamic limit, which occurs when N → ∞ and V → ∞. Although this limit must be taken by necessity, it will allow for some simplifying approximations. The starting point for this derivation is the expression for the Gibbs binding free energy given in equation S.9. Thermodynamic equilibrium is established when the binding free energy for N waters is at a minimum. 5 Denoting ∆G bind (0 → N ) as ∆G bind (N ), we seek an expression that satisfies A useful expression shown for this purpose-disscused by Ross et al. 2 -is that in the thermodynamic limit, the Helmotz free energy to transfer water N molecules from ideal gas to the system of interest is approximately given by Recognising N/V as the number density of the system of interest, denoted ρ sys , we arrive at The protocols for GCMC and decoupling simulations are the same as the BPTI system, however with the reduced system sampling described above. A replica exchange frequency of every 100,000 moves was chosen. For the L3 complex, water E has a lower binding free energy than water D, so water E was included for the GCMC simulations of water D. For the L1 complex, the individual-box GCMC simulations were repeated both with and without the other water molecule.

Results
The  decoupling. In contrast, the previous, non-standard state formulation of the GCI binding free energy produces energies that are distinct from the decoupling results, with a small but statistically significant volume dependence.
The binding free energy of each two water network has also been calculated in a single step, using a GCMC volume region covering both sites (see Figure S

Bovine pancreatic trypsin inhibitor
Additional methodological details GCMC subvolumes  Additional results

Replica exchange improves the consistency of titration data
To quantify the consistency of the BPTI titration data with and without replica exchange, The median centered average water occupancy at each B value for the RE and all of the non-RE data was compared using the Kolmogorov-Smirnov test, which found the distribution of the values to be significantly different with a p-value of 1.3%. The distributions between the data for different RE frequencies were found to have p-values >5%, suggesting the water occupancies at each B value are drawn from the same distribution, irrespective of the RE frequency.
Acceptance rates and replica exchange sampling The acceptance rates of the water insertion and deletion moves are 0.002% and 0.003% respectively, on average over all B values and for all RE frequencies.
S.16 Only seven replicas out of twenty-four, which were initially equally spaced in B before equilibration, have been shown for clarity. The water occupancy for B values below -24.8 was 0 for all repeats. S.17