GridSolvate: A Web Server for the Prediction of Biomolecular Hydration Properties

We present a novel web server, named gridSolvate, dedicated to the prediction of biomolecular hydration properties. Given a solute in atomic representation, such as a protein or protein–ligand complex, the server determines positions and excess chemical potential of buried and first hydration shell water molecules. Calculations are based on our semiexplicit hydration model that provides computational efficiency close to implicit solvent approaches, yet captures a number of physical effects unique to explicit solvent representation. The model was introduced and validated before in the context of bulk hydration of drug-like solutes and determination of protein hydration sites. Current methodological developments merge those two avenues into a single, easily accessible tool. Here, we focus on the server’s ability to predict water distribution and affinity within protein–ligand interfaces. We demonstrate that with possibly minimal user intervention the server correctly predicts the locations of 77% of interface water molecules in an external set of test structures. The server is freely available at https://gsolvate.biomod.cent.uw.edu.pl.

• Excess water chemical potential is evaluated, by considering an effective Hamiltonian of a rotatable solvent probe: where {n} denotes an instantaneous distribution of occupied and empty lattice points, H U V (r, θ) is a solute-solvent interaction energy combining electrostatic and Lennard-Jones (LJ) contributions with SPC/E model 4 representing the solvent probe, H V V (r, θ, {n}) is a mean-field solventsolvent term, k B T , and β are the Boltzmann constant times temperature (T = 300 K), and its inverse respectively, and the summation extends over a set of 12 discrete probe orientations, θ. Note, that due to specific geometry of the SPC/E water model (OH distance of 1Å, and HOH angle of 109 o ), the probe can be placed on the BCC lattice such that its oxygen atom occupies point r, and two hydrogen atoms are located at two of its 8 nearest neighbours, with 12 possible spatial orientations.
• Given a unique solute placement on the grid, solute-solvent interaction potentials (electrostatic and Lennard-Jones) are calculated for all lattice points, giving rise to the H U V (r, θ) term.
• The mean-field solvent solvent term, H V V (r, θ, {n}) = k HB , is evaluated based on the assumed energy of water hydrogen bond, HB (one of six adjustable model parameters), and the number k ∈ {0..4} of occupied lattice points, two of which are located 3 BCC grid spacings away from the central point r occupied by the oxygen atom in the direction of hydrogen atoms (the resulting 3Å distance is well within the length limit for water-water hydrogen bond), and two other that are at the same distance at the direction of oxygen lone pairs, i.e. reproducing the tetrahedral arrangement of water hydrogen-bonding.
• The calculations start with a uniform water distribution, ρ = 1 at each lattice point. H eff is then evaluated for each lattice point, and points with H eff > η are vacated, with η being a "drying threshold" -an adjustable model parameter. 2,3 This procedure is repeated until stationary solvent distribution {n} is reached.
• The resulting map of excess chemical potential is mapped on a finer, 0.5Å Cartesian grid. The calculations are then repeated for next random BCC grid orientation with respect to the solute (by default N = 500 independent runs are performed).
• An excess chemical potential map accumulated on the Cartesian grid during N independent runs is then partitioned into individual hydration sites. Water binding free energy at each site i, ∆F i is obtained as a difference between the estimate of water excess chemical potential, µ(r i ), and the reference, bulk water excess chemical potential, µ b : where the summation over r includes all points belonging to hydration site volume V i , H N eff (r) is an effective Hamiltonian obtained at run N , and V W CA (r i ) corresponds to a repulsive, Weeks-Chandler-Andersen potential accumulated at r i during solvent partitioning. The latter was parametrised to obtain spacing between surface water molecules in agreement with explicit solvent simulations. 3 • Structures submitted for calculations are parametrised using with non-bonded force field parameters: Amber99 5 for proteins or GAFF 6 for ligands. In this latter case AM1-BCC atomic partial charges 7 are assigned with the use of antechamber program. 8 • The model has six adjustable parameters: 2,3 1) the energy of water-water hydrogen bond, 2) bulk water excess chemical potential compatible with grid representation, 3) the drying threshold, 4) an energy scaling factor, 5) an effective BCC grid solvent dielectric constant (used only in hydration free energy calculations), 6) an effective number density of water on the cartesian grid. Parameters 1-5 were adjusted to reproduce experimental hydration free energies of drug-like solutes (700 neutral and charged compounds, with hydration free energies ranging from +3.4 to 536 kcal/mol.), 2 and the parameter 6 was adjusted to provide agreement of predicted water binding free energies to protein cavities in agreement with reference, explicit solvent free energy calculations 3

PQRS file format
The GridSolvate server uses a slightly modified PQR file format 9 to store all necessary information about solvated system. A sample listing is shown in Fig. 1. In addition to data found in a standard PDB file up to columns containing X, Y, Z atomic positions, a PRQS file includes 3 additional records: • atomic partial charge in e units (float), • atomic radius, R, inÅ (float), • well depth of atomic Lennard Jones potential, ε in kcal/mol (float).

Server output files
The following files are generated upon successful job completion: • water_added.pdb containing all added water molecules. In addition to Cartesian coordinates, two extra columns in this file contain an estimate of local excess water chemical potential in kcal/mol (more negative values mean stronger affinity to a given site), and a rough estimate of volume inÅ 3 accessible to the centre of water molecule at a given site • water_buried.pdb, water_5.pdb, and water_10.pdb containing subsets of all added water molecules that are either completely buried within macromolecular structure, or have 5, or 10 % of surface area exposed to bulk solvent, respectively. All those files contain the same additional columns as the water_added.pdb.
• grid_s.pdb containing all sites with measurable excess chemical potential, including those with positive (i.e. unfavourable) values in addition to hydrated sites contained in water_added.pdb file.
• grid_c.pdb containing a dense Cartesian grid representing a spatial map of excess chemical potential. This file can be used to visualise and analyse hydration propensity of a protein surface.
• system_hydrated.pdb containing the submitted macromolecular structure/complex together with all added water molecules.
• system.pqrs containing the complete macromolecular structure/complex converted to PQRS format.
• $job_id.log a log file with detailed information concerning job execution.
• config.gsl a configuration file generated by the server.

Test set structures
A list of protein-ligand complexes used for server tests is provided in the accompanying text file. Each line contains information about a single complex in the following format: PDB id, ligand residue name, PDB chain id, ligand residue number, followed by a list of interface water molecules. If a given water molecule was identified (within 1.4Å spatial accuracy), an estimated binding free energy (kcal/mol) is provided, otherwise "X" is given. For complexes that were automatically not processed, SQM denotes a failure ligand partial charges calculations, and PQR denotes a failure in protein parametrisation.
Interface water: solvent accessible surface distribution Solvent accessible surface area (SASA) of interface water molecules was calculated with the VMD program, 10 using its default SASA method with solvent probe of 1.4Å radius. The distribution of SASA fraction (i.e. the actual SASA divided by SASA of a free water molecule) among crystallographic and predicted water molecules is presented in Fig. 2.