EspalomaCharge: Machine Learning-Enabled Ultrafast Partial Charge Assignment

Atomic partial charges are crucial parameters in molecular dynamics simulation, dictating the electrostatic contributions to intermolecular energies and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of ab initio semiempirical quantum chemical methods such as AM1-BCC and is expensive for large systems or large numbers of molecules. We propose a hybrid physical/graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserve total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling for the first time the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package EspalomaCharge, this approach provides drop-in replacements for both AmberTools antechamber and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at https://github.com/choderalab/espaloma-charge.

to each atom.Second, the charge potential energy is minimized analytically to yield predicted partial charges ̂ that satisfy the total molecular charge constraint .
Traditionally, partial charges have been derived from expensive ab initio or semi-empirical quantum chemical approaches In the early stages of development of molecule mechanics (MM) force fields, ab initio methods were used to generate electrostatic potentials on molecular surfaces from which restrained electrostatic potential (RESP) charge fits were derived [2].This process proved to be expensive, especially for large molecules or large numbers of molecules (e.g., in virtual screening, where datasets now approach 10 9 molecules [18]).This led to the development of the AM1-BCC charge scheme [25,26]-a method for approximating restrained electrostatic potential (RESP) fits at the HF/6-31G* level of theory, by first calculating population charges using the much less expensive AM1 semiempirical level of theory and subsequently correcting charges via bond charge corrections (BCC).As a result, this approach has been widely adopted by the molecular mechanics community utilizing force fields such as GAFF [44] and the Open Force Field force fields [38].
Despite this progress, there are still multiple drawbacks with AM1-BCC.First, the computation is dependent on the generation of one or more conformers, which contributes to the discrepancy among the results of different chemoinformatics toolkits.While conformer ensemble selection methods such as ELF10 1  attempt to minimize these geometry-dependent effects, they do not fully eliminate them, and significant discrepancies between toolkits can remain.
Secondly, the speed is still a bottleneck (especially when it comes to the virtual screening of large libraries) as it still requires QM calculation for the parameterization.Moreover, the runtime complexity of AM1-BCC scales ( 2 ) in the number of atoms .In particular, the poor runtime complexity necessitates using a different charging model for biopolymers (such as proteins and nucleic acids), making the process of extending these polymeric force fields to accommodate post-translational modifications, nonstandard residues, covalent ligands, and other chemical modifications both complex and likely to require a third charging strategy within the same simulation.
Machine learning approaches to charge assignment have recently been proposed but face challenges in balancing generalization with the ability to preserve total molecular charge The rising popularity of machine learning has led to a desire to exploit new approaches to rapidly predict partial atomic charges.For example, recent work from Bleiziffer et al. [5] employed a random forest approach to assign charges based on atomic features, but faced the issue of needing to preserve total molecular charge while making predictions on an atomic basis-they distribute the difference between predicted 1 ELF10 denotes that the ELF ("electrostatially least-interacting functional groups") conformer selection process was used to generate 10 diverse conformations from the lowest energy 2% of conformers.Electrostatic energies are assessed by computing the sum of all Coulomb interactions in vacuum using the absolute values of MMFF charges assigned to each atom [21].AM1-BCC charges are generated for each conformer and then averaged.and reference charge evenly among atoms.Similarly, Metcalf et al. [33] preserves the total charge by allowing only charge transfer in message-passing form resulting in zero net-charge change.A more classical approach by Gilson et al. [17] tackles the charge constraint problem in a clever manner: instead of directly predicting charges, by predicting atomic electronegativity and electronic hardness, a simple constrained optimization problem inspired by physical charge equilibration [39] can be solved analytically to yield partial charges that satisfy total molecular charge constraints.In spite of its experimental success, its ability to reproduce quantum chemistry-based charges is heavily dependent upon the discrete atom typing scheme to classify and group atoms by their chemical environments.
Recently, Wang et al. [48] designed a graph neural networks-based atom typing scheme, termed Espaloma (extensible surrogate potential optimized by message-passing algorithms), to replace the human expert-derived, discrete atom types with continuous atom embeddings (Figure 1).This allows atoms with subtle chemical environment differences to be distinguished by the model, without the need to painstakingly specify heuristics.EspalomaCharge generates AM1-BCC ELF10 quality charges in an ultra-fast manner using machine learning In this paper, we use the continuous embedding atom representation scheme from Espaloma in conjunction with analytical constrained charge assignment inspired by charge equilibration to come up with an ultra-fast machine learning surrogate for partial charge assignment (EspalomaCharge).We train Espalo-maCharge on an expanded set of protonation states and tautomers of representative biomolecules and druglike molecules (the SPICE dataset [12]) to assign high-quality AM1-BCC ELF10 charges [26].The resulting EspalomaCharge model accurately reproduces AM1-BCC ELF10 charges to an error well within the discrepancy between AmberTools sqm and OpenEye oequacpac implementations on average 2,000 times faster than AmberTools on the SPICE dataset, can utilize either CPU or GPU, and scales as ( ) with number of atoms, allowing even entire proteins to be assigned AM1-BCC equivalent charges.We implement this approach in the Python package espaloma_charge, which is distributed open source under MIT license and pip-installable (Listing 1).

Theory: Espaloma graph neural networks for chemical environment perception, charge equilibration (QEq), and EspalomaCharge
Espaloma uses graph neural networks to perceive atomic chemical environments Espaloma [48] uses graph neural networks (GNNs) [1,16,23,29,47,52] to assign continuous latent representations of chemical environments to atoms that replace human expert-derived discrete atom types.These continuous atom representations are subsequently used to assign symmetry-preserving parameters for atomic, bond, angle, torsion, and improper force terms.When GNNs are employed in chemical modeling, the atoms are abstracted as nodes ( ) and bonds as edges ( ) of a graph . ℎ (0) , the initial features associated with node are determined based on resonanceindependent atomic chemical features from a cheminformatics toolkit (see Section 4).Following the framework from Battaglia et al. [1], Gilmer et al. [16], Xu et al. [52], for a node with neighbors ∈  ( ), in a graph , with ℎ ( ) denoting the feature of node at the -th layer (or -th round of message-passing) and ℎ 0 ∈ ℝ the initial node feature on the embedding space, the -th message-passing step of a GNN can be written as three steps: First, an edge update, where the feature embeddings ℎ of two connected nodes and update their edge feature embedding ℎ , followed by neighborhood aggregation, where edges incident to a node pool their embeddings to form aggregated neighbor embedding , and finally a node update, where  (⋅) denotes the operation to return the multiset of neighbors of a node and and are implemented as feed-forward neural networks.Since the neighborhood aggregation functions → are always chosen to be indexing-invariant functions, namely SUM or MEAN operator, Equation 3, and thereby the entire scheme, is permutationally invariant.In practice, choices such as dimensionality of node and edge vectors, number of layers, layer width, activation function, aggregation operators, and initial conditions for training are treated as hyperparameters and optimized during training to produce robust, near-optimal models on a held-out validation set separate from a test set.

Charge equilibration (QEq) is a physically inspired model for computing partial charges while maintaining total molecular charge
This Espaloma framework can be used to predict atomic parameters that can be fed into subsequent neural modules that predict molecular mechanics parameters.For partial charges, however, the constraint that the predicted partial charges ̂ should sum up to the total charge -the sum of all formal charges or total molecular charge-is non-trivial to satisfy were the charges to be predicted directly.
We adopt the method proposed by Gilson et al. [17] where we predict the electronegativity and hardness of each atom , which are defined as the first-and second-order derivative of the potential energy in charge equilibration approaches [39]: Next, we minimize the second-order Taylor expansion of the charging potential energy contributed by these terms, neglecting interatomic electrostatic interactions: which, as it turns out, has an analytical solution given by Lagrange multipliers: We thus use the Espaloma framework to predict the unconstrained atomic electronegativity ( ) and hardness ( ) parameters used in Equation 8 to assign partial charges in a manner that ensures total molecular charge sums to .It is worth noting that, by the equivalence analysis proposed in Wang et al. [48], the tabulated atom typing scheme Gilson et al. [17] uses amounts to a model working analogously to a Weisfeiler-Lehman test [50] with hand-written kernel, whereas here we replace this with an end-to-end differentiable GNN model to greatly expand its resolution and ability to optimize based on reference charges.
EspalomaCharge has ( ) time complexity in the number of atoms One of the primary advantages of spatial GNNs that pass messages among local neighborhoods is their ( ) complexity, where is the number of edges.In chemical modeling, since the sparsity of the graph is roughly fixed (number of edges is 3 to 4 times that of number of nodes), it is safe to write the runtime complexity as ( ), with being the number of nodes (atoms).The charge equilibration (QEq) step with its linear operator does not alter the complexity, nor is it the bottleneck of EspalomaCharge.Therefore, unlike with ab initio or semi-empirical methods, the runtime complexity of EspalomaCharge is ( ).

Experiments: EspalomaCharge accurately reproduces AM1-BCC charges at a fraction of its cost
We show, in this section, that the discrepancy between EspalomaCharge and the OpenEye toolkit is comparable to or smaller than that between AmberTools [6] and OpenEye.EspalomaCharge is fast and scalable to larger systems, taking seconds to parameterize a biopolymer with 100 residues on CPU.

The SPICE dataset covers biochemically and biophysically interesting chemical space
To curate a dataset representing the chemical space of interest for biophysical modeling of biomolecules and drug-like small molecules, we use the SPICE [12] dataset, enumerating reasonable protonation and tautomeric states with the OpenEye Toolkit.We generated AM1-BCC ELF10 charges for each of these molecules using the OpenEye Toolkit, and trained EspalomaCharge (Figure 1) to reproduce the partial atomic charges with a squared loss function.This model, with its parameters distributed with the code, is used in all characterization results hereafter.Here, mol denotes the number of molecules in the dataset; avg.atoms denotes the average number of atoms in molecules for the corresponding dataset; average is the charge RMS deviation between AM1-BCC implementations averaged over all molecules in the dataset, with sub-and superscripts denoting the 95%-confidence interval of the mean (computed by bootstrapping over molecules in the dataset with replacement); average walltime denotes the average wall time for the respective toolkit to assign partial charges for a molecule in the dataset.Boldface statistics denote the best (most accurate or fastest) model or models (in case confidence intervals are indistinguishable) for each statistic.
EspalomaCharge is accurate, especially on chemical spaces where training data is abundant First, upon training on the 80% training set of SPICE, we test on the 10% held-out test set to benchmark the in-distribution (similar chemical specie) performance of EspalomaCharge (Table 1, first half).Notably, the discrepancy (measured by charge RMSE) between EspalomaCharge and OpenEye is comparable with or smaller than that between AmberTools [6] and OpenEye-two popular chemoinformatics toolkits for assigning AM1-BCC charges to small molecules.Since it is a common practice in the community to use these two toolkits essentially interchangeably, we argue that the discrepancy between these could be established as a baseline below which the error is no longer meaningful.
We prepare several out-of-distribution external datasets to test the generalizability of EspalomaCharge to other molecules of significance to chemical and biophysical modeling, including a filtered list of FDA approved drugs, a subset of the ZINC [20,24] purchasable chemical space, and finally the FreeSolv [34] dataset consisting of molecules with experimental and computationally-estimated solvation free energy.The discrepancy between EspalomaCharge and OpenEye is lower than, or comparable with, that between AmberTools and OpenEye, demonstrating that the high performance of EspalomaCharge is generalizable, at least within chemical spaces frequently used in chemical modeling and drug discovery.To pinpoint the source of the error for EspalomaCharge, we stratify the molecules by number of atoms and total molecular charge, computing the errors on each subset (Figure 2).Compared to the error baseline, EspalomaCharge is most accurate where there was abundant data in the training set.This is especially true when it comes to stratification by net molecular charge, since the extrapolation from small systems to larger systems is encoded in the inductive biases of GNNs.Given the performance of well-sampled charge bins, it seems likely the poor performance for molecules with more exotic −4 and −5 net charges will be resolved once the dataset is enriched with more examples of these states.It is worth mentioning that unified application programming interfaces (API) (See Listing 3) integrated in Open Force Field toolkits are responsible for generating the performance benchmark experiments above.Additionally, a command-line interface (CLI) is also provided for seamless integration of EspalomaCharge into Amber workflows (See Listing 4).

EspalomaCharge is fast, even on large biomolecular systems
Apart from the accurate performance, the drastic difference in the speed of parameterization is also observed in the benchmarking experiments.For the small molecule datasets in Table 1, EspalomaCharge is 300 to 3000 times faster than AmberTools and 15 to 75 times faster than OpenEye.We closely examine the dependence of parameterization time on the size of the (biopolymer) system in Figure 3, where we choose the peptide system ACE-ALA -NME while varying = 1, ..., 100.The parameterization wall time for AmberTools and OpenEye rapidly increase w.r.t. the size of the system (the theoretical runtime complexity for semi-empirical methods are ( 2)) and exceeds 1000 seconds at = 18 and = 30, respectively.This scenario explains the infeasibility of employing AM1-BCC charges in parameterizing large systems.EspalomaCharge, on the other hand, has ( ) complexity and is capable of parameterizing peptides of a few hundred residues within seconds.This process can be further accelerated by distributing calculations on GPU hardware.Batching many molecules into a single charging calculation can provide significant speed benefits when parameterizing large virtual libraries by making maximum use of hardware parallelism.EspalomaCharge provides a seamless way to achieve these speedups when providing a Sequence of molecules, rather than single molecules at a time, as the input to the charge function in the API (Listing 5).In this case, the molecular graphs are batched with their adjacency matrix concatenated diagonally, processed by GNN and QEq models, and subsequently unbatched to yield the result.For instance, the wall time needed to parameterize all 100 ACE-ALA -NME molecules from = 1, … , 100 depicted in Figure 3 at once, in batch mode, is 7.11 seconds with CPU-only marginally longer than the time required to parameterize the largest molecule in the dataset, indicating that hardware resources are barely being saturated at this point.
Error from experiment in explicit solvent hydration free energies is not statistically significantly different between EspalomaCharge, AmberTools, and OpenEye implemnetations of AM1-BCC.
While the charge deviations between EspalomaCharge and other toolkit implementations of AM1-BCC are comparable to the deviation between toolkits, it is unclear how the magnitude of these charge deviations translates into deviations of observable condensed-phase properties (such as free energies) from experiment.To assess this, we carried out explicit solvent hydration free energy calculations, which serve as an excellent gauge of the impact of parameter perturbations [35], as the result is heavily dependent upon the small molecule charges.We use each set of charges in calculating the hydration free energies for the Calculated-vsexperimental explicit solvent hydration free energies computed with AM1-BCC charges provided by EspalomaCharge, AmberTools, and the OpenEye Toolkit, respectively.Simulations used the GAFF 2.11 small molecule force field [44] and TIP3P water [27] with particle mesh Ewald electrostatics (see Detailed Methods).Annotated are root mean square error (RMSE) and R 2 score therebetween and bootstrapped 95% confidence interval.See also Appendix Figure 7 for comparison among computed hydration free energies.molecules in FreeSolv [11] (see Detailed Methods in Appendix Section 4), a standard curated dataset of experimental hydration free energies.In Figure 4, we compare the computed explicit solvent hydration free energies with experimental measurements and quantify the impact of charge model on both deviation statistics (RMSE) and correlation statistics (R 2 ) with experiment.We note that EspalomaCharge provides statistically indistinguishable performance compared to AmberTools [6] and the OpenEye toolkit on both metrics, RMSE and R 2 .This encouraging result suggests that any discrepancy introduced by EspalomaCharge is unlikely to significantly alter the qualitative behavior of MD simulations in terms of ensemble averages or free energies.

Discussion
EspalomaCharge assigns high-quality conformation-independent AM1-BCC charges using a modern machine learning infrastructure that supports accelerated hardware Composing the Espaloma graph neural networks framework [48,49] for producing continuous, vectorial representations of the chemical environment of individual atoms with a conformation-independent charge equilibration (QEq) scheme [17] for assigning partial atomic charges that satisfy total molecular charge constraints, EspalomaCharge provides a robust approach for assigning conformer-agnostic AM1-BCC charges to biomolecular systems.Because EspalomaCharge is built on PyTorch [36], a fast, modern, Python-based machine learning framework, it supports multiple optimized compute backends, including both CPUs and GPUs.Unlike AM1-BCC implementations based on traditional semiempirical quantum chemical codes, Es-palomaCharge has ( ) runtime complexity with respect to the number of atoms (Figure 3), and introduces only small discrepancies to high-quality AM1-BCC reference implementations comparable to the discrepancies among popular AM1-BCC implementations (Table 1).
The ability to assign topology-driven conformation-independent self-consistent charges to small molecules and biopolymers prepares the community for next-generation unified force fields EspalomaCharge, thanks to its ( ) runtime complexity, can assign charges to biopolymers with hundreds of residues-including proteins with exotic post-translational modifications or covalent ligands, nucleic acids, or complex conjugates of multiple kinds-within seconds.For the first time, rather than using multiple distinct methodologies to parameterize various components in a system (e.g., RESP-derived charges for amino acids and AM1-BCC charges for noncovalent ligands), it is feasible to simultaneously and selfconsistently parameterize small molecules and biopolymers (and more complex covalent modifications of biopolymers) with a high-quality self-consistent scheme.This would be compatible with the next generation of unified force fields for small molecules and biopolymers, namely Wang et al. [48].
EspalomaCharge provides a simple API and CLI for facile integration into popular workflows EspalomaCharge is a pip-installable (Listing 1) open software package (Appendix Section 4), making it easy to integrate into existing workflows with minimal complexity.Assigning charges to molecules using the EspalomaCharge Python API is simple and streamlined (Listing 2).A GPU can be used automatically, and entire libraries can be rapidly parameterized in batch mode (Listing 5).EspalomaCharge provides both a Python API and a convenient command-line interface (CLI), allowing EspalomaCharge to be effortlessly integrated into popular MM and MD workflows such as the OpenForceField toolkit (Listing 3) and Amber (Listing 4).
One-hot embedding cannot generalize to rare or unseen elements One-hot element encoding is used in the architecture, making the model unable to perceive elemental similarities.This would compromise per-node performance for rare elements and prevent the model to be applied on unseen elements.Possible ways to mitigate this limitation include encoding the elemental physical properties as node input.

Future expansions of the training set could further mitigate errors
As shown in Figure 2, the generalization error is heavily dependent on the data abundance within the relevant stratification of the training set-bins containing more training data show higher accuracy.Future work could aim to systematically identify underrepresented regions of chemical space and expand training datasets to reduce error for uncommon chemistries and exotic charge states, either with larger static training sets or using active learning techniques.

Multi-objective fitting could enhance generalizability
Though EspalomaCharge produces accurate surrogate for AM1-BCC charges, these small errors in charges can translate to larger deviations in electrostatic potential (ESP) (Figure 6).Since the function mapping charges (together with conformations) to ESPs are simple and differentiable, one can easily incorporate ESP as a target in the training process, using ESPs derived either from reference charges or (as in the original RESP [2]) to quantum chemical ESPs.A multi-objective strategy that includes multiple targets (such as charges and ESPs), potentially with additional charge regularization terms (as in RESP [2]), could result in more generalizable models with lower ESP discrepancies.Furthermore, similar observables can be incorporated into the training process to improve the utility of the model in modeling real condensed-phase systems.For instance, condensed phase properties such as densities or dielectric constants, other quantum chemical properties, or even experimentally measured binding free energies.

Funding
Research reported in this publication was supported by the National Institute for General Medical Sciences of the National Institutes of Health under award numbers R01GM132386 and R01GM140090.YW acknowledges funding from NIH grant R01GM132386 and the Sloan Kettering Institute.JDC acknowledges funding from NIH grants R01GM132386 and R01GM140090.

Disclaimer
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Disclosures
JDC is a current member of the Scientific Advisory Board of OpenEye Scientific Software, Redesign Science, Ventus Therapeutics, and Interline Therapeutics, and has equity interests in Redesign Science and Interline Therapeutics.The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, Bayer, XtalPi, Interline Therapeutics, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute.A complete funding history for the Chodera lab can be found at http://choderalab.org/funding.

Electrostatic potential (ESP) errors
To calculate deviations between electrostatic potentials (ESP) on a surface, we first generated conformers using the OpenFF Toolkit 0.11.2.Conformer generation followed the Electrostatically Least-interacting Functional groups (ELF) approach.Initially, a maximum of 500 conformers was generated using RDKit with an RMS threshold of 0.05 Å.A cis conformation was enforced for carboxylic acid groups by rotating the protons in trans carboxylic acids 180 • around the C-O bond.The electrostatic energy of each conformer was calculated using MMFF94 charges [22].The 98% conformers with the highest electrostatic energy were discarded.From the remaining 2% conformers, we greedily selected up to 10 conformers that were most distinct from each other by RMS.Each conformer geometry was distinct by at least a heavy-atom RMS of 0.05 Å from each other.
For each conformer, we used OpenFF Recharge 0.4.0 to generate standard Merz-Singh-Kollman grids [41] around the molecule at a density of 1 point per Å 2 .We then calculate the root mean squared error (RMSE) between ESPs generated by each set of partial charges on the conformer grid.To compare the overall effect of different partial charges on the ESP, we average the RMSE between ESPs for each conformer.

Induced solvent potential from Poisson-Boltzmann model (ZAP)
As a fast measure of how small differences in partial charges might impact interaction free energies, we computed the induced solvent potential on each atom using a fast Poisson-Boltzmann implicit solvation model implemented in OpenEye ZAP [19].The induced solvent potential reflects the potential induced by the polarization of the solvent, and was computed following recommended standard usage [https://docs.eyesopen.com/toolkits/python/zaptk/thewayofzap.html].

Hydration free energies in explicit solvent (Δ hyd )
To compute hydration free energies for the FreeSolv dataset [34] to quantify the impact of small differences in charges on experimentally-measurable free energies, we used a modified version of the protocol described in [35].Neutral molecules were solvated with TIP3P water [27] in rectangular boxes with 14Å of padding, and assigned GAFF-2.11parameters [43,45] using openmmforcefields [8].Hydration free energy calculations were computed by performing replica-exchange alchemical free energy calculations using a two-stage alchemical protocol in which charges were annihilated by linear scaling and Lennard-Jones interactions, and then subsequently annihilated using the Buetler softcore potential [4,37].Simulations employed particle mesh Ewald (PME) [14] to treat long-range electrostatics and used mixed precision to ensure accuracy in energies and integration.Integration was performed with the BAOAB Langevin integrator [30][31][32] using hydrogen masses of 3.8 amu to enable 4 fs timesteps to be taken while introducing minimal configuration space sampling error [15].Calculations were carried out in gas phase at 298 K and in solvent at 1 atm using OpenMM 8 [13] and openmmtools 0.21.5 [7], and free energies were estimated with the multistate Bennett acceptance ration (MBAR) [40] after automatic equilibration detection [9] and decorrelation.Simulations were run for 1 ns/replica in each phase.Code for reproducing these calculations can be found in https://github.com/choderalab/espaloma_charge/tree/main/scripts/hydration-free-energies.

Figure 1 .
Figure 1.Schematic overview of EspalomaCharge: a hybrid physical / GNN model for fast charge assignment.First, the graph node representation ℎ assigned by a GNN is used to compute unconstrained electronegativity and hardnessto each atom.Second, the charge potential energy is minimized analytically to yield predicted partial charges ̂ that satisfy the total molecular charge constraint .

Figure 2 .
Figure 2. EspalomaCharge shows smaller average charge RMSE than AmberTools on well-represented regions of chemical space.SPICE dataset test set performance stratified by total charge (left panel) and molecule size (right panel).To better illustrate the effects of limited training data on stratified performance, the number of test (upper number) and training (lower number) molecules falling into respective categories are also annotated with test set distribution plotted as histogram.

Figure 3 .
Figure 3. EspalomaCharge is fast, even for large systems.Wall time required to assign charges to ACE-ALA -NME peptides with different toolkits is shown on a log plot, illustrating that EspalomaCharge on the CPU or GPU is orders of magnitude faster than semiempirical-based charging methods for larger molecules or biopolymers, and is practical even for assigning charges to proteins of practical size.Fluctuation in traces is due to the stochasticity in timing trials.

Figure 4 .
Figure 4. EspalomaCharge introduces little error to explicit hydation free energy prediction.Calculated-vsexperimental explicit solvent hydration free energies computed with AM1-BCC charges provided by EspalomaCharge, AmberTools, and the OpenEye Toolkit, respectively.Simulations used the GAFF 2.11 small molecule force field[44] and TIP3P water[27] with particle mesh Ewald electrostatics (see Detailed Methods).Annotated are root mean square error (RMSE) and R 2 score therebetween and bootstrapped 95% confidence interval.See also Appendix Figure7for comparison among computed hydration free energies.

Installing EspalomaCharge via the pip Python package manager.
Listing 2. Example illustrating the EspalomaCharge Python API.Here, EspalomaCharge assigns AM1-BCC ELF10 equivalent partial charges to an RDKit Molecule, returning them in a NumPy array.