DL_MG: A Parallel Multigrid Poisson and Poisson-Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution.

The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼109 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.

permittivity) are not applicable in these circumstances and numerical methods must be used.
In this work, we present DL MG, a flexible, scalable and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions.
Using two chemically relevant model systems, we tested the accuracy and computational performance of DL MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼ 10 9 unknowns and 100s of CPU cores. We also applied DL MG in actual large scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.

Introduction
What is the electrostatic potential corresponding to a given charge density? This deceptively simple question is key to modelling the electronic structure of atoms, molecules and materials, where the classical electrostatic potential forms a foundation upon which quantum mechanical many-body effects can be modeled. Consequently, developing efficient techniques for answering this question-by solving the Poisson equation-is a central concern for researchers interested in the electronic structure of matter.
For reasons of practicality, electronic structure calculations have historically tended to be restricted to the study of small systems in vacuum with fully open or fully periodic boundary conditions (BCs). In this case, the Poisson equation can be efficiently solved using analytic solutions (as described in section 2.1). However, in recent years it has become possibleperhaps even routine-to perform electronic structure calculations on systems numbering 100s or 1000s of atoms and to include the effect of the surrounding environment. This has been driven by prodigious growth in the computing power available to researchers and theoretical developments allowing electronic structure calculations to scale efficiently with respect to system size and number of processors. In particular, progress in this area has been enabled by the development of so-called linear-scaling, or O(N ), methods, in which the asymptotic computational cost increases linearly with system size, N . These methods have been implemented in several software packages, including ONETEP, 1 BigDFT, 2 CONQUEST, 3 OpenMX, 4 Quickstep, 5 and SIESTA. 6 When modelling large chemical systems, such as biomolecules and nanoparticles, neglecting the environment can have a substantial effect on the properties of the system. For example, without the screening effect of a solvent, it is possible for systems to develop unphysical surface states and dipole moments. 7 This issue can be resolved by simply including solvent molecules in the electronic structure calculation. However, this explicit approach is very costly, even using linear-scaling methods, because of the significant increase in the number of atoms that must be treated quantum mechanically and the need to statistically average over solvent configurations. In addition, it is generally the case that the electronic structure of the environment is not of interest and may complicate the interpretation of results. • scale efficiently with problem size, and • scale efficiently to large numbers of parallel processors.
In addition, if the solver is to provide an implicit representation of the environment, it must also be able to solve the more complicated generalized Poisson and/or Poisson-Boltzmann equations.
Using the multigrid approach, [8][9][10] Poisson solvers which satisfy all of these requirements can be developed. Multigrid methods provide a framework in which relatively simple iterative solvers can be applied on a hierarchy of progressively coarsening grids, yielding rapid convergence at low computational cost (section 2.3). With careful design and selection of components, multigrid solvers can also achieve excellent parallel efficiency (see ch. 6 of Ref. 10).
The use of multigrid solvers for solving the Poisson equation in electronic structure calculations is well-established, with many publications describing their successful application in this context. [11][12][13][14][15][16][17] Multigrid methods have also been applied as efficient solvers for realspace discretizations of the Kohn-Sham eigenvalue equations in density functional theory (DFT). [18][19][20][21] Solvers based on the multigrid approach have proven particularly effective for solving the generalized Poisson equation in implicit solvent models based on Fattebert and Gygi's electrostatic model 12,[14][15][16][17]22,23 (section 2.2). The smoothly varying function used to represent the dielectric permittivity in these models poses no problem for multigrid solvers, requiring only that the operator stencil (appendix A) is modified to incorporate variable coefficients.
While multigrid is clearly well-suited for solving the Poisson equation in electronic structure calculations featuring electrostatic embedding, it is not the only approach in use. In this paper, we introduce DL MG, a flexible, scalable and accurate Poisson solver library based upon a high-order defect-corrected multigrid approach. The solver was designed specifically to tackle the challenges inherent in modern large-scale electronic structure calculations. In particular, the library was developed to provide a means of accounting for environmental effects in electronic structure calculations by efficient solution of the generalized Poisson and Poisson-Boltzmann equations.
In the following, we present the theoretical context for the development of DL MG (section 2) and an overview of the implementation of the library (section 3), focusing particularly on the defect correction component (section 3.1.2). Through careful testing of the solver for chemically relevant model systems (section 4.1) and in large-scale electronic structure calculations (section 4.2) with ONETEP, 1 we demonstrate that the solver is able to scale efficiently to 100s of processors and ∼ 10 9 grid points and deliver close agreement with known analytic results and established FFT-based Poisson solvers. In addition, since DL MG is freely available under a permissive open source license, we provide some brief information for developers in appendix B to aid interested readers who may want to test and possibly integrate the library in their own codes.

Poisson and Poisson-Boltzmann equation
The electrostatic potential, φ 0 (r), resulting from a given charge density, n(r), in vacuum can be obtained by solving the Poisson equation: The solution of this equation is relevant in the context of electronic structure calculations, where the potentials due to electronic and ionic densities, n elec (r) and n ionic (r), are required.
In open boundary conditions (OBCs), where the potential goes to zero as r goes to infinity, the non-periodic potential can be expressed in terms of the corresponding Green's function for the Laplacian, ∇ 2 (see ch. 10 of Ref. 26): This yields the well-known form for Coulomb potential in OBCs: where the integration is over all space.
Under periodic boundary conditions (PBCs), the potential corresponding to a given periodic charge density can be obtained directly by solving the equation in reciprocal space, i.e.
φ 0 (G) = 4π n(G) where φ 0 (G) and n(G) are the Fourier transforms of the real-space potential and charge density, respectively. This simple expression is of great utility in electronic structure calculations using periodic basis functions, where FFTs can be employed to efficiently transform quantities between real-and reciprocal-space.
While the form of Eq. 4 is convenient, it also illustrates a particular difficulty encountered when solving the Poisson equation with PBCs, namely that the charge density must be neutral (i.e. n(0) = 0). Non-neutral charge densities result in a singularity in the potential at G = 0. In practice, this issue is typically avoided by introducing a compensating homogeneous background charge which ensures that the overall charge in the periodic unit cell is neutral, equivalent to solving where n is the average charge density over the volume of the unit cell, V , i.e.
The subtraction of n from the real-space charge density, n(r), in Eq. 5 is equivalent to setting n(0) = 0, thus avoiding singularities in Eq. 4. While this is a useful method for obtaining a solution from the Poisson equation when dealing with a non-neutral periodic density, it necessarily changes the nature of the problem-the potential obtained corresponds to the periodic density and the artificial neutralizing background charge.
In electronic structure calculations, it is convenient to deal with the interactions of the electronic and ionic components of the overall charge density separately. Since the electronic and ionic densities are independently non-neutral, neutralizing background charges must be used for each component when solving the Poisson equation in PBCs. This has no impact on the energy of a neutral system, since the contributions due to the background charges in each term cancel out.
This mobile ion density at a given r may be generally written as where c i [φ](r) and q i are, respectively, the local concentration and charge of ionic species i; m is the total number of ionic species present; and λ(r) is a function which describes the accessibility of r to the mobile ions. When c i [φ](r) takes the form of a Boltzmann distribution, with bulk concentration c ∞ i , Boltzmann constant k B , and temperature T , Eq. 9 becomes the Poisson-Boltzmann equation (P-BE): where we have used β = 1/k B T .
For a given charge density n(r), dielectric permittivity ε(r), accessibility function λ(r), and the charges and bulk concentration of mobile charges {q i } and {c ∞ i }, the P-BE (Eq. 12) can be solved to yield an overall electrostatic potential, φ(r). The equation may therefore be applied in situations where a static charge density is embedded in a dielectric medium and surrounded by mobile charges.
An important application of the P-BE is in classical modelling of the electrostatics of biomolecules in ionic solutions, where the atoms constituting the biomolecule are typically represented by point charges, the solvent as a dielectric medium, and the concentrations of species of mobile ions in solution are represented by a Boltzmann distribution (Eq. 11). The use of Eq. 12 in biomolecular contexts has been reviewed in Refs. 28,29. The P-BE may be similarly applied in electronic structure calculations (section 2.2), but with the quantum mechanical electron charge density represented as a smooth function, rather than a collection of atom-centered point-charges. This allows the effect of a saline solution on the electronic structure of a solute to be modeled implicitly, without the need for atomistic modelling of either the solvent or dissolved ions.
The non-linear P-BE (NLP-BE) may be approximated by a simpler linearized form when the electrostatic potential, φ(r), is small. In this case, the Boltzmann term in Eq. 12 is approximated as the first two terms in a Maclaurin series: Inserting this approximation into Eq. 12 yields the linearized Poisson-Boltzmann equation Note that for a neutral solution of mobile ions and thus this term does not appear in Eq. 14.
In the standard Poisson-Boltzmann model outlined in this section, the mobile charges are point-like particles with a statistical distribution based on the overall electrostatic potential of the system (Eq. 11

Electronic structure calculations
The classical electrostatic energy of a charge density interacting with itself is given by where the potential, φ 0 [n](r), is the solution of the SPE (Eq. 1). If the charge density represents the total charge of a collection of atoms, then this can be decomposed into contributions from the electrons and ionic cores, i.e.
The Hartree energy E Hartree , and ion-ion energy E ion-ion , are defined analogously to Eq. 16 for each density, though in practice they differ in how they address self-interaction. For the ion-ion term, the self-interaction is typically explicitly subtracted within E ion-ion , while the electronic self-interaction is not considered in the classical electrostatic energy-in electronic structure methods, this is part of the exchange contribution to the total energy. The electronion interaction, where no self-interaction correction is necessary, is given by The overall classical electrostatic energy typically represents a significant fraction of the total energy computed in an electronic structure calculation.
Electronic structure calculations are generally concerned with the behavior of electrons in the presence of nuclei at a set of fixed positions. In this situation, it is convenient to separate the total charge density into electronic and nuclear components, as in Eq. 17. The electron density, n elec (r), can then be treated as a continuous function, while the nuclear density, n nuc (r), is represented as a sum of point charges, with positions {R I } and charges {Z I }. This allows the potentials corresponding to the two densities to be solved for independently, using methods appropriate to their form.
In self-consistent field (SCF) methods, such as DFT and Hartree-Fock theory, the electron density is constructed as a sum over products of one-electron orbitals, ψ i (r), weighted by their occupancies, f i , i.e.
The orbitals, and hence the electron density, are obtained by solving one-electron Schrödinger equations of the general form where the electrostatic potentials associated with the nuclei and electrons are components of the effective potentialV eff . These equations must be solved self-consistently, since the effective potential is dependent on the orbitals, in part due to the relationship between the electrostatic potential and electron density (Eq. 1).
While the nuclear charge density is generally fixed during an SCF calculation, the orbitals, and hence electron density, are updated as part of the iterative process. As a consequence, the electrostatic potential due to the electron density-the Hartree potential-must be repeatedly solved for during the SCF procedure. Efficient methods for solving the SPE for a given electron density are therefore of great importance in the implementation of SCF approaches.
For SCF calculations in vacuum, the SPE (Eq. 1) can be solved using the analytic so- While there are efficient methods for obtaining the Hartree potential using analytic solutions to the SPE, numerical approaches have some utility under certain circumstances. One such situation is the case where a periodic basis set is used to represent a finite system. To reduce the extent of spurious interactions between periodic images of the finite system, it is typical to use the "supercell" approximation, 34 in which the finite system is surrounded by a large volume of vacuum "padding". The additional padding required for this approach can be computationally expensive for basis sets which grow with cell size (for example, plane waves). Real-space numerical approaches can be employed to efficiently solve for the electrostatic potential, while imposing open BCs. This completely eliminates electrostatic interactions between periodic images, while allowing the use of a periodic basis set-see Ref.
It is often useful to study the electronic structure of systems embedded in a medium with a nonhomogeneous dielectric permittivity, ε(r). In this case, the total electrostatic potential can be obtained by solving the GPE (Eq. 7) and comprises two terms: where φ 0 (r) is the usual electrostatic potential associated with the charge distribution of the system and φ r (r) is a reaction potential due to the polarization of the dielectric medium, ε(r). A key application of this model is in modelling solvent effects on electronic structure, using a polarizable dielectric medium to implicitly represent the solvent environment.
As mentioned in section 2.1, the dearth of widely applicable analytic solutions for the GPE means that the equation is typically tackled using numerical approaches or recast as a nonlinear form of the SPE and solved self-consistently (as described in Refs. 24,25). Some of the most widely used implicit solvent models employ an additional simplifying assumption that the system is separated into two regions in which the dielectric permittivity is homogeneous, i.e.
The solute cavity is defined in such a way that it incorporates the solute charge and the boundary between the two regions is discontinuous. In this model, it is possible to reformulate the problem of solving the GPE purely in terms of a polarization charge on the surface of the cavity. This apparent surface charge (ASC) defines the reaction potential, and so solving the GPE becomes a question of determining the ASC over the 2-D surface of the cavity-see Ref. 36 for an overview of ASC-type approaches.
In this work, we are concerned with the numerical solution of the full GPE in 3-D for an arbitrary nonhomogeneous dielectric permittivity. This provides a flexible foundation for the development of implicit solvent models in which the form of the dielectric permittivity is not restricted to the discontinuous piecewise form used in ASC approaches. In particular, solving the full GPE in 3-D allows the dielectric permittivity to smoothly transition between the bulk values within the solute cavity and solvent.
An advantage of solving the full GPE in 3-D with a smooth dielectric function is that this yields a continuous potential, thus evading the difficulties associated with discontinuous gradients which can arise in ASC approaches. 37 This was a motivating factor for Fattebert and Gygi in developing an electrostatic implicit solvent model based upon a smooth dielectric function for use in molecular dynamics, where accurate energy gradients are critical. 12,22 In Fattebert and Gygi's model ε(r) is a functional of the solute electron density, and is defined in terms of the electron density at r, n elec (r), the bulk permittivity of the solvent, ε ∞ , and two parameters: β and n 0 .
The use of the electron density to construct the solute cavity has the advantage that a good representation of the solute shape can be obtained using very few fitted parameters.
Fattebert and Gygi's cavity is defined by only two fitted parameters-significantly fewer than the number required when employing the widely adopted method of constructing the cavity from atom-centered spheres. This model has since been elaborated and extended in well-suited to implementation in parallel and exhibits excellent computational scaling with respect to the grid size. 10 Second, the representation of the charge density and electrostatic potential on a regular grid in ONETEP is ideal for use with a multigrid solver, allowing wellestablished and understood variants of the method to be used, as described in the following sections.

Multigrid
To solve the Poisson equation in situations where exact reciprocal space solutions are not available, real-space numerical approaches can be employed. In numerical approaches, a discretized version of the Poisson equation is required. In the context of electronic structure methods with periodic plane-wave-type basis sets, it is natural to discretize the problem on the regular real-space grid used to represent the electronic charge density, i.e.
where u h and f h are, respectively, the potential we are solving for and source term (the charge density multiplied by a factor of −4π for Eqs. 1  high-frequency components of the error. However, the overall convergence of these methods towards the solution is limited by low-frequency components in the error, which are less effectively removed and become increasingly prevalent when using finer grids. 9 Multigrid methods 8-10 simultaneously take advantage of the smoothing property of iterative solvers while addressing their slow rate of convergence. This is achieved by applying a hierarchy of progressively coarsening grids to the problem. Since low-frequency components of the error represented on a given grid appear as higher-frequency components on a relatively coarser grid, iterative methods can be applied on the coarser grid to rapidly attenuate the low-frequency components, avoiding the problematic slow convergence that arises with a single-grid approach.
Consider the general linear equationÂ with the corresponding defect (or residual) and defect equationÂ where f is the source term; u and u ′ are exact and approximate solutions to Eq. 26, respectively; and the error in the approximate solution is e = u − u ′ . We can define three basic operations: Smoothing Apply an iterative method to remove higher frequency components of the error on a given grid, i.e. solveÂ starting with some initial guess, u h , to obtain a smoothed result, u h , on a grid with spacing h.
Restriction Transfer the defect computed on a finer grid to a coarser grid: whereÎ H h is the restriction operator which maps functions on the grid with spacing h to the coarser grid with spacing H (in this example, the spacing is doubled).
Prolongation Transfer the error computed on a coarser grid to a finer grid: whereÎ h H is the prolongation operator which maps functions on the coarse grid with spacing H to the finer grid with spacing h (in this example, the spacing is halved).
When these operations are combined in an appropriate order, the resulting multigrid approach can significantly improve convergence compared to applying an iterative solver on a single grid.
A simple two-grid multigrid iteration, starting with the initial approximation, u (m) h , and producing an improved approximation, u (m+1) h , can be summarized as follows: ) can be repeated until a convergence criterion is satisfied. Since the computation of the error in step 4 is of the same form as the linear equation (Eq. 26) we wish to solve, the two-grid cycle can be applied recursively, leading to multigrid scheme involving a hierarchy of progressively coarser grids.
With multiple levels of coarse grids, the basic smoothing (Eq. 29), restriction (Eq. 30) and prolongation (Eq. 31) steps can be combined to produce a variety of recursive schemes.
One such scheme is the "V-cycle", illustrated in Fig

Defect correction
The representation of a continuous problem on a grid results in a "discretization error". In a finite difference method, this discretization error can be expressed as the remainder from For small h, the leading term in the error in the discretized derivative in Eq. 32 is a first order polynomial in h. More generally, the discretization error is the difference between the exact solution to the continuous problem and the exact solution to the discretized problem, e.g. for the general linear problem of Eqs. 26 and 29, the discretization error is |u − u h |.
The accuracy of a solution to a discretized problem is limited by the discretization error. To reduce this error, high-order finite difference approximations, in which the error asymptotically scales with higher powers of the grid spacing, h, can be employed. Such higher-order approximations have the advantage that relatively coarser grids can be used while maintaining the same level of accuracy compared to lower-order approximations.
Higher-order finite difference approximations are necessarily more complicated than lowerorder approximations, generally involving a greater number of terms. This corresponds to larger and/or more densely populated operator stencils (see appendix A), which can be challenging to implement in a manner which is computationally efficient. This is a particular The high-order defect correction approach 10,43 provides a means by which approximations to high-order solutions can be obtained from a multigrid solver while avoiding the complexities of implementing a high-order multigrid scheme. This is achieved by iteratively correcting the solution obtained using a lower-order multigrid scheme using a higher-order discretization of the operator. The higher-order discretization of the operator is applied only on the fine grid on which the multigrid solver deposits the solution, thus avoiding the difficulties associated with applying large, complicated, stencils in parallel on coarse grids.
The high-order defect correction procedure resembles the multigrid cycle described in section 2.3, in that an approximate error, e, is obtained by solving the defect equation, and is used to correct the approximate solution, u ′ , i.e.
The multigrid and defect correction procedures differ in how the defect equation is solved.
In a multigrid cycle the defect equation is solved on a coarser grid with a defect computed on a finer grid. In contrast, the high-order defect correction involves solving the defect equation with a lower-order discretization of the operator, using a defect computed using a higher-order discretization of the operator. In both cases, we solve the defect equation approximately (on a coarser grid, or with a lower-order operator discretization) so Eq. 34 yields an improved approximation, rather than the exact result.
Consider the high-order defect for an approximate solution, u (i) , obtained via a multigrid scheme using a second-order-accurate operator,Â 2 : The subscripts now refer to the order of accuracy of the operator, d, rather than grid spacing (in contrast to section 2.3) and the high-order operatorÂ d has d > 2. The defect equation may be approximately solved to second-order using the same second-order multigrid scheme, to yield an approximation to the higher-order error e (i) 2,d . The approximate error can then be used to correct the original approximation: This scheme can be applied iteratively, using the updated approximate solution u (i+1) to construct a new defect (Eq. 35), and repeating the process until a convergence criterion is satisfied-the specific criteria available in DL MG are described in section 3.1.2.
The iterative defect correction method outlined above is an effective method for reducing the discretization error in the solution obtained from a lower-order method. The scheme converges toward the solution for the higher-order problem, provided that is satisfied, where ρ(M ) is the spectral radius (i.e. largest eigenvalue) of a matrix M , I is the identity matrix andÂ d ′ is the lower-order discretization ofÂ (d ′ < d). 10 For further details on the high-order defect correction, see Refs. 10 (ch.5), 43, and 16 (appendix B).

Second order solver
The implementation of the second-order solver in DL MG is based upon the "geometric multigrid" approach, 44 whereby the problem to be solved is expressed on a fixed hierarchy of coarsening grids, as described in section 2.3. This is distinct from the "algebraic multigrid" approach, which works directly with algebraic equations, rather than grids. 45 The algorithms for DL MG's geometric multigrid solver were selected following the standard recommendations for the Poisson and Poisson-Boltzmann equations (section 2.1) given in Refs. 10,46: • Grid coarsening is achieved by doubling the grid-point separation in all dimensions at each multigrid level-this corresponds to the use of grids with spacing 2 n h with h the spacing of the finest grid (n ∈ Z and n ≥ 0).
• The grid stencils used to apply the differential operator ∇ · ε∇ on all grids are 3-D, 7-point second-order finite differences discretizations (see appendix A).
• Inter-grid transfers are performed with half-weight restriction and bilinear interpolation.
• Smoothing is performed using the Gauss-Seidel red-black (GS-RB) method (see, for Under the GS-RB scheme the grid is divided into two sets of points (red and black), with the points in each set depending only on the points in the other set. This has the advantage that the smoothing procedure can be applied to all the points in each set simultaneously, making the GS-RB smoother highly parallelizable.
The solver components described above can be used to construct an efficient solver for the Poisson and Poisson-Boltzmann equations with close to optimal computational scaling with respect to grid size, provided that the models used for the permittivity and charge density are smooth and without strong anisotropies. 10 DL MG was developed for use in large scale electronic structure calculations, the feasibility of which depends on the efficient use of parallel computing resources. The library was therefore designed to ensure good parallel performance on modern hardware, using widely adopted parallel frameworks (MPI and OpenMP) to ensure broad compatibility with existing electronic structure packages. In particular: • Multigrid iterations are performed using the V-cycle (Fig. 1), as this generally recommended for parallel computations. 10,46 • The distribution of global grid data among MPI processes is based upon a 3D Cartesian topology provided to DL MG as an argument, allowing the solver to adopt the parallel data decomposition of the calling program.
• Since grid coarsening is achieved by removing even index points in all directions, MPI communication is only necessary during grid transfer steps when dealing with points on the boundaries of the grid held on each MPI process.
The use of a sequence of progressively coarsening grids can be challenging for parallel implementations of multigrid. In particular, the number of active MPI ranks at each multigrid level can vary because, as the grids become coarser, there are fewer points to share among parallel processes. Below a certain coarsening level, some MPI ranks may be assigned zero grid points. To deal with this variation in parallel data distribution, a separate MPI communicator, which includes only the active MPI ranks, is used to perform MPI communication at each multigrid level.
The communication of domain halos between MPI ranks-required in smoothing, prolongation and restriction steps-is done using non-blocking MPI sends and receives, allowing communication to be interleaved with useful computation. Since the 3-D differential operator is discretized as a 7-point stencil (appendix A), smoothing steps only require data exchange between MPI processes with local domains that share a face. For the inter-grid transfer steps (prolongation and restriction), data exchange between MPI processes which hold local grids that share edges or corners is also necessary. The edge and corner points are efficiently communicated by means of ordered communication along axes between nearest neighbors 10 -this amounts to extending the size of the halos exchanged between MPI ranks with local domains which share faces so that the required edge and corner points are transferred along with the usual points along the shared face.
To take full advantage of modern multicore CPUs, DL MG employs shared-memory parallelism within each MPI process via OpenMP threads. This is implemented as a single OpenMP parallel region, covering the V-cycle loop and the subroutine which builds the stencil coefficients.
The local grid held on each MPI process is decomposed into thread blocks and distributed to ensure equal work for all threads. The sizes of these blocks can be tuned to optimize cache utilization, and "first touch policy" (see, for example, Ref. 48) is used to ensure optimal memory access by OpenMP threads on NUMA architectures.
Communication between multi-threaded MPI processes is handled by the master thread, i.e. the so-called "funneled" mode. 49 This mode of communication was adopted to ensure portability between MPI implementations with differing support for multi-threaded communication-funneled mode is the simplest hybrid MPI/OpenMP mode which allows overlapping of computation and communication. 50 Data transfers between MPI buffers and halos are parallelized using OpenMP threads, employing "single" directives to assign one thread per local grid side to allow halos along each direction to be copied asynchronously.
Although DL MG has been designed to take full advantage of hybrid MPI/OpenMP parallelism, support for MPI and OpenMP is not a requirement. When running calculations on a single workstation, it might be desirable to use only shared-memory parallelism. Alternatively, a distributed-memory-parallelism-only approach might be preferred when DL MG is called from an application which is designed to spawn one MPI process per CPU core.
DL MG is flexible in this respect-the library can be compiled with or without MPI or OpenMP, and can therefore be applied in contexts where only one type of parallelism is desired (or none at all).
The algorithm used by DL MG to solve the NLP-BE (Eq. 12) is based on a specialized inexact-Newton method. 51 In short, the linear multigrid solver is used to find an approximate solution of the linearized system of equations which correspond to a Newton iteration. A damping factor for the linear correction is also computed in order to ensure global convergence. See Ref. 51 for a detailed description of this approach (referred to as the "Damped-Inexact-Newton method").
For the SPE, GPE and P-BE, DL MG uses the same general convergence test, based upon the norm of the residual: where r (i) is the residual at iteration i, f is the source term and τ abs , τ rel , are user-configurable absolute and relative convergence thresholds, respectively. The definition of the residual depends on the equation being employed-for linear equations (SPE, GPE, linearized Poisson-Boltzmann), the general form is Eq. 27, while for non-linear equations an extra non-linear Using the maximum of the absolute and relative thresholds allows flexible control of convergence and can help avoid numerical issues when the source term is small.
The behavior of the second order solver is largely independent of the type of BCs being used, with the same solver components being employed in all cases. There are a few aspects of the operation of the solver which depend on BCs: • For calculations with open BCs along one or more Cartesian direction, fixed values for these BCs at the grid boundaries must be provided when calling the solver (via the initial value of the potential, see appendix B). Compute r The software library was developed in Fortran 95, using modules and derived data types for information encapsulation and to maintain a hierarchical structure. See appendix B for an introduction to the application programming interface (API) (or the online documentation 52 for a more detailed account).

Defect correction
As described in section 2.4, the high-order defect correction 10,43 in DL MG is applied on the fine grid, i.e. the grid on which input data is provided from the calling program. The second-order multigrid solver described in section 3.1.1 is used to approximately solve the defect equation (Eq. 36) for the residual computed using a high-order discretization of the differential operator. In practice, this is implemented as a loop, with the second-order solver repeatedly called to approximately solve the defect equation. In each iteration, the approximate potential is corrected using the second-order solution to the defect equation The defect correction procedure is considered to have converged when the following criteria are satisfied: where the most-recently updated potential and defect are u (i) and r (i) d , the initial (uncorrected) defect is r (0) d , and where the absolute, τ abs , and relative, τ rel , convergence thresholds are user configurable. The combination of these two conditions ensures that the iterative process does not stop too early due to temporary satisfaction of either condition-a truly converged solution will be converged with respect to both the residual and the error in the potential.
The use of absolute thresholds, τ abs , ensures that convergence can be achieved in cases where the relative threshold is problematic, for example where |u (i−1) | or |r The overall operator can be trivially expressed in terms of these "bare" derivative operators by applying the product rule, yielding a high-order defect with the following form: where αn(r) is the source term with α a constant which depends on the unit system (in atomic units it is −4π); φ (i) is the approximate potential from the i th defect correction iteration; and whereê i is a unit vector along Cartesian direction i. The stencils for these 1-D operators were derived automatically, using a computer algebra system 53 to perform the following procedure: 1. For a generic function f (x) sampled at n + 1 points x i with equal spacing h, construct the unique n th order interpolating polynomial, P (x).
2. Compute the symbolic k th order derivative of the polynomial, P (k) (x) = ∂ k P (x)/∂x k .
3. Evaluate P (k) (x j ) where x j is one of the interpolation points, {x i }, and simplify the expression.
This procedure yields general expressions of the form These expressions describe "n th -order" 1-D finite difference stencils, for taking the derivative at a given interpolation point, with coefficients, s i , i.e.
where we have re-numbered the summation to make x 0 the point at which the derivative is taken (j takes values between 0 and n, yielding (n+1)-point stencils). Using this scheme, arbitrarily large stencils can be constructed for taking the derivative at any of the interpolation points used to construct the polynomial. A note on the nomenclature: in this work, where we describe the 1-D stencils used in the defect correction (Eq. 48) as "n th -order", we are referring to the order of the interpolating polynomial used to construct the stencil. We refer to all forward-, backwardand central-differences stencils derived from an interpolating polynomial of order-n as "n thorder", regardless of the order of derivative being discretized. This is not the same as the order of the discretization error which we have previously referred to (e.g. Eq. 32), since this depends on the order of the derivative and also whether a given discretization benefits from the cancellation of terms when expanded in Taylor series.
The high-order 1-D discretizations of differential operators used in the defect correction have large stencils, which increase in size with the order of discretization-an n th -order stencil will, in general, include contributions from n + 1 points. This poses a challenge when using distributed-memory parallelism, since applying these operator stencils at the boundary of the local domain requires the exchange of large halos between MPI processes. Where the local domain is narrow, halos may extend over the local grids on more than one MPI rank, increasing the complexity of communication. Handling these extended halos efficiently is the main difficulty in computing derivatives with higher order discretization over a distributed domain.
To enable efficient exchange of extended halos during the defect correction procedure, DL MG builds two maps on each MPI process which describe the data which must be sent to and received from other MPI processes. Each of these maps is essentially a list of data blocks, containing the MPI rank and the relevant global index coordinates of the block of halo data to be sent or received. The halo exchanges are done with non-blocking send-receive MPI communication, allowing data to be dynamically copied to halo arrays as it is received. The with s ∈ (0, 1], such that i.e. the defect for the corrected potential, u (i+1) is smaller than the defect for the uncorrected potential u (i) .
In practice, the damping factor, s, is systematically reduced (starting from s = 1) by a fraction, q < 1, until Eq. 50 is satisfied-see Algorithm 2. If s becomes smaller than a prescribed value the entire defect correction process is halted with an error. This procedure can be enabled with an optional argument of the solver subroutine, but should be used only Algorithm 2 High-order defect correction with error damping (q < 1)

5:
SolveÂ 2 e where N ′ (u (i) ) is the first derivative of the non-linear Boltzmann term with respect to the potential, u, evaluated at the current approximation to the potential, u (i) . Note that this linearization of the Poisson-Boltzmann equation is distinct from the linearization in Eq. 14, which is linearized for potentials close to zero, rather than close to the current approximation of the potential. In the specific case of ONETEP, the size of the fine grid passed to the solver is related to the kinetic energy cutoff used to construct the underlying psinc basis. 59,60 The MPI topology over which this global grid is distributed is 1-D, with the grid divided into "slabs" along one coordinate direction. 39 The total number of MPI processes is also restricted-this cannot exceed the number of atoms used in the calculation.   The first test represents the type of problem encountered in electronic structure calculations where an implicit solvent is represented using a smoothly varying dielectric function (as in Fattebert and Gygi's electrostatic solvent model and variants, 12,22 described in section 2.2). Since the dielectric function is general and non-homogeneous, this requires the solution of the GPE (Eq. 7). We shall refer to this test case as "erf eps", as in DL MG's test suite.

Electronic
The second test models the interaction of an ionic solution with a charged surface, for example an electrode immersed in an electrolyte. This situation may be studied by solution of the P-BE (Eq. 12), representing the solvent via a homogeneous dielectric permittivity and using Boltzmann distributions to describe the concentrations of mobile ions in solution. This test will be referred to as "pbez", following the name used in DL MG's test suite.
The erf eps test is based upon the model system proposed by Fisicaro et al. in Ref. 25 to represent an isolated solute embedded in implicit solvent. In this situation, the overall electrostatic potential due to the solute charge and polarization of the dielectric medium is obtained by solving the GPE for the solute charge n(r), and the dielectric permittivity ε(r), which switches smoothly between a bulk value ε(r) = ε ∞ far from the solute and vacuum value ε(r) = 1 close to the solute.
Defining the electrostatic potential as a normalized Gaussian function, and using a dielectric permittivity constructed using an error function, the corresponding charge density can be derived analytically, i.e.
The Gaussian potential, error-function-based permittivity and corresponding density used in the erf eps model are defined by a set of parameters: the center of the Gaussian potential and dielectric cavity R; the permittivity in the bulk solvent ε ∞ ; the distance of the center of the transition region of the permittivity (where ε(r) = (ε ∞ + 1)/2) from the center of the Gaussian potential, d 0 ; and parameters controlling the widths of the Gaussian potential, σ, and the transition region of the permittivity, ∆.
We examined the accuracy of the solutions produced by DL MG for the erf eps model, The difference in convergence behavior observed for different grid sizes at high finite difference orders is interesting, however the key result illustrated by Fig. 2 is that the application of the high-order defect correction can reduce the maximum error in the solution by several orders of magnitude. For the grids tested here, the maximum error for 12 th -order finite differences was at least a factor of ∼ 10 −6 smaller than the maximum error for the second-order multigrid solver alone.
The model system used in the pbez test is a 1:1 salt solution (e.g. NaCl in H 2 O) in contact with an infinite planar surface of homogeneous charge. Assuming that the ionic concentrations are described by Boltzmann distributions (Eq. 11), the electrostatic potential for the system can be found by solving the P-BE. The problem considered in the pbez test is further simplified by assuming a homogeneous dielectric permittivity, singly charged ionic species, and that the accessibility function λ(r) = 1 everywhere. In this case, the electrostatic potential can be found by solving a simplified P-BE in 1-D, where c 0 is the bulk concentration of the salt, ε ∞ is the homogeneous permittivity of the solvent, β = 1/(k B T ) and the z coordinate direction is normal to the plane of the charged surface. The potential due to the charged surface enters into the equation via boundary conditions, i.e.
where φ surf is the value of the potential at the planar surface.
with the inverse Debye length for singly-charged 1:1 ionic solutions correction is plotted in Fig. 3.
The general trend for rapid reduction in the maximum error for increasing orders of finite differences seen for the erf eps test (Fig. 2) is reproduced in Fig. 3. Similarly, as observed for the erf eps test, the absolute magnitude of the error for a given order of finite differences decreases as the number of grid points is increased. This effect is more consistent for pbez than erf eps, with smaller grids yielding larger errors for all orders of finite difference. Again, this can be attributed to the number of defect correction iterations required to achieve convergence-for erf eps this was different for 209 3 versus the other grid sizes, while for pbez this is the same for all grid sizes. Unlike for erf eps, the norms of the final defect |r (i) d |, and error |u (i) − u (i−1) |, at each order of finite differences consistently decrease for increasing grid sizes.
As noted for the erf eps test, the key result of interest is the overall reduction in the maximum error achieved by applying the defect correction. The results for the pbez test indicate that, as with erf eps, the error in the solution may be very significantly decreased by application of the defect correction. For the grid sizes used in Fig. 3, the maximum error is at least ∼ 10 −5 times smaller with 12 th -order finite differences than for the second-order multigrid solver without defect correction.

Performance tests
DL MG was originally conceived for use in large-scale electronic structure calculations on massively parallel computers. For the solver to fulfill this purpose, it must be able to scale efficiently with problem size and number of parallel processors. To examine the scaling of computational cost in these two circumstances for problems of the type which would be encountered in electronic structure calculations, we used the erf eps and pbez test cases described in section 4.1.1. Using these synthetic test cases the number of processors and size of the problem could be varied systematically and the performance of DL MG studied For erf eps (Fig. 4), the total computational cost and the cost attributed to the multigrid solver and high-order derivative computation increases linearly with respect to the number Figure 5: Execution time to reach solution for the pbez test for increasing problem size. The quantities plotted are as in Fig. 4, with the addition of a least-squares linear fit to the total execution times for the five smallest grid sizes (dashed gray line). The functional forms and parameters used to construct the pbez model for these calculations are as for Fig. 3 (see section 4.1.1). The computational details of these calculations are described in section 4.1.2. of grid points, N grid , for the grid sizes used. In addition, for each of the components of the total cost plotted (second-order multigrid solver, computing high-order derivatives and the communication of high-order derivative halo data), the cost is seen to increase linearly with respect to grid size. The cost of the second-order multigrid solver dominates the overall computational cost, suggesting that future work to optimize the performance of DL MG should focus upon the core multigrid solver, rather than the high-order defect correction. A detailed theoretical convergence analysis of the algorithms employed in DL MG is beyond the scope of this work. Nevertheless, it is clear from Fig. 4 that the cost to obtain a defect-corrected solution to the GPE for the erf eps test scales linearly with respect to grid size, within the range of grid sizes tested. Given that the erf eps test is designed to mimic the situation of an isolated molecule in implicit solvent, and the grid sizes used in electronic structure calculations are typically in the range of grids tested here, it is likely that O(N grid ) scaling would also apply in practical implicit solvent calculations.
In Fig. 5, the overall execution time for the pbez test is ∼ 5 to 6 times larger than for erf eps for a given grid size. 68 In this case close-to-linear scaling of computational cost with respect to N grid is observed, though the overall scaling is less clear than for erf eps.
As described in section 3.1.2, DL MG obtains defect-corrected solutions to the NLP-BE by linearizing the defect equation for the NLP-BE at the current approximation to the potential (Eq. 51). In this scheme, the initial second-order solution to the NLP-BE is obtained by the inexact-Newton method outlined in section 3.1.1. Consequently, there are three iterative procedures to consider in the pbez test-the second-order multigrid solution of linearized versions of the P-BE, the inexact-Newton procedure, and the high-order defect correction.
As for erf eps, the number of defect correction iterations required to satisfy the convergence tests in pbez (Eqs. 42 and 43) is independent of grid size-this is 1 iteration for all grid sizes tested. Similarly, the number of iterations required to converge the inexact-Newton procedure was 6 for all grid sizes. Interestingly, the number of V-cycle iterations required to obtain a second-order multigrid solution increased with grid size. For the initial second-order solution (within the inexact-Newton method), 4 iterations are required for the 4 smallest grids, but for larger grids, this increases with grid size, rising to 8 for the largest grid. Similarly, for the approximate second-order solution of the high-order defect equation, 7 iterations are required for all grids except the two largest, which require 8 and 10 V-cycle iterations. This explains why total execution times for the largest grids in Fig. 5 are somewhat greater than would be expected for a linear fit to the first 5 points. This is illustrated  The parallel speedup data presented in Figs. 6 and 7 indicates that significant speedups can be achieved by increasing the number of processes. For erf eps (Fig. 6), the speedup with respect to number of MPI processes is near-linear for all N MPI , N OMP combinations (where N MPI and N OMP are the total number of MPI processes and number of OpenMP threads per process, respectively). The prefactor for the scaling is less than one, which implies that in this regime, the addition of each MPI process offers a constant, but lessthan-ideal speedup. The difference in speedup obtained using different numbers of OpenMP threads per MPI process is small, though for higher-core counts, it appears that 2 OpenMP threads offers the best speedup per additional MPI process.
The strong-scaling behavior of the pbez test is more complicated than for erf eps. Fig. 7 shows that the speedup achieved for a given number of MPI processes, S(N MPI ), is strongly dependent on the number of OpenMP threads per process. With 2 and 4 threads per process, the scaling behavior is very good. Near-ideal speedup is observed for 2 and 4 OpenMP threads per process for up to 125 MPI processes. The overall trend in this case is for a slow decrease in the performance improvement offered per additional parallel process (i.e. decreasing parallel efficiency, S(N MPI )/N MPI ), in line with Amdahl's law. 69,70 To illustrate this, Fig. 8 presents a least-squares fit to Amdahl's law, for 2 OpenMP threads per MPI process, under the assumption that the fraction of the execution time amenable to parallelization, p, experiences ideal speedup, S ideal . This fit yielded a value of p = 0.98367.
While the parallel speedup for the pbez test with 2 and 4 OpenMP threads per MPI process follows the expected trend, with 1 OpenMP thread per MPI process the behavior is more erratic, with the speedup for 64 and 125 MPI processes substantially lower than would be expected. For 64 MPI processes with 1 OpenMP thread per process, the speedup is actually lower than for 27 MPI processes. This appears to be a consequence of contention for hardware resources.
For the pbez test, the number of compute nodes allocated for the problem was OpenMP thread and thus these resources will be more contested for operations which occur on a per-process (not per-thread) basis (e.g. MPI communication).
To verify that the unusual speedup behavior for 1 OpenMP thread was due to more contested resources, we repeated the calculations presented in Fig. 7, but artificially allocated identical numbers of compute nodes for tests with 1, 2 and 4 OpenMP threads per process.
In this case, the poor speedup for 1 OpenMP thread per process vanished, yielding instead the expected trend resembling the 2 and 4 OpenMP thread lines plotted in Fig. 7.
Overall, Figs. 6 and 7 demonstrate that DL MG efficiently scales from 10s to 100s of processor cores, yielding significant performance improvements at typical core counts used in parallel electronic structure calculations. 2 OpenMP threads per process offers the best speedup in these particular tests, and use of > 1 OpenMP thread per MPI process is recom- Figure 9: Absolute error in the Hartree energy for a 448 atom graphene sheet computed using DL MG to solve the SPE in ONETEP. The total error and error per atom are plotted as a function of order of finite differences used in the defect correction procedure. The error is calculated with respect to the Hartree energy obtained from a reciprocal space solution to the SPE (Eq. 4).
mended to avoid issues with contention for hardware resources, as seen in Fig. 7.

Electronic structure calculations
In this section, we consider the numerical accuracy and computational performance of DL MG when used as a Poisson solver in ONETEP, 1 an electronic structure package designed to perform calculation with a cost that scales linearly with the number of atoms, N .
All DFT results presented in this section were computed using the PBE exchangecorrelation functional 71 39). The percentage of the total time spent solving the SPE and the absolute and relative differences in the energies and execution times for the two SPE solution methods are also included. All MG calculations were performed using 12 th -order finite differences in the defect correction. Timing data was taken from the repetition with minimum total time, t total , for three identical repetitions of the calculation. To evaluate the numerical accuracy of the results produced by ONETEP with DL MG, the complex was first studied in vacuum with periodic BCs. This allowed the results obtained with DL MG to be directly compared to the results obtained when solving the SPE using the standard reciprocal space approach (Eq. 4) employed in ONETEP. Table 1 shows the results of these calculations, which were run on the EPSRC MMM Hub "Thomas" machine, The excellent numerical agreement in energies computed using DL MG and the reciprocal space approach to solve the SPE seen in Fig. 9 is evident in   The total execution time for the free energy of solvation calculation reported in Table 2 is ∼ 3× the time taken for the vacuum PBC calculation on the same system (MG column in Overall, the execution times presented in Table 2  BCs. This is a small increase in total cost when it is considered that computing the free energy of solvation requires single-point calculations in both vacuum and solvent and that computation of open BCs involves significant additional computational work (see Table 2).
To assess the performance of DL MG against an alternative solver, we examined the

A Stencils
When considering the discrete representation of differential operators, such as those featuring in the standard and generalized Poisson equations (Eqs. 1 and 7), it is often convenient to think in terms of stencils. This concept and associated notation is clearly defined in Ref. 10.
We provide a brief summary here for the convenience of interested readers.
The stencil for an operator discretized on a regular grid describes the set of grid points in the locality of a point of interest which are involved in the application of the operator.
It is common to refer to a stencil in terms of the number of points involved. For example, the forward difference approximation to the derivative in Eq. 32 corresponds to a two-point stencil on a 1-D grid, with the point of interest x and adjacent point x + h. For multidimensional grids, and higher-order finite difference approximations, larger numbers of grid points are involved.
The utility of the stencil concept is in the compact expression of the "shape" of a discretized operator on a grid. In particular, the geometric arrangement of the points involved in a discretized operator can easily be discerned using "stencil notation". As an example, consider the SPE discretized on a 2-D grid, with discretized LaplacianL h , potential φ h (x, y), and density n(x, y). A 5-point stencil (with discretization error O(h 2 )) has the following form The second line of Eq. 63 uses the compact stencil notation described in Ref. 10, where the geometric relationship between the grid points involved in the stencil is clearly illustrated.
A general expression of the action of an operator, discretized on a 2-D grid, on a function on that grid is, in stencil notation: This is trivially extended to 3-D grids by combining layers of 2-D stencils. For example, a 3-D 7-point stencil (with discretization error O(h 2 )) for the Laplacian can be written:

B Information for developers
The DL MG library has been designed to be simple to interface with existing electronic structure packages.
The current version of the library (v2.0) is written in Fortran 95 and packaged with a GNU Makefile which automatically compiles the source code into a single static library. The library has no substantial external dependencies and can be compiled with modern Fortran compilers from Cray, Intel and GNU. Compilation with MPI and OpenMP is typically as simple as using an MPI compiler wrapper (e.g. mpif90) and adding the vendor-specific flag to compile with OpenMP support (e.g. -fopenmp for gfortran).
The typical procedure for calling DL MG from within an existing electronic structure code is very simple: • Initialize the solver using dl mg init.
• Call dl mg solver with appropriate arguments.
The arguments that must be passed to the initialization and solver routines depend on the nature of the problem being solved (e.g. equation type), the type of parallelism employed (if any), and whether default parameters (e.g. convergence tolerances) are being overridden.
For a typical use case, where DL MG is used to solve the GPE across several MPI processes and the default convergence tolerances are used, the calls to dl mg init and dl mg solver might take the following forms: In these subroutine calls, the global grid has dimensions (nx, ny, nz) and (dx, dy, dz) grid point spacing along x, y and z. The boundary conditions along each direction are determined by the integer array bc-for full Dirichlet BCs, all elements of bc should be DL MG BC DIRICHLET.
The MPI processes which will be used to solve the GPE and their Cartesian topology are described by the MPI communicator mg comm. For each MPI process the start and end points of the locally held grid within the global grid are given by the integer vectors gstart and gend.
DL MG outputs detailed information to a log file while running, which is useful when debugging issues or tuning parameters. The log file has the name report file and associated Fortran IO unit report unit.
The type of equation to solve is inferred from the arguments provided when calling the overloaded dl mg solver routine. For the GPE (Eq. 7), we need to provide the dielectric permittivity eps, and charge density rho as input, and the corresponding electrostatic potential pot for output, all with the dimensions of the local grid held on this rank (i.e. gend(:) -gstart(:) + 1). In addition, we require the values of the dielectric permittivity at the points located halfway between the points of the global grid in each Cartesian direction, eps mid.
The order of finite difference stencil (Eq. 48) used in the high-order defect correction is determined by fd order (2, 4, 6, 8, 10 or 12, with 2 corresponding to no high-order correction) and alpha is a multiplicative constant defined by the unit system (in the atomic units used throughout this paper, alpha is −4π). Finally, DL MG may return integer-valued error codes through ierror.
This interface is designed to be simple and clean, but also offers a large amount of configuration options behind optional arguments. For example, it is possible to finely tune the absolute and relative convergence parameters for the multigrid V-cycle, inexact-Newton method and high-order defect correction (Eqs. 40 We start by rearranging the GPE to obtain an expression in terms of the charge density and expanding the divergence in terms of the product rule, i.e. n(r) = − 1 4π ε(r)∇ 2 φ(r) + (∇ε(r)) · (∇φ(r)) .
(63) For completeness, we note that the definition of the Gaussian potential used in DL MG's test suite differs from Eq. 53 by a factor of π 3/2 , i.e. φ DL MG (r) = π 3/2 φ(r).