Predicting the One-Particle Density Matrix with Machine Learning

Two of the most widely used electronic-structure theory methods, namely, Hartree–Fock and Kohn–Sham density functional theory, require the iterative solution of a set of Schrödinger-like equations. The speed of convergence of such a process depends on the complexity of the system under investigation, the self-consistent-field algorithm employed, and the initial guess for the density matrix. An initial density matrix close to the ground-state matrix will effectively allow one to cut out many of the self-consistent steps necessary to achieve convergence. Here, we predict the density matrix of Kohn–Sham density functional theory by constructing a neural network that uses only the atomic positions as information. Such a neural network provides an initial guess for the density matrix far superior to that of any other recipes available. Furthermore, the quality of such a neural-network density matrix is good enough for the evaluation of interatomic forces. This allows us to run accelerated ab initio molecular dynamics with little to no self-consistent steps.

Two of the most widely used electronic-structure theory methods, namely Hartree-Fock and Kohn-Sham density functional theory, both require the iterative solution of a set of Schrödinger-like equations.The speed of convergence of such process depends on the complexity of the system under investigation, the self-consistent-field algorithm employed and on the initial guess for the density matrix.An initial density matrix close to the ground-state one will effectively allow one to cut out many of the self-consistent steps necessary to achieve convergence.Here, we predict the density matrix of Kohn-Sham density functional theory by constructing a neural network, which uses the atomic positions as only information.Such neural network provides an initial guess for the density matrix far superior to any other recipes available.Furthermore, the quality of such neural-network density matrix is good enough for the evaluation of interatomic forces.This allows us to run accelerated ab-initio molecular dynamics with little to no self-consistent steps.

I. INTRODUCTION
Density functional theory (DFT) [1-3] has been playing a central rôle in the electronic structure calculations of molecules and solids for more than six decades.The DFT success boils down to the rigorous theoretical framework [1], to the availability of well-controlled approximations of the exchange-correlation functional [4], to the multitude of numerical implementations [5][6][7][8][9][10][11], and to the community rigor in benchmarking results [12,13].In principle, for a given functional one can find the groundstate density and energy by functional minimization with respect to the electron density, a procedure denoted as orbital-free DFT [14][15][16].However, the lack of a universal and accurate approximation to a functional form of the non-interacting kinetic energy, a shortfall hardly mitigated by machine learning (ML) [17], makes the widespread use of orbital-free DFT impractical.The problem can be circumvented by the Kohn-Sham (KS) construct [2], in which the minimization of the functional is performed by solving an associated system of singleparticle Schrödinger-like equations.The one-particle potential entering the KS equations does, in turn, depend on the electron density.Hence, the solution is iterative and requires a multi-step cycle, where the electron density and the KS potential are continuously updated until converge is reached.Such self-consistent field (SCF) process is common to other electronic structure methods, for instance the Hartree-Fock scheme, where one updates the coefficients of the molecular orbitals [18].
In general the number of iterations required by the SCF process to achieve the desired accuracy depends on the system's complexity, the particular numerical DFT implementation, the SCF algorithm used and the initial trial electron density.Systems presenting a band gap are typically easier to converge than metals, since oscillations in the electron density during the iterative convergence are largely suppressed.Such oscillations can be damped by selecting appropriate ways to update the charge density from an iteration to the next, a process that in gen-eral largely determines the rate at which convergence is achieved.Note that, depending on the specific DFT numerical implementation, one may decide to mix the KS Hamiltonian instead of the charge density.In any case, regardless of the quantity chosen for the SCF algorithm, typically the output of several previous iterations are combined to determine the new input.The direct inversion of the iterative subspace (DIIS) method, proposed by Pulay [19,20], is probably the most widely used SCF-solver in modern local-orbital DFT codes, where one mixes the Fock matrices.However, there exists a multitude of alternative schemes and refinements, whose performance depends on the specific problem at hand [21][22][23][24][25][26][27][28][29].
Regardless of the mixing method used, an initial guess for the charge density close to the final self-consistent solution will speed up convergence, by typically reducing the number of iterations to perform.Such an initial density is usually defined either over a real-space or a k-space grid, in DFT codes based on plane waves, or in the form of a one-particle density matrix (DM), for local-orbital codes [30].There are multiple strategies to generate the initial charge density (or the DM), which all reduce to solve an associated non-self-consistent problem of some kind.As several of such schemes will be explored here, a detailed description will be provided in our method section.
The main aim of our work is to construct ML models, namely neural networks, to learn the ground-state one-particle DM of a DFT calculation.The models are based on structural and atomic information only, namely the chemical nature and positions of the atoms forming a molecule.The so-constructed DM can then be used either as a starting point for a SCF cycle or, if the accuracy is good enough, to perform a non-self-consistent evaluation of the various observables, for instance energy and forces.
Note that ML schemes to generate the charge density in either real space [31][32][33][34] or over an atom-center basis [35] have been already proposed.One can then, in principle, take one of such models and try to construct the DM from the computed charge density.This strategy, however, requires projection across different incomplete basis sets, a process that inevitably introduces additional errors.These are likely to be large enough to preclude the use of the ML DM, namely it will be unlikely competitive with other initial-density generation approaches.Furthermore, learning the DM directly enables the straightforward evaluation of the expectation values of all one-particle operators.These include nonlocal potentials, so that the same scheme can be used with Hartree-Fock calculations.Note that a mapping between the external potential and the one-particle density matrix, constructed over a kernel ridge regression, has been recently proposed [36].This is complementary to our work, since it requires smaller training sets for similar-sized molecules, but the inference is significantly more demanding.Alternatively, there is a body of literature looking at the construction of the Hamiltonian, most typically the Kohn-Sham one, from an equivariant description of the molecule's structure.This is typically achieved either through an equivariant network or data augmentation.In both cases, the result is that of having rather large and deep models, which need to be trained over extensive data sets [37][38][39].
The paper is organised as follows.In the next section we discuss the main methodological aspects of the models construction and the DFT implementation used to generate the data and benchmark the results.Then, we proceed by presenting the results.In particular, we first evaluate the quality of our DM as starting point for a SCF cycle, and then we analyse the error on the predicted energy and forces.With these results at hand we perform non-self-consistent ab initio molecular dynamics, whose results are compared with fully converged one.Finally, we conclude and suggest possible future directions.

II. METHODS
As discussed in the introduction, our task is to predict the converged DM of a DFT calculation by using a ML model, which utilises only chemical and structural information of a molecule.In particular, we consider fullyconnected dense neural networks together with global structural descriptors, and we predict all the independent matrix elements of the DM.The models are defined by the neural-network architecture, the descriptor types and the content of the training and test sets, all aspects that will be described here in detail.
All the electronic structure theory calculations performed in this work to generate the training dataset and to benchmark the results have been produced with the open source Python package, PySCF [11,40].PySCF implements all-electron DFT and a number of quantumchemistry methods, such as Hartree-Fock, over a Gaussian basis set.For all our calculations we have used the cc-pVDZ basis [41], formed by double-zeta polarised orbitals for the valence electrons.Thus, for the atomic species relevant for our tests we have 5 basis functions for H, 14 for O, 18 for S and 43 for Fe.In fact, three different molecule have been considered in this work, namely H 2 O, S 2 O and [Fe(H 2 O) 6 ] 2+ , see Fig. 1.
We start our analysis with H 2 O (C 2v symmetry), since this presents a simple electronic structure and typically no convergence problems.Its charge density, however, does not differ much from a superposition of atomic densities, so that a more stringent test is provided by S 2 O (also C 2v symmetry).Importantly, both H 2 O and S 2 O can be described by only three structural features, so that the more complex [Fe(H 2 O) 6 ] 2+ (O h symmetry) is investigated last.The [Fe(H 2 O) 6 ] 2+ molecule also gives us the opportunity to test our method for a metal bond.All electronic structure calculations are performed with the BLYP functional, which combines the generalised gradient approximation for the exchange energy of Becke [42] with the Lee-Yang-Parr correlation energy [43], as implemented in the libxc library [44].When creating the training, validation and test sets, the SCF cycle is converged with the DIIS scheme [19,20] for H 2 O and S 2 O, while for [Fe(H 2 O) 6 ] 2+ we employ a second-order solver (SOS) [28,29], as the convergence appears more problematic.In contrast, when analyzing the convergence history of different initial DM guesses, we will consider both the DIIS and SOS SCF algorithm.
Since our ML DM will be compared with that generated by conventional initial guesses, it is worth spending some time in describing these.Possibly, the simplest choice constructs the charge density as a superposition of atomic densities, while the DM is obtained from the orbitals that digonalize the Fock matrix associated to such spin-restricted guess density.This is the 'minao' PySCF default option [45,46].Alternatively, one can use the eigenstates of the non-interacting problem, namely those orbitals that diagonalize an Hamiltonian comprising only the kinetic energy and the nuclear potential.This is the one-electron DM, '1e' option in PySCF, which usually represents a poor starting point for molecules [30].Then, there are options based on spin-restricted atomic Hartree-Fock calculations, employing different recipes for the construction of the guess orbitals used to compute the DM.In PySCF these are called 'atom' [47] and 'huckel' [30].Finally, one can construct the DM with the orbitals obtained from the solution of a superposition of tabulated atomic potentials [47].This is the 'vsap' option [30].
Here we predict the initial guess DM with dense neural networks, where the input features are the independent Cartesian coordinates of the molecules.A summary of the structure of the different neural networks, together with the training-set errors, are reported in Table I.The use of the Cartesian coordinates together with a dense neural network effectively forces an equivariant quantity, namely the DM, to be described by an invariant model.This issue is here resolved by manually removing the rotational and translational degrees of freedom of the molecule, a procedure that makes the entire problem invariant.Of course, such solution is not general Table I: Table summarising the structure and performance of the final neural networks trained for the three molecules.Here we report the number of features defining the input, N in , the dimension of the DM, D DM , the number of the network hidden layers, N hi , the total number of neurons forming each hidden layer, N nu , and the total number of weights, N w .Then, we report the mean absolute error (MAE), the largest error on the matrix elements, δρ max , the root-mean square error (RMSE), and the R 2 coefficient of the DMs.All errors refer to the test sets and they are in atomic units (a.u).and a more elegant way to tackle the problem would be to use a fully equivariant representation of the molecular structure [48].This, however, adds a significant new layer of complexity, which we wish to avoid for our early study.Thus, we remove the translational degrees of freedom by fixing a given atom at the origin.In particular, we use oxygen, the central sulphur and the Fe .This returns us six independent coordinates (see Fig. 1).The structure of the neural network has been optimized by varying the number of hidden layers and their size, by minimizing the mean absolute error (MAE).The optimal configuration for each molecule is reported in Table I.Note that in all cases we employ the exponential linear unit activation function.The datasets used to construct the model are formed by 9,000 configurations for training, 800 for validation and 1,000 for testing.In the case of H 2 O and S 2 O such configurations are extracted from ab-initio Born-Oppenheimer molecular dynamics trajectories at 150 K, also performed with PySCF.In particular, we run for 117 and 130 picoseconds, respectively for H 2 O and S 2 O, by using the Nose-Hover thermostat through the pyLammps API as implemented in the LAMMPS package [49].In contrast, for the case of [Fe(H 2 O) 6 ] 2+ we consider random rigid displacement of the H 2 O molecules, such that the Fe-O bond length is varied within 10% from its equilibrium value (2.0525 Å).These return us training-set mean absolute errors (MAEs) of the order of 10 −3 atomic units (a.u.).Note that typically, the largest DM matrix elements are found along the diagonal and they can reach values close to unity, while a significant fraction of the off-diagonal matrix elements remain small.For example, for the H 2 O molecule we find 8.15% of the matrix elements, ρ, having values 0.1 < |ρ| < 1, 43.75% in the range 10 −3 < |ρ| < 0.1 and 48.1% being |ρ| < 10 −3 .The parity plots associated to our neural networks, together with the typical matrix elements distributions can be found in the Appendix for all the three molecules.Note also that our numerical construction of the DM does not guarantee idempotency to be satisfied, and in fact, we find that this is numerically violated.Since idempotency is a non-linear condition it is difficult to implement it as a constrain in the neural networks.Such drawback is here compensated by the numerical accuracy achieved, as we will demonstrate in the following.
Finally, we perform tests on how the computed DM can drive molecular dynamics, namely we perform abinitio molecular dynamics using our ML DM and not the one resulting from a SCF cycle.In this case special care must be taken, since the neural networks return DM only for molecules having the specific spatial orientations described before.For this reason, we operate the following workflow.A molecule at an arbitrary position is translated back to the origin and rotated so to have the orientation required by the neural networks.Then, the DM is evaluated with the network and used as a starting guess for a static DFT calculation (nonself-consistent). Energy and forces are thus evaluated using PySCF.The molecule is then rotated/translated back to the original position and the same rotation is applied to the forces.Such force field is input into the molecular dynamics package, which updates the atomic coordinates.Then the process is repeated.The molecular dynamics steps are implemented in the LAMMPS package [49].Although more cumbersome than standard MD, the strategy adopted here allows us to use our simple structural descriptors for a problem, the construction of the DM, which is intrinsically translational invariant and rotational covariant.Translational invariance can be achieved by using local structural descriptors, while rotations can be accounted for with a covariant model.Here we have preferred to keep our model as simple as possible so to concentrate fully on demonstrating how a DM can be constructed with ML.

III. RESULTS AND DISCUSSIONS
In order to validate our entire approach we perform three different tests.Firstly, we evaluate the efficacy of the ML DM as a starting point for a SCF cycle.Then, we quantify the accuracy of the DM in determining energy and forces.Finally, we compare the molecular-dynamics trajectories driven by the forces associated to the ML DM with those of fully self-consistent ab-initio molecular dynamics.

A. Machine-learning DM as initial guess
The first test consists in evaluating how accurate is the DM generated by the neural networks as starting point for the DFT SCF cycle.The test is performed over 1000 new configurations for each molecule and in Fig. 2 we report the average number of SCF iterations performed to achieve convergence and their associated variance.In this case converge is defined by having an energy difference between subsequent iterations lower than 10 −9 Ha, a value that sets a rather tight convergence criterion.For this test we perform two sets of calculations, where the SCF cycles are driven respectively by the DIIS or the SOS mixing scheme, with convergence parameters set by the PySCF default.
In general we find the SOS DM-update strategy to be significantly more performing than the simpler DIIS, with the total number of iterations reducing by approximately a factor three regardless of the molecule or the initial DM.Note that this advantage is partially compensated by the SOS scheme being numerically heavier than DIIS, namely a single iteration takes longer.We have also found that sometimes for [Fe(H 2 O) 6 ] 2+ and the DIIS solver, 50 iterations are not enough to achieve convergence.This is somehow expected considering the electronic structure of the [Fe(H is a spin-crossover molecule presenting a temperatureinduced low-spin to high-spin transition, driven by a distortion of the octahedral coordination shell of the Fe 2+ ion.This is only partially described by DFT [50], and a multi-determinant theory appears more appropriate [51].For this reason it is not surprising that for some highly distorted configurations our non-spin-polarised DFT calculations struggle to converge.Note that this is not so crucial here, since we are not seeking to compute the exact ground state of [Fe(H 2 O) 6 ] 2+ , but simply to present a test example for 'difficult' convergence.In any case, when convergence is not achieved, the SCF cycle is stopped after 50 iterations.Despite these differences, the convergence-speed trends with respect to the initial DM are rather similar across the two mixing schemes, so that in our discussion we refer first to data obtained with the SOS algorithm.As expected the H  ] 2+ requires only two.This gives us a speedup in the computation of the SCF cycle comprised between a factor 3 and a factor 5. Note that the speedup is less pronounced when the DIIS mixing scheme is used, in particular for the covalently bonded molecules, where the advantage over the other schemes is only fractional.This difference seems to boil down to the inefficiency of the mixing scheme, which brings the iteration count to 6-7 even when the calculation is initiated with the ML DM.While it is not straightforward to establish why the SOS algorithm offers a better convergence speedup to MLgenerated initial DMs', we note here that, in general, DIIS algorithms may not honour well the initial guess, meaning that the optimisation procedure may lead the electron density anywhere in the variational space.It is then expected that such drawback penalises more DM's close to the fully converged one than more inaccurate ones.

B. Convergence analysis
We now look in more detail at how convergence is achieved for different starting DMs and the two different mixing schemes.We begin by considering H 2 O and then move to [Fe(H 2 O) 6 ] 2+ .The results for S 2 O are somehow in between these two cases and are presented in the Supplementary Information (SI).In figure 3, we show the total energy (with respect to the ground-state energy) as a function of the iteration number, n, for H 2 O computed with the DIIS [panel (a)] and SOS [panel (b)] mixing scheme.Furthermore, in panel (c) we present the norm of the difference between the ground state (converged) DM, ρ GS , and that at the n-th iteration, ρ n , also along the DIIS-driven SFC cycle.This last quantity is computed as ∆ρ = ij |ρ GS ij − ρ n ij |, with ρ ij being the i, j DM matrix element.Although the detail of each SCF cycle may differ depending on the molecule geometry, the figure shows a typical case.
In general all initial DMs are somewhat different from the final ground-state one, with the largest variations found, as expected, for the '1e' initialization (the total energy difference at n = 0 is in excess of 8 Ha and it is not displayed here).The convergence is then monotonic, when the SOS solver is used, while it may present oscillations for DIIS.This explains the largest number of iterations typically taken by DIIS.Strikingly the ML-generated DM appears extremely close to the final ground-state one, so that the convergence is monotonic in all cases.In fact, the n = 0 computed total energies for H 2 O are on average within 10 −4 Ha from their ground-state value and the percentage variation of the DM at the first iteration is only 0.196%.This suggests that, by large, the ML-DM already provides an excellent estimation of the ground-state density matrix.As a comparison the second best initial DM appears to be that generated with a restricted Hartree-Fock calculation ('huckel' option), with an initial total energy error of about 10 −2 Ha.All the others DM-generating schemes have an initial error larger than 0.1 Ha.
The path to convergence becomes significantly more oscillatory when one looks at the DIIS SCF cycle for [Fe(H 2 O) 6 ] 2+ (see Fig. 4).This time, the energy and DM fluctuations are significantly more pronounced, with the appearance of 'spikes' in correspondence to some self-consistent steps when using the DIIS solver.These originate from fluctuations in atomic orbital occupation across different SCF iterations.Such large fluctuations are suppressed by the SOS mixing scheme, which reinstates a monotonic approach to the ground-state solution.Most importantly, also for [Fe(H 2 O) 6 ] 2+ the MLconstructed DM provides a much more accurate starting point.In fact, it is sufficiently, accurate that the oscillations are suppressed, regardless of the mixing scheme.
Furthermore, already at the first iteration energy and DM are extremely accurate.This time, the average energy is about 3 × 10 −5 Ha away from the converged one (this corresponds to an error 1.7 × 10 −6 %), while the percentage variation of the DM at the first iteration with respect to the ground state is 0.77%.

C. Non-self-consistent forces
The previous section has shown that the ML DM requires an extremely limited number of SCF iterations to achieve the desired convergence and that, even without any iteration, it can already provide an accurate estimate of the DFT total energy.Here we explore further this second aspect and investigate the accuracy of our ML DM in predicting a second observable, namely the atomic forces.For this section we consider the S 2 O molecule in particular, but the results for H 2 O and [Fe(H 2 O) 6 ] 2+ are qualitatively rather similar and presented in the SI..
In figure 5 we present the parity plot diagram for the x [upper panel] and y [lower panel] component of the atomic forces acting on the atom lying in the x-y plane.These are computed for a set of 1000 distorted molecules obtained from the molecular dynamics trajectory used to generate the training set, but never used in the construction of the neural network.Since, the molecule are, by construction, always aligned in the x-y plane, there are no forces along z.The parity plot compares the fully converged DFT forces (y axis) with those predicted from the ML DM without operating any SCF iteration (x axis).Points on the parity line are predicted exactly.The graphs also show histograms of the distributions of the atomic forces.Note that along our molecular-dynamics trajectory the forces can be as large as 2 eV/Å, but typically they are concentrated within ±0.25 eV/Å.
Clearly, there is an extremely good mapping between the ML-DM-predicted forces and the exact ones, with the vast majority of the points staying on the parity line.This is reflected in the almost identical force distributions.Quite interestingly, there is no biased distribution of errors across the range of force magnitude explored, in contrast to what usually found for ML force fields, where the largest error is encountered for small forces.The mean absolute error (MAE) is calculated at to the ground-state energy) as a function of the iteration number, n, for convergence driven by the DIIS and SOS mixing scheme, respectively.In panel (c) we present the norm of the difference between the ground-state converged DM, ρ GS , and that computed at the n-th iteration, ρ n .In this case we follow the DIIS-driven SCF cycle.For ease of visualization in all plots the y axis is on a logarithmic scale.In panel (c) we present the norm of the difference between the ground-state converged DM, ρ GS , and that computed at the n-th iteration, ρ n .In this case, we follow the SH: DIIS-driven SCF cycle.For ease of visualization in all plots the y axis is on a logarithmic scale.
126 meV/Å and 62 meV/Å, respectively for the x and y component.Such error can be further reduced by noticing that the ML-generated DM does not necessarily describe an integer number of electrons.This feature can be corrected by re-scaling the ML-generated DM of a factor N e /N ML e , where N e is the total number of electrons and N ML e that computed using the as-generated ML DM, N ML e = Tr ρ ML • S , with S being the overlap matrix.For S 2 O we find that typically N ML e is less than 0.1% different from N e , but the correction is enough to bring down the MAE to 62 meV/Å and 22 meV/Å, respectively for the x and y component (the parity plots of Fig. 5 have been obtained with the forces computed after such DM re-scaling).This error is significantly lower than what typically found for state-of-the-art force fields [52][53][54].Although a thorough comparison is not simple, since the analysis needs to be carried out with the same molecules, same training set size, etc., this result clearly demonstrates that predicting the DM to be used in non-self-consistent DFT, can be a valid alternative to the construction of a force field.Namely, the forces obtained from the ML-predicted DM can be used as driver for molecular dynamics.This aspects is explored last in the next section.

D. Non-self-consistent molecular dynamics
As a final test, we now perform molecular dynamics simulations by using the ML-predicted DM.In particular, we use the forces obtained after rescaling the DM by N e /N ML e and after a single SCF step.Such step is needed, since the re-scaled DM seems to have a total energy marginally lower than that of the ground state.The molecular dynamics simulations are then performed at 150K for S 2 O and H 2 O for total of 0.14 and 0.12 nanosec- onds, respectively.In both case we use the rotation procedure described in the method section to avoid the need of an equivariant model.The trajectories obtained are then compared with the fully converged ab-initio ones, and with those computed by performing only a single DIIS self-consistent step, starting from the PySCF default 'minao' charge density.The different molecular dynamics trajectories are monitored and compared by looking at the bond length and bond angle thermal distributions, which are presented in Fig. 6 for the two molecules.In the case of H 2 O [panels (c) and (d)] there is no significant difference between the various methods, with rather similar distributions.This The S 2 O case is, instead, quite different.From panels (a) and (b) of Fig. 6 one can appreciate that the ML DM provides an excellent estimate of the fully DFT-converged one, so that the thermal distributions of bond lengths and angle are rather similar to those obtained with fully abinitio molecular dynamics.In this case there are three bond lengths corresponding to the S-O bond, the S-S one and the second S-O distance between the two most peripheral atoms.The centers of the distributions are close to the reported experimental values of 1.4650 Å (S-O), 1.8834 Å (S-S) and 3.2505 Å (S-S) [55], and so is the bond angle, 117.876 • , with the remaining differences being attributed to the choice of exchange and correlation energy.This is not the case when the molecular dynamics is performed with a single SCF step starting from the PySCF 'minao' initialization.In fact, the low accuracy in the determination of the forces results in an average structure presenting a significantly compressed S-S bond and a drastic reduction in the bond angle.
Finally, in Fig. 7 we present the decomposition of the total energy over the one-electron, H one , Coulomb (Hartree), H C , exchange-correlation, H XC , and nucleusnucleus, H NN , components.As expected, since the average structures are erroneously predicted, the minaoinitialized molecular dynamics provides distributions pretty far from those obtained with self-consistent DFT.This contrasts the results obtained from our ML DM, which not only describes well the structure, but also all energy components.

IV. CONCLUSION
We have shown that neural networks can be trained to predict the one-particle density matrix required by electronic-structure theories constructed over atomicorbital basis sets.These ML DMs are of sufficient quality to be used as initial guess in Kohn-Sham DFT, demonstrating a reduction in the number of the self-consistent steps needed to achieve convergence over all other common initial choices.Equally important, such DMs return rather accurate energy and forces even without selfconsistency, a fact that enables one to run inexpensive molecular dynamics simulations, whose computational cost is similar to that needed by machine-learning force fields.We have shown here results for three molecules, H 2 O, S 2 O and [Fe(H 2 O) 6 ] 2+ , and density functional theory, but the method is agnostic to the system to describe and the choice of electronic structure theory, as long as this is based on the density matrix.For instance, it can be employed together with wave-function-based quantum-chemistry methods such as Hartree-Fock.It is also important to note that, although here the molecule structure is represented by simple Cartesian coordinates, our proposed method can be implemented with more sophisticated structural descriptors.These can be constructed equivariant [56] and can be descriptive enough to avoid the need of using deep-learning machine-learning models [54].
The drawback of our scheme is that the number of matrix elements to predict scales quadratically with the number of basis functions used in the calculation, so that it becomes increasingly expensive as the system size grows.In practice, many of the matrix elements remain rather small and they can be safely neglected when evaluating the DM for accurate non-self-consistent electronic structure or molecular dynamics.Furthermore, efficiency can be achieved by constructing the ML density matrix over a small basis set and then using it for calculations employing larger ones [30].
It is important to note, however, that whether or not a ML strategy for the construction of the DM is favourable against other solutions ultimately depends on many considerations.These are characteristic of the system to investigate and the workflow adopted.In particular, one may consider three determining factors: 1) the size of the training set needed, namely how many DFT calculations one needs to perform for constructing the model); 2) the size of the optimal neural network as the dimension of the DM grows (heavier networks may be required as the complexity increases); 3) the workflow in which the method is deployed, namely how many calculations one has to perform once the ML model has been constructed.All these factors together determine the 'computational economy' of any ML approach, and a careful assessment must be carried out before a specific computational strategy is selected.
In any case, our scheme will become progressively more convenient as the scaling of the overarching electronic structure method with the system size becomes more prohibitive.In this case, the quadratic scaling of the DM construction is overshadowed by the computational cost to run long self-consistent cycles, and significant savings in computational overheads can be achieved.This can be the case, for instance, of non-local exchange-correlation functionals.and the value of the MAE is reported in all cases.As we can see there is an excellent agreement between the ML-computed DM and the fully converged DFT one, with most of the points lying on the parity line.This is true for matrix elements larger than ∼ 10 −2 , namely for those that have a dominant behaviour on all the observable of relevance (e.g. total energy and forces).Larger relative errors are found for smaller elements, for which the DFT data are already noisy and the ML model can hardly train.This distribution of errors reflects the rather low MAE reported for all cases (see also Table I).The only exception is for a few points in the training set of [Fe(H 2 O) 6 ] 2+ , for which some more pronounced deviations are reported.These, however, correspond to situations where the DFT calculations struggle to converge (see main test) and the DM is the one obtained at the maximum number of self-consistent iterations allowed (hence, not the converged one).These training points are thus discarded.The test set, instead, contains only examples where full convergence is obtained.

VI. SUPPORTING INFORMATION
A Supporting Information file is associated to this paper.This includes the convergence analysis for S 2 O and the non-self-consistent energy and forces for H 2 O and [Fe(H 2 O) 6 ] 2+ .This information is available free of charge via the Internet at http://pubs.acs.org
2+ cation, respectively for H 2 O, S 2 O and [Fe(H 2 O) 6 ] 2+ .Then the rotational degrees of freedom are imposed by selecting an appropriate rotation.For the triatomic molecules, H 2 O and S 2 O, we constrain one atom on the negative x-axis and the second one in the x-y plane, so that the molecules are described by only three coordinates.In contrast, for [Fe(H 2 O) 6 ] 2+ we force the O atoms of the H 2 O ligands to be on the three Cartesian axes and we consider only variation in the Fe-O bond length (the water molecules are taken as rigid)

Figure 2 :
Figure 2: Total average number of SCF steps taken to achieve convergence for different starting DMs and mixing schemes.Here the convergence criterion is on the total energy between two consecutive SCF steps that should be lower than 10 −9 Ha.The black bars around the mean indicate the variance.Variances lower than one iteration are not displayed.

Figure 3 :
Figure 3: Analysis of the SCF cycle for H 2 O.In panels (a) and (b) we show the total energy (measured with respectto the ground-state energy) as a function of the iteration number, n, for convergence driven by the DIIS and SOS mixing scheme, respectively.In panel (c) we present the norm of the difference between the ground-state converged DM, ρ GS , and that computed at the n-th iteration, ρ n .In this case we follow the DIIS-driven SCF cycle.For ease of visualization in all plots the y axis is on a logarithmic scale.

Figure 4 :
Figure 4: Analysis of the SCF cycle for [Fe(H 2 O) 6 ] 2+ .In panels (a) and (b) we show the total energy (measured with respect to the ground-state energy) as a function of the iteration number, n, for convergence driven by the DIIS and SOS mixing scheme, respectively.In panel (c) we present the norm of the difference between the ground-state converged DM, ρ GS , and that computed at the n-th iteration, ρ n .In this case, we follow the SH: DIIS-driven SCF cycle.For ease of visualization in all plots the y axis is on a logarithmic scale.

Figure 5 :
Figure 5: Parity plot for the α = x, y component of the atomic forces computed by using the ML DM, F ML α , with one SCF cycle, against the fully converged DFT ones, F DFT α .Data are here presented for a set of 1000 S 2 O molecules extracted from the same molecular dynamics trajectory used to generate the training set.The upper panel is for the forces x component, while the lower panel is for the y component.The histograms on the side describes the frequency of the forces in the test set.

Figure 6 :
Figure 6: Histograms of the bond lengths, d, and bond angles, θ, along the molecular dynamics trajectories for S 2 O [panels (a) and (b)] and H 2 O [panels (c) and (d)].There are two distinct bond lengths for H 2 O, namely O-H and H-H, while there are three for S 2 O, namely two S-O and one S-S.

Figure 7 :
Figure 7: Histogram of the various energy components along the different molecular dynamics trajectories for the S 2 O molecule.Here we separate the total energy into one-electron, H one , Coulomb (Hartree), H C , exchange-correlation, H XC , and nucleus-nucleus, H NN , components.

Figure 8 :
Figure 8: Parity plots for H 2 O [panels (a) and (d)], S 2 O [panels (b) and (e)], and [Fe(H 2 O) 6 ] 2+ [panels (c) and (f)].The upper panels are for the training set, and the lower ones for the test one.Each graph reports also the MAE achieved.Note that all the parity plots are in logarithmic scale (we plot |ρ ij |) and that deviations are only found for the smaller matrix elements.The colour code describes the density of given DM matrix-element values.
2 O and S 2 O molecules converge significantly faster than [Fe(H 2 O) 6 ] 2+ , as their simple covalent bonding structure would suggest.Also as expected, the simple '1e' default provides the worst starting DM and convergence is achieved in five iterations for H 2 O and S 2 O and about 16 for [Fe(H 2 O) 6 ] 2+ .The other conventionally constructed starting DMs appear to perform rather similarly to each other with the two covalently bonded molecules converging in about 3-4 SCF steps, and [Fe(H 2 O) 6 ] 2+ in about 10.Most importantly, our DM significantly outperforms any other methods, with a single SCF iteration being necessary for H 2 O and S 2 O, while [Fe(H 2 O) 6