Exploiting the Hessian for a Better Convergence of the SCF-RDMFT Procedure

One-body reduced density matrix functional theory provides an alternative to density functional theory, which is able to treat static correlation while keeping a relatively low computation scaling. Its disadvantageous cost comes mainly from a slow convergence of the self-consistent energy optimization. To improve on that problem, we propose in this work the use of the Hessian of the energy, including the coupling term. We show that using the exact Hessian is very effective at reducing the number of iterations. However, since the exact Hessian is too expensive to use in practice, we propose an approximation based on an inexpensive exact part and BFGS updates.


Introduction
0][21][22][23][24][25] The large cost of the self-consistent field (SCF) minimisation of the ground state energy in RDMFT (SCF-RDMFT) remains, nevertheless, an obstacle to its acceptance by the community. 17,26,27The large cost is mostly incurred by the exceptionally large number of iterations.The primary focus of this work is then to reduce the number of SCF iterations to bring RDMFT closer as a practical alternative for cases where DFT fails.DFT relies on the Hohenberg-Kohn theorem, 28 which proves that the total energy can be written as a functional of the density for vrepresentable densities.A decade later, Gilbert used the same approach to show that the total energy can also be expressed as a functional of the one-body reduced density matrix (1-RDM) γ, for v-representable 1-RDMs. 29The advantage is that more general perturbations can be considered than only local potentials, e.g.magnetic fields, and that the kinetic energy is a simple linear functional of the 1-RDM.The total energy functional then reads where x i = r i σ i denotes a combined space-spin coordinate, ∆ r the Laplacian with respect to r, v ext the (non-)local external potential and W the interaction energy functional, which Gilbert defined as with Ψ an N e -particle wave function, Ŵ an electron-electron interaction operator, which is the Coulomb interaction in practice.Note the similarity with the Hohenberg-Kohn functional in DFT, but also the striking difference that the kinetic energy is not part of the universal functional, so 'only' the interaction energy, for which we do not have an exact explicit expression in terms of the 1-RDM, thus needs to be approximated.However, Gilbert's approach suffers from the same problem as the Hohenberg-Kohn theorem: in order to minimise the functional one needs to remain in the domain on which the functional is defined (domain of v-representable 1-RDMs 30,31 ), which is unknown in general.This problem was partially solved by Levy,32 who extended the functional to the domain of pure state N -representable 1-RDMs, i.e. 1-RDMs which can be generated by an N eparticle wave function via γ(x, x ′ ) = N e • • • Ψ(x, x 2 , . . ., x N ) The universal functional is then defined via the constrained search construction as 4][35][36] Valone therefore generalised Levy constrained search to the domain of ensemble N -representable 1-RDMs, 37 W V [γ] := min {w P ,Ψ P }→γ P w P ⟨Ψ P | Ŵ |Ψ P ⟩. (5)   The prescription {w P , Ψ P } → γ means that the ensemble composed of states {Ψ P } with respec-tive weights {w P } generates the 1-RDM via 38 γ(x, x ′ ) = N e P • • • w P Ψ P (x, x 2 , . . ., x N ) where w P ≥ 0 and P w P = 1.The functional W V [γ] also has the nice property to be convex. 39,40o ensure that a 1-RDM is ensemble Nrepresentable, it is sufficient to impose that Tr{γ} = N e , γ and I − γ are positive semidefinite (I, denoting the identity matrix). 38In order to impose these constraints most SCF procedures in RDMFT work directly with in the basis which diagonalises the 1-RDM 26,[41][42][43] where the eigenvalues n i are called the (natural) occupation numbers (ONs) and the eigenfunctions ϕ i (x), the natural orbitals (NOs). 44][47][48][49] There are several ways to enforce the inequality constraints on the ONs, arising from ensemble N -representability, and in particular parametrization by some suitable function of a variable x seems to be a popular choice.One of the earliest parametrizations is the cos(x) 2 function, 29,[50][51][52] though the Fermi-Dirac function 1/(e −βx + 1) 27,43 and the error function erf(x) 53 are perhaps more attractive choices.
Other authors opt for more direct optimization with Powell 41 or Lagrange multiplier method, in particular also for the constraint on the trace. 19,264,55 One also has to impose the orthonormality of the NOs.The most popular way, to do so is to optimise the NOs using an iterative diagonalisation of a generalised Fock matrix. 19,26,27,42,43nother common way is to write the NOs as a unitary transformation U of an orthonormal basis.The most standard form is U = exp(X), taking X as a real antisymmetric matrix. 17,56,57ome authors also choose to preserve orthonor-mality only approximately to calculate an optimization step and reorthonormalize the NOs afterwards, 41,50,52,58 or brute forcefully apply a Lagrange multiplier method. 59ost approaches described here are based on a two-step scheme in which the ONs and NOs are each optimized in turn.The main disadvantage is that no information of the coupling between the NOs and ONs is used, so typically a large number of macro-iterations (i.e. one pass over an ON optimisation followed by an NO optimisation) is needed to obtain consistency between the NOs and ONs.A notable exception investigating a simultaneous optimisation of NOs and ONs with preconditioned conjugate-gradient method was however published during the review process of the present work, 60 though that work still does not include the coupling between ON-NO within a step.
A second order single step method which optimizes the NOs and ONs simultaneously seems therefore a sensible direction.The coupling between the NOs and ONs is naturally included via the off-diagonal ON-NO blocks of the hessian in a (quasi-)Newton scheme.An additional reason to use the hessian is that the unitary parametrization of the NOs U (X) yields a nonconvex energy functional in terms of the parameters X (even if the functional is convex in γ e.g. the exact RDMFT functional).The energy landscape becomes much more bumpy in X parameter space and the hessian provides very useful information to guide the optimization algorithm through curved valleys.Since it is not straightforward to include second order information in the iterative diagonalisation approach, we focus in this work to the orthogonalisation of the NOs via a parametrisation by a unitary matrix.The aim of this work is to reduce the total number of iterations required for the SCF procedure to converge to arbitrary precision.To achieve this goal, we proceed as follows.After giving some details on the implementation, in section 2 and deriving the expression of the exact hessian in section 3, we look at the waste of optimisation steps that can arise when optimising the ONs and NOs separately, in section 4. Then we try to extract as much information as possible from the exact hessian in section 5. Third, we introduce an intermediate set of variables, and combine it with our second point to obtain an approximation of the hessian in section 6.We finally conclude in section 7.

Numerical details
The purpose of this research being to improve the SCF convergence, we want to avoid as much as possible additional difficulties coming from the functionals themselves.To do so, we have decided to test our methods exclusively on the Müller functional, also known as the Buijse-Baerends (BB) functional, 61,62 known for its convexity w.r.t. the 1-RDM. 63For the Müller functional where the orbitals {ϕ i } can be taken real as we work in a, non-relativistic setting and restricting to the spin-summed case, where 0 ≤ n i ≤ 2, from now on.As mentioned in the introduction, though the Müller functional is convex w.r.t. the 1-RDM, the unitary parametrization U (X) yields a functional which is not convex w.r.t.X.Thus the hessian is generally indefinite even for this simple functional.It is therefore preferable to turn to methods able to handle indefinite hessians, such as trust-region methods. 64,65We then work with a trust-region (quasi-)Newton algorithm.As the cost of the method will be dominated by the amount of evaluations of the functional and its derivatives, we report all iterations, including the steps rejected by the trust-region algorithm which will be responsible for 'jumps' in the energy convergence in the figures.Our implementation is available at https://github.com/NGCartier/SCF-RDMFT_Hess_investigation and uses PySCF software 66 and the C++ implemen-tation of the fides package, 67 relying on an algorithm proposed by Coleman and Li. 68,69This method is expensive because of the use of the eigenvalue decomposition of the hessian.However, a trust-region conjugate-gradient method, which is much more affordable, could be used as a more economical alternative.Throughout this work, we use the Explicit By Implicit (EBI) method 53 to impose the Nrepresentability conditions on the ONs.The EBI approach consists in parameterising the ON n i = 1 2 (erf(x i + µ) + 1) ∀i, where x i are the variables to optimise and µ is computed to satisfy the constraint on the trace of the 1-RDM.Since many functionals (including the Müller functional) depend on the square root of the ONs, in order to avoid diverging derivatives for n i going to 0, 70 we adapt the EBI parametrisation as follows The NO orthonormality is imposed by a unitary parametrisation via an exponential (U = exp(X)), as explained in the introduction.We test our algorithm on alkanes, primary alcohols, the H 2 O, HF and N 2 molecules.As example of strongly correlated system we also add a stretched version of the N 2 molecule, at a bond length R equal to height time the equilibrium bond length R e , denoted N 2 (R = 8R e ).
The results are obtained in a cc-pVDZ basis for H 2 O, alkanes and primary alcohols and cc-pVTZ for HF, N 2 and N 2 (R = 8R e ), starting form Hartree-Fock orbitals and a Fermi-Dirac like distribution of the ONs.For the micro-iterations (i.e.single update of the ONs or NOs) we used the gradient, g (k) < 10 −6 (i.e.2-norm of the gradient), as termination criterion and for the macro-iterations we used the energy difference w.r.t. the previous macroiteration, |∆E| < 10 −8 for both our implementation of 2-step and 1-step algorithms as depicted in Fig. 1.To evaluate the convergence of the energy we report at each micro-iteration the energy difference with the best energy E ref we could obtain for each system.

The exact hessian
In the following we use Greek indices for the atomic orbital basis and Latin ones for the NO basis.Expanding the NOs in a given orbital basis {|χ µ ⟩} naively as (where Y is an anti-symmetric matrix) leads to a very complicated form of the gradient and the hessian, since the derivatives of the unitary matrix U at general Y are very complicated.However, at Y = 0 the derivatives become quite simple, 57 so instead, the expansion is only made about the current iterate where as Y (k+1) and Y (k) do not commute in general.Expanding the derivatives around the current iterate is exact for an exact energy hessian H exa , but in the case of an approximate hessian more care is needed as it is based on the the difference between gradients at different iterates (see equation (45) in the supplementary material).In the {|χ µ ⟩} basis the 1-RDM then attains the form To derive the gradient and exact hessian, U (X) can be expanded in a Taylor series for common parameterizations (Cayley 71 , exponential).For simplicity and consistency with our implementation, we will take the case U (X) = exp(X), we have The first and second derivatives are then Inserting it into the derivative of ( 13) we get 47][48]52 The energy functional then takes the form where From there, we can derive explicit expressions of the entries of H exa (see Appendix A).In the following we will show that the exact hessian provides a good convergence.Unfortunately, the excessive cost to compute H exa of O(N 5 ) (with N , number of orbitals) makes its use for practical calculations not a viable option.Therefore, we will aim for an efficient approximation that retains as much of H exa as possible in sections 5 and 6.

Consistency between NOs and ONs
It is common in RDMFT to optimise the NOs and ONs successively in two separate steps.However, if not done carefully, a 2-step procedure may lead to a waste of computational time, when the implementation keeps optimising the NOs to a high precision, though the change of the ONs in the next step will make such a high precision irrelevant (the reciprocal is also true but the impaired computational cost is negligible).As a demonstration we show the energy convergence at each consecutive microiteration step of a 2-step optimization in Fig. 2.
The wasteful NO optimization steps is reflected by the plateaus in the energy convergence.It is possible to limit this unwanted behavior by choosing, for example, a gradually tighter termination criterion for the optimisation of the ONs and NOs, but the choice of such procedure can be rather difficult (see supplementary material Sec. 4 for details).
The problem of wasteful NO optimization steps is readily circumvented by optimizing both the ONs and NOs simultaneously.An additional advantage is that the coupling between the ONs and NOs can readily be taken into account by including their cross derivative in the hessian.
The results for this 1-step procedure based on the full exact hessian are shown in Fig. 3.The advantage of using the 1-step method is strikingly clear by comparing these results to the 2-step results from Fig. 2. Indeed, we obtain a significant reduction of the total number of iterations, the energy converging to an error of 5 • 10 −8 Hartree for all molecules of the set within 70 iterations.The 2-step procedure, on the other hand, needs a few hundred iterations for the tested molecules.Moreover, the only additional cost per iteration for the 1-step procedure comes from the coupling between NOs and ONs, which is of order O(N 3 ) only (we have N variables for the ONs).In contrast, the number of entries in the hessian dedicated to NOs is of order O(N 4 ), since the matrix parametrising the NOs, X, counts N (N − 1)/2 degrees of freedom, for a basis of size N , which means that the cost of the coupling block is asymptotically negligible compared to the hessian of the NOs.For a fair comparison, we have tried to keep the implementations as similar as possible, so we have retained the macro-condition of the 2-step algorithm in the 1-step one.In principle, this condition should not play any role in the 1-step case as, it acts as a double check by restarting from the converged point and does indeed not appear when using the exact hessian.We, nonetheless have observed that this restart of the 1-step optimisation can regularly save the convergence as it resets the approximation of the hessian, which may have significantly diverged from the exact one.
To verify that the 1-step algorithm converged to a minimum, we have computed the eigenvalues of H exa and checked that they are all nonnegative at the end of the convergence.We plot in Fig. 4 the number of negative eigenvalues of the ON and NO block of the hessian, as well as the difference between the number of negative eigenvalues from the total hessian and and the two aforementioned blocks, which we can interpret as coming from the ON-NO coupling block.Indeed, the number of negative eigenvalues goes down to zero for all tested molecules.Note that the hessian can initially contain a large number of negative eigenvalues, putting the emphasis on the nonconvexity of our problem even for the Müller functional.Moreover, most of the negative eigenvalues come from the NO block of the hessian (middle panel), indicating that the hardest part to handle numerically is the optimisation of the NOs.One may wonder how important the coupling block of the hessian is, which we included in the 1-step procedure, especially as the number of negative eigenvalues (Fig. 4) indicates that the most difficult part is actually the NO block.
To investigate this point, we have tested the 1step algorithm with the coupling block of the hessian set to zero.The results are shown in Fig. 5. Comparing Figs. 3 and 5, we observe that the convergence is roughly similar for most molecules, except for N 2 .Somewhat surprisingly, N 2 at equilibrium distance is more chal-   bertz 72 to reduce the scaling of the contribution to H exa from the first term of (30), first two terms of (33) and the first height lines of (37), from O(N 5 ) to O(N 4 ) (see Fig. 6).Indeed, denoting , ( 33) and (37) become Figure 5: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using the exact ON-ON and NO-NO blocks of the hessian and setting the ON-NO block to 0 for different molecules. and We can analogously compute the corresponding terms of ( 29), (32) and (36) in O(N 4 ), by computing separately v J ij = p f (n p )[ij|pp].Moreover, the one-electron part of H exa can clearly be obtained in O(N 4 ) [even only O(N 3 )].
For the following expressions, we use uppercase for indices going from N + 1 to N (N + 1)/2, which can more conveniently be mapped to two indices (the first one going from 1 to N and the second one being strictly lower than the first one).We denote H exp such that for the ON block, for the coupling block and for the NO block and H cheap = H exa − H exp .The scaling to obtain H cheap is then O(N 4 ) for a separable functional (that is, the same as to obtain the gradient), and we want to approximate H exp only.
To do so, we can first neglect H exp entirely and take B = H cheap as an hessian approximation.
(We use B to denote an approximation to the hessian).We tried it for our set of molecules in Fig. 7 and observed a good convergence for the first few tens of iterations.This approximation can even accidentally outperform the exact hessian (see the H 2 O molecule in Fig. 8).However, this is a very crude approximation and the algorithm cannot converge for half of the molecules.This seems to indicate that H cheap provides a good approximation of H exa for the first few iterations which is sufficient to make H 2 O, CH 4 and HF converge up to 10 −8 Hartree but only 10 −2 for N 2 and 10 −4 for C 2 H 6 .
To assess this assumption, we looked at the error made by taking B = H cheap in Fig. 9 for the different blocks of the hessian.This shows that the ON block of the hessian is decently approximated by H cheap while the off-diagonal elements of the NO block and the coupling block are less accurate.The pattern of the upper panels of Fig. 9 shows that the error in the NO block remains low when two indices of X ij and X pq are equal, that is, when H cheap ijpq ̸ = 0, highlighting the use of H cheap .To avoid convergence issues for most systems, we thus need a suitable approximation of H exp and will propose one in section 6.

Approximate hessians
The most common numerical approximation of the hessian in the literature is the BFGS approximation. 73The BFGS update is obtained by demanding that the new approximate hessian B is close to the previous one B (k) under the constraints that the new approximate is symmetric and that B satisfies the secant equation. 73That is, the BFGS update solves min (21)   where s (k) and y (k) the step and difference of the gradient between the (k + 1) th and k th iterations respectively.The constraint Bs (k) = y (k) is the secant equation, which simply means that the hessian corresponds to a quadratic model through the last two iterates.Using the Frobenius norm, we obtain the standard BFGS update (see section 6 of the supplementary material for details) Unfortunately the bare BFGS approximation of the hessian does not provide a decent approximation of the exact hessian, leading to a poor convergence of the energy (see section 2 of the supplementary material).The BFGS approximation is intended to approximate the hessian for convex problems.However, due to the parametrization, we do the energy minimisation w.r.t.x = (x, X up ), where X up is the vector of the strictly upper triangle entries of X.We will refer to the space of these variables as the x-space.On the other hand, we will call the ν-space ('NU-space'), the space where the energy derivatives can be directly calculated, i.e. ν = ( √ n, U ), where U comprises all entries of the full N × N matrix.We can readily transform the relevant derivatives from the ν-to the x-space using the Jacobian matrix J pq = ∂νp ∂xq and its derivative.The hessian tends to have a nicer expression in ν-space (is often quartic in U and the square roots of ONs) meaning that we can hope that the BFGS provides a better approximation in this space.Also the fact that the unitary parametrization tends to make the hessian indefinite, indicates that the BFGS update on the hessian in ν-space might work better.Since we can readily calculate the cheap part of the hessian in O(N 4 ) (see Sec. 5), we only need to approximate the remaining part in ν-space as B exp ν ≈ H exp ν .Denoting ∇ ν the gradient in ν-space, ∇ 2 x the hessian operator in x-space, the expensive energy hessians H exp in x-and ν-space are related by the chain rule as and the total hessian is Now we have several choices to define closeness to the previous step.One sensible option (another is presented in the supplementary material in Sec.3.3) is to make the hessians close in ν-space, whilst still demanding the secant where We thus only have to replace B → B exp ν , s → sν and y → ȳν − ξν in (22) and obtain This approximation (Fig. 10) unfortunately cannot match the good behaviour of the B x = H cheap x approximation (Fig. 7) for the first it- erations.In section 5 we attributed the initial good convergence of the B x = H cheap x approximation to a sufficient approximation of the hessian.To take full advantage of the good convergence of B exp x = 0, we then decided to turn on the BFGS approximation of H exp only when the algorithm start to converge with B x = H cheap x , to correct the convergence if it was not going to an actual minimum.Since H cheap x approximated the ON block of the hessian particularly well, it seems reasonable to look primarily at the ON step.In practice, we add a factor We report the result in Fig. 11.By comparing Fig. 10 and Fig. 11 we see that the prefactor A significantly improves the convergence, even beyond the first few iterations.Although the convergence is not as good as for the exact hessian (Fig. 3), it is able to reach an error of 2 • 10 −8 Hartree within 280 iterations for all molecules.To reach 10 −8 Hartree, we would need to use a stronger termination condition than our default settings (see Fig. 1).It may be informative to look at the evolution of A to understand the plateaus in the convergence of the energy for some molecules using B exp ν .We plot A and the energy error as functions of the iteration in Fig. 12.Only the result for the CH 3 OH molecule is reported here, but the results are qualitatively similar for the other molecules that do not converge with the B exp = 0 approximation.We observe that A does go from 0 to 1, but oscillates, inducing the plateaus in Fig. 11.This indicates that the choice of A is not optimal and that the algorithm can be improved by building a better prefactor.In order to compare the above approximation with existing results, we tested our molecules with the Müller functional implemented in the Natural Orbital Functional (NOF) Theory module 74 of the molgw software. 75This module is based on the widely use RDMFT program, DoNOF. 52DoNOF was developed for recent Piris NOF functionals 24,47,48 and inherits the partitioning of the orbitals in seniority-zero subspaces, meaning that the 'Müller' functional in this module does have to comply with additional constraints, but is believed to by decently close to the actual Müller functional.More precisely DoNOF employes a 2-step procedure, optimising the ONs with a limited memory BFGS approximation and the NOs by using an iterative diagonalisation approach (with a maximum of 30 NO iteration per macroiteration).The later corresponds to a steepest descent and thus struggles to converge to high precision.For that reasons, it has been decided report the results to reach an error of 10 −3 Hartree (instead of 10 −8 up to now) in Table 1.The DoNOF implementation does (exactly or with reduction of identity) the 4-index transformation explicitly, which needs to be computed at each NO iteration but only at the first ON iteration of a macro-iteration, meaning that the total number of expensive iterations is equal to the number NO iterations plus the number of macro-iterations.On the contrary if it would use the idea of ref 72 , to avoid explicit 4-index transformations, the cost of NO and ON iterations would be of the same order and the total number of expensive iterations the sum of the NO and ON iterations.We thus report in table 1, the number of macro-, NO and ON iterations.To compare to our code, we will focus on the number of expensive iterations as imple-mented in DoNOF (i.e.macro-plus NO iterations).We obtain that DoNOF needs between 3000 and 7000 iterations while our algorithm always reaches the required error within 30 to 90 iterations, which is an improvement of two orders of magnitude.

Conclusion
In this work, we used an optimisation based on the hessian to reduce the number of iterations of the SCF-RDMFT procedure.We first added the cross terms in the hessian, providing an efficient 1-step optimisation method, in  contrast to the more common 2-step one.As a baseline, we have shown that a trust-region optimization with the exact hessian provides a very effective algorithm for this.However, using the exact hessian, makes the iterations too expensive to be useful in practice.Thus we have derived an accurate approximation of the hessian.We first extracted an affordable part of the exact hessian.Using only this cheap part, the number of iterations is actually of similar order as for the exact hessian if it converged, but half of our test set actually did not converge.A rather robust and effective approximation was obtained by using the cheap part of the hessian exactly and use a BFGS-like approximation to only approximate the expensive part in the more suitable ν-space.This approximate hessian resulted in an algorithm which converged for all eight molecules within a few hundred iterations to an error less than 10 −8 Hartree.Although the numerical results show a great amelioration of the convergence compared to a naive algorithm, there is still some room of improvement compared to the exact hessian.To enhance the convergence even further, a more accurate approximation of the hessian seems to be required.A first step in this direction would Table 1: Number of iterations for the molgw implementation 74,75 of the DoNOF code 52 and our last approximation (Fig. 11) to converge to an energy error of 10 −3 Hartree for different molecules (the reference energy for the DoNOF code is the energy obtained after a large number of 200 to 300 iterations and for our code the energy obtained with the exact hessian in fig.3).For DoNOF we report the number of macro-iterations (left column), the total number of NO iteration (middle column) and the total number of ON iterations (right column).

DoNOF
Fig. be to build a consistent secant equation in xspace.
The plateaus in the energy versus iteration probably indicate that the algorithm is struggling with relatively flat areas in the energy landscape i.e.NOs close to 0 or 2, which may become especially problematic for large basis sets.This might also explain why when using even the exact hessian, convergence is not that fast as one would expect.So another direction to improve the algorithm is to avoid these flat areas, by modifying the parametrisation.
We have only used the Müller functional in our tests.Although the the approximate hessian presented here can be applied to non-separable functionals, the computation of the energy for these functionals is already in O(N 5 ) and using H exa would therefore be preferable.Moreover we can use the resolution of identity to obtain a scaling of O(N 4 ) for both the computation of the energy and H exa .However non-separable RDMFT functionals are not convex and some even display discontinuities respect to the occupations, which may hinder the convergence.The efficiency of our approach for non-separable functionals has thus yet to be assessed, which will be subject of future work.
Inserting the derivatives of the 1-RDM in (16)  we get, for the NO block of the hessian for the coupling block, if F (n i , n j ) = F (n j , n i ), which holds for all functionals we are aware of.And, for the ON block, we get Figure 16: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using equation (42) in solid lines and (22) in dashed lines for different molecules.Figure 18: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using (47) in solid lines and (42)  in dashed lines for the H 2 O and CH 4 molecules.
NO iteration cannot be to small, otherwise the update do not sufficiently change the energy and the algorithm stops (grey line).We can also notice that changing the maximum number of iteration can significantly impact the final number of iterations (purple line needs around 200 iterations to converge to 10 −8 Hartree, while the red line, more than 400), and that a small number of NO iterations per ON iteration is more efficient.This indeed corresponds to limiting the amount of inefficient NO iterations.
5 Other separable functionals 5.1 The Power functional with α = 0.7 Our approach requires additional work to be adapted to non-separable functionals, but can readily be employed for separable functionals other than Müller.Unfortu-nately, there are few of then, all taking the form, where α ∈ (0, 1].3][4][5] As an example, here we take α = 0.7 and test our algorithm with the exact hessian in Fig. 22, approximations (27)  and (44) (with H cheap exactly and damping factor as in section 6 of the article) in Fig. 23 and Fig. 24 respectively.
With the exact hessian, the number of iterations to  reach a precision of 10 −8 Hartree is similar to the case of the Müller functional (see Fig. 3 of the article) for most molecules, but the convergences display more unsuccessful steps (energy jumps) for diatomic molecules and C 3 H 8 , and fails to converge beyond an error of 10 −7 (purple and light blue lines of Fig. 22).The algorithm moreover struggles to converge for the strong correlated system (dark blue line), and only reaches the convergence criterion after 240 iterations.The approximation (44) seems to work worse with the Power functional, it fails to converge for a certain number of molecules, including CH 4 which did not prove to be challenging up to this point (see Fig. 23).Fortunately, the update (27) gives convergences comparable to the case of the Müller functional (Fig. 24 compared to Fig. 17), and most molecule reach an error of 10 −7 to 10 −8 Hartree (as the exact hessian, Fig. 22) within 200 iterations.A notable exception is the stretched N 2 molecule, which did not converge, but this system was already challenging for the exact hessian.

The Hartree-Fock functional
Although, not practically relevant it is possible to define a functional in 1RDMFT, corresponding (i.e.giving the same ground state 1RDM and energy) to the Hartree-Fock method.On the contrary to the Müller functional, the Hartree-Fock is concave in the space of 1RDM, 6,7 Figure 21: Convergence of the energy (E (k) − E (ref) with respect to the sum of ON and NO micro-iteration counts) using different limits on the number of iteration for the optimisation of the NO and ON at each macro-iteration, for the H 2 O molecule, and exact hessian.The first number indicates the maximum number of ON iteration per macro-iteration and the second the maximum number of NO per macro-iteration.
Figure 22: Convergence of the energy (E (k) − E (ref) given by the Power functional (α = 0.7) with respect to micro-iteration count) using the exact hessian.
which makes it a relevant test for our method.We thus report the energy convergence with the Hartree-Fock functional when using the exact hessian in Fig. 25.While the algorithm converges quiet well to an error of 10 −6 Hartree for all the tested molecules, it fails to improve beyond that precision.The reason is that the error function in the EBI parametrisation tends to be very flat when the ONs tend to 0 or 2, which makes the gradient and hessian vanish, preventing the algorithm from converging further.A way around it is to use an other parametrisation associated with a projection of the ONs as proposed by Cancès and Pernal, 8 where x i ∈ R and ν ∈ R + s.t.Tr{γ} ≤ N e .We have tested our algorithm while replacing only the way we handle constraints on the ONs i.e.EBI by (49) and report the results in Fig. 26.Aside from some instabilities due to the immaturity of the code, the approach of equation (49) seems to indeed solve the problem and allow the algorithm to converge decently fast to 10 −8 Hartree.It has also been possible to test the approximation of section 6 of the article with the Hartree-Fock functional and using (49), which give a convergence comparable to the one of the exact hessian (up to different jumps due to the instabilities), as shown in Fig. 27.ref) given by the Power functional (α = 0.7) with respect to micro-iteration count) using H cheap x exactly and equation (44) to approximate H exp with the prefactor A(sn, 10 −3 ) (see text in the article).
Figure 24: Convergence of the energy (E (k) − E (ref) given by the Power functional (α = 0.7) with respect to micro-iteration count) using H cheap x exactly and equation (27) to approximate H exp with the prefactor A(sn, 10 −3 ) (see text in the article).

Derivation of the BFGS approximation
The idea of (quasi-)Newton methods is to take the derivative of a second order model m of the function f to optimise, which writes for the k th iteration where B is the (approximate) hessian.We then impose that the derivative of m coincides with the derivative of f at the last two iterations.For the last iterations it is automatically verified by taking p = 0, and for the second last iterations we have to take p = −s (k−1) , that is, the opposite of the last step, and we get ∇m (k) (−s (k−1) ) = ∇f (k) − B (k) s (k−1) = ∇f (k−1) .
We follow the proof given by Hauser. 9To ensure that B is symmetric, we write it as B = LL T , L ∈ R N ×N .Taking g = L T s (k) the secant equation becomes Figure 25: Convergence of the energy (E (k) − E (ref) given by the Hartree-Fock functional with respect to micro-iteration count) using the exact hessian.
Figure 26: Convergence of the energy (E (k) − E (ref) given by the Hartree-Fock functional with respect to micro-iteration count) using the exact hessian.Here the boundary constraints on the ONs were handle using (49) instead of EBI.Note that the energy jumps for HF and C 3 H 8 around 10 iterations are due to an instability in the code, which sometimes fails to impose the trace constraint Tr{γ} = N .
Taking the scalar product ⟨A|B⟩ = Tr AB T , ( 53) is equivalent to s.t.L e i g T = y (k) i ∀i, (55)   with e i the i th column of the identity matrix, that is e i g T are the normal vectors defining the affine subspace P g in which L has to be optimised.The minimizer L * is the closest point to L (k) in P g .Thus, L * − L (k) is orthogonal to P g and so a linear combination of the e i g T .( 53) is then equivalent to find L * ∈ R N ×N and Multiplying (56a) from the right by g and utilizing (56b) gives which can be inserted back to yield Figure 27: Convergence of the energy (E (k) − E (ref) given by the Hartree-Fock functional with respect to micro-iteration count) using H cheap x exactly and (27) to approximate H exp with the prefactor A(sn, 10 −3 ) (see text in the article).Here the boundary constraints on the ONs were handle using (49) instead of EBI.The result for CH 4 was not reported has is failed to impose the trace constraint Tr{γ} = N .
Inserting the minimizer L * in the definition of g and rearranging gives meaning that g ∝ L (k)T s (k) .We take g = βL (k)T s (k) and by inserting it into the previous equation, we get s (k)T B (k) s (k) .We now have an explicit expression for L * and simplifying B (k+1) ≡ B * ≡ L * L * T we get the BFGS approximation (22).

Figure 1 :
Figure 1: Algorithm of the 2-step procedure in red, modifications to obtain the 1-step version in blue and common part in black.

Figure 2 :
Figure 2: Convergence of the energy (E (k) − E (ref) with respect to the sum of NO and ON iteration) using the 2-step procedure (see text) for different molecules.

Figure 3 :
Figure3: Convergence of the energy (E (k) − E(ref)  with respect to the sum of NO and ON iteration) using H exa for different molecules.

Figure 6 :
Figure 6: Computation time to compute the exact hessian H exa (red dotes) and cheap part of the hessian H cheap (blue dotes) w.r.t. the size of different molecules.Asymptotic behaviours are fitted (dashed lines), the formal scaling is O(N 5 ) for H exa and O(N 4 ) for H cheap .

Figure 7 :
Figure 7: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using B = H cheap for different molecules.

Figure 8 :
Figure 8: Comparison of the convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using B = H cheap , denoted by • and B = H exa , denoted by × for different molecules.

Figure 9 : 1 divided
Figure 9: Relative error of H cheap in the first iteration H cheap(0) ij −H exa(0) ij

Figure 10 :
Figure 10: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using H cheap x

Figure 11 :
Figure 11: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using H cheap x

Figure 12 :
Figure 12: Absolute energy error (orange line) and the prefactor A(s n , 10 −3 ) used on B exp in ν-space (grey line) w.r.t. the iteration during the SCF procedure, using equation (27),for the CH 3 OH molecule.

Figure 15 :
Figure 15: Error of the BFGS approximation ∥B BFGS − H exa ∥ (dotted line) for the H 2 O molecule (top) and CH 3 OH molecule (bottom).The error coming from the ON block is plotted as dotted red lines, from the NO block as green lines and from the coupling blocks as purple lines.

Figure 17 :
Figure 17: Convergence of the energy (E (k) − E (ref) with respect to micro-iteration count) using H cheap x exactly and (44) to approximate H exp x with the prefactor A(sn, 10 −3 ) (see text in the article).

Figure 19 :
Figure19: Convergence of the energy (E (k) − E(ref)  with respect to the sum of ON and NO micro-iteration counts) using different ϵ termination criteria for the optimisation of the NO at each macro-iteration, for the H 2 O molecule, and exact hessian.For ϵ dyn , we start at ϵ (0) dyn = 10 −2 and take ϵ(k+1) dyn = 1 α ϵ (k)dyn after each macro-iteration until ϵ (k)dyn reaches the same precision as the global convergence criterion.

Figure 20 :
Figure20: Convergence of the energy (E (k) − E(ref)  with respect to the sum of ON and NO micro-iteration counts) using different ϵ termination criteria for the optimisation of the NO at each macro-iteration, for the H 2 O molecule, and BFGS as hessian approximation.For ϵ dyn , we start at ϵ (0) dyn = 10 −2 and take ϵ(k+1) dyn = 1 α ϵ (k)dyn after each macro-iteration until ϵ (k)dyn reaches the same precision as the global convergence criterion.

Figure 23 :
Figure 23: Convergence of the energy (E (k) − E(ref)  given by the Power functional (α = 0.7) with respect to micro-iteration count) using H cheap