Optimal Transport Distances to Characterize Electronic Excitations

Understanding the character of electronic excitations is important in computational and reaction mechanistic studies, but their classification from simulations remains an open problem. Distances based on optimal transport have proven very useful in a plethora of classification problems and, therefore, seem a natural tool to try to tackle this challenge. We propose and investigate a new diagnostic Θ based on the Sinkhorn divergence from optimal transport. We evaluate a k-NN classification algorithm on Θ, the popular Λ diagnostic, and their combination, and assess their performance in labeling excitations, finding that (i) the combination only slightly improves the classification, (ii) Rydberg excitations are not separated well in any setting, and (iii) Θ breaks down for charge transfer in small molecules. We then define a length-scale-normalized version of Θ and show that the result correlates closely with Λ for results obtained with Gaussian basis functions. Finally, we discuss the orbital dependence of our approach and explore an orbital-independent version. Using an optimized combination of the optimal transport and overlap diagnostics together with a different metric is in our opinion the most promising for future classification studies.


Introduction
Electronic excitations drive various processes, for example in light harvesting or solar cells, which are important in the exploration of alternative energy sources. 1,2There is consequently significant theoretical interest in calculating excited states and their energies, which could then be used to drive non-adiabatic dynamics and design novel materials.However, excited state calculations are a computationally expensive problem, and various approximations have been introduced to obtain solutions at lower costs.Similarly to the ground state problem, methods based on density functional theory (DFT) have become highly popular as they come at a relatively low computational cost but have a solid theoretical foundation, except for approximations that have to be made to the density functional. 3Despite decades of effort, there is no 'one-fits-all' density functional. 4In particular for electronic excitations in time-dependent DFT (TDDFT), 5 where transition energies are calculated within the linear response setting, it has become clear that a functional's performance can depend heavily on the character of the excitation. 6is is one of the main motivations behind exploring diagnostics that split TDDFT calculated excitations into different types, 7,8 of which there are three.First, Rydberg excitations to energetically high lying and diffuse Rydberg states.Secondly, charge transfer (CT) excitations from a donor to an acceptor which is spatially separated from the donor, either within the same molecule (intramolecular) or in different molecules (intermolecular).CT excitation can carry especially large errors in TDDFT calculations. 6The third group are local excitations, which encompass excitations that fit neither of the other two categories.With a classifier one can decide a posteriori whether the chosen density functional was suitable and if not, repeat the calculation with a higher-level functional.However, such a classification might prove useful beyond picking the right functional.Assigning a label to a new excitation can be a very tedious process, during which one relies on chemical intuition or has to inspect wavefunctions and/or localised orbitals visually.With a quantitative method to decide on the character of the excitation, this step could be extremely simplified. 9 this end, Peach et al. took advantage of the decomposition of an electronic excitation into single orbital transitions within the TDDFT framework and studied a weighted average of the overlap between initial and final orbitals. 7While they did not establish a classification of excitation types, they showed a correlation between their overlap diagnostic and the associated error in excitation energy.1][12] In a similar spirit Guido et al. studied the difference between the electron's centroid in the initial and final orbitals within a single orbital transition. 13The centroid difference is better able to differentiate between CT and valence excitations than the overlap diagnostic, although they stress that the best results were obtained when using them in combination.
An alternative interpretation of electronic excitations is via the formation of an electronhole pair (the exciton).The resulting exciton formalism opens the door for a multitude of other descriptors like the distance between electron and hole, d he , the electron/hole sizes, σ e /σ h respectively, and the size of the exciton d exc .All have been used to characterise excited states 8,14 and the overlap diagnostic has been combined with d he and d exc for classification. 9ing back to a density picture in the TDDFT framework, Moore et al. consider the modulus of the change in electron density between the initial and final orbitals, which they argue is better suited for a classification. 15However, this diagnostic suffers from a similar problem as the overlap: If the sets of points where the initial and final orbitals are nonzero, which will also be referred to as their 'supports', become disjoint, there is a plateau value for both of them and they are agnostic to any variations in the orbitals beyond that point.
In this work, we seek to improve the classification of excitations by considering optimal transport properties of electron densities, which promise to avoid the plateau problem.The article is structured as follows.After laying out the necessary theory on TDDFT and optimal transport in Sec. 2, we cover the computational details in Sec. 3. In Sec. 4, we present the results, both confirming previous studies on the overlap diagnostic and new results using descriptors from optimal transport.Finally, we summarise our findings in Sec. 5 and propose future directions.

TDDFT and the Λ diagnostic
The Kohn-Sham (KS) ground state for an N -electron system is a Slater determinant where A anti-symmetrises and normalises the wave function, x i is the space-spin coordinate of the i-th electron and φ i is the i-th KS orbital. 3The orbitals are solutions to the (timeindependent) KS equations, 16 The ground state is composed of the KS orbitals φ i with the lowest energies ε i (the occupied orbitals).Excited states within linear-response TDDFT are linear combinations of Slater determinants with single orbital excitations into previously unoccupied (or virtual) orbitals φ a . 5The excitation energy and the contribution of a single orbital transition can be determined in the linear response picture by solving the Casida equations, 17 The eigenvalues Ω are the excitation energies and the eigenvectors X, Y contain the excitation and de-excitation amplitudes, respectively.Although equation (3) looks simple, the entries of A and B depend on the exact ground state of the system as well as on the exact form of the exchange correlation functional.Neither of these is currently attainable, but there are various approximations to the exchange-correlation functional with which excitation energies can be obtained. 18However, TDDFT excitation energies can vary greatly in accuracy between different excitation characters and between different functionals. 6 is generally difficult to assign one of the labels "CT", "local" or "Rydberg" to a given electronic excitation.Peach et al. investigated energy errors in electronic excitations based on the overlap of the single orbital transitions, 7 where overlap of the (modulus of the) orbitals.They studied a set of eleven molecules with the PBE, B3LYP and CAM-B3LYP functionals and evaluated Λ for a total of 59 excitations.
While Λ gives some insight into the expected error magnitude for different DFT functionals, it is not able to distinguish between the different excitation types without prior knowledge: Though Λ values of local and Rydberg excitations are well separated, CT excitations fall across almost the whole range of Λ.Hence, one cannot make conclusions about the excitation type based on Λ alone.Optimal transport might be able to help.

An optimal transport diagnostic
At the heart of optimal transport lies the problem of finding a plan to transport a source probability density into a target probability density. 19Imagine a pile of soil next to a hole in the ground.With a wheelbarrow, one can -with some physical effort -carry the soil over and fill the hole.When the hole is entirely filled, the total work that was necessary is the invested 'cost'.For this process, the optimal transport problem would be to plan the wheelbarrow transport such as to minimise the work.As we will be looking at electronic orbitals φ i,a , it is only natural to use the probability densities ρ i,a = |φ i,a | 2 .The (entropically regularised) optimal transport problem is 20 where c is called a cost function, Π(ρ i , ρ a ) is the set of joint probability densities π with marginals ρ i and ρ a , and we use the regularised optimal transport problem problem with the Kullback-Leibler divergence KL (π|ρ i ρ a ) for computational efficiency. 20Because of the entropic regularisation, Eq. ( 5) can unituitively be non-zero even if the orbitals φ i and φ a are identical, which can be fixed by defining the Sinkhorn divergence, 21 which will therefore be used from here on.The Sinkhorn divergence S has the same units as the cost function c. Depending on the application, different cost functions are appropriate.
Here, we will always use the squared Euclidean cost, c(r, r ′ ) = ∥r−r ′ ∥ 2 , for which S has units of a length squared.Note that π has units of length −6 , since it is a joint probability density of two position vectors.ε also needs to have the same units as c, since the Kullback-Leibler divergence is dimensionless.
Fig. 1 motivates the use of the Sinkhorn divergence to study electronic excitations: Consider the case where the target density (ρ a in Eq. ( 5)) is a translation of the source density ρ i .
This could be a crude model of CT with an increasing number of linker fragments between donor and acceptor, for which Mewes and Dreuw have already found a linear dependence of exciton-based diagnostics. 14As soon as the supports of source and target density become disjoint, the overlap is zero and stays zero for any larger translations.It is in that sense blind to translations above some threshold, and Λ will tend towards zero.The optimal transport derived quantities are expected to behave very differently.If the regularisation parameter ε goes to 0, the squared Wasserstein distance W 2 2,0 would be quadratic in the displacement.us to propose a new diagnostic, with the single orbital contribution c ia as in Eq. ( 4).

Computational details
4][25][26][27] The SCF convergence threshold was 10 −7 and the multiple grid m3 was used for the DFT calculation. 28At the end of a converged TDDFT calculation, Turbomole prints a list of single orbital contributions c ia (for c ia > 0.05) and the involved orbitals, which can be obtained on a grid.The grid data is then used to calculate Λ by numerical quadrature and Θ with the geomloss package (version 0.2.5). 29In order to calculate Sinkhorn divergences, the orbital grids have to be chosen large enough to contain all electronic density (in practice, we chose the grids big enough to contain at least 99% of the density).We use an equidistant grid with spacing of 0.8 Bohr for all molecules.
In order to gain some quantitative understanding of the degree of separation into excita-

Optimal transport results
The regularisation parameter ε in Eq. ( 5) is determined based on the maximum value of the cost function d max , with integer σ.In order to determine a suitable σ, we calculated Θ for the dipeptide and N 2 molecules at different σ, the results of which are included in the supplementary information (Tab.S1).These molecules have drastically different grid sizes and are therefore good edge cases to test σ.For σ = 4, Θ is captured to within 2% of its true value, which is sufficient for the purpose of this article and was therefore used in all further calculations.
To start, we compare both Θ and Λ with the error in the electronic excitation energy (Fig. 2), analogously to Fig. 2 in Ref. 7. The two diagnostics exhibit opposite trends: Θ increases while Λ decreases with the excitation error.Further, Θ is generally large for Rydberg excitations and small for local excitations.The inverse is true for Λ, which we can reason as follows: In Rydberg excitations, the electron is excited to a very diffuse and highly delocalised orbital.On the one hand, it will only have very small density within the support of the initial orbital and Λ is small.On the other hand, in order to satisfy the marginals, the transport plan π will have non-zero entries far away from its diagonal, resulting in a large Θ.
In contrast, local excitations have high overlap between the initial and final orbials, which results in a large Λ.Simultaneously, little overall density has to be transported over a small distance and therefore Θ will be small.
Turning our attention to CT excitations, we note that, contrary to our expectations, the CT Θ values lie between Rydberg and local values, similarly to the Λ values.This makes both Λ and Θ unlikely to be suitable classifiers, which is also apparent from the results of a k-nearest neighbours classifier trained using either Θ or Λ (Fig. 3, first two panels): The Λ-classifier (Fig. 3a) performs well for both local and Rydberg excitations, but labels almost all CT excitations wrong.The Θ-classifier (Fig. 3b) performs better for CT excitations but worse for the other excitation types.It should be noted that we are trying to classify imbalanced data, as our data set contains significantly more local excitations than CT or Rydberg ones, which might incur additional errors.However, it is clear from Fig. 2 that even imbalanced learning would not produce a very good classifier: The CT excitations are simply too spread out to allow for this.To understand better why, let us consider the regions of overlap of CT excitations with local and Rydberg excitations, respectively.Rydberg and CT excitations might be difficult to distinguish with the diagnostics used here because there is a property that is completely ignored: diffusivity.We do not have the ability to discern an excitation into a diffuse orbital from one into a translated orbital.Additional metrics that can identify Rydberg orbitals, such as the average distance of the electron to the center of the molecule or higher order moments, might provide a better classification in combination with Θ.In the same spirit, Hirose et al. previously used a combination of Λ and the electron-hole distance relative to the exciton size to classify excitations. 9 One good example of coinciding CT and local excitations are the CT excitations in the HCl molecule.Since the excitations only take place across one hydrogen-chloride bond, the associated Sinkhorn divergences are very small (Θ on the order of 4 Bohr 2 ) and similar to typical Θ values for local excitations in the same molecule.This is in fact not the only molecule where CT excitations are associated with surprisingly low values of Θ but rather a common situation which occurs in all molecules with CT excitation considered here.
We have now pointed out multiple times that there are opposite trends in Θ and Λ.For two Gaussian densities with the same variance, there is even an explicit relation between with a constant C, which motivates us to define a new optimal transport diagnostic where the Sinkhorn divergence is normalised by the variance of the electron position, with and we have assumed that the center of the molecule lies at the origin.Note that Θ ′ is now dimensionless.Plotting the new Θ ′ against log Λ (Fig. 4) reveals a close correlation between the two, which confirms that they capture the same information.Note that Turbomole uses Gaussian basis sets, which may have an influence on the striking correlation in Fig. 4 and we reserve the study of other basis functions to future work.
Fig. 5 shows the confusion matrices for k-NN classifiers trained on the Θ ′ −log Λ data set.
While the log Λ classifier (Fig. 5a) does not improve significantly on the Λ classifier (Fig. 3a), we see a much better performance of the Θ ′ classifier (Fig. 5b) on Θ (Fig. 3b).Again, the combination does not seem to offer much additional improvement, but the current data set is too small to allow for quantitative conclusions.For an explanation for the confusion matrices, see caption of Fig. 3 .

Orbital-dependence
Λ, Θ and Θ ′ are calculated from the Kohn-Sham orbitals obtained in the DFT calculation.
These orbitals are not always useful to assess electronic excitations, as they can be significantly delocalised over the system.Up to now, we followed the same strategy used for the Λ diagnostic in order to have a straight-forward comparison.However, modern developments have focused on natural transition orbitals, which give a physically more reasonable picture of the excitation on an orbital level, 30 on densities and density matrices, where orbitaldependence is avoided al together, [31][32][33] and on the exciton formalism. 9,32,34,35The difference between orbital-and density-based descriptors has also been discussed elsewhere. 36In particular, Savarese et al. conclude that natural transition orbitals are much better suited for CT diagnostics than molecular orbitals. 36We will next briefly discuss some of these newer developments.
For the ground and excited state densities, ρ 0 and ρ X , respectively, Le Bahers et al.
consider the density variation associated with a specific excitation, 31 Positive values of ∆ρ correspond to an increment in density upon excitation, negative values to a depletion.They define a measurement for the CT length, D CT , as the difference between the barycenters of the positive and negative part of ∆ρ.Further, they define the spread of the regions associated with positive and negative changes in the density variation and combine them with D CT to find a diagnostic for the breakdown of lower-level TDDFT functionals in CT excitations.
Alternatively, we can start from the density matrices ρ0 and ρX of ground and excited state.We follow Etienne et al. in defining the difference matrix, 33 If the excited state consists of singly excited Slater determinants, the difference matrix is block-diagonal with an occupied-occupied and a virtual-virtual block. 37The detachment density matrix Γ corresponds to the occupied-occupied block and the attachment density matrix Λ to the virtual-virtual block. 37From these density matrices we can define the attachment and detachment densities, where τ = Γ, Λ. Etienne et al. define the overlap of the attachment and detachment densities, where ϑ = 1 2 R 3 τ ∈{Λ,Γ} ρ τ (r)dr, and show that they can make similar conclusions as for the Λ diagnostic, but with the benefit of orbital-independence. 33 Since the attachment and detachment densities hold the same mass, it is straight-forward to apply our optimal transport formalism.We simply evaluate the Sinkhorn divergence for the attachment and detachment densities, for which our diagnostic can be written as Note that, if the attachment and detachment densities have the same overall shape, the Sinkhorn divergence would reduce to the distance between the centroids of the densities (for small enough ε) and would be closely related to D CT .In Fig. 4, we compared Θ ′ to the overlap-based Λ.In a similar spirit, we compare Θ ′ to ϕ S in the context of densities (see Fig. 6).Compared to Fig. 4, the CT and Rydberg excitations are clustered very differently, showing the orbital dependence of our previous results.At the same overlap value, the Sinkhorn divergence is larger for CT excitations than for Rydberg excitations, leading to a diagonal separating plane, whereas in Fig. 4 the CT excitations were squeezed between the other two regions.The improved separation between the excitation types is also reflected in the associated confusion matrix (Fig. 7), where we reach a near perfect classification.
In future work, Θ ′ should therefore be applied either in the manner of Eq. ( 18) or in the manner of Eq. ( 12), but with natural transition orbitals.Both of these strategies mitigate the extent of orbital dependence.
In Fig. 6, there are still data points that lie close to the boundary between excitation types.In particular in these boundary cases, it is sensible to think of these diagnostics as measuring the extent of an excitation character rather than assigning a definite label.In such cases it may be useful to introduce a fractional decomposition into excitation characters.We can now also question the labels in our data set based on the results in Fig. 6.For example, the 1 B 1 excitation in the formaldehyde (Θ ′ = 1.6, log ϕ S = −1.0) is labelled as a local excitation in our data set, but clearly lies the CT labelled region, an indication that the excitation carries significant CT character.Finally, let us touch on diagnostics derived from the exciton formalism.The electron and hole sizes from the exciton formalism are useful to analyse Rydberg states. 34Further, the exciton size d exc has been shown to be large for CT and Rydberg excitations. 32d exc also works well for centrosymmetric molecules, 35 which have proven challenging for other diagnostics like the electron-hole distance d eh and the CT length D CT . 9,31It can be expected that the optimal transport diagnostics would also struggle with centrosymmetric molecules.

Conclusion
There is currently no reliable way to classify electronic excitations into CT, Rydberg and local excitations.In this report we studied two diagnostics, the overlap-based Λ, 7 the optimal transport-based Θ and their combination to classify an electronic excitation via its k-nearest neighbours.There are strengths and issues in both Λ and Θ and their combination gives a compromise, but it still does not give a viable classifier.We argued that CT excitations can be similar to local excitations in Θ and Λ if the donor and acceptor are close together, especially in small molecules.We further argued that Θ and Λ cannot distinguish between CT and Rydberg excitations because neither of them captures the diffusivity of the final orbital in a Rydberg excitation.
We then showed that there is a relation between log Λ and a modified diagnostic Θ ′ , which is a length scale-corrected version of Θ.The combination of Θ ′ and log Λ gives the best classifier explored in this work which is purely based on KS orbitals, but more studies are necessary into whether Θ ′ is also a sensible diagnostic in programs that use non-Gaussian orbitals.We also discuss the orbital-dependence in Θ ′ and propose a density matrix-based version, which compares attachment and detachment densities.We compare this modification to the overlap between the two densities, 33 and observe a very different clustering of the data points which leads to a significant improvement in the classification.
This suggests that the orbital choice has a significant impact.In future work, we also want to compare Θ ′ diagnostics with other modern CT diagnostics, [8][9][10][13][14][15]31,32,34,35 to learn more about its strengths and weaknesses. Modern diagnostic will additionally help in the differentiation between CT and Rydberg excitations.
While this work was being finalised, a study that shares similar ideas has appeared. 38The study focuses on the density difference defined in Eq. ( 14) and looks at the optimal transport between the regions of density depletion (∆ρ < 0) and density enhancement (∆ρ > 0).We reach similar conclusions, but we have also key differences, like the use of the entropic regularisation to accelerate calculations and a different look at possible classifications.

Figure 1 :
Figure 1: (Qualitative) demonstration of overlap measurements and the Sinkhorn divergence when studying translations, here for Gaussian source and target densities.
calculated Λ and Θ for the same set of molecules as studied by Peach et al., because their data set includes many examples of the three different excitation types. 7The set includes the four small molecules N 2 , CO, HCl and formaldehyde, five larger molecules (dipeptide, β-dipeptide, tripeptide, N-phenylpyrrole (PP) and 4-(N,N-dimethylamino)benzonitrile (DMABN)) and conjugated polymer systems: acenes (1-5 monomers) and polyacetylene (PA) oligomers (2-5 monomers), all of which are shown in Fig. S1.Calculating Λ allows us tion types, we will use a k-nearest neighbor classifier.The data points are first separated into a training and a test set.For each point in the test set, we consult the excitation type of its k nearest neighbors in the training set.The data point is then labelled with the excitation type that the most neighbors belong to.Here, the k = 13 nearest neighbors are consulted, which corresponds approximately to the square root of the training set size.Contrary to other learning algorithms, there is no iterative training procedure, since the training set does not change.4 Results and Discussion 4.1 The dataset The calculated excitation energies, Λ and Θ values are available in the supplementary information.The energies and Λs agree very well with previously reported values except for the PBE excitation in the third PA oligomer, where our Λs seem swapped, and the CAM-B3LYP D 1 ∆ and I 1 Σ − excitations in CO, where our Λs are much smaller in the Turbomole results.In the other cases any deviations are likely due to differences between different quantum chemistry programs, different grids or different convergence thresholds for the SCF iterations.

Figure 2 :
Figure 2: Comparison of Θ (left) and Λ (right) with the error in excitation energy.From top to bottom, the TDDFT functionals are PBE, B3LYP and CAM-B3LYP.

Figure 3 :
Figure 3: Confusion matrix of a k-NN classification using Λ only (a), Θ only (b) and both (c).A cell in the ith row and jth column gives the total number of points in group i that were classified as group j.Hence, the diagonal elements are all correctly identified data points.

Figure 4 :
Figure 4: Combined plot of Θ ′ and log Λ for the present set of excitations.The background colours correspond to the boundaries from a k-NN classification.

Figure 5 :
Figure 5: Confusion matrix of a k-NN classification using log Λ only (a), Θ ′ only (b) and both (c).For an explanation for the confusion matrices, see caption of Fig.3

Figure 6 :
Figure 6: As Fig. 4, but for Θ ′ and ϕ S calculated for attachment-and detachment densities.

Figure 7 :
Figure7: Confusion matrix of a k-NN classification using log ϕ S and Θ ′ for attachment and detachment densities.For an explanation of the confusion matrices, see caption of Fig.3