Magnetstein: An Open-Source Tool for Quantitative NMR Mixture Analysis Robust to Low Resolution, Distorted Lineshapes, and Peak Shifts

1H NMR spectroscopy is a powerful tool for analyzing mixtures including determining the concentrations of individual components. When signals from multiple compounds overlap, this task requires computational solutions. They are typically based on peak-picking and the comparison of obtained peak lists with libraries of individual components. This can fail if peaks are not sufficiently resolved or when peak positions differ between the library and the mixture. In this paper, we present Magnetstein, a quantification algorithm rooted in the optimal transport theory that makes it robust to unexpected frequency shifts and overlapping signals. Thanks to this, Magnetstein can quantitatively analyze difficult spectra with the estimation trueness an order of magnitude higher than that of commercial tools. Furthermore, the method is easier to use than other approaches, having only two parameters with default values applicable to a broad range of experiments and requiring little to no preprocessing of the spectra.

5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 f1 (ppm) mixture benzyl benzoate m-anisaldehyde Figure S 9: Spectra of library components and mixture from the Experiment no. 7.For details see Table S1.S1. Figure S 11: Spectra of library components and mixture from the Experiment no. 9.For details see Table S1.

The Wasserstein distance
To introduce the concept of the Wasserstein distance for spectra comparison, we first need to represent a spectrum as a probability measure.This approach is somewhat different than the standard vector representation.The main difference between the two approaches is that the vector representation implicitly assumes the chemical shift positions for consecutive intensity values and thus makes it impossible to compare or add two spectra with different points on chemical shift axis.On the contrary, probability measure respresentation does not require common chemical shift axis for all the spectra, because it encodes the spectrum explicitly as a set of tuples.The difference is visualized in Figure 13.Below, we formally introduce the notion of the Wasserstein distance.We only describe one-dimensional case here, as it is sufficient for the purposes of this paper.Definition 1.Let X ⊂ R and let P(X) be the space of probability measures defined on X. Let's define P 1 (X) as: is called a transport plan between µ and ν.We denote a set of all transport plans between µ and ν with Γ = Γ(µ, ν).Definition 3. Let µ, ν ∈ P 1 (X).We define the Wasserstein distance (the Wasserstein metric) between µ and ν as In a more general formulation, absolute value in the above definition can be replaced by some other distance function ρ : X × X → [0, +∞).Then we have:

Main theoretical result with proof
The main goal of estimation is finding p * such that where µ is spectrum of a mixture, ν p = k j=1 p j ν j is a linear combination of reference spectra ν 1 , ..., ν k and ω, ξ are auxiliary points corresponding to noise in the spectrum of a mixture and reference spectra of components, respectively.For all p = (p ′ 0 , p 0 , p 1 , ..., p k ) we have Moreover, we assume that the mixture's spectrum and the reference spectra are normalized, i.e.: Additionally, we need to assume that This assumption guarantees that the amount of signal removed to ω is not greater than the amount of signal present in rescaled mixture's spectrum (i.e. 1 − p ′ 0 ≥ p 0 ) and the amount of signal removed to ξ is not greater than the amount of signal present in reference spectra (i.e. 1 − p 0 ≥ p ′ 0 ).For simplicity of notation, let us denote κ mixture (penalty for removing noise signal from spectrum µ) as κ and κ components (penalty for removing signal from spectrum ν p ) as κ ′ .
Below we present our main theoretical result: Theorem 1.Let S = {s 1 , s 2 , ..., s n } be an ordered list of all distinct ppm values present in all the spectra.Problem of finding with distance function Let's denote by M (s) and N (s) = k j=1 p j N j (s) cumulative distribution functions of µ and ν p , respectively.We are going to prove our main theoretical result, but first, let's recall a useful theorem proved in [2]: Theorem 2. Let µ and ν be two probability measures on the real line R. Let M and N be the cumulative distribution function of µ and ν respectively.Then Now, we are going to prove the theorem on equivalence by splitting it into Theorem 3 and Theorem 4. Theorem 4 provides computationally convenient way of finding optimal proportions.Theorem 3. Let γ ∈ Γ be a transport plan, g(x) = γ(x, ω), g ′ (y) = γ(ξ, y) and let G(s) and G ′ (s) be the cumulative distribution functions of g and g ′ , respectively.Problem of finding with distance function Proof.Using the definition of the distance function ρ 2 , we can reformulate the Wasserstein distance between (1 − p ′ 0 )µ + p ′ 0 ξ and ν p + p 0 ω as: To use Theorem 2, we need to have a probability measure defined on R 2 , but this is not the case yet.Let's recall that g(x) = γ(x, ω), g ′ (y) = γ(ξ, y) and G(s) and G ′ (s) are cumulative distribution functions of g and g ′ , respectively.Note that for all x ∈ R y∈R γ(x, y)dy Moreover, from assumption 1 − p 0 − p ′ 0 ≥ 0, we infer that 1 − p 0 − p ′ 0 + γ(ξ, ω) ≥ 0. (5) Now, we are going to show that γ |R 2 (x, y)/(1 − p 0 − p ′ 0 + γ(ξ, ω)) is a probability measure.Indeed, we have: where the first equality follows from (3) and the second equality follows from (4) and the fact that µ is a probability measure.We can also find marginal distributions of γ |R 2 (x, y)/(1 − p 0 − p ′ 0 + γ(ξ, ω)) as we are going to need them in the subsequent part of the proof: Knowing that M (s), N (s), G(s) and G ′ (s) are cumulative distribution functions of µ, ν p , g and g ′ respectively, we can compute cumulative distribution functions of the marginals (6)-(7): Now, we can apply Theorem 2 to our problem.We split minimization over all possible transport plans γ into minimization over γ |R 2 and over g, g ′ , γ(ξ, ω): where the third equality is a consequence of Theorem 2, the fact that γ |R 2 (x, y)/(1− p 0 − p ′ 0 + γ(ξ, ω)) is a probability measure and equations ( 8)-( 9).In fourth equality we use the fact (5).The last equality holds, because the minimized expression does not depend on the choice of γ(ξ, ω).
Finally, we obtain: Term κp 0 + κ ′ p ′ 0 does not depend on g nor g ′ , so we get: In the discrete case that is exactly our thesis.
So, as we see from Theorem 3, instead of the original formulation we can now consider: (10) To solve this problem, we use linear programming. where and S = {s 1 , s 2 , ..., s n } is an ordered list of all distinct ppm values present in all the spectra.
Proof.Let's denote: N ij := N j (s i ), g i := g(s i ), Using this notation, we can rewrite our minimization problem as: Constraint ( 11) is a consequence of definition of ϵ i , ( 12) and ( 13) follow from the fact that the total amount of signal removed from experimental and theoretical spectra cannot be greater than proportions p 0 and p ′ 0 , respectively.We do not have equality here, because we leave open the possibility that some amount of signal, γ(ξ, ω), will be transported from ξ to ω.However, this situation lacks clear interpretation and, as we show later, is never optimal.Therefore, it will not happen in practice.Constraint (14) guarantees that the amount of signal removed to ω is not greater than the amount of signal present in rescaled experimental spectrum (i.e. 1 − p ′ 0 ≥ p 0 ) and the amount of signal removed to ξ is not greater than the amount of signal present in reference spectra (i.e. 1 − p 0 ≥ p ′ 0 ).Finally, (15) states that the sum of proportions must be less than 1.
To obtain a linear program in a standard form, we need to have linear function and non-negative variables.We split ϵ i and |ϵ i | using positive and negative parts: We can also get rid of p 0 by substituting

S24
and then we obtain: which is a linear program.(Note that we skip a constant +κ in the objective function, since minimizing an expression plus constant is equivalent to minimizing an expression itself).Now, to simplify the computations, let's find its dual formulation: In order to use matrix notation, let's define ×k as: Let's also split x into two vectors: Note that M = U V and N = U W .Using matrices V and W in computations is much more efficient than using M and N , as they are sparse.Now, we can rewrite the dual program as: To further simplify the computations we change variables: , the following properties also hold: x i = z i − z i+1 for i = 1, 2, ..., n − 2, Using the above equalities, we can rewrite our optimization problem as

Figure S 1 :Figure S 2 :
Figure S 1: Comparison of components' spectra added in proportions estimated by Magnetstein and mixture's spectrum for experiments 1-9, 10.1-10.4 and 11 (Supplementary Fig.1a -1n).For the sake of visibility, only parts of the spectra are shown.

Figure S 4 :
Figure S 4: Spectra of library components and mixture from the Experiment no. 2. For details see TableS1.

Figure S 6 :
Figure S 6: Spectra of library components and mixture from the Experiment no. 4. For details see TableS1.

Figure S 7 :
Figure S 7: Spectra of library components and mixture from the Experiment no. 5.For details see TableS1.

Figure S 10 :
Figure S 10: Spectra of library components and mixture from the Experiment no. 8.For details see TableS1.

Figure S 12 :
Figure S 12: Spectra of library components and mixtures from Experiments 10 and 11.For details, see the Experimental Section in the main text.

Figure S 13 :
Figure S 13: Illustration of two approaches to representing spectra: as a vector (left) and as a probability measure (right).

Table S 1
: Results of estimation for all the experiments (some of them are not mentioned in the main text) obtained from Magnetstein with optimized parameters, Magnetstein with default parameters, ACD/TP and MANIQ.