Improved Automated Quantification Algorithm (AQuA) and Its Application to NMR-Based Metabolomics of EDTA-Containing Plasma

We have recently presented an Automated Quantification Algorithm (AQuA) and demonstrated its utility for rapid and accurate absolute metabolite quantification in 1H NMR spectra in which positions and line widths of signals were predicted from a constant metabolite spectral library. The AQuA quantifies based on one preselected signal per metabolite and employs library spectra to model interferences from other metabolite signals. However, for some types of spectra, the interspectral deviations of signal positions and line widths can be pronounced; hence, interferences cannot be modeled using a constant spectral library. We here address this issue and present an improved AQuA that handles interspectral deviations. The improved AQuA monitors and characterizes the appearance of specific signals in each spectrum and automatically adjusts the spectral library to model interferences accordingly. The performance of the improved AQuA was tested on a large data set from plasma samples collected using ethylenediaminetetraacetic acid (EDTA) as an anticoagulant (n = 772). These spectra provided a suitable test system for the improved AQuA since EDTA signals (i) vary in intensity, position, and line width between spectra and (ii) interfere with many signals from plasma metabolites targeted for quantification (n = 54). Without the improvement, ca. 20 out of the 54 metabolites would have been overestimated. This included acetylcarnitine and ornithine, which are considered particularly difficult to quantify with 1H NMR in EDTA-containing plasma. Furthermore, the improved AQuA performed rapidly (<10 s for all spectra). We believe that the improved AQuA provides a basis for automated quantification in other data sets where specific signals show interspectral deviations.


Table of content
: Explanation of in-and outputs for the improved AQuA implementation Table S4: Additional information data used for the improved AQuA implementation      Step 1: Determine the height (y, i.u.) by searching for the local maximum within the pre-selected spectral region (δstart to δend, ppm).
Step 3.2: Estimate the line width contribution (ppm) on the left side of the signal (δ│0.5y -y│; distance between position δy and, to the left, the position that corresponds to the intensity closest to 0.5y).
Step 3.3: Estimate the line width contribution (ppm) on the right side of the signal (δ│y -0.5y│; distance between position δy and, to the right, the position that corresponds to the intensity closest to 0.5y).
Step 3.4: Calculate the line width (FWHM, ppm) as the sum of contributions (δ│0.5y -y│ + δ│y -0.5y│). For details on how to implement the code in MATLAB, see Table S2. Abbreviations: FWHM, full width at half-maximum; i.u. intensity units; ppm, parts per million.
lib_window(1:57  1:2) Windows for automated peak-picking used in data reduction of the compound library (the values are presented in Table S4) Lorentzian.m Outcome from signal monitoring of the first free EDTA signal (target signal) in experimental spectrum n:1, height; 2, position (bin); 7, line width (ppm). To convert to Hertz, multiply with spectrometer frequency (600 MHz) S2(n, 1:6) Outcome from signal monitoring of second free EDTA signal in experimental spectrum n: 1, height; 2, position (bin); 6, line width (ppm). To convert to Hertz, multiply with spectrometer frequency (600 MHz) lib_HEDTA(1:42000, n) Normalised library spectrum n for free EDTA, matched based on its conditions (positions and line widths) in experimental spectrum n. The ratio between the two signals in the library was set to a fixed value in all 772 library spectra (i.e., the slope in the regression line between the intensities of the two experimental signals from free EDTA in the entire dataset; n=772; k=0.5181; r 2 =0.9976).
normalised_lib_spectra_n(1:42000  1:57) Normalised library n where the fixed spectrum for free EDTA (normalised) has been replaced with normalised library spectrum n for free EDTA target_position_n(1:57  1) Same as target_position, except that the fixed target position for free EDTA has been replaced with the actual position in experimental spectrum n mn(1:57  1:57) The interference matrix, where the contribution from free EDTA has been optimised for the conditions in experimental spectrum n xn (1:57, n) The reporter signal heights of all compounds in experimental spectrum n c_sample_uM(1:57, n) The concentrations (μM) of all compounds in NMR sample n. Note that the three EDTA compounds were not interpreted quantitatively.
library_xn (1:42000  1:57) The library spectra that corresponds to the reporter signals in experimental spectrum n (for proof-of-concept figures only; red) library_xn_sum(1:42000, n) Sum line of all library spectra that corresponds to the reporter signals in experimental spectrum n (for proof-of-concept figures only; blue) a Note that the input data required for a non-improved and an improved AQuA does not differ b The process to create the (fixed) compound library, do spectral binning in the ChenomX software and import a data file with binning results to MATLAB has been described previously in Röhnisch, H. E.; Eriksson, J.; Müllner, E.; Agback, P.; Sandström, C.; Moazzami, A. A. Analytical Chemistry 2018, 90, 2095-2102 c Datasets required to test this implementation (e.g., from exp_spectra) can be provided upon reasonable request S-8 S-9 Results for an experimental spectrum with narrow, high-intensity free EDTA signals (n=1). (B) Results for an experimental spectrum with broader, low-intensity free EDTA signals, slightly shifted up-field (n≠1). Data from the improved AQuA: black dots, yn; red dots, xn. Proof-of-concept spectral lines: black, exp_spectra; library_xn, red; library_xn_sum, blue.    Interference matrix element mi,j: obtained after normalisation of the intensity value extracted from the metabolite library at position δi from compound j (≥0; if i = j, then mi,j = 1); Target height yi: the height of the experimental signal selected for quantification of compound i (modelled as the sum of intensity contributions at position δi from: compound i, reporter; compounds ≠ i, interferences); Reporter xi: the estimated height of the pure signal from compound i at position δi; Interference Δi: the sum of relative height contributions to yi from compounds ≠ i b In this study, the total interference was separated into two sources:

A) From non-metabolites (EDTA) B) From metabolites
Abbreviations: EDTA; ethylenediamine-tetra-acetic acid; NA, not applicable (since the reporter is not an interference); Nr, number S-14   A subset of 30 experimental spectra (binned data normalised based on the TSP signal area), which had already been quantified with the improved AQuA, were randomly selected from the entire dataset for quantification with ASICS.

Simulated a (N=30+30)
Simulated spectra representing known mixtures of metabolite concentrations (μM), were generated by summing normalised library spectra weighed by their respective reporter contributions in the improved AQuA computations. Simulated spectrum n used ̅ vector elements resulting from the AQuA computation on experimental spectrum n. Two subsets of simulated metabolite spectra were generated for quantification with ASICS: one subset contained 30 spectra including 54 metabolites and 3 EDTA compounds and the other subset contained 30 spectra with only the metabolites.

Libraries used in ASICS
Library spectra (N=56+30+1) ASICS allows the user to choose between different approaches for reference library alignment and metabolite quantification (R, G, B, Y, see below), and also to import reference libraries. To facilitate a straightforward comparison between the improved AQuA and ASICS, it was desirable to use the same compound library in both algorithms (i.e., the library used in the improved AQuA). For all compounds (except free EDTA) the fixed library spectra used in the improved AQuA could also be utilised in ASICS (N=56). When applicable, the free EDTA library spectrum generated in the n th improved AQuA computation was used for the n th ASICS computation (n=1:30). This could be applied to one of the ASICS approaches described below (R**). In the other ASICS approaches the average free EDTA library spectrum was used (N=1).

Simulated/ Experimental
Before importing the experimental-and simulated datasets into RStudio, a synthetic signal (Lorentzian: position, 0 ppm; intensity, 1; line width 1.15 Hz) was added to each spectrum. By using the synthetic signal and the PepsNMR peak method included in the ASICS package, the mandatory normalisation step in ASICS could be performed without changing the spectra. Data was scaled by a factor of 10 6 to minimise the amount of zero value ASICS quantifications. No alignment was done prior to quantification.

Libraries
The library spectra were normalised by a constant sum. The number of protons (nb.protons) for each compound was set according to its molecular formula without amino-and hydroxyl protons and the threshold was set to 0.1 to obtain a smooth transition between signals and baseline.

ASICS approaches tested
Red (R/R**) Independent library alignment and independent quantification with FWER-controlled compound selection Blue (B) Joint library alignment, independent quantification, and FWER-controlled compound selection

Green (G)
Joint library alignment and independent quantification followed by FWER-controlled compound selection for final joint quantification Yellow (Y) Joint library alignment and joint quantification ASICS parameters used

Max.shift
Library signals were allowed to deviate 0.01 ppm (max.shift=0.01; 50 bins), a number that exceeds the positional deviations typically observed in the entire dataset. Other parameters were kept at their default values. a The known metabolite concentrations were simulated to reflect the experimental dynamic range and free EDTA signals were simulated to reflect the experimentally observed inter-spectral positional and line width variations b ASICS package (Version: 2.6.1) git_url: https://git.bioconductor.org/packages/ASICS Estimates from the independent ASICS approach (R) compared to the joint ASICS approaches (B, R and Y) in experimental spectra with EDTA (N=30). ASICS approaches: "R", independent library alignment and independent quantification with FWERcontrolled compound selection; "B", joint library alignment, independent quantification, and FWERcontrolled compound selection; "G", joint library alignment and independent quantification followed by FWER-controlled compound selection for joint quantification; "Y", joint library alignment and joint quantification. *ASICS zero values (%) are shown in black inside the colored fields of the figure and no correlation coefficient is displayed if the missing data >90%. The independent approach (R**) was performed with the n th library spectrum for free EDTA (generated in the improved AQuA) for the n th ASICS computation. In all other approaches the average free EDTA library spectrum was used. Table S9: General observations and interpretations of comparisons with ASICS General observation from Figure S5 Interpretation •(A): The concentrations estimates derived with all four ASICS approaches (R, B, G, and Y) were typically highly correlated to known concentration in simulations without EDTA • The employed ASICS processing workflow (e.g., normalisation, scaling and compound library) performed adequately as the results correlated well with the known values •(A-B): Correlations between ASICS and known concentrations were typically higher in simulations without EDTA signals compared to simulations with EDTA • In the presence of EDTA signals, quantification with ASICS can be more difficult •(B): In simulations with EDTA, the independent procedure that accounted for inter-spectral line width variations of free EDTA (R**) showed somewhat higher correlations to known concentrations than the independent procedure that did not adjust for such line width variations (R)

S-18
• By utilising library data generated by the improved AQuA in the independent ASICS procedure (R**) it becomes possible to account for inter-spectral line width variation of free EDTA signals to possibly improve the outcome. (We were not able to find a straightforward way to account for such line width variations with the joint procedures (B, G and Y), since these procedures use one library for an entire dataset) •(C): In experimental spectra with EDTA, correlations between the improved AQuA and ASICS were typically higher for the joint procedures (B, G, Y) compared to the independent procedures (R, R**) • Overall, the improved AQuA yielded results more similar to the joint ASICS approaches •(D): In experimental spectra with EDTA, some metabolites showed low correlations between the independent (R) and joint (B, G, Y) ASICS approaches • When the different ASICS approaches were employed on the same experimental dataset, with the same library and parameter settings, the outcome differed for some metabolites. (some joint procedures (G, Y) yields extremely similar outcome)