AutoTuner: High fidelity, robust, and rapid parameter selection for metabolomics data processing

Untargeted metabolomics experiments provide a snapshot of cellular metabolism, but remain challenging to interpret due to the computational complexity involved in data processing and analysis. Prior to any interpretation, raw data must be processed to remove noise and to align mass-spectral peaks across samples. This step requires selection of dataset-specific parameters, as erroneous parameters can result in noise inflation. While several algorithms exist to automate parameter selection, each depends on gradient descent optimization functions. In contrast, our new parameter optimization algorithm, AutoTuner, obtains parameter estimates from raw data in a single step as opposed to many iterations. Here, we tested the accuracy and the run time of AutoTuner in comparison to isotopologue parameter optimization (IPO), the most commonly-used parameter selection tool, and compared the resulting parameters’ influence on the quality of feature tables after processing. We performed a Monte Carlo experiment to test the robustness of AutoTuner parameter selection, and found that AutoTuner generated similar parameter estimates from random subsets of samples. We conclude that AutoTuner is a desirable alternative to existing tools, because it is scalable, highly robust, and very fast (∼100-1000X speed improvement from other algorithms going from days to minutes). AutoTuner is freely available as an R package through BioConductor.


Abstract 23
Untargeted metabolomics experiments provide a snapshot of cellular metabolism, but 24 remain challenging to interpret due to the computational complexity involved in data processing 25 and analysis. Prior to any interpretation, raw data must be processed to remove noise and to align 26 mass-spectral peaks across samples. This step requires selection of dataset-specific parameters, 27 as erroneous parameters can result in noise inflation. While several algorithms exist to automate 28 parameter selection, each depends on gradient descent optimization functions. In contrast, our 29 new parameter optimization algorithm, AutoTuner, obtains parameter estimates from raw data in 30 a single step as opposed to many iterations. Here, we tested the accuracy and the run time of 31 AutoTuner in comparison to isotopologue parameter optimization (IPO), the most commonly-32 used parameter selection tool, and compared the resulting parameters' influence on the quality of 33 feature tables after processing. We performed a Monte Carlo experiment to test the robustness of 34 Introduction 41 Metabolomics is the study of all the compounds present within a cell, organism, or tissue. 42 Such investigations provide a holistic snapshot of the activity within a biological matrix, and 43 have led to a myriad of discoveries ranging from the elucidation of novel biochemical pathways, 44 to the recognition of molecular adaptive responses to stress and the clarification of mechanisms 45 driving cell-cell interactions. [1][2][3] Advances in mass spectrometry fostered these discoveries, 46 specifically improvements in instrument sensitivity, accuracy, and data collection capacity. 1,4,5 47 Parallel advances in computational tools have historically followed to fulfill the potential of 48 analytical improvements. 6 49 Prior to data analysis, raw data from untargeted metabolomics experiments must be 50 processed to generate a features table. Features are defined as peaks with unique mass to charge 51 (m/z) and retention time values, with relative abundances determined by their height or area. 52 Processing is critical to extract chemical signals from electrical noise and to correct for retention 53 time drift across samples. 7 A variety of untargeted data processing methods exist, [8][9][10][11] including 54 two commonly used tools: MZmine2 12 and XCMS. 13 Although these tools reliably extract true 55 features from complex data, their performance depends on the selection of algorithmic 56 parameters that can mitigate analytical caveats such as matrix effects and differences in 57 analytical platforms. [14][15][16] No universal set of parameters exists for all datasets, hence parameter 58 optimization must occur prior to analysis to avoid noise inflation within the feature table. 17-19 59 Tuning parameters manually is prohibitively time consuming due to the high number of 60 possible numerical combinations. To overcome this challenge, several methods exist to identify 61 optimal dataset-specific parameters. 20 (2) 144 and x is the set of all observations, is the ith observation, is the number of observations, and 145 is a measure of smoothness for the empirical distribution. The function between ppm and 146 absolute error is not surjective, meaning two identical absolute mass error values can have 147 distinct ppm values. Thus, we hypothesize that the ppm value of noise peaks should be scattered 148 widely, while ppm values of real features should be within a narrow range. 14 Hence, we expect 149 that by using a user-provided mass difference threshold larger than an instrument-defined 150 threshold, the KDE will have a long-tail and a high narrow peak representing the instrument-151 dependent ppm of real features and a shorter smaller down-stream peak representing the ppm 152 from erroneously binned m/z values ( Figure S1). where represents the largest cluster of error values and is the ith observation similar to (1). 169 To identify this cluster, AutoTuner uses k-means clustering, a data partitioning technique used to 170 separate a set of observations into k-many groups. Either the gap statistic or a user-provided 171 variance-explained threshold is used to determine the appropriate number of clusters. 31 Using 172 ensures that the density of each calculated ppm value is normalized by the density of the true 173 error values ( Figure S1). 174 The ppm estimate is calculated by the following: 175 where is any calculated ppm value with outlier scores above 1, and is the standard 177 deviation of all values. An outlier score value above 1 indicates that the density of that 178 particular x is at least as great as the expected value of the density of all elements within . The  Table S1 for a complete list of standards. 219 We prepared stock solutions of each metabolite standard in water or a 1:1 mix of 220 methanol and water at 1000 mg mL -1 , unless constrained by solubility. Some standards required 221 the addition of ammonium hydroxide or formic acid for dissolution. We stored stock solutions in 222 the dark at -20°C. We created a standard metabolite mix (10 mg ml -1 ) from the stock solutions 223 and diluted with Milli-Q water to obtain four solutions where the concentration for each standard 224 was 500 ng mL -1 . We obtained standards at the highest grade available through Sigma Aldrich 225 resolution tribrid mass spectrometer (Orbitrap Fusion Lumos (Thermo Scientific)). We 232 performed chromatographic separation with a Waters Acquity HSS T3 column (2.1 × 100 mm, 233 1.8 μm) equipped with a Vanguard pre-column, both maintained at 40°C. We used mobile 234 phases of (A) 0.1% formic acid in water and (B) 0.1% formic acid in acetonitrile at a flow rate of 235 0.5 mL min -1 to elute the column. The gradient started at 1% B for 1 min, ramped to 15% B from 236 1-3 min, ramped to 50% from 3-6 min, ramped to 95% B from 6-9 min, held until 10 min, 237 ramped to 1% from 10-10.2 min, and finally held at 1% B (total gradient time 12 min). We made 238 separate positive and negative ion mode autosampler injections of 5 μL. We set electrospray 239 voltage to 3600 V (positive) and 2600 V (negative), and source gases to 55 (sheath) and 20 240 (auxiliary). We set the heated capillary temperature to 375°C and the vaporizer temperature to 241 400°C. We acquired full scan MS data in the Orbitrap analyzer (mass resolution 120,000 FWHM 242 at m/z 200), with an automatic gain control (AGC) target of 4e5, a maximum injection time of 50 243 sec, and a scan range of 100-1000 m/z. We set the AGC target value for fragmentation spectra at 244 5e4, and the intensity threshold at 2e4. We collected all data in profile mode. 245 246 Validation Data. We used two published datasets to validate AutoTuner's performance on 247 experimental data: (1) a bacterial culture experiment 32 , MetaboLights 33 identifier MTBLS157, 248 and (2) a rat fecal microbiome, by direct contact with authors (Table 2) (Table 2). 257 We used XCMS and centWave to generate feature tables for each dataset, 13,36 and 258 CAMERA for isotopologue and adduct detection. 37 Table S2 contains parameters used for  259 processing. For the standards, we searched for the most abundant parent ion within EICs. We 260 confirmed the presence of a metabolite standard if a feature had an intensity above 1e4, was 261 within an exact mass error of 5 ppm of the parent ion, and had a retention time error of 5 seconds 262 from the EIC peak. For the culture experiment, the data was subjected to quality control as 263 described previously. 38 Briefly, we removed features detected in process blanks, features 264 detected within only one replicate, and features representing isotopologues and adducts. 265 Additionally, we removed features with coefficient of variation values above 0.4 across six 266 pooled samples. We defined overlapping features in AutoTuner-and IPO-parameterized feature 267 tables to be those with ppm error below 5 and retention time differences less than 20 seconds. 268 The culture experiment allowed a higher retention time correction because it relied on data 269 collected with HPLC compared to the standards which were analyzed with a UPLC system. We also performed linear regressions on these values to find trends between estimate variability 281 and sample numbers. We randomly selected 3 to 9 samples from each of these subsets 55 times. 282 In total, there were 385 estimates for each group of 3 to 9 samples, resulting in a total of 2695 283 separate runs of AutoTuner per dataset. We performed a sensitivity analysis to determine how 284 distinct values of mzDiff impact downstream data processing, the only continuous valued 285 centWave parameter not optimized by AutoTuner To accomplish this, we counted the number of 286 unique features between pairs of feature tables generated with mzDiff parameters varying by a 287 value of 0.001. 288 289

Results 290
AutoTuner Accuracy and Comparison to IPO. At this time, the only publicly-available method 291 for automated selection of peak-picking parameters for XCMS is isotopologue parameter 292 optimization (IPO). 22 IPO uses a gradient descent algorithm that requires users to iteratively run 293 centWave with different combinations of parameters until the set that maximizes a scoring 294 function is identified. We used 5 distinct metrics to compare the accuracy, ease of use, and 295 downstream data quality of IPO-and AutoTuner-derived parameters. The metrics include the 296 accuracy, number of features, the peak areas and shapes of EIC peaks detected using parameters 297 from only one of the two methods, and MS 2 count. 298 We searched for 85 known chemical standards (a total of 101 possible ions) within 299 feature tables generated with IPO-and AutoTuner-derived parameters to test the influence of 300 each parameter selection method on data processing accuracy ( Table 2, Table S1). We detected 301 82 and 100 standards within the feature table generated with IPO-and AutoTuner-derived 302 parameters, respectively. Figure S2 provides an example of compounds that were only detected 303 with AutoTuner and were absent when the IPO-derived parameters were used. 304 We first compared the number of features from culture data generated with parameters 305 from each algorithm to understand the influence of parameter selection on downstream data 306 quality (Figure 2A, Figure S4A). Each feature table contained a distinct number of total features 307 following processing and quality control (Table S3). In positive ion mode, AutoTuner-derived 308 parameters detected fewer unique features (203) compared to 2606 unique features detected with 309 IPO-derived parameters (Figure 2A), while sharing 1022 features between them. A similar 310 situation was observed in negative ion mode where AutoTuner detected 540 unique features 311 compared to 3420 unique features found with IPO-derived parameters, while sharing 904 312 features ( Figure S4A). 313 We then compared the structural differences between features exclusively detected using 314 IPO-and AutoTuner-derived parameters. We created an empirical cumulative distribution 315 function (CDF) to compare the distribution of peak areas ( Figure S3, Figure S4B) and maximum 316 observed continuous wavelet transform (CWT) coefficients ( Figure 2B, Figure S4C) of all EIC 317 peaks belonging to features outside of the intersect. The maximum observed CWT coefficient 318 increases with peak steepness, hence provides a measure of a peak's chromatographic resolution 319 ( Figure 2: see inset). The CDF of each metric was significantly different in positive (KS-Test, 320 Area: p < 10 -6 ; CWT: p < 10 -4 , n = 203) and negative (KS-Test, area: p < 10 -14 , CWT: p < 10 -8 , n 321 = 540) ionization mode data. Applying the same test on unbalanced comparisons (e.g., negative 322 ion mode: 3420 IPO-vs. 540 AutoTuner-unique features) was more highly significant than using 323 equivalent numbers of features obtained through subsampling. 324 Next, we used the abundance of MS 2 spectra within each unique dataset to compare the 325 abundance of real features (i.e., not noise) across feature tables. In total, there were more features 326 with MS 2 spectra from the feature table generated with IPO-derived parameters compared to 327 AutoTuner-derived parameters (positive: 448 vs. 115; negative: 686 vs. 121, both for IPO vs. 328 AutoTuner, respectively). However, when the features with MS 2 spectra were normalized by the 329 total number of features, AutoTuner-specific features contained more MS 2 spectra than IPO-330 specific features (Figure 3). Unlike AutoTuner specific features, IPO specific features were 331 significantly depleted for MS 2 spectra relative to the intersect region (Hypergeometric Test, 332 Negative ion mode: p < 10 -10 , Positive ion mode: p < 10 -10 ). 333 Finally, using all three test datasets, we compared the time required to run AutoTuner and 334 IPO (Table 3) AutoTuner is a robust, rapid, and high-fidelity estimator of untargeted mass spectrometry 350 data processing parameters. Its unique design improves upon previous methods by providing a 351 scalable framework to handle large datasets, reducing runtime, and generating high-accuracy 352 parameters that retain known features. AutoTuner's ease of use make it an ideal candidate to 353 include within existing data processing pipelines. 39-42 AutoTuner is available through 354 BioConductor, and additional contributions are possible and encouraged. 355 AutoTuner's high accuracy indicates that its parameter selection is based on true data 356 features. One possible explanation for the lower accuracy of IPO is that the peak-width of the 357 missing standards was below the Minimum Peak-width parameter selected by IPO (Table S2  358 and Figure S2). 359 AutoTuner parameter estimates were robust across all datasets and ionization modes. 360

Some parameters like ppm, Noise, S/N Threshold, Prefilter intensity, and Scan count reflect 361
systematic properties inherent to the platform chromatography, mass analyzer, and/or sample 362 matrix. 42 Other parameters like Maximum peak-width are more specific to each sample; hence 363 increasing the number of samples used to estimate parameters always strengthened their 364 robustness. The low CV values for parameter estimates suggests that using a subset of samples to 365 generate estimates returns a set representative for all samples. Based on our results, we 366 recommend the use of 9 to 12 samples to generate estimates in culture and community datasets, 367 respectively. For most of the parameters estimated here, 9 samples proved sufficient to obtain 368 estimates with CV values less than 0.1. The 12-parameter recommendation originated from 369 extrapolating the linear fits of these data to obtain 0.1 CV values for remaining parameters that 370 failed this criterion ( Figure S5, S7, S9, and S11). We were unable to check the robustness of the 371 Group difference parameter estimate, as this parameter is estimated through a non-automatable 372 cross sample comparison during the TIC peak detection step of the algorithm. 373 Although other algorithms return more parameter estimates than AutoTuner, the 374 parameters calculated by AutoTuner represent continuous valued ones with the greatest possible 375 number of choices. Performing a parameter sweeping optimization like previous approaches to 376 estimate the remaining parameters reduces the total combinations of available centWave 377 parameters from at least 2 4 *5 8 possible choices of parameters to 40. This is because the 378 centWave algorithm, used by both XCMS and MZmine2 data processing tools, requires tuning 379 of 11 distinct parameters. Of these, 8 are continuously valued, meaning that they can be any real 380 number. The remainder are either boolean values or can be one of a few discrete choices (Table  381 S4). The reduction of the total number of combinations is achieved by optimizing 7 of the 8 382 continuous valued parameters. In regards to the last continuous parameter not optimized by 383 AutoTuner, mzDiff, we performed a sensitivity analysis to show that distinct values had minimal 384 effect on the returned feature table (Table S5). Future contributions towards AutoTuner's design 385 can help the estimation of additional parameters. 386 AutoTuner's low runtime indicates that the algorithm is scalable (Table 3). As more data 387 is generated due to increases in analytical throughput and dataset size, AutoTuner will remain a 388 tractable option to generate estimates of metabolomics data processing parameters. 4,44 Because 389 AutoTuner estimates parameters much faster than IPO, and IPO was shown to perform at a faster 390 rate than software preceding it, we surmise that AutoTuner is the fastest parameter selection calculated metric. The inset shows three randomly selected EIC peaks that fall on distinct regions 579 of the maximum CWT empirical cumulative distribution function to demonstrate how this metric 580 influences peak quality. The curves were significantly different (KS-test, p < 10 -4 , n = 203). 581 Results for positive ion mode data area CDF and negative ion mode data were similar and are 582 found in figure S3 and S4, respectively.  Table 1. Parameters estimated through the AutoTuner algorithm. We chose to optimize these 604 parameters due to their influence on the number and quality of features returned following 605 XCMS data processing. 20,28 Table S4 gives more information on these parameters. 606 Parameter Description XCMS parameter name

Group difference
Expected retention time deviation of an mz/rt feature between samples bw Grouping XCMS ppm Parts per million (ppm) error threshold used to bin consecutive mass intensities across adjacent scans into a single peak ppm centWave (peak-picking)

S/N Threshold
The minimum ratio between peak and average noise intensity required to retain a feature snthresh centWave (peak-picking)

XCMS & MZmine2
Scan count Minimum number of scans required to retain a peak Prefilter scan centWave (peak-picking)

Noise
Numerical threshold used to filter out noise from true masses noise centWave (peak-picking)

Prefilter intensity
Minimum integrated intensity to required retain a peak   Table 3. Run-times for AutoTuner and IPO required to run 6 common samples collected in 616 positive (+) and negative (-) ionization modes. All system time measurements were done on an 8 617 CPUs and 10Gb of memory Ubuntu Xenial 16.04 Google Cloud instance. IPO ran on 8 CPUs, 618 while AutoTuner ran on 1 CPU. The ratio accounts for the total computing power used to run 619 both algorithms. 620