Improving Visualization and Interpretation of Metabolome-Wide Association Studies: An Application in a Population-Based Cohort Using Untargeted 1H NMR Metabolic Profiling

1H NMR spectroscopy of biofluids generates reproducible data allowing detection and quantification of small molecules in large population cohorts. Statistical models to analyze such data are now well-established, and the use of univariate metabolome wide association studies (MWAS) investigating the spectral features separately has emerged as a computationally efficient and interpretable alternative to multivariate models. The MWAS rely on the accurate estimation of a metabolome wide significance level (MWSL) to be applied to control the family wise error rate. Subsequent interpretation requires efficient visualization and formal feature annotation, which, in-turn, call for efficient prioritization of spectral variables of interest. Using human serum 1H NMR spectroscopic profiles from 3948 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), we have performed a series of MWAS for serum levels of glucose. We first propose an extension of the conventional MWSL that yields stable estimates of the MWSL across the different model parameterizations and distributional features of the outcome. We propose both efficient visualization methods and a strategy based on subsampling and internal validation to prioritize the associations. Our work proposes and illustrates practical and scalable solutions to facilitate the implementation of the MWAS approach and improve interpretation in large cohort studies.


* S Supporting Information
ABSTRACT: 1 H NMR spectroscopy of biofluids generates reproducible data allowing detection and quantification of small molecules in large population cohorts. Statistical models to analyze such data are now well-established, and the use of univariate metabolome wide association studies (MWAS) investigating the spectral features separately has emerged as a computationally efficient and interpretable alternative to multivariate models. The MWAS rely on the accurate estimation of a metabolome wide significance level (MWSL) to be applied to control the family wise error rate. Subsequent interpretation requires efficient visualization and formal feature annotation, which, in-turn, call for efficient prioritization of spectral variables of interest. Using human serum 1 H NMR spectroscopic profiles from 3948 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), we have performed a series of MWAS for serum levels of glucose. We first propose an extension of the conventional MWSL that yields stable estimates of the MWSL across the different model parameterizations and distributional features of the outcome. We propose both efficient visualization methods and a strategy based on subsampling and internal validation to prioritize the associations. Our work proposes and illustrates practical and scalable solutions to facilitate the implementation of the MWAS approach and improve interpretation in large cohort studies. KEYWORDS: full resolution 1 H NMR, metabolome wide association study, multiple testing correction, significance level, cohort studies, molecular epidemiology, MESA, results visualization and prioritization, high-throughput analysis, metabolic profiling

■ INTRODUCTION
Over the past 15 years, improvements in high-throughput technologies have accelerated the simultaneous measurement of large numbers of metabolites in a single sample using NMR and mass spectrometry (MS), which are both widely used to characterize a biofluid using either untargeted or targeted profiling approaches. 1,2 Both technologies have emerged as efficient tools for identifying biomarkers of exposures as well as early disease manifestations, hence informing on molecular mechanisms involved in pathogenesis (e.g., in cancer, diabetes, cardiovascular, and neurological diseases). 3−6 Metabolic phenotyping uses robust and reliable analytical methods 7,8 that are ideally suited for untargeted profiling since prior knowledge about compounds present in a sample is not required. Such an agnostic discovery approach can inform complementary targeted methods by identifying novel chemical compounds that may contribute to the molecular pathways involved in complex phenotypes. 8−10 These screening exercises call upon the efficient analyses of highdimensional data whose exploration poses complex methodological and interpretation problems. 11 Typically, untargeted NMR experiments in a large population study generate tens of thousands of data points for thousands of individuals. By adopting univariate approaches, the concept of the metabolome wide association study (MWAS) 12 has been proposed to analyze these data, and various statistical approaches to perform such analyses were established and have been explored and reviewed. 13−17 The complex correlation structures existing in metabolic profiles result in partially redundant signals across the metabolic spectra. These need to be accounted for while correcting for multiple testing, for instance by using a permutation-based procedure to derive the metabolome-wide significance level (MWSL) controlling the family wise error rate (FWER). 13 However, the comparative applicability of these approaches, as well as the visualization of the results they produce, has not been comprehensively explored. Here we provide, using a real-life example from the COMBI-BIO project, an extension of the MWAS methodology we initially developed using simulated data sets in a class discrimination context 13 and further extended to accommodate continuous outcomes. The current extension includes the computation of a stable and reproducible statistical significance threshold and proposes ways to estimate consistently across different types of NMR spectra as well as an internal validation procedure to assess the robustness of candidate associations. Our data set comprises two versions of 1 H NMR nontargeted metabolic profiles in 3948 participants from the Multiethnic Study of Atherosclerosis (MESA) cohort. 18 As a proof-of-principle example, we focus on identifying the NMR spectral features associated with fasting blood serum glucose, which was measured by an independent technique (glucose oxidase method). By examining the full NMR spectrum, our analysis was designed to identify spectral regions corresponding to other metabolites beyond glucose itself that also vary with fasting serum glucose ( 1 H NMR glucose associated peaks).

Study Population and Sample Selection
The MESA cohort has been described elsewhere 18 and includes 6814 participants (53% females, 47% males) aged 44−84 years (mean = 62 years) from four different ethnic groups: Chinese-American (n = 803), African-American (n = 1893), Hispanic (n = 1496), and Caucasian (n = 2622), all recruited between 2000 and 2002 at clinical centers in the United States. Participants were free of symptomatic cardiovascular disease at baseline, and demographic, medical history, anthropometric, lifestyle data, and serum samples were collected during the first examination (July 2000−August 2002), together with information on lipid or blood pressure treatment, and diabetes, and measures of systolic blood pressure. Serum samples were stored at −80°C after collection. At enrolment, high density lipoprotein (HDL-C) was measured in EDTA plasma on the Roche/Hitachi 911 Automatic Analyzer (Roche Diagnostics Corporation, Indianapolis, IN), and low density lipoprotein (LDL-C) was calculated using the Friedewald equation, 19 together with fasting serum glucose using the glucose oxidase method on the Vitros analyzer (Johnson and Johnson Clinical Diagnostics). Ethical approval was obtained by local ethical review boards, and subsequent analysis was conducted in full accordance with the ethical approval obtained.

Samples Preparation and 1 H NMR Spectroscopic Acquisition
The full sample preparation and quality control procedure have been extensively described elsewhere. 20 Briefly, serum samples were thawed on the day prior to analysis, and 300 μL of each sample was mixed with 300 μL of phosphate buffer (NaHPO4, 0.075M, pH = 7.4). Samples were processed in two phases (each corresponding to a separate analytical batch). Eppendorfs were used for phase 1, while 96-well plates were used for phase 2. After centrifugation (12 000g at 4°C for 5 min), 550 μL of each sample-buffer mixture was manually transferred into SampleJet 5 mm diameter NMR tubes and kept at 4°C until analysis. Different types of quality control (QC) samples were used for each phase as described elsewhere. 20 All QC pools were aliquoted in 350 μL and stored at −80°C prior to analysis. 1 H NMR spectra were acquired using a Bruker DRX600 spectrometer (Bruker Biospin, Rheinstetten, Germany) operating at 600 MHz. A standard water suppressed one-dimensional spectrum (usually termed NOESY) and a Carr−Purcell− Meiboom−Gill (CPMG) spectrum were obtained for each sample. 20

H NMR Metabolite Profiling
The metabolite profiling workflow has been detailed elsewhere. 20 For each biosample, two NMR profiles were generated: (i) a 1-D spectrum (NOESY) showing resonances from all protoncontaining molecules in the sample, including broad, largely undefined bands from serum proteins, sharper and well-defined bands from serum lipoproteins (with some classification into their main groups), and sharp peaks from a range of small molecule metabolites such as amino acids, simple carbohydrates, organic acids, organic bases, and a number of osmolytes, and (ii) a Carr−Purcell−Meiboom−Gill (CPMG) spectrum that attenuates the peaks from the macromolecules and allows better definition of the small molecules.
For both CPMG and NOESY NMR data, in-house written MATLAB (Mathworks Inc., USA) routines were utilized for phasing and baseline correction. Prior to spectral peak alignment, the region δ 4.400−5.100 corresponding to the H 2 O resonance was removed. Spectral peak alignment was performed by the Recursive Segment-wise Peak Alignment (RSPA) algorithm. 21 Regions where the peaks of different suspected contaminations (i.e., methanol) occurred were removed from the whole spectra (δ 1.180−1.240, δ 2.244−2.261 and δ 3.660−3.710). The remaining spectral regions were normalized by probabilistic quotient normalization using the median spectrum as the reference. 22 The normalized high resolution spectra contained 30 590 data points (variables) for both CPMG and NOESY data sets. To ensure comparability across batches (i.e., across studies and phases), each variable was mean-centered 23 (we will hereafter refer to this as mean corrected intensities). Score plots

Journal of Proteome Research
Article of the first few principal components for the resulting data set were finally used to identify and remove potential outlier samples (N = 3 for the CPMG and N = 4 for the NOESY data) from further analyses. The MESA study population was further examined for potential outliers using score plots from the first two principal components ( Figure S-1), which showed strong homogeneity in the study participants.

Metabolite Assignments
Selected samples were analyzed using a range of 2-D NMR experiments such as total correlation spectroscopy (TOCSY) or heteronuclear single quantum coherence (HSQC) to aid molecular identification. We used approaches such as statistical total correlation spectroscopy (STOCSY) and STORM to help constrain possible molecular structures. 24,25 Peak identification in the 1 H NMR data was supported by a semiautomatic clustering of the full resolution 1 H NMR spectra (30 590 data points) using statistical recoupling of variables, 26 where the algorithm defines a cluster as containing 10 or more variables. Each cluster is subsequently checked by NMR experts to improve the data point grouping and identify peak overlaps.
From our data, 136 and 159 clusters were identified in 1 H NMR NOESY and CPMG data, respectively, each of them corresponding to a single or a group of peaks. Resulting spectral information was also compared to the literature 27,28 and to existing databases such as the Human Metabolome Database 29 (HMDB, http://www.hmdb.ca/) and the Biological Magnetic Resonance Data Bank 30 (BMRB, http://www.bmrb.wisc.edu). Resulting sets of NMR features were ultimately confirmed through spiking experiments using commercial standards. The level of assignment (LoA) we used was adapted from Sumner et al. 31 NMR features were also characterized by their peak multiplicity: singlet (s), doublet (d), triplet (t), quartet (q), doublet of doublet (dd), multiplet (m), broad peak (b), and noise (n). Ninety-two clusters were assigned to 44 unique metabolites for the NOESY data set, and 91 clusters were assigned to 48 unique metabolites for the CPMG data set. Figure 1 presents our analytical workflow. We adopted a MWAS approach using univariate linear regression models to systematically

Journal of Proteome Research
Article screen the 30 590 data points assayed in both NOESY and CPMG profiles.
For a given data point, the linear model can be formulated as where X i represents the normalized NMR intensity, Y i is the outcome variable; here the log 10 transformed blood glucose concentration and ε i is the error term for an observation i. α is the intercept of the model, and ß 1 measures how strongly each data point influences the outcome variable. FE i is a vector of fixed effect observations for individual i and the vector B 2 compiles the regression coefficients for each adjustment covariate; for model 1, age, gender, phase, and ethnicity, and for model 2, we additionally correct for LDL and HDL cholesterol, lipids and blood pressure treatment, systolic blood pressure, smoking status, and diabetes. Most of the recently published MWAs adopted a Bonferroni correction for multiple testing, 32−34 which ignores correlation across variables and therefore does not account for the (partial) redundancy across the statistical test performed. This may lead to an overly conservative multiple testing correction. To take into account the high degree of correlation in spectral data and prevent overcorrection for multiple testing, one computationally efficient method relies on the estimation of the effective number of tests (ENT), as defined by the virtual number of independent tests that are performed across the actual p tests performed. ENT measures the level of correlation within the spectral data and can be estimated through spectral decomposition of the correlation matrix of predictors. 35 From this estimate, the per-test significance level ensuring a FWER control can be defined as the Bonferroni-corrected threshold corresponding to that number of independent tests. However, this approach remains of limited use in real-life metabolomic data sets, notably because, for numerical reasons, the ENT is upper-bounded by the number of observations. 36 As a scalable alternative, we used a permutation-based method to estimate the metabolome wide significance level (MWSL or α′) 13 in which the outcome (glucose levels) is randomly shuffled across observations. Each permutation mimics the null hypothesis of no association, and we performed for each permuted data set a MWAS using the linear regression model described above. In that setting, and for a given permuted data set, the minimum p-value across all variables (denoted q) represents the largest significance level to be considered to ensure no false positive findings, and the MWSL α′ controlling the FWER at a level α can be derived from the distribution of q across the N (set here to 10 000) permutations.
The effective number of tests (ENT) is then defined as the number of independent tests that would be required to obtain the same significance level using a Bonferroni correction: ENT= α/α′ and measures the level of correlation across the p tests performed.
To assess the robustness of the ENT estimates and to circumvent the strong assumptions from the generalized linear model on data structure (independence of each data points, distribution of the residuals, variance structure, and linear relationship between response and predictors), we ran our permutation-based procedure for both data sets (NOESY and CPMG) for both models (1 and 2) and used different transformations of the glucose distribution: raw concentrations, truncated concentrations (129 outlying observations with more than 2 standard deviations away from the mean glucose level were discarded), and log10-transformed glucose levels ( Figure S-2).
To formally assess the sensitivity of our MWSL estimates to distributional features of the outcome, we also simulated continuous responses from gamma and several Gaussian distributions and compared resulting MWSL estimates to those obtained using measured glucose levels.

Sensitivity, Stability Analyses, and Results Prioritization
We performed further sensitivity analyses to assess the stability of the candidate associations we identified. These included a crossvalidation procedure based on the independent subsampling (N = 100 times) of discovery (containing 80% of the observations) and replication (comprising the remaining 20% observation) data sets.
For each 80:20 discovery and replication split, we performed a MWAS and used the MWSL to identify the candidate associations in the discovery set. For each discovery set (80% of the observations), the number of independent signals among the candidate associations was approximated by the number of principal components needed to explain more than 99% of their variance. That number of PCs was then used to compute the Bonferroni corrected MWSL used in the replication set. 37 Our strategy could be summarized as follows for a given split: (1) MWAS: identify (N 0 ) candidate associations in the discovery set with discovery p-value < MWSL. (2) Estimate the number of independent signals these N 0 correspond to run a PCA on the X N0, the matrix combining the N 0 data points declared significant in the discovery set (a random 80% subsample or the full study population), and identify N PC , the number of PC's needed to explain more than 99% of the variance in X N0 . (3) Replication: identify from the candidate signals (step 1) those replicating in the 20% replication set at a Bonferronicorrected significance level accounting for N PC tests, α/ N PC , setting α = 5%.
Steps 1−3 are repeated across the 100 independent splits and the proportion a given signal is identified in the discovery set, and replicated in the validation set is reported.

■ RESULTS AND DISCUSSION
A total of 3948 individuals from the MESA cohort were included in the analysis, and for each individual, 30 590 serum NMR features were measured for both NOESY and CPMG spectra. The characteristics of the study population are summarized in Table 1.
We first explored the sensitivity of the MWSL estimates to (i) the parametrization of the statistical model, (ii) the type of NMR data under investigation, and (iii) distributional features of the outcome of interest. As summarized in Table 2, we then ran the permutation procedure for two different models (Models 1 and 2), for both types of NMR data sets (NOESY and CPMG) and 3 versions of glucose blood concentrations: raw, log10 transformed and truncated (removing 129 outlying observations which were outside the 2 standard deviation range from the mean glucose level). Across all models investigated, MWSL estimates for CPMG are more stringent than for NOESY spectra and, correspondingly, ENT estimates for CPMG are much greater than those for NOESY data (ranging from 17 610 to 122 460 and from 3680 to 17 010 for CPMG and NOESY, respectively). This suggests stronger correlations within NOESY data, which is plausible given that NOESY NMR spectra contain stronger broad peaks from proteins and lipoproteins than CPMG spectra, such that data point intensities are highly correlated.

Article
As summarized in Table 2, irrespective of the type of spectrum, MWSL estimates (and corresponding ENT) seem to be only marginally affected by the number of confounders included in the model (i.e., comparing models 1 and 2). However, a systematic but moderate increase in the ENT is observed for the fully adjusted model (Model 2). This can be explained by the fact that the fully adjusted model removes spectral signals relating to the components of the Framingham risk scores (such as the conventionally measured lipoprotein levels), and these confounders were highly likely to be driving correlations across some data points.
For CPMG data, estimates of the MWSL using raw glucose concentrations led to an estimated ENT greater than the actual number of tests (ratio around 4.00). This might be attributed to the parametric assumption in generalized linear regression model (e.g., normality of the outcome and equal variance) underlying our permutation procedure being violated. This is supported by the distribution of the blood levels of glucose, which is right skewed, with several outlying observations ( Figure S-2A).
To account for this asymmetrical distribution, we first applied a log 10 transformation to glucose levels ( Figure S-2B). Although the transformed distribution remains right skewed, corresponding estimates of the MWSL were less stringent, and the effective to actual number of tests ratio dropped to around 0.75 for CPMG (and to 0.15 for NOESY). To remove the influence of outlying observations, we truncated the glucose distribution and removed observations more than 2 standard deviations away from the mean glucose level (Figure S-2C). While only a small number of observations were discarded (N = 129), this removal strongly impacted the MWSL estimates for CPMG data, and the effective to actual number of test ratio dropped to less than 60% for both models.
These results suggest that MWSL estimates are sensitive to the shape of the distribution of the continuous outcome under investigation, and are specifically affected by both the relative weight of its tail, and by the presence of outlying observations. To formally assess the sensitivity of our MWSL estimates to the parametric form of the response variable, we ran a series of sensitivity analyses where we randomly sampled for each participant (and for each permutation) the glucose levels from (i) a Gamma distribution fitted on the measured glucose levels (shape = 15.90, scale = 6.18), and (ii) several Gaussian distributions (mean = 0, sd = 1; mean = 0, sd = 10; mean = 0, sd = 100, mean = mean(glucose), sd = sd(glucose). By construction, the Gamma-distributed response did not include outlying observations, but featured an inflated right tail, which provided less stringent MWSL for both NOESY. Results from the normally distributed outcomes showed consistent MWSL estimates and did not seem to be strongly affected by the parameters choices defining the Gaussian distribution (Table S-1, Figure S-3). Since the numbers of associated variables were only marginally affected by the way the ENT was computed in our example, we took forward the MWSL estimated from the Gaussian simulated outcome (mean = 0, sd = 1), which also seemed to provide the  In all scenario considered, the largest proportion of associated

Journal of Proteome Research
Article variables was always observed when using a BH-FDR procedure, and the smallest proportion of associated variables was always observed when using a Bonferroni procedure ( Figure S-5).

Metabolome Wide Association Study of Glucose: Visualization and Prioritization
We performed the MWAS of blood levels of glucose on both CPMG and NOESY spectra using both models and setting the MWSL to that estimated using the Gaussian simulated outcome (mean = 0, sd = 1, Table S-1). As expected, irrespective of the model and of the type of spectrum, a very large number of spectral features were found associated with blood concentration of glucose (Table 3): for NOESY data, 72% and 44% of the spectral variables were found significantly associated with glucose level for models 1 and 2, respectively. These proportions were 40% and 16% for models 1 and 2, respectively, in CPMG data. Analyses by classes of metabolite for model 1 revealed that nearly all variables assigned to drug derivative (100%), proteins (98.7%), and carbohydrates (92.1%) were associated with glucose in NOESY. Similarly, all variables assigned to proteins (100%), carbohydrates (96.5%), and others (71.9%, which include choline and glycerol) were associated with glucose in CPMG. The proportion of unassigned associated variables was higher for NOESY (69.4%) compared to CPMG (33.9%).
The utility of the MWAS approach in metabolic phenotyping relies on explicit visualization of the results. One primary output to help identify relevant spectral regions borrows from the field of genome-wide association studies and reports for each spectral variable the−log10 p-value multiplied by the sign of the corresponding regression coefficient. The resulting signed Manhattan plots (Figure 2 and Figure S-6 for CPMG and NOESY, respectively) offer a global view of the spectral regions associated with the outcome of interest, and can be further informed by the annotation of spectral features. Assigned metabolites found associated with conventional serum glucose measurements are reported in Tables S-2 and S-3 for CPMG and NOESY, respectively. For CPMG, 47 unique metabolites were associated with glucose for model 1, of which 34 were still associated after controlling for the FRS variable (44 and 41 for NOESY).
High-resolution visualization is also key to enable peak validation and subsequent annotation through the inspection of the shape and multiplicity of the spectral features in the neighborhood of the associated regions. Such visualization could be provided by regional plots as exemplified in Figure 3, which focuses on the glutamine region ([2.4535−2.5045] ppm). The upper panel represents the high resolution (unsigned) Manhattan plot where the −log 10 p-value (left Y axis) measuring the strength of association each between peak height and the log 10 transformed blood glucose level is represented by a triangle (down pointing for negative associations). For the region under investigation, we define the reference feature as the strongest association (here 2.47175 ppm, p-value <2.10 −16 , represented in gray) and color-code the pairwise correlation of each spectral variable with this reference. The mean-corrected intensity (i.e., residuals removing the effect of possible confounders: phase and cohort, see Methods) is also represented (right Y axis) Figure 3. CPMG-model 2 regional association plots with log 10 (glucose) for the glutamine. In the upper plot, the −log10 p-value for the features at two regions are shown on each plot. Features are colored based on their correlation with the gray hit that has the smallest p-value in the region. The lines show the mean corrected intensity (i.e., residuals removing the linear effect of the phase and the cohort) in the 5% of samples with high residual glucose in green and 5% of the samples with low residual glucose in blue. The bottom plot shows the mean spectral intensity in MESA phase 1 (plain line) and in MESA phase 2 (dashed line). Green circles indicate the proportion of replication after results prioritization.

Journal of Proteome Research
Article for the samples in the fifth (blue line) and 95th (green line) quantiles of the residuals log 10 glucose levels after controlling for the FRS variables.
From this plot, it is clear that there are strong, and exclusively positive correlation coefficients among 1 H NMR data points throughout the region as indicated by the color-coded pairwise correlation coefficients (75% are >0.77). As expected, because of the local correlation structures in NMR spectra (caused by the finite peak width and the resonance being split by the J couplings), data points neighboring the reference signal exhibit higher correlation levels. More distant spectral variables may also show strong correlation with the reference peak (e.g., 2.5045, Spearman correlation = 0.26; 2.4535, Spearman correlation = 0.30), and irrespective of their location in the spectrum, intensities that are the most correlated to the reference peak exhibit the strongest associations with glucose levels. This arises because a given metabolite often has several NMR peaks and this procedure offers a way of finding such linked resonances as an aid to molecular annotation. For these variables (dark orange and red triangles), the difference in the mean corrected intensity in participants with the highest and lowest glucose levels are larger compared to the spectral variables less correlated to the reference peak (yellow triangles).
To get further insights into the nature (shape and multiplicity) of the variables found associated with glucose we represent the mean spectrum in the bottom panel (left Y axis). Mean spectra are plotted for each of the analytical phases to assess both potential technically induced bias across experimental batches, and possible population heterogeneity across samples assayed in each phase. While the mean spectrum from phase 2 appears to have more marked variables (with higher modes), both data sets yield consistent subspectra in terms of alignment, multiplicity and overall peak shape.
To accommodate the expected large number of associations with glucose levels, efficient signal prioritization strategies are key. One established way to identify robust and replicable associations is to seek external replication in an independent data set. However, owing to the specificity of experimental protocols and the nature of the measurements (which are relative and not absolute concentrations), external validation is challenging in metabolic phenotyping, but could however be sought for using either standardized or annotated data. As a workable alternative and complementary approach prioritizing relevant metabolic features, we propose to seek for internal validation and randomly split the study population in a discovery (80% of the full population) and a validation (the remaining 20%) set. The robustness of an association identified in the full population is then quantified by the number of times discovered and replicated over the (N = 100) independent splits. This proportion is represented on the regional plot (green dots on the bottom panel of Figure 2). Owing to the strong signals we identify in our glucose MWAS, we report very high replication proportions in glucose-associated regions. As a first approach, prioritized associations were defined as those discovered and validated in at least one split (i.e., with a proportion of replication >0). As illustrated in Table 3 and in Figures 4 and S-7 for CPMG and NOESY, this prioritization strategy reduced the number of significant associations for model 2 by almost 50 and 30%, respectively. As illustrated in Figure 4, the associated spectral variables are strongly clustered and define clear regions that are associated with the blood levels of glucose. The −log10(p-value) is signed by the direction of the effect size estimate and is plotted against the chemical shift. The horizontal dashed line indicates the per-test significance level controlling the FWER at a 5% level. Variables found from the analyses in MESA are presented in black, those discovered and replicated at least once across the 100 splits are presented in green and those discovered and replicated in 50% of the split are presented in red.

Article
Our prioritization strategy identifies overall the same regions along the spectra but selects preferentially the strongest associated variables within each region and regions that have been assigned. This is even more apparent when more stringent prioritization strategies, for instance, in Figure 4 we plot the signed Manhattan plot for the features discovered and replicated in a least half of the splits (in red).

■ CONCLUSION
The identification and interpretation of metabolic features that contribute to a physiological and/or disease-induced outcome from full resolution NMR data is challenging for many reasons related to the dimensionality and complexity of metabolic profiles. The main aim of the present work is to address some of these issues through the development of a robust multiple testing strategy, intuitive visualizations and objective ways to prioritize results. The MWAS approach requires accurate correction for the large number of tests performed, and therefore needs to appropriately account for the strong and complex correlation structures within the NMR spectra. All methods for performing multiple testing correction assume a valid statistical model that captures dependencies in the data. While approaches controlling the FDR usually provide less stringent multiple testing correction, these have been reported to misperform in cases of high correlations among predictors. 40 As an extension of the MWAS approach to accommodate continuous variables in a regression context, we proposed an estimation of the metabolome wide significance level adopting the same permutation strategy and investigated the sensitivity of the estimates to the data structure and to the model parametrization. Our results suggest that in the case of highly correlated variables (spectral variables) that are strongly associated with an outcome (here glucose levels), permutations do not succeed in destroying the predictor-outcome relationship, hence yielding ENT estimates greater than the actual number of tests performed. Sensitivity analyses removing extreme values, and/or log transforming glucose levels showed that outlying observations are driving this unexpected estimate of the ENT. In the presence of strong correlation (i) between blood glucose and most of the assayed metabolites, (ii) among the metabolites, and (iii) between metabolites and adjustment variables, assumptions (e.g., observations exchangeability under the null) on which permutation inferences are based may be violated, and especially in the presence of strong outlying observations. From our data, MWSL estimates appeared robust to model parametrization (i.e., marginally affected by the set of confounding variables considered), but clearly depended on the correlation structure in the data, which are population and platform specific. This suggests that the MWSL should be tailored and re-estimated for each data set. MWSL estimates were also found to be sensitive to the distribution of the outcome and especially in the case of heavy tailed distributions due to the presence of outlying observations. While numerical transformations and truncation could be a way forward, one more general and conservative option could be to calculate the MWSL once using a virtual predictor sampled from a Gaussian distribution ( Figure S-4).
Using this MWSL approach, we propose extensions of existing visualization tools for the MWAS. This includes full resolution signed Manhattan plots included functional annotation and higher resolution regional plots displaying the per-variable strength of association as well as spectral summary features including correlation patterns across spectral variables. In our proof-of-principle example, we chose glucose as the outcome of interest which defined a challenging context in terms of results interpretability as a very large number of strong associations were identified and the assessment of their relevance went beyond the observed strong positive correlation for the expected glucose peaks along the 1 H NMR spectra. This called for the definition of strong result prioritization strategy, which, in less extreme situations, is also critical to identify the most relevant associations that are worth dedicating resources for molecular assignment. Our approach relies on subsampling strategy where discovery (80%)-replication (20%) splits are used to identify associations that internally replicate. This strategy was found to be efficient to prioritize the most robust associations, which were not only those with the strongest p-values. We performed our internal validation at the metabolome-wide level, which was computationally intensive. To scale this approach in real-life studies, one alternative would consist in restricting the discovery-replication split strategy for the MWAS candidates that have been identified in the full population. The overall analytical strategy presented here provides a general framework for the analysis of cohort studies where large number of samples are profiled using untargeted technologies (e.g., high-resolution NMR, mass spectrometry). Overall, we believe the strategy and approach we present are generalizable and scalable and may therefore be relevant to aid the MWAS approach, particularly improving the interpretation of results.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.7b00344.
Plots of first two PCA scores for CPMG and NOESY data in MESA; distributions of three different transformations of glucose used to investigate MWSL; percentage of effective/actual number of tests and 95% confidence intervals for CPMG and NOESY; percentage of associated variables for CPMG and NOESY derived from each simulated continuous response; percentage of associated variables for CPMG and NOESY derived from different multiple testing correction strategies; metabolome wide study of glucose from analysis of 30 590 NOESY features; comparison of results from analysis in MESA to those from 80:20 split strategy from NOESY metabolome wide association study of glucose using model 2; significance threshold a′ and ENT based on Bonferonni correction; CPMG and NOESY metabolic features associated with log 10 (glucose) in MESA at metabolome-wide significance level for models 1 and 2 (PDF)