ACS Publications. Most Trusted. Most Cited. Most Read
Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections
My Activity
  • Open Access
Article

Breaking Down Structural Diversity for Comprehensive Prediction of Ion-Neutral Collision Cross Sections
Click to copy article linkArticle link copied!

  • Dylan H. Ross
    Dylan H. Ross
    Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
  • Jang Ho Cho
    Jang Ho Cho
    Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
    More by Jang Ho Cho
  • Libin Xu*
    Libin Xu
    Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
    *Email: [email protected]. Tel: (206) 543-1080. Fax: (206) 685-3252.
    More by Libin Xu
Open PDFSupporting Information (1)

Analytical Chemistry

Cite this: Anal. Chem. 2020, 92, 6, 4548–4557
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.analchem.9b05772
Published February 25, 2020

Copyright © 2020 American Chemical Society. This publication is licensed under these Terms of Use.

Abstract

Click to copy section linkSection link copied!

Identification of unknowns is a bottleneck for large-scale untargeted analyses like metabolomics or drug metabolite identification. Ion mobility-mass spectrometry (IM-MS) provides rapid two-dimensional separation of ions based on their mobility through a neutral buffer gas. The mobility of an ion is related to its collision cross section (CCS) with the buffer gas, a physical property that is determined by the size and shape of the ion. This structural dependency makes CCS a promising characteristic for compound identification, but this utility is limited by the availability of high-quality reference CCS values. CCS prediction using machine learning (ML) has recently shown promise in the field, but accurate and broadly applicable models are still lacking. Here we present a novel ML approach that employs a comprehensive collection of CCS values covering a wide range of chemical space. Using this diverse database, we identified the structural characteristics, represented by molecular quantum numbers (MQNs), that contribute to variance in CCS and assessed the performance of a variety of ML algorithms in predicting CCS. We found that by breaking down the chemical structural diversity using unsupervised clustering based on the MQNs, specific and accurate prediction models for each cluster can be trained, which showed superior performance than a single model trained with all data. Using this approach, we have robustly trained and characterized a CCS prediction model with high accuracy on diverse chemical structures. An all-in-one web interface (https://CCSbase.net) was built for querying the CCS database and accessing the predictive model to support unknown compound identifications.

Copyright © 2020 American Chemical Society
Identification of unknown analytes is challenging in LC-MS-based metabolomics or drug metabolite identification experiments as identification is often dependent upon the availability of reference MS/MS spectra, and even with MS/MS spectral matching, the structural information garnered is often inconclusive because of uncharacteristic MS/MS fragmentations. Definitive compound identifications can be achieved using nuclear magnetic resonance (NMR) spectroscopy, but the large amount of material required makes this method costly and time-consuming. Therefore, there is a demand for increased confidence in identification of unknowns without sacrificing analytical throughput. (1,2)
Ion mobility spectrometry (IMS) is a rapid gas-phase separation technique based on the size and shape of analyte ions in the gas phase. (3−7) In IM separation, ions are driven through a neutral buffer gas under the influence of either a static electric field (as in drift tube IM, DTIM) or a dynamic electric field (as in traveling wave IM, TWIM), (8,9) where they are differentially slowed because of their interactions with the drift gas molecules. The mobility of an ion in IMS is related to its collision cross section (CCS), a unique physical property determined by the ion’s size, shape, and degree of interactions with the drift gas in the gas-phase. (10) When IMS is coupled with mass spectrometry (IM-MS), a powerful two-dimensional separation is achieved on the basis of CCS-to-charge and mass-to-charge ratio (m/z). In some cases, such separation is sufficient to justify the shortening or omission of chromatographic separation (e.g., liquid chromatography) entirely, which dramatically increases the analytical throughput while simultaneously providing additional structural information. In addition to its inherent relationship to gas-phase structure, CCS has the benefit of being highly reproducible across different measurements and instruments. A recent study comparing the reproducibility of CCS values measured on DTIM and TWIM instruments found that 93% of the compounds tested showed absolute errors ≤2% for protonated species, and 87% showed absolute errors ≤2% for sodiated adducts; however, a few entries displayed errors >3% (<4% of total comparable entries). (11) This reproducibility becomes even greater when considering CCS values measured on a single type of instrument. (12)
The structural dependency of CCS and the high reproducibility of its measurement make CCS an excellent property to be used for compound identification. This utility, however, is limited by the availability of reference CCS values to compare against. Many large collections of CCS values have been produced that cover various chemical classes, but because of the breadth of small molecule chemical space and limitation in available standards, it is infeasible, if not impossible, to cover all of the unknowns encountered in metabolomics or related studies. CCS prediction can be used to address this limitation by providing CCS values for compounds that do not have reference values. Current methods of CCS prediction fall into two categories: theory-driven (10,13−21) and data-driven approaches, (22−27) each of which has unique benefits and drawbacks. Theory-driven approaches generally use molecular modeling to produce an approximation of the 3D and electronic structure of a molecule and then compute CCS by simulating the interactions between the drift gas and analyte ion (both at varying levels of detail). These methods have the benefit of being rooted in solid theoretical principles, but depending on the level of detail, the computations can become very time-consuming and laborious to set up. Furthermore, depending on the assumptions made by the theory of a chosen approach, systematic errors may also be introduced and can be difficult to correct without introducing bias. Recently, a higher-throughput computational workflow, ISiCLE, was reported, but this method still requires large amounts of computational resources, multiple steps of computational setup (conformer generation and optimization and CCS calculation for each conformer), and thus, the throughput is still not ideal. (21) Furthermore, the computation results are good but still not superb (averaged error = 3.2% and R2 = 0.94–0.96 in correlation with experimental values). Additionally, data-driven approaches leverage trends within collections of reference values and generate predictions for similar compounds using their common characteristics, and they typically employ machine learning (ML) to do so. Data-driven approaches have the benefit that once a predictive model has been trained, predictions can be carried out almost instantaneously. These models can produce high quality predictions on unseen data, with the caveat that the quality of predicted CCS values is tied directly to the quality and relevance of the training data.
CCS prediction using ML has gained traction in recent years, (22−27) and a range of approaches have been adopted by different groups. These approaches share a similar general workflow: compilation and/or generation of a suitable set of training data, selection and optimization of a numerical representation for compounds (featurization), partitioning of data into training and testing/validation sets, selection of a ML model, training and optimizing the selected model, and finally testing and validating the trained model’s performance. The majority of published predictive models are trained on relatively specialized, single collections of compounds like small molecule metabolites, (22) lipids, (23,25) pesticides, (24) and drug-like compounds, (26) and therefore, the accurate prediction is limited to the types of molecules used for training. A recent study used data sets compiled from existing collections (mostly DTIM values, but some TWIM values) in order to train predictive models that can handle a wider variety of chemical features, (27) but there is limited coverage of drug and drug-like compounds (i.e., small molecules). Most existing models use some form of molecular descriptors (MDs) computed from the chemical structures as the features for performing ML. (22−26) These MDs often correspond to a summary property of a compound, such as polarity, LogP, and number of heavy atoms, and the combination of a variety of MDs provides a fingerprint of a compound’s chemical characteristics, but these MDs do not always reflect a compound’s structural features. The SMILES structure has also recently been used as input in a recent work using a convolutional neural network (CNN). (27) A variety of schemes have been used to partition data sets for training and validation, but generally, all examples include some form of performance validation using data not seen by the model during training. Another limitation of existing models is that they use either support vector machines (SVMs) (22,23) or artificial neural networks (ANNs), (24,26,27) which operate as “black boxes” and thus offer no interpretation of results beyond the predictions themselves.
To overcome the current limitations in CCS prediction, we first rigorously curated a large CCS database with nearly 7700 entries covering a diverse range of chemical space and identified structural characteristics, represented by molecular quantum numbers (MQNs), contributing to the variance in mass-CCS space. We then developed a novel and high-performance approach of building CCS prediction models by breaking down the chemical structural diversity using unsupervised clustering based on structural features, followed by training of individual prediction models on each cluster using ML. Using this approach, we robustly trained and characterized a comprehensive CCS prediction model with high accuracy on diverse chemical classes. Lastly, we built an all-in-one web interface (https://CCSbase.net) for querying the CCS database and accessing the predictive model to support unknown compound identifications.

Experimental Section

Click to copy section linkSection link copied!

Assembly of a Comprehensive CCS Database

A comprehensive CCS database was assembled from a variety of individual collections of CCS values available in the literature, (22−24,26,28−40) representing broad coverage of current measurement techniques and chemical classes. For lipid CCS values measured on TWIM instruments, only those calibrated with lipids were included into the database. (41) The source data sets were each manually examined for any errors, and relevant data (i.e., CCS, m/z, mass, SMILES if present, etc.) from each entry was converted into a JSON format, yielding consistently formatted cleaned data with separate files for each individual data set. The combined CCS database was constructed from the individual cleaned data sets using a series of build scripts developed in-house in order to be able to reproducibly rebuild the database when new data sets are added or when database organization is changed. A SQLite3 relational database was initialized with a table to hold relevant CCS measurement data (including MS adduct, m/z, and charge state) and metadata [including source data set and CCS measurement platforms and methods (DTIM: single field or stepped field vs TWIM: calibrants)]. Data from each source data set was added to the database, then entries with missing SMILES structures were attempted to be filled first by searching PubChem or LIPID MAPS databases by compound name and if not found in the existing databases, SMILES were obtained from manual search in the database or hand-drawn structures in ChemDraw, in combination with the online SMILES translator (https://cactus.nci.nih.gov/translate/). Next, a table was added to the database containing columns for each of the 42 molecular quantum numbers (MQNs) used as part of the features for machine learning (see below). Finally, rough chemical classifications (carbohydrates, lipids, peptides, small molecules) were assigned to each entry of the database on the basis of compound name and data source.

Feature Set for Machine Learning

The full set of features used for prediction of CCS (n = 50) includes the m/z of the observed MS adduct, one-hot encoded MS adduct (OHEA, n = 7), and a set of molecular descriptors that capture information about the size, composition, and topology of each chemical structure (42 MQNs). The m/z and MS adduct were already present in the source data sets, but the MS adduct had to be encoded into a numeric form in order to be used for CCS prediction. One-hot encoding was used to convert the MS adduct into a binary representation, with unique labels for each adduct type that had ≥100 examples in the database and the rest of adduct types represented under “other adducts” (total of 7 features, Table S1). The 42 MQNs were computed for all database entries containing SMILES structures using the RDKit library (https://www.rdkit.org) and stored in the database. This feature set reflects a variety of compound characteristics, ranging from size to composition and to structure topology (see complete list in Supporting Information, Table S2). Specifically, MQNs are molecular descriptors obtained from analyzing compounds as a molecular graph: i.e. collections of nodes (atoms) and edges (bonds). (42) The descriptors are properties of these graphs, consisting primarily of counts of various atom types, bond orders, connectivity, etc. A benefit of using graph properties as molecular descriptors is that they are invariant with respect to the software used to compute them (unlike many empirical properties, such as cLogP), facilitating the broad application of predictive models and promoting reproducibility.

Analysis of Structural Features Contributing to Variation in the Mass-CCS Space

Dimensionality reduction analyses were used to explore the important sources of variance within the combined CCS database and to determine what chemical characteristics contribute most strongly to variance in CCS values. Specifically, principal component analysis (PCA) and partial least-squares regression analysis (PLS-RA) were used to analyze the MQNs of all entries in the combined CCS database in an untargeted and targeted fashion, respectively. Both analyses work by finding multidimensional axes in the input data that explains as much variance as possible, and then subsequent orthogonal axes are chosen that explain as much of the remaining variance as possible. PLS-RA differs from PCA in that the first axis chosen is the one that explains the maximal variance in a target variable, in this case, CCS, rather than the input data, making it a targeted analysis. Both analyses are implemented in Scikit-Learn, (43) a free and open-source machine learning library for Python (sklearn.decomposition.PCA and sklearn.decomposition.PLSRegression).

CCS Prediction Using Machine Learning

It can be difficult to determine a priori what type of ML algorithm will have optimal performance characteristics for a given task, so a variety of ML models representing a range of algorithmic complexity were examined for use in predicting CCS. All ML models used in this work are implemented in Scikit-Learn. Mean squared error of predictions vs reference values (MSE) was used as the optimization target for model training. ML models tested vary from simple to complex, including multivariate linear regression (sklearn.linear_model.LinearRegression), a linear regression model employing L1 regularization (lasso, sklearn.linear_model.Lasso), a support vector regression model with a radial basis function kernel (svr, sklearn.svm.SVR), and a stochastically assembled ensemble of decision tree models (random forest, sklearn.ensemble.RandomForestRegressor). Description of these chosen models are discussed in the SI.
Before model training, the data set was processed in a stepwise fashion (Scheme S1). First, the data set was split into training and test sets (at proportions of 80% and 20%, respectively), and the test set was put aside until after model training. This data splitting was performed in a stochastic fashion (seeded for deterministic results), with rough stratification on the basis of CCS to ensure a somewhat similar distribution of CCS values between the training and test sets. The training data were then centered and scaled such that each feature would have a mean of 0 and a standard deviation of 1. Such normalization is necessary to ensure numerical stability in training certain predictive models and to avoid arbitrarily overemphasizing features on the basis of their magnitudes. When necessitated by the predictive model being tested (i.e., lasso, svr, random forest), hyperparameter optimization was performed using a grid search with 5-fold cross validation (sklearn.optimize.GridSearchCV) on the training data. Using the optimal hyperparameters, a predictive model was then trained on the training data, and performance metrics (see below) were computed from the training data. Finally, performance metrics were computed using the trained model to make predictions from the test set data. The entire process was repeated multiple times, using different pRNG (pseudorandom number generator) seeds to get different results, and the metrics (for both training and test data) were averaged together from each trial to get an idea of the average predictive model performance under a given set of conditions. Repeated trials were only used for bulk performance comparisons between models or feature sets, but not to actually produce usable individual CCS predictions.

K-Means Clustering for Untargeted Classification of Chemical Structures

Means clustering is a multivariate technique in which data is partitioned into clusters, such that the similarity between samples that are partitioned together is maximized. Briefly, in K-Means clustering, centroids with the same dimensions as the input data are chosen for each cluster, and each sample is assigned to the cluster with the nearest centroid. The centroid positions for the clusters are adjusted (and data partitioning repeated) such that the inertia (sum of squares of Euclidean distances of each sample from their assigned cluster centroid) is minimized. This is an unsupervised classification technique because the class assignment (i.e., partitioning into clusters) is done on the basis of similarity between subgroups within a data set, without using predetermined labels, and therefore offers a means of classifying chemical compounds without the bias inherent to traditional manual assignment. K-Means clustering is implemented in Scikit-Learn (sklearn.cluster.KMeans).

CCS Prediction Performance Metrics

The performance of predictive ML models trained on the combined CCS data set was assessed using an array of metrics, intended to offer a complete representation of model performance. Specifically, R2, mean and median absolute error (MAE and MDAE, respectively), root mean-squared error (RMSE), and cumulative distribution of prediction errors below 1, 3, 5, and 10% (CE135A) were used. Each metric was computed from predicted and reference CCS values (y′ and y, respectively) as follows. R2 is calculated by comparing the residual sum of squares with the variance in the reference values. MAE and MDAE are computed as the mean and median, respectively, of the absolute errors of model predictions. RMSE is computed as the square root of the mean-squared error. CE135A was computed as the proportions of predictions with relative absolute below 1, 3, 5, and 10%.
Although all metrics reflect the accuracy of the regression model in predicting CCS value, each metric specifically highlights a different aspect of model performance. R2 is a good metric for assessing goodness of fit for regression models, but it is only suitable for comparing between models with the same number of parameters. MAE gives a good indication of the average magnitude of errors made by the predictive model, but MDAE may be a better indicator of this magnitude when the distribution of prediction errors is significantly skewed from normal. A drawback of MAE and MDAE is that they can contribute to overfitting by allowing for small numbers of large errors to be easily balanced out by large numbers of small errors. In contrast, RMSE gives an indication of the magnitude of errors but does so in a way that more heavily penalizes large errors (due to the squaring of errors), favoring generalizable models with similar prediction errors across all samples. MAE, MDAE, and RMSE all share the benefit of being computed in the units of the target variable (in the case of CCS prediction, Å2), which can aid in their interpretation with respect to real world performance.

Results and Discussion

Click to copy section linkSection link copied!

Selection of Data Sets for Combined CCS Database

Individual collections of CCS values measured in nitrogen gas included in the combined CCS database were chosen on the basis of size, diversity, and quality of measurements. The combination of multiple data sets, measured in different laboratories and on different instruments, into a single database for training CCS prediction models is desirable from the standpoint of generalizability because any systematic errors/biases present in a single CCS data set can be averaged out by the presence of similar data from other data sets. CCS values measured on both DTIM and TWIM instruments were included in the database in order to maximize the size, breadth, and diversity of the CCS collection. Lipid CCS values measured on TWIM instruments were included only if the CCS was calibrated with lipid standards, which was found to give CCS values predominantly within 2% of DTIM values. (28−30,41) Comparing the CCS values of metabolites and drug and drug-like compounds measured on DTIM and TWIM, only a small percentage of the entries (4 out of 45 in Hines et al. (32) and 7 out of 194 in Hinnenkamp et al. (11)) display errors that are >3%, which justifies the inclusion of values for such compounds measured on both platforms. (22,24,26,32−38) Furthermore, during assembly of the database, we analyzed overlapping values to assess their degree of agreement. Figure 1 shows the number and proportions of overlapping values from the combined CCS database that fall within 1, 3, and 5% difference of one another. In the distribution of all overlapping values (all, n = 695), 46.0, 84.5, and 94.5% fall within 1, 3, and 5% of one another, respectively. This indicates generally good agreement between overlapping values in the database, regardless of the instrument types. Looking specifically at the variation between overlapping DTIM CCS values from different data sets (DT, n = 409), 43.3, 84.4, and 94.1% fall within 1, 3, and 5%, respectively, similar to that of the database as a whole. The variation between overlapping TWIM CCS values (TW, n = 163) is smaller than the overall data set with 65.0, 87.7, and 93.3% falling within 1, 3, and 5%, respectively. Finally, looking specifically at the overlapping values that contain both DTIM and TWIM CCS measurements (DT vs TW, n = 238), 36.6, 75.6, and 93.3% fall within 1, 3, and 5%, respectively, which displays the most variation but still similar to that of the other groups. These results indicate that using both DTIM and TWIM CCS values is unlikely to add additional uncertainty to CCS predictions made by models trained on this combined database relative to models trained on data sets containing only one type of CCS values. Furthermore, the use of the combined database would greatly increase the generalizability of the prediction model because of the broad coverage of the structural diversity. Overall, a total of 7405 CCS entries with SMILES structures (out of 7678 entries in the database at this time) are used for this study, which covers 3526 small molecules (5012 CCS values), 1041 lipids (2345 values), 91 peptides (112 values), and 84 carbohydrates (200 values), many of which have multiple observations from different MS adducts. (22−24,26,28−40)

Figure 1

Figure 1. (A) Counts and (B) comparison of agreement between measurements present in multiple sources. Agreement for all overlapping CCS values in blue. Agreement between DTIM CCS values in red. Agreement between TWIM CCS values in purple. Agreement between DTIM and TWIM CCS values in gold.

Structural Characterization of the Combined CCS Database

A three-component PCA was performed on all the structures in the combined CCS database using the complete feature set (m/z, OHEA, MQNs) in order to examine the primary contributors to the variance in the data set as a whole in an unsupervised fashion (i.e., entirely ignoring CCS for the time being). This feature set contains molecular descriptors that reflect various structural characteristics of the compounds in the database, thus computing a PCA on these features provides an indication of the breadth of chemical space spanned by this collection, as well as the most important chemical features that differ among them. The three principal axes captured 20.9, 14.4, and 5.7% of the total variance in the data set, respectively. Together, only ∼41% of the total variance was captured in this analysis, indicating that there are many sources of variance in this data set and many of these contributing factors are at least partially orthogonal to one another. Figure 2A–D depicts the projections of the full data set onto all three principal axes, with individual data points colored either by source data set (Figure 2A,B) or rough compound classification (Figure 2C,D). First, examining the projections colored by data set, there is no significant separation between data sets along all principal components. This indicates that the source data set is not strongly associated with the primary attributes of the data set that contribute most significantly to the overall variance; that is, there is overlapping structural diversity among the various data sets. Compounds in this collection were then assigned one of the following rough chemical classes: small molecule, lipid, peptide, or carbohydrate-based on the names of the compounds and information provided in the original publications. Examining PCA projections colored by these class labels, there is some separation along PC1 between small molecules and all other classes, the latter of which only separate from one another along PC2 and PC3. The separation between compound classes along the three principal axes coincides with the separation within each source data set, supporting the notion that the variance observed between different data sets is attributable primarily to the different chemical classes represented in each data set.

Figure 2

Figure 2. PCA projections of full CCS database onto principal axes 1, 2, and 3, colored by data set (A,B) or chemical classification (C,D). Correlation of the top three molecular descriptors contributing to separation along PC1 (E–G) and PC2 (H–J). hac = heavy atom count; m/z = mass to charge ratio; ao = acyclic oxygen count; hbd = H-bond donor atoms; ctv = cyclic trivalent nodes; r6 = 6-membered ring count.

We next sought to investigate the general chemical trends driving separation along the principal axes, by examining the individual feature loadings. Figures 2E–G depict the values of the top three features contributing to separation along PC1 vs their PC1 projections. The top three features are heavy atom count (count of non-hydrogen atoms, hac), mass-to-charge ratio (m/z), and acyclic oxygen count (ao), all of which are related to overall compound mass. Figures 2H–J depict the values of the corresponding features for PC2: H-bond donor atoms (hbd), cyclic trivalent node count (ctv, related to branching in the chemical structure involving cyclic systems), and 6-membered ring count (r6). The features driving separation along PC2 can be generally described as being related to the composition and topology of a chemical structure (full list of the features can be found in the Supporting Information, Table S2). The corresponding top three features contributing to separation along PC3 can be found in the SI (Figure S1). Taken together, these results indicate that the most significant source of variance in the data set as a whole is related to compound mass, but composition and topology of compound structures are also important contributors.
We next examined what features contribute the most strongly to the distribution of CCS values in the data set as a whole. PLS-RA was performed on all molecules using the complete feature set with CCS as the target variable. Unlike the unsupervised PCA, the first axis in PLS-RA (scores[0]) is chosen such that the most variance in a target variable (CCS) is explained, making it a supervised analysis. The second axis (scores[1]) is orthogonal to the first and explains the most of the remaining variance in the data set. Figure 3A,B show the projections of the full data set onto these axes, with points colored by data set source and rough chemical classification, respectively. Just as with the PCA projections, there does not appear to be a high degree of separation between data sets along scores[0] (Figure 3A), indicating that differences between data sets do not contribute strongly to the variance in CCS values. However, there does appear to be a similar pattern of separation between small molecules and all other classes along scores[0] (Figure 3B), indicating that the chemical characteristics that differ between these chemical classes contribute strongly to variance in CCS values. Given the similarity between the overall patterns observed between source data sets and chemical classes in both PCA and PLS-RA, it appears that the most significant sources of variance within the feature set have strong associations with variance in the target variable, i.e., CCS. Indeed, plotting the projections of the full data set along the most significant axes from PLS-RA and PCA (scores[0] vs PC1, Figure 3C) shows a high degree of correlation, supporting this notion.

Figure 3

Figure 3. PLS-RA projections of full CCS database onto axes 1, 2, colored by data set (A) or chemical classification (B). Correlation between PLS-RA projections along axis 1 and PCA projections along PC1 (C). Correlation between molecular descriptors and PLS-RA projections along axis 1 (blue) or CCS (red) for all compounds (D–K). hac = heavy atom count; m/z = mass to charge ratio; hbam = H-bond acceptor sites; c = carbon atom count; ao = acyclic oxygen count; asb = acyclic single bonds; asv = acyclic single valent nodes; adb = acyclic double bonds.

As with PCA, the contribution of individual features to separation along a given axis in PLS-RA can be investigated by examining the feature loadings. Figure 3D–K depicts the values of the top eight features that contribute to separation along the primary axis from PLS-RA vs scores[0] (shown in blue) or CCS (shown in red). The most significant contributors to variance in CCS are hac and m/z (Figure 3D,E, respectively), both of which are mass-related molecular descriptors. The remainder of the top features (Figure 3F–K) that show good correlations with scores[0] and CCS include features that are related to compound size (H-bond acceptor sites, hbam; carbon count, c; ao) as well as structure topology (acyclic single bonds, asb; acyclic monovalent nodes, asv; acyclic double bonds, adb). Much like the results from PCA, PLS-RA indicates that the most important features in the data set that contribute to variance in the CCS are related to compound mass and size, also with significant contributions from features describing molecular topology.

Model Specialization through Unsupervised Classification

Current examples of CCS prediction models were built using collections of measurements made for specific chemical classes, e.g. lipids or small molecules (i.e., for metabolomics). (22−26) Such model specialization is justifiable from the standpoint that the chemical characteristics that contribute to a compound’s CCS are likely to differ between chemical classes, and therefore, models trained to recognize the important characteristics of one chemical class is likely to perform poorly in producing predictions for other classes. Indeed, performing CCS predictions with LipidCCS Predictor on all compounds from the combined database tagged as lipids produced an RMSE of 5.5 Å2, while predictions on compounds tagged as peptides and carbohydrates produced RMSE scores of 46.9 and 56.3 Å2, respectively (Supporting Figure S2). Employing such specialized models is an excellent approach for attaining high prediction accuracy, as long as the delineation between chemical classes is clear. However, assigning chemical class labels in an unbiased way can become difficult when considering the diverse small molecule space, which encompasses complex compounds containing substructures with chemical characteristics resembling multiple conventional chemical classes. Furthermore, most ML algorithms act as “black boxes” with respect to the underlying characteristics of the training data that ultimately contribute to predictions. Even for ML algorithms that provide some insight into which features contribute most strongly (e.g., lasso, forest), there is still no way to tell whether and how subgroups within the training data influence this process. To address these limitations, rather than relying on manually assigned chemical classes, we performed unbiased and unsupervised classification using the K-Means clustering ML algorithm on all molecules with SMILES structures in the database to determine the most prominent groupings of compounds with respect to structural characteristics.
Using the full set of MDs (m/z, OHEA, and all MQNs) as features, K-Means clustering was used to group compounds on the basis of their structural similarity. During initial testing, fitting with four clusters was found to be optimal for capturing the major groups within the data set while maintaining sufficient numbers in each cluster for model training (at least 100 molecules per cluster). Figure 4A,B show the PCA projections of the full data set colored by rough chemical classes, and Figure 4C,D show the same PCA projection colored by clusters. The clustering analysis revealed similar groupings to the chemical class labels: most of the small molecules were assigned to cluster 1 and 2 (purple and blue, respectively), lipids to cluster 3 (magenta), and carbohydrates and peptides were assigned to cluster 4 (gold). However, a significant number of molecules in the original rough chemical classification were reassigned to new clusters, suggesting that manual assignment of chemical classes could result in classification error. As seen in the PCA plot (Figure 4C), the two small molecule clusters separate from the lipid and carbohydrate/peptide clusters along PC1, and separation within these groups occurs primarily along PC2. With the separation along PC1 being mostly related to compound mass, it appears that the separation between the small molecule clusters and the lipids and peptides/carbohydrate clusters is primarily mass-driven. Separation along PC2 is related more to topological differences in chemical structure, meaning that other factors such as branching or presence of rings drives the separation between the two small molecule clusters, and between the lipid and peptide/carbohydrate clusters, respectively. Interestingly, the assigned clusters conform to familiar trends in the IM-MS conformational space, although overlapping between clusters still exists (Figure 4E). Broadly speaking, the lipid cluster occupies the high-CCS and high-mass region, while the peptide/carbohydrate cluster occupies the low-CCS high-mass region. The small molecule clusters occupy significant overlapping space in the low-mass region, each covering the broad range of CCS values associated with this region. The most central structures within each cluster (minimum distance to the cluster centers in feature space) are presented in Figure 4F. PE(36:3) is a prototypical lipid, while CMP-N-acetylneruaminic acid has chemical characteristics related to both carbohydrates and peptides (large, containing many heteratoms). The small molecules etodolac and 3-methoxytyrosine represent the central structures within the two small molecule clusters and, as expected from the PCA, they seem to differ mostly on a topological basis (e.g., rings and branching) rather than by size or composition. Collectively, these results show that use of an unsupervised clustering approach recapitulates many of the classifications attainable through a manual approach, while also providing class separation based on more nuanced chemical characteristics. Importantly, such an approach allows for unbiased assignment of chemical class when considering new molecules, especially those that do not fall cleanly into a single conventional chemical class.

Figure 4

Figure 4. PCA projections of full CCS database onto principal axes 1, 2, and 3, colored by chemical class label (A, B), or by cluster (C, D). (E) Plot of CCS vs m/z for full CCS database, colored by cluster. (F) Central structures within each cluster. (G–I) Average predictive performance of models (lasso, forest, svr, respectively) by MDAE and RMSE from five independent trials, trained on the full CCS database (training set = blue, test set = red) or on individual cluster data sets (training set = purple, test set = gold).

Within each cluster, individual ML models were trained using the complete feature set, and the average performance of this ensemble of models was compared to corresponding individual models trained on the complete data set (Figure 4G–I). Feature selection is commonly performed in ML projects as it promotes generalizability of a trained model; however, we did not observe significant differences in performance between using the complete feature set and using selected features following a previous practice (see Figure S3 and associated discussion in SI). (23) This is likely due to the already small number of structure-related MQNs as MDs (compare with the hundreds of MDs typically used in the literature) and the intrinsic feature selection process when training with models like Lasso, random forest, and svr. Training individual models specific to each cluster led to a marked performance increase by MDAE and RMSE for the lasso models and to a lesser degree for the svr models (panels G and I, respectively, of Figure 4).
The performance by MDAE did not increase with the addition of clustering for the forest models. However, the RMSE did decrease, indicating that the use of clustering reduced the proportion of higher-magnitude errors for these models (Figure 4H). Across all models (with or without clustering), the average performance by both metrics for the test set predictions was at parity with the training set performance, indicating good generalizability with respect to making predictions on new data. These results demonstrate that the inclusion of untargeted clustering followed by building individual predictive CCS models using each cluster can increase the model performance and that such an approach is well suited for application to large compound collections covering diverse chemical space. Additionally, this approach offers the benefit of interpretability at the classification level: the assignment of compounds to individual clusters provides information on the common chemical characteristics that define chemical classes in an unbiased fashion.

Training and Performance Characteristics of the Final Optimized Prediction Model

On the basis of the insights gained from the above work, we built a final deployable predictive CCS model using K-Means clustering and four individual svr models with radial basis function kernels trained on each of the fitted clusters. Figure 5 describes the final workflow for building and training this model, and Figure 6 summarizes all of the performance characteristics of the model on both the training and test set data. The R2 scores for training and test set data were 0.994 and 0.991, respectively (Figure 6A), indicating excellent generalizability. Judging from additional metrics, this model was able to achieve high performance on the training data, with MAE, MDAE, and RMSE scores of 2.92, 1.70, and 5.48 Å2, respectively (Figure 6B, blue). The generalizability was also excellent based on the corresponding test set scores of 3.83, 2.37, and 6.46 Å2, respectively, which show no significant performance lapse relative to the training data (Figure 6B, red). The cumulative error distributions of CCS predictions give a more detailed indication of the error structure of the CCS predictions, with 56.1 and 86.4% of CCS predictions for the training data falling within 1 and 3% of the reference values, respectively (Figure 6C, blue). The model achieves similar performance on the test set data, with 44.7 and 81.2% of CCS predictions falling within 1 and 3% of the reference values, respectively (Figure 6C, red). Taken together, these performance metrics indicate that this final model is capable of performing CCS prediction with high accuracy on diverse compounds and that this performance is robust when applied to unseen data.

Figure 5

Figure 5. Workflow describing the process for training and validating the final prediction model. First, MQNs are generated from the compound SMILES structure in addition to the m/z and MS adduct, and this data is stored in a database. The complete data set is randomly partitioned into a training set and test set, preserving the approximate distribution of CCS values between the two sets. The training set is then fit using K-Means clustering to find the dominant groupings within the data set in terms of chemical similarity. The data from each assigned cluster is then used to train an individual predictive model that is specialized for that group of compounds. Finally, the overall CCS prediction performance of this set of models is validated using the test set data by first assigning each sample to one of the fitted clusters then predicting CCS using the corresponding predictive model.

Figure 6

Figure 6. (A–C) Complete performance metrics for final predictive model on training (blue) and test (red) data. (A) R2 (B) mean/median absolute error and root mean squared error (C) proportion of predictions falling within 1, 3, 5, and 10% of reference values. (D–F) Comparison of CCS prediction performance between final model (purple) and DeepCCS (gold) on all data sets used for training DeepCCS. (D) R2 (E) mean/median absolute error and root mean squared error (F) mean/median relative error (MRE and MDRE).

We next compared our final model with DeepCCS, the other comprehensive CCS prediction model using a CNN trained directly on SMILES structures, albeit with limited coverage of small molecules. A comparison data set was assembled from the data sets used in the training and testing of DeepCCS (2298 compounds after selection of valid SMILES and MS adducts). Using this comparison data set, 1960 CCS values were predicted by DeepCCS, and the accuracy of these predictions was compared with our model by a variety of metrics (Figure 6D–F). The final prediction model presented here outperformed DeepCCS by all performance metrics, despite using far fewer parameters to fit the data. This higher CCS prediction performance is likely attributable to two reasons. First, the larger and more diverse collection of data used to train the model enables greater accuracy in predictions by learning more robust and generalizable trends from the training data. Second, the use of untargeted clustering to partition the data on the basis of common structural features allows this model to learn specific trends for different classes of chemicals, thus increasing the overall accuracy of CCS predictions through model specialization.

Building an All-in-One Web Interface for Querying the CCS Database and Accessing the Prediction Model

To increase the accessibility of the database and the prediction model to the field, we have assembled the combined CCS database and the final prediction model into a convenient web interface (https://CCSbase.net). This interface allows users to query the database for reference CCS values with fine filtering control, including name, mass with accuracy, CCS with accuracy, polarity, SMILES structures, adduct types, charge state, and fuzzy search. This easily accessible database offers complementary and broader coverage (7678 entries) in comparison with the existing CCS databases, (35,38,44,45) including the carefully assembled CCS compendium on DTIM CCS values (3833 values). (44) This interface also allows rapid prediction of CCS values (with confident prediction for six different MS adducts) directly from SMILES structures using the final cluster-based prediction model discussed above. Batch query and prediction can be achieved with a simple single CSV file input. All results are directly viewable on the interface in the form of a table and in an interactive CCS-mass plot with the main trendline of the entire database being the background. The results are also downloadable as a CSV file. This platform can be easily built into existing metabolomics workflows and thus serve as a useful tool in the identification of unknowns from large-scale untargeted analyses.
The primary utility of the CCS database and predictive model are to support unknown compound identification by CCS. The metabolomics standards initiative (MSI) (46) defines 4 annotation levels reflecting the rigor of compound identification. Level 1 annotations are obtained from matching at least two orthogonal properties, such as m/z, CCS, or MS/MS, against values determined experimentally from authentic standards. Level 2 annotations are the same, with the exception that the reference measured values are taken from a secondary source (like the literature). Level 3 annotations are obtained when reference measurements are not directly available for a compound, but can be inferred based on existing measurements from similar compounds. Level 4 annotations correspond to compounds that are unidentified, but can be differentiated from other signals. Identifications based on measured values from the CCS database would therefore constitute level 2 annotations, i.e., using measured m/z and CCS as orthogonal properties, while those made using CCS values generated using the predictive model would constitute level 3 annotations (i.e., using m/z and predicted CCS values).

Conclusions

Click to copy section linkSection link copied!

The ability to predict high-quality CCS values for unknowns is a key step toward using CCS as a broadly applicable identifier for metabolites. Several major advances in ML-based CCS prediction are achieved in this work. First, by assembling the largest CCS database to date, a broad coverage of chemical structural diversity is achieved. Second, by performing statistical analysis of the structural features of all compounds, we have identified structural features that display high correlation with CCS values (Figure 3). Such correlation has not been systemically examined previously. Third, the use of structure-related MQNs as MDs is more relevant to the structure-dependent CCS than MDs reflecting the physical properties that were often used in the literature. Fourth, by breaking down the structural diversity using structure-based unsupervised clustering in combination with individual prediction models, the integrated model displays greatly improved performance and generalizability than without clustering. Importantly, this model also provides interpretable results based on the cluster that the unknown is assigned to, unlike previous work mostly using “black box” prediction models. Finally, we have built an easily accessible all-in-one web interface for efficient querying of the database and the prediction model. We anticipate continuous growth of this database and improvement of the prediction model as molecules with additional structural features are added.

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.9b05772.

  • All code used to assemble the CCS database and train predictive models is available on GitHub (https://github.com/dylanhross/c3sdb); SI includes Experimental Section and schema; MS adduct encodings, molecular quantum numbers (MQNs), top three features contributing to separation along PC3, CCS prediction accuracy of LipidCCS on different chemical classes, and feature selection trials; and additional Results and Discussion (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
  • Authors
    • Dylan H. Ross - Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
    • Jang Ho Cho - Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
  • Author Contributions

    The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

This work was supported by the Drug Metabolism Transport and Pharmacogenetics Research Fund of the School of Pharmacy at the University of Washington (UW), UW CoMotion Innovation Gap Fund, and startup funds from the Department of Medicinal Chemistry to L.X.

References

Click to copy section linkSection link copied!

This article references 46 other publications.

  1. 1
    Prakash, C.; Shaffer, C. L.; Nedderman, A. Mass Spectrom. Rev. 2007, 26, 340369,  DOI: 10.1002/mas.20128
  2. 2
    Blazenovic, I.; Kind, T.; Ji, J.; Fiehn, O. Metabolites 2018, 8, 31,  DOI: 10.3390/metabo8020031
  3. 3
    Clemmer, D. E.; Hudgins, R. R.; Jarrold, M. F. J. Am. Chem. Soc. 1995, 117, 1014110142,  DOI: 10.1021/ja00145a037
  4. 4
    von Helden, G.; Wyttenbach, T.; Bowers, M. T. Science 1995, 267, 14831485,  DOI: 10.1126/science.267.5203.1483
  5. 5
    McLean, J. A.; Ruotolo, B. T.; Gillig, K. J.; Russell, D. H. Int. J. Mass Spectrom. 2005, 240, 301315,  DOI: 10.1016/j.ijms.2004.10.003
  6. 6
    Kanu, B.; Dwivedi, P.; Tam, M.; Matz, L.; Hill, H. H., Jr. J. Mass Spectrom. 2008, 43, 122,  DOI: 10.1002/jms.1383
  7. 7
    Fenn, L. S.; Kliman, M.; Mahsut, A.; Zhao, S. R.; McLean, J. A. Anal. Bioanal. Chem. 2009, 394, 235244,  DOI: 10.1007/s00216-009-2666-3
  8. 8
    Pringle, S. D.; Giles, K.; Wildgoose, J. L.; Williams, J. P.; Slade, S. E.; Thalassinos, K.; Bateman, R. H.; Bowers, M. T.; Scrivens, J. H. Int. J. Mass Spectrom. 2007, 261, 112,  DOI: 10.1016/j.ijms.2006.07.021
  9. 9
    May, J. C.; McLean, J. A. Anal. Chem. 2015, 87, 14221436,  DOI: 10.1021/ac504720m
  10. 10
    Mesleh, M. F.; Hunter, J. M.; Shvartsburg, A. A.; Schatz, G. C.; Jarrold, M. F. J. Phys. Chem. 1996, 100, 1608216086,  DOI: 10.1021/jp961623v
  11. 11
    Hinnenkamp, V.; Klein, J.; Meckelmann, S. W.; Balsaa, P.; Schmidt, T. C.; Schmitz, O. J. Anal. Chem. 2018, 90, 1204212050,  DOI: 10.1021/acs.analchem.8b02711
  12. 12
    Stow, S. M.; Causon, T. J.; Zheng, X.; Kurulugama, R. T.; Mairinger, T.; May, J. C.; Rennie, E. E.; Baker, E. S.; Smith, R. D.; McLean, J. A.; Hann, S.; Fjeldsted, J. C. Anal. Chem. 2017, 89, 90489055,  DOI: 10.1021/acs.analchem.7b01729
  13. 13
    Shvartsburg, A.; Jarrold, M. F. Chem. Phys. Lett. 1996, 261, 8691,  DOI: 10.1016/0009-2614(96)00941-4
  14. 14
    Kim, H. I.; Kim, H.; Pang, E. S.; Ryu, E. K.; Beegle, L. W.; Loo, J. A.; Goddard, W. A.; Kanik, I. Anal. Chem. 2009, 81, 82898297,  DOI: 10.1021/ac900672a
  15. 15
    Kim, H.; Kim, H. I.; Johnson, P. V.; Beegle, L. W.; Beauchamp, J. L.; Goddard, W. A.; Kanik, I. Anal. Chem. 2008, 80, 1928,  DOI: 10.1021/ac701888e
  16. 16
    Campuzano, I.; Bush, M. F.; Robinson, C. V.; Beaumont, C.; Richardson, K.; Kim, H.; Kim, H. I. Anal. Chem. 2012, 84, 10261033,  DOI: 10.1021/ac202625t
  17. 17
    Larriba, C.; Hogan, C. J., Jr. J. Phys. Chem. A 2013, 117, 38873901,  DOI: 10.1021/jp312432z
  18. 18
    Ewing, S. A.; Donor, M. T.; Wilson, J. W.; Prell, J. S. J. Am. Soc. Mass Spectrom. 2017, 28, 587596,  DOI: 10.1007/s13361-017-1594-2
  19. 19
    Lee, J. W.; Lee, H. H. L.; Davidson, K. L.; Bush, M. F.; Kim, H. I. Analyst 2018, 143, 17861796,  DOI: 10.1039/C8AN00270C
  20. 20
    Zanotto, L.; Heerdt, G.; Souza, P. C. T.; Araujo, G.; Skaf, M. S. J. Comput. Chem. 2018, 39, 16751681,  DOI: 10.1002/jcc.25199
  21. 21
    Colby, S. M.; Thomas, D. G.; Nunez, J. R.; Baxter, D. J.; Glaesemann, K. R.; Brown, J. M.; Pirrung, M. A.; Govind, N.; Teeguarden, J. G.; Metz, T. O.; Renslow, R. S. Anal. Chem. 2019, 91, 43464356,  DOI: 10.1021/acs.analchem.8b04567
  22. 22
    Zhou, Z.; Shen, X.; Tu, J.; Zhu, Z. J. Anal. Chem. 2016, 88, 1108411091,  DOI: 10.1021/acs.analchem.6b03091
  23. 23
    Zhou, Z.; Tu, J.; Xiong, X.; Shen, X.; Zhu, Z. J. Anal. Chem. 2017, 89, 95599566,  DOI: 10.1021/acs.analchem.7b02625
  24. 24
    Bijlsma, L.; Bade, R.; Celma, A.; Mullin, L.; Cleland, G.; Stead, S.; Hernandez, F.; Sancho, J. V. Anal. Chem. 2017, 89, 65836589,  DOI: 10.1021/acs.analchem.7b00741
  25. 25
    Soper-Hopper, M. T.; Petrov, A. S.; Howard, J. N.; Yu, S. S.; Forsythe, J. G.; Grover, M. A.; Fernandez, F. M. Chem. Commun. (Cambridge, U. K.) 2017, 53, 76247627,  DOI: 10.1039/C7CC04257D
  26. 26
    Mollerup, C. B.; Mardal, M.; Dalsgaard, P. W.; Linnet, K.; Barron, L. P. J. Chromatogr., A 2018, 1542, 8288,  DOI: 10.1016/j.chroma.2018.02.025
  27. 27
    Plante, P. L.; Francovic-Fontaine, E.; May, J. C.; McLean, J. A.; Baker, E. S.; Laviolette, F.; Marchand, M.; Corbeil, J. Anal. Chem. 2019, 91, 51915199,  DOI: 10.1021/acs.analchem.8b05821
  28. 28
    Hines, K.; Herron, J.; Xu, L. J. Lipid Res. 2017, 58, 809819,  DOI: 10.1194/jlr.D074724
  29. 29
    Hines, K. M.; Waalkes, A.; Penewit, K.; Holmes, E. A.; Salipante, S. J.; Werth, B. J.; Xu, L. mSphere 2017, 2, e00492-17  DOI: 10.1128/mSphere.00492-17
  30. 30
    Hines, K. M.; Xu, L. Chem. Phys. Lipids 2019, 219, 1522,  DOI: 10.1016/j.chemphyslip.2019.01.007
  31. 31
    Groessl, M.; Graf, S.; Knochenmuss, R. Analyst 2015, 140, 69046911,  DOI: 10.1039/C5AN00838G
  32. 32
    Hines, K. M.; Ross, D. H.; Davidson, K. L.; Bush, M. F.; Xu, L. Anal. Chem. 2017, 89, 90239030,  DOI: 10.1021/acs.analchem.7b01709
  33. 33
    May, J. C.; Goodwin, C. R.; Lareau, N. M.; Leaptrot, K. L.; Morris, C. B.; Kurulugama, R. T.; Mordehai, A.; Klein, C.; Barry, W.; Darland, E.; Overney, G.; Imatani, K.; Stafford, G. C.; Fjeldsted, J. C.; McLean, J. A. Anal. Chem. 2014, 86, 21072116,  DOI: 10.1021/ac4038448
  34. 34
    Paglia, G.; Williams, J. P.; Menikarachchi, L.; Thompson, J. W.; Tyldesley-Worster, R.; Halldorsson, S.; Rolfsson, O.; Moseley, A.; Grant, D.; Langridge, J.; Palsson, B. O.; Astarita, G. Anal. Chem. 2014, 86, 39853993,  DOI: 10.1021/ac500405x
  35. 35
    Zheng, X.; Aly, N. A.; Zhou, Y.; Dupuis, K. T.; Bilbao, A.; Paurus, V. L.; Orton, D. J.; Wilson, R.; Payne, S. H.; Smith, R. D.; Baker, E. S. Chem. Sci. 2017, 8, 77247736,  DOI: 10.1039/C7SC03464D
  36. 36
    Zhou, Z.; Xiong, X.; Zhu, Z. J. Bioinformatics 2017, 33, 22352237,  DOI: 10.1093/bioinformatics/btx140
  37. 37
    Nichols, C. M.; May, J. C.; Sherrod, S. D.; McLean, J. A. Analyst 2018, 143, 15561559,  DOI: 10.1039/C8AN00056E
  38. 38
    Righetti, L.; Bergmann, A.; Galaverna, G.; Rolfsson, O.; Paglia, G.; Dall’Asta, C. Anal. Chim. Acta 2018, 1014, 5057,  DOI: 10.1016/j.aca.2018.01.047
  39. 39
    Leaptrot, K. L.; May, J. C.; Dodds, J. N.; McLean, J. A. Nat. Commun. 2019, 10, 985,  DOI: 10.1038/s41467-019-08897-5
  40. 40
    Blaženović, I.; Shen, T.; Mehta, S. S.; Kind, T.; Ji, J.; Piparo, M.; Cacciola, F.; Mondello, L.; Fiehn, O. Anal. Chem. 2018, 90 (18), 1075810764,  DOI: 10.1021/acs.analchem.8b01527
  41. 41
    Hines, K. M.; May, J. C.; McLean, J. A.; Xu, L. Anal. Chem. 2016, 88, 73297336,  DOI: 10.1021/acs.analchem.6b01728
  42. 42
    Reymond, J.-L.; Awale, M. ACS Chem. Neurosci. 2012, 3, 649657,  DOI: 10.1021/cn3000422
  43. 43
    Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. J. Mach Learn Res. 2011, 12, 28252830
  44. 44
    Picache, J. A.; Rose, B. S.; Balinski, A.; Leaptrot, K. L.; Sherrod, S. D.; May, J. C.; McLean, J. A. Chem. Sci. 2019, 10, 983993,  DOI: 10.1039/C8SC04396E
  45. 45
    Hernandez-Mesa, M.; Le Bizec, B.; Monteau, F.; Garcia-Campana, A. M.; Dervilly-Pinel, G. Anal. Chem. 2018, 90, 46164625,  DOI: 10.1021/acs.analchem.7b05117
  46. 46
    Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W. M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R. Metabolomics 2007, 3 (3), 211221,  DOI: 10.1007/s11306-007-0082-2

Cited By

Click to copy section linkSection link copied!
Citation Statements
Explore this article's citation statements on scite.ai

This article is cited by 100 publications.

  1. Kaylie I. Kirkwood-Donelson, Prashant Rai, Lalith Perera, Michael B. Fessler, Alan K. Jarmusch. Bromine-Based Derivatization of Carboxyl-Containing Metabolites for Liquid Chromatography–Trapped Ion Mobility Spectrometry–Mass Spectrometry. Journal of the American Society for Mass Spectrometry 2025, Article ASAP.
  2. Anjana Elapavalore, Dylan H. Ross, Valentin Grouès, Dagny Aurich, Allison M. Krinsky, Sunghwan Kim, Paul A. Thiessen, Jian Zhang, James N. Dodds, Erin S. Baker, Evan E. Bolton, Libin Xu, Emma L. Schymanski. PubChemLite Plus Collision Cross Section (CCS) Values for Enhanced Interpretation of Nontarget Environmental Data. Environmental Science & Technology Letters 2025, 12 (2) , 166-174. https://doi.org/10.1021/acs.estlett.4c01003
  3. Xingliang He, Bin Wu, Xing Guo, Fulong Deng, Hong’en Sun, Zhihao He, Yixiang Duan, Zhongjun Zhao. Twisted Dipole Ion Guide (TDIG) for Flexible Ion Transfer in Atmospheric Pressure Ionization Mass Spectrometry. Analytical Chemistry 2025, 97 (2) , 1070-1077. https://doi.org/10.1021/acs.analchem.4c03255
  4. Thomas O. Metz, Christine H. Chang, Vasuk Gautam, Afia Anjum, Siyang Tian, Fei Wang, Sean M. Colby, Jamie R. Nunez, Madison R. Blumer, Arthur S. Edison, Oliver Fiehn, Dean P. Jones, Shuzhao Li, Edward T. Morgan, Gary J. Patti, Dylan H. Ross, Madelyn R. Shapiro, Antony J. Williams, David S. Wishart. Introducing “Identification Probability” for Automated and Transferable Assessment of Metabolite Identification Confidence in Metabolomics and Related Studies. Analytical Chemistry 2025, 97 (1) , 1-11. https://doi.org/10.1021/acs.analchem.4c04060
  5. Marcelino Varona, Daniel P. Dobson, José G. Napolitano, Rekha Thomas, Jessica L. Ochoa, David J. Russell, Christopher M. Crittenden. High Resolution Ion Mobility Enables the Structural Characterization of Atropisomers of GDC-6036, a KRAS G12C Covalent Inhibitor. Journal of the American Society for Mass Spectrometry 2024, 35 (11) , 2586-2595. https://doi.org/10.1021/jasms.4c00103
  6. Mithony Keng, Kenneth M Merz, Jr.. Eliminating the Deadwood: A Machine Learning Model for CCS Knowledge-Based Conformational Focusing for Lipids. Journal of Chemical Information and Modeling 2024, 64 (20) , 7864-7872. https://doi.org/10.1021/acs.jcim.4c01051
  7. Cheng Wang, Chuang Yuan, Yahui Wang, Yuying Shi, Tao Zhang, Gary J. Patti. Predicting Collision Cross-Section Values for Small Molecules through Chemical Class-Based Multimodal Graph Attention Network. Journal of Chemical Information and Modeling 2024, 64 (16) , 6305-6315. https://doi.org/10.1021/acs.jcim.3c01934
  8. Sara M. de Cripan, Trisha Arora, Adrià Olomí, Núria Canela, Gary Siuzdak, Xavier Domingo-Almenara. Predicting the Predicted: A Comparison of Machine Learning-Based Collision Cross-Section Prediction Models for Small Molecules. Analytical Chemistry 2024, 96 (22) , 9088-9096. https://doi.org/10.1021/acs.analchem.4c00630
  9. Ryan Nguyen, Ryan P. Seguin, Dylan H. Ross, Pengyu Chen, Sean Richardson, Jennifer Liem, Yvonne S. Lin, Libin Xu. Development and Application of a Multidimensional Database for the Detection of Quaternary Ammonium Compounds and Their Phase I Hepatic Metabolites in Humans. Environmental Science & Technology 2024, 58 (14) , 6236-6249. https://doi.org/10.1021/acs.est.3c10845
  10. Pattipong Wisanpitayakorn, Sitanan Sartyoungkul, Alongkorn Kurilung, Yongyut Sirivatanauksorn, Wonnop Visessanguan, Nuankanya Sathirapongsasuti, Sakda Khoomrung. Accurate Prediction of Ion Mobility Collision Cross-Section Using Ion’s Polarizability and Molecular Mass with Limited Data. Journal of Chemical Information and Modeling 2024, 64 (5) , 1533-1542. https://doi.org/10.1021/acs.jcim.3c01491
  11. Jana M. Carpenter, Hannah M. Hynds, Kingsley Bimpeh, Kelly M. Hines. HILIC-IM-MS for Simultaneous Lipid and Metabolite Profiling of Bacteria. ACS Measurement Science Au 2024, 4 (1) , 104-116. https://doi.org/10.1021/acsmeasuresciau.3c00051
  12. Hannah M. Hynds, Kelly M. Hines. MOCCal: A Multiomic CCS Calibrator for Traveling Wave Ion Mobility Mass Spectrometry. Analytical Chemistry 2024, 96 (3) , 1185-1194. https://doi.org/10.1021/acs.analchem.3c04290
  13. Xue-Chao Song, Elena Canellas, Nicola Dreolin, Jeff Goshawk, Meilin Lv, Guangbo Qu, Cristina Nerin, Guibin Jiang. Application of Ion Mobility Spectrometry and the Derived Collision Cross Section in the Analysis of Environmental Organic Micropollutants. Environmental Science & Technology 2023, 57 (51) , 21485-21502. https://doi.org/10.1021/acs.est.3c03686
  14. Haosong Zhang, Mingdu Luo, Hongmiao Wang, Fandong Ren, Yandong Yin, Zheng-Jiang Zhu. AllCCS2: Curation of Ion Mobility Collision Cross-Section Atlas for Small Molecules Using Comprehensive Molecular Representations. Analytical Chemistry 2023, 95 (37) , 13913-13921. https://doi.org/10.1021/acs.analchem.3c02267
  15. Noelle Reimers, Quynh Do, Rutan Zhang, Angela Guo, Ryan Ostrander, Alyson Shoji, Chau Vuong, Libin Xu. Tracking the Metabolic Fate of Exogenous Arachidonic Acid in Ferroptosis Using Dual-Isotope Labeling Lipidomics. Journal of the American Society for Mass Spectrometry 2023, 34 (9) , 2016-2024. https://doi.org/10.1021/jasms.3c00181
  16. Yulemni Morel, Jace W. Jones. Utilization of LC–MS/MS and Drift Tube Ion Mobility for Characterizing Intact Oxidized Arachidonate-Containing Glycerophosphatidylethanolamine. Journal of the American Society for Mass Spectrometry 2023, 34 (8) , 1609-1620. https://doi.org/10.1021/jasms.3c00083
  17. Maria Mar Aparicio-Muriana, Renato Bruni, Francisco J. Lara, Monsalud del Olmo-Iruela, Maykel Hernandez-Mesa, Ana M. García-Campaña, Chiara Dall’Asta, Laura Righetti. Implementing the Use of Collision Cross Section Database for Phycotoxin Screening Analysis. Journal of Agricultural and Food Chemistry 2023, 71 (26) , 10178-10189. https://doi.org/10.1021/acs.jafc.3c01060
  18. Dylan Ross, Aivett Bilbao, Joon-Yong Lee, Xueyun Zheng. mzapy: An Open-Source Python Library Enabling Efficient Extraction and Processing of Ion Mobility Spectrometry-Mass Spectrometry Data in the MZA File Format. Analytical Chemistry 2023, 95 (25) , 9428-9431. https://doi.org/10.1021/acs.analchem.3c01653
  19. Samuel Cajahuaringa, Daniel L. Z. Caetano, Leandro N. Zanotto, Guido Araujo, Munir S. Skaf. MassCCS: A High-Performance Collision Cross-Section Software for Large Macromolecular Assemblies. Journal of Chemical Information and Modeling 2023, 63 (11) , 3557-3566. https://doi.org/10.1021/acs.jcim.3c00405
  20. Carter K. Asef, Markace A. Rainey, Brianna M. Garcia, Goncalo J. Gouveia, Amanda O. Shaver, Franklin E. Leach, III, Alison M. Morse, Arthur S. Edison, Lauren M. McIntyre, Facundo M. Fernández. Unknown Metabolite Identification Using Machine Learning Collision Cross-Section Prediction and Tandem Mass Spectrometry. Analytical Chemistry 2023, 95 (2) , 1047-1056. https://doi.org/10.1021/acs.analchem.2c03749
  21. Longchan Liu, Ziying Wang, Qian Zhang, Yuqi Mei, Linnan Li, Huwei Liu, Zhengtao Wang, Li Yang. Ion Mobility Mass Spectrometry for the Separation and Characterization of Small Molecules. Analytical Chemistry 2023, 95 (1) , 134-151. https://doi.org/10.1021/acs.analchem.2c02866
  22. Alberto Celma, Richard Bade, Juan Vicente Sancho, Félix Hernandez, Melissa Humphries, Lubertus Bijlsma. Prediction of Retention Time and Collision Cross Section (CCSH+, CCSH–, and CCSNa+) of Emerging Contaminants Using Multiple Adaptive Regression Splines. Journal of Chemical Information and Modeling 2022, 62 (22) , 5425-5434. https://doi.org/10.1021/acs.jcim.2c00847
  23. Narumol Jariyasopit, Suphitcha Limjiasahapong, Alongkorn Kurilung, Sitanan Sartyoungkul, Pattipong Wisanpitayakorn, Narong Nuntasaen, Chutima Kuhakarn, Vichai Reutrakul, Prasat Kittakoop, Yongyut Sirivatanauksorn, Sakda Khoomrung. Traveling Wave Ion Mobility-Derived Collision Cross Section Database for Plant Specialized Metabolites: An Application to Ventilago harmandiana Pierre. Journal of Proteome Research 2022, 21 (10) , 2481-2492. https://doi.org/10.1021/acs.jproteome.2c00413
  24. Xue-Chao Song, Elena Canellas, Nicola Dreolin, Jeff Goshawk, Cristina Nerin. Identification of Nonvolatile Migrates from Food Contact Materials Using Ion Mobility–High-Resolution Mass Spectrometry and in Silico Prediction Tools. Journal of Agricultural and Food Chemistry 2022, 70 (30) , 9499-9508. https://doi.org/10.1021/acs.jafc.2c03615
  25. Bailey S. Rose, Jody C. May, Allison R. Reardon, John A. McLean. Collision Cross-Section Calibration Strategy for Lipid Measurements in SLIM-Based High-Resolution Ion Mobility. Journal of the American Society for Mass Spectrometry 2022, 33 (7) , 1229-1237. https://doi.org/10.1021/jasms.2c00067
  26. Xue-Chao Song, Nicola Dreolin, Elena Canellas, Jeff Goshawk, Cristina Nerin. Prediction of Collision Cross-Section Values for Extractables and Leachables from Plastic Products. Environmental Science & Technology 2022, 56 (13) , 9463-9473. https://doi.org/10.1021/acs.est.2c02853
  27. David Izquierdo-Sandoval, David Fabregat-Safont, Leticia Lacalle-Bergeron, Juan V. Sancho, Félix Hernández, Tania Portoles. Benefits of Ion Mobility Separation in GC-APCI-HRMS Screening: From the Construction of a CCS Library to the Application to Real-World Samples. Analytical Chemistry 2022, 94 (25) , 9040-9047. https://doi.org/10.1021/acs.analchem.2c01118
  28. MaKayla Foster, Markace Rainey, Chandler Watson, James N. Dodds, Kaylie I. Kirkwood, Facundo M. Fernández, Erin S. Baker. Uncovering PFAS and Other Xenobiotics in the Dark Metabolome Using Ion Mobility Spectrometry, Mass Defect Analysis, and Machine Learning. Environmental Science & Technology 2022, 56 (12) , 9133-9143. https://doi.org/10.1021/acs.est.2c00201
  29. Dylan H. Ross, Ryan P. Seguin, Allison M. Krinsky, Libin Xu. High-Throughput Measurement and Machine Learning-Based Prediction of Collision Cross Sections for Drugs and Drug Metabolites. Journal of the American Society for Mass Spectrometry 2022, 33 (6) , 1061-1072. https://doi.org/10.1021/jasms.2c00111
  30. Amber D. Rolland, James S. Prell. Approaches to Heterogeneity in Native Mass Spectrometry. Chemical Reviews 2022, 122 (8) , 7909-7951. https://doi.org/10.1021/acs.chemrev.1c00696
  31. Xue-Chao Song, Elena Canellas, Nicola Dreolin, Jeff Goshawk, Cristina Nerin. A Collision Cross Section Database for Extractables and Leachables from Food Contact Materials. Journal of Agricultural and Food Chemistry 2022, 70 (14) , 4457-4466. https://doi.org/10.1021/acs.jafc.2c00724
  32. Xue-Chao Song, Nicola Dreolin, Tito Damiani, Elena Canellas, Cristina Nerin. Prediction of Collision Cross Section Values: Application to Non-Intentionally Added Substance Identification in Food Contact Materials. Journal of Agricultural and Food Chemistry 2022, 70 (4) , 1272-1281. https://doi.org/10.1021/acs.jafc.1c06989
  33. John R. F. B. Connolly, Jordi Munoz-Muriedas, Cris Lapthorn, David Higton, Johannes P. C. Vissers, Alison Webb, Claire Beaumont, Gordon J. Dear. Investigation into Small Molecule Isomeric Glucuronide Metabolite Differentiation Using In Silico and Experimental Collision Cross-Section Values. Journal of the American Society for Mass Spectrometry 2021, 32 (8) , 1976-1986. https://doi.org/10.1021/jasms.0c00427
  34. Leicheng Zhang, Tengfei Xu, Jingtao Zhang, Stephen Choong Chee Wong, Mark Ritchie, Han Wei Hou, Yulan Wang. Single Cell Metabolite Detection Using Inertial Microfluidics-Assisted Ion Mobility Mass Spectrometry. Analytical Chemistry 2021, 93 (30) , 10462-10468. https://doi.org/10.1021/acs.analchem.1c00106
  35. Cameron N. Naylor, Brian H. Clowers. Reevaluating the Role of Polarizability in Ion Mobility Spectrometry. Journal of the American Society for Mass Spectrometry 2021, 32 (3) , 618-627. https://doi.org/10.1021/jasms.0c00338
  36. Corey D. Broeckling, Linxing Yao, Giorgis Isaac, Marisa Gioioso, Valentin Ianchis, Johannes P.C. Vissers. Application of Predicted Collisional Cross Section to Metabolome Databases to Probabilistically Describe the Current and Future Ion Mobility Mass Spectrometry. Journal of the American Society for Mass Spectrometry 2021, 32 (3) , 661-669. https://doi.org/10.1021/jasms.0c00375
  37. Evelyn Rampler, Yasin El Abiead, Harald Schoeny, Mate Rusz, Felina Hildebrand, Veronika Fitz, Gunda Koellensperger. Recurrent Topics in Mass Spectrometry-Based Metabolomics and Lipidomics—Standardization, Coverage, and Throughput. Analytical Chemistry 2021, 93 (1) , 519-545. https://doi.org/10.1021/acs.analchem.0c04698
  38. Daniela Mesa Sanchez, Steve Creger, Veerupaksh Singla, Ruwan T. Kurulugama, John Fjeldsted, Julia Laskin. Ion Mobility-Mass Spectrometry Imaging Workflow. Journal of the American Society for Mass Spectrometry 2020, 31 (12) , 2437-2442. https://doi.org/10.1021/jasms.0c00142
  39. Dylan H. Ross, Jang Ho Cho, Rutan Zhang, Kelly M. Hines, Libin Xu. LiPydomics: A Python Package for Comprehensive Prediction of Lipid Collision Cross Sections and Retention Times and Analysis of Ion Mobility-Mass Spectrometry-Based Lipidomics Data. Analytical Chemistry 2020, 92 (22) , 14967-14975. https://doi.org/10.1021/acs.analchem.0c02560
  40. Laura Righetti, Nicola Dreolin, Alberto Celma, Mike McCullagh, Gitte Barknowitz, Juan V. Sancho, Chiara Dall’Asta. Travelling Wave Ion Mobility-Derived Collision Cross Section for Mycotoxins: Investigating Interlaboratory and Interplatform Reproducibility. Journal of Agricultural and Food Chemistry 2020, 68 (39) , 10937-10943. https://doi.org/10.1021/acs.jafc.0c04498
  41. Jaqueline A. Picache, Jody C. May, John A. McLean. Chemical Class Prediction of Unknown Biomolecules Using Ion Mobility-Mass Spectrometry and Machine Learning: Supervised Inference of Feature Taxonomy from Ensemble Randomization. Analytical Chemistry 2020, 92 (15) , 10759-10767. https://doi.org/10.1021/acs.analchem.0c02137
  42. Manon Meunier, Martina Haack, Dania Awad, Thomas Brück, Khalijah Awang, Marc Litaudon, Frédéric Saubion, Marc Legeay, Dimitri Bréard, David Guilet, Séverine Derbré, Andreas Schinkovitz. Matrix free laser desorption ionization coupled to trapped ion mobility mass spectrometry: an innovative approach for isomer differentiation and molecular network visualization. Talanta 2025, 287 , 127626. https://doi.org/10.1016/j.talanta.2025.127626
  43. Henrik Hupatz, Ida Rahu, Wei-Chieh Wang, Pilleriin Peets, Emma H. Palm, Anneli Kruve. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Analytical and Bioanalytical Chemistry 2025, 417 (3) , 473-493. https://doi.org/10.1007/s00216-024-05471-x
  44. Soumyadeep Sarkar, Xueyun Zheng, Geremy C. Clair, Yu Mi Kwon, Youngki You, Adam C. Swensen, Bobbie-Jo M. Webb-Robertson, Ernesto S. Nakayasu, Wei-Jun Qian, Thomas O. Metz. Exploring new frontiers in type 1 diabetes through advanced mass-spectrometry-based molecular measurements. Trends in Molecular Medicine 2024, 30 (12) , 1137-1151. https://doi.org/10.1016/j.molmed.2024.07.009
  45. Chloe Engler Hart, António José Preto, Shaurya Chanana, David Healey, Tobias Kind, Daniel Domingo-Fernández. Evaluating the generalizability of graph neural networks for predicting collision cross section. Journal of Cheminformatics 2024, 16 (1) https://doi.org/10.1186/s13321-024-00899-w
  46. Dmitriy D. Matyushin, Ivan A. Burov, Anastasia Yu. Sholokhova. Uncertainty Quantification and Flagging of Unreliable Predictions in Predicting Mass Spectrometry-Related Properties of Small Molecules Using Machine Learning. International Journal of Molecular Sciences 2024, 25 (23) , 13077. https://doi.org/10.3390/ijms252313077
  47. Laura Schlüter, Kine Østnes Hansen, Johan Isaksson, Jeanette Hammer Andersen, Espen Holst Hansen, Jörn Kalinowski, Yannik Karl-Heinz Schneider. Discovery of thiazostatin D/E using UPLC-HR-MS2-based metabolomics and σ-factor engineering of Actinoplanes sp. SE50/110. Frontiers in Bioengineering and Biotechnology 2024, 12 https://doi.org/10.3389/fbioe.2024.1497138
  48. Mehmet Atakay, Hacı Mehmet Kayılı, Ülkü Güler, Bekir Salih. Ion Mobility-Mass Spectrometry for Macromolecule Analysis. 2024, 1-35. https://doi.org/10.2174/9789815050059122020003
  49. Jakob Koch, Lukas Neumann, Katharina Lackner, Monica L. Fernández-Quintero, Katrin Watschinger, Markus A. Keller. Trapped ion mobility spectrometry-guided molecular discrimination between plasmalogens and other ether lipids in lipidomics experiments. 2024https://doi.org/10.1101/2024.10.23.619801
  50. Ting Xie, Qiong Yang, Jinyu Sun, Hailiang Zhang, Yue Wang, Zhimin Zhang, Hongmei Lu. Large-scale prediction of collision cross-section with very deep graph convolutional network for small molecule identification. Chemometrics and Intelligent Laboratory Systems 2024, 252 , 105177. https://doi.org/10.1016/j.chemolab.2024.105177
  51. Thomas O. Metz, Christine H. Chang, Vasuk Gautam, Afia Anjum, Siyang Tian, Fei Wang, Sean M. Colby, Jamie R. Nunez, Madison R. Blumer, Arthur S. Edison, Oliver Fiehn, Dean P. Jones, Shuzhao Li, Edward T. Morgan, Gary J. Patti, Dylan H. Ross, Madelyn R. Shapiro, Antony J. Williams, David S. Wishart. Introducing ‘identification probability’ for automated and transferable assessment of metabolite identification confidence in metabolomics and related studies. 2024https://doi.org/10.1101/2024.07.30.605945
  52. Robbin Bouwmeester, Keith Richardson, Richard Denny, Ian D. Wilson, Sven Degroeve, Lennart Martens, Johannes P.C. Vissers. Predicting ion mobility collision cross sections and assessing prediction variation by combining conventional and data driven modeling. Talanta 2024, 274 , 125970. https://doi.org/10.1016/j.talanta.2024.125970
  53. Xingliang He, Xing Guo, Fulong Deng, Pengyu Zeng, Bin Wu, Hong'en Sun, Zhongjun Zhao, Yixiang Duan. A study of the transient gas flow affected ion transmission in atmospheric pressure interfaces based on large eddy simulation for electrospray ionization mass spectrometry. Talanta 2024, 274 , 125980. https://doi.org/10.1016/j.talanta.2024.125980
  54. Dylan H. Ross, Harsh Bhotika, Xueyun Zheng, Richard D. Smith, Kristin E. Burnum‐Johnson, Aivett Bilbao. Computational tools and algorithms for ion mobility spectrometry‐mass spectrometry. PROTEOMICS 2024, 24 (12-13) https://doi.org/10.1002/pmic.202200436
  55. Hongda Wang, Lin Zhang, Xiaohang Li, Mengxiao Sun, Meiting Jiang, Xiaojian Shi, Xiaoyan Xu, Mengxiang Ding, Boxue Chen, Heshui Yu, Zheng Li, Dean Guo, Wenzhi Yang. Machine learning prediction for constructing a universal multidimensional information library of Panax saponins (ginsenosides). Food Chemistry 2024, 439 , 138106. https://doi.org/10.1016/j.foodchem.2023.138106
  56. Alongkorn Kurilung, Suphitcha Limjiasahapong, Khwanta Kaewnarin, Pattipong Wisanpitayakorn, Narumol Jariyasopit, Kwanjeera Wanichthanarak, Sitanan Sartyoungkul, Stephen Choong Chee Wong, Nuankanya Sathirapongsasuti, Chagriya Kitiyakara, Yongyut Sirivatanauksorn, Sakda Khoomrung. Measurement of very low-molecular weight metabolites by traveling wave ion mobility and its use in human urine samples. Journal of Pharmaceutical Analysis 2024, 14 (5) , 100921. https://doi.org/10.1016/j.jpha.2023.12.011
  57. Francisco José Díaz-Galiano, María Murcia-Morales, Víctor Cutillas, Amadeo R. Fernández-Alba. Ions on the move: The combination of ion mobility and food metabolomics. Trends in Food Science & Technology 2024, 147 , 104446. https://doi.org/10.1016/j.tifs.2024.104446
  58. Ignacio Pérez-Victoria. Natural Products Dereplication: Databases and Analytical Methods. 2024, 1-56. https://doi.org/10.1007/978-3-031-59567-7_1
  59. Guillermo Ramajo, Constantino García, Alberto Gil, Abraham Otero. Training Deep Learning Neural Networks for Predicting CCS Using the METLIN-CCS Dataset. 2024, 225-236. https://doi.org/10.1007/978-3-031-64636-2_17
  60. Orobola E. Olajide, Kimberly Y. Kartowikromo, Ahmed M. Hamid. Ion Mobility Mass Spectrometry: Instrumentation and Applications. 2023https://doi.org/10.5772/intechopen.1002767
  61. Mingdu Luo, Yandong Yin, Zhiwei Zhou, Haosong Zhang, Xi Chen, Hongmiao Wang, Zheng-Jiang Zhu. A mass spectrum-oriented computational method for ion mobility-resolved untargeted metabolomics. Nature Communications 2023, 14 (1) https://doi.org/10.1038/s41467-023-37539-0
  62. Renfeng Guo, Youjia Zhang, Yuxuan Liao, Qiong Yang, Ting Xie, Xiaqiong Fan, Zhonglong Lin, Yi Chen, Hongmei Lu, Zhimin Zhang. Highly accurate and large-scale collision cross sections prediction with graph neural networks. Communications Chemistry 2023, 6 (1) https://doi.org/10.1038/s42004-023-00939-w
  63. Juliane Hollender, Emma L. Schymanski, Lutz Ahrens, Nikiforos Alygizakis, Frederic Béen, Lubertus Bijlsma, Andrea M. Brunner, Alberto Celma, Aurelie Fildier, Qiuguo Fu, Pablo Gago-Ferrero, Ruben Gil-Solsona, Peter Haglund, Martin Hansen, Sarit Kaserzon, Anneli Kruve, Marja Lamoree, Christelle Margoum, Jeroen Meijer, Sylvain Merel, Cassandra Rauert, Pawel Rostkowski, Saer Samanipour, Bastian Schulze, Tobias Schulze, Randolph R. Singh, Jaroslav Slobodnik, Teresa Steininger-Mairinger, Nikolaos S. Thomaidis, Anne Togola, Katrin Vorkamp, Emmanuelle Vulliet, Linyan Zhu, Martin Krauss. NORMAN guidance on suspect and non-target screening in environmental monitoring. Environmental Sciences Europe 2023, 35 (1) https://doi.org/10.1186/s12302-023-00779-4
  64. Amy Li, Libin Xu. MALDI-IM-MS Imaging of Brain Sterols and Lipids in a Mouse Model of Smith-Lemli-Opitz Syndrome. 2023https://doi.org/10.1101/2023.10.02.560415
  65. Rutan Zhang, Nate K. Ashford, Amy Li, Dylan H. Ross, Brian J. Werth, Libin Xu. High-throughput analysis of lipidomic phenotypes of methicillin-resistant Staphylococcus aureus by coupling in situ 96-well cultivation and HILIC-ion mobility-mass spectrometry. Analytical and Bioanalytical Chemistry 2023, 415 (25) , 6191-6199. https://doi.org/10.1007/s00216-023-04890-6
  66. Jia-Hui Wen, An-Qi Guo, Meng-Ning Li, Hua Yang. A structural similarity networking assisted collision cross-section prediction interval filtering strategy for multi-compound identification of complex matrix by ion-mobility mass spectrometry. Analytica Chimica Acta 2023, 1278 , 341720. https://doi.org/10.1016/j.aca.2023.341720
  67. Anne Miller, Elisa M. York, Sylwia A. Stopka, Juan Ramón Martínez-François, Md Amin Hossain, Gerard Baquer, Michael S. Regan, Nathalie Y. R. Agar, Gary Yellen. Spatially resolved metabolomics and isotope tracing reveal dynamic metabolic responses of dentate granule neurons with acute stimulation. Nature Metabolism 2023, 5 (10) , 1820-1835. https://doi.org/10.1038/s42255-023-00890-z
  68. Kimberly Y. Kartowikromo, Orobola E. Olajide, Ahmed M. Hamid. Collision cross section measurement and prediction methods in omics. Journal of Mass Spectrometry 2023, 58 (9) https://doi.org/10.1002/jms.4973
  69. Ian Ramtanon, Alexandra Berlioz-Barbier, Simon Remy, Jean-Hugues Renault, Agnès Le Masle. A combined liquid chromatography – trapped ion mobility – tandem high-resolution mass spectrometry and multivariate analysis approach for the determination of enzymatic reactivity descriptors in biomass hydrolysates. Journal of Chromatography A 2023, 1706 , 464277. https://doi.org/10.1016/j.chroma.2023.464277
  70. Randolph R. Singh, Yann Aminot, Karine Héas-Moisan, Hugues Preud'homme, Catherine Munschy. Cracked and shucked: GC-APCI-IMS-HRMS facilitates identification of unknown halogenated organic chemicals in French marine bivalves. Environment International 2023, 178 , 108094. https://doi.org/10.1016/j.envint.2023.108094
  71. Dylan H. Ross, Aivett Bilbao, Richard D. Smith, Xueyun Zheng. Ion Mobility Spectrometry‐Mass Spectrometry for High‐Throughput Analysis. 2023, 183-213. https://doi.org/10.1002/9781119678496.ch6
  72. Anne Miller, Elisa York, Sylwia Stopka, Juan Martínez-François, Md Amin Hossain, Gerard Baquer, Michael Regan, Nathalie Agar, Gary Yellen. Spatially resolved metabolomics and isotope tracing reveal dynamic metabolic responses of dentate granule neurons with acute stimulation. 2023https://doi.org/10.21203/rs.3.rs-2276903/v1
  73. He Xingliang, Guo Xing, Wu Mengfan, Deng Fulong, Zeng Pengyu, Zhao Zhongjun, Duan Yixiang. From droplets to ions: a comprehensive and consecutive ion formation modelling in atmosphere pressure interface of electrospray ionization mass spectrometry. The Analyst 2023, 148 (14) , 3174-3178. https://doi.org/10.1039/D3AN00607G
  74. Noelle Reimers, Quynh Do, Rutan Zhang, Angela Guo, Ryan Ostrander, Alyson Shoji, Chau Vuong, Libin Xu. Tracking the Metabolic Fate of Exogenous Arachidonic Acid in Ferroptosis Using Dual-Isotope Labeling Lipidomics. 2023https://doi.org/10.1101/2023.05.28.542640
  75. Elias Iturrospe, Rani Robeyns, Katyeny Manuela da Silva, Maria van de Lavoir, Joost Boeckmans, Tamara Vanhaecke, Alexander L. N. van Nuijs, Adrian Covaci. Metabolic signature of HepaRG cells exposed to ethanol and tumor necrosis factor alpha to study alcoholic steatohepatitis by LC–MS-based untargeted metabolomics. Archives of Toxicology 2023, 97 (5) , 1335-1353. https://doi.org/10.1007/s00204-023-03470-y
  76. Xiaohang Li, Hongda Wang, Meiting Jiang, Mengxiang Ding, Xiaoyan Xu, Bei Xu, Yadan Zou, Yuetong Yu, Wenzhi Yang. Collision Cross Section Prediction Based on Machine Learning. Molecules 2023, 28 (10) , 4050. https://doi.org/10.3390/molecules28104050
  77. Kaylie I. Kirkwood, Melanie T. Odenkirk, Erin S. Baker. Ion Mobility Spectrometry. 2023, 151-182. https://doi.org/10.1002/9783527836512.ch6
  78. Nikiforos Alygizakis, Francois Lestremau, Pablo Gago-Ferrero, Rubén Gil-Solsona, Katarzyna Arturi, Juliane Hollender, Emma L. Schymanski, Valeria Dulio, Jaroslav Slobodnik, Nikolaos S. Thomaidis. Towards a harmonized identification scoring system in LC-HRMS/MS based non-target screening (NTS) of emerging contaminants. TrAC Trends in Analytical Chemistry 2023, 159 , 116944. https://doi.org/10.1016/j.trac.2023.116944
  79. Marie Lenski, Saïd Maallem, Gianni Zarcone, Guillaume Garçon, Jean-Marc Lo-Guidice, Sébastien Anthérieu, Delphine Allorge. Prediction of a Large-Scale Database of Collision Cross-Section and Retention Time Using Machine Learning to Reduce False Positive Annotations in Untargeted Metabolomics. Metabolites 2023, 13 (2) , 282. https://doi.org/10.3390/metabo13020282
  80. Alberto Celma. The Use of Ion Mobility Separation as an Additional Dimension for the Screening of Organic Micropollutants in Environmental Samples. 2023https://doi.org/10.1007/698_2023_1055
  81. Katyeny Manuela da Silva, Maria van de Lavoir, Rani Robeyns, Elias Iturrospe, Lisa Verheggen, Adrian Covaci, Alexander L. N. van Nuijs. Guidelines and considerations for building multidimensional libraries for untargeted MS-based metabolomics. Metabolomics 2023, 19 (1) https://doi.org/10.1007/s11306-022-01965-w
  82. Yuping Cai, Zhiwei Zhou, Zheng-Jiang Zhu. Advanced analytical and informatic strategies for metabolite annotation in untargeted metabolomics. TrAC Trends in Analytical Chemistry 2023, 158 , 116903. https://doi.org/10.1016/j.trac.2022.116903
  83. Ting Xie, Qiong Yang, Jinyu Sun, Hailiang Zhang, Yue Wang, Zhimin zhang, Hongmei Lu. Large-Scale Prediction of Collision Cross-Section with Graph Convolutional Network for Compound Identification. 2023https://doi.org/10.2139/ssrn.4505380
  84. Robbin Bouwmeester, Keith Richardson, Richard Denny, Ian D. Wilson, Sven Degroeve, Lennart Martens, Johannes PC Vissers. Predicting Ion Mobility Collision Cross Sections and Assessing Prediction Variation by Combining Conventional and Data Driven Modeling. 2023https://doi.org/10.2139/ssrn.4646924
  85. Frank Menger, Alberto Celma, Emma L. Schymanski, Foon Yin Lai, Lubertus Bijlsma, Karin Wiberg, Félix Hernández, Juan V. Sancho, Lutz Ahrens. Enhancing spectral quality in complex environmental matrices: Supporting suspect and non-target screening in zebra mussels with ion mobility. Environment International 2022, 170 , 107585. https://doi.org/10.1016/j.envint.2022.107585
  86. Hiba Mohammed Taha, Reza Aalizadeh, Nikiforos Alygizakis, Jean-Philippe Antignac, Hans Peter H. Arp, Richard Bade, Nancy Baker, Lidia Belova, Lubertus Bijlsma, Evan E. Bolton, Werner Brack, Alberto Celma, Wen-Ling Chen, Tiejun Cheng, Parviel Chirsir, Ľuboš Čirka, Lisa A. D’Agostino, Yannick Djoumbou Feunang, Valeria Dulio, Stellan Fischer, Pablo Gago-Ferrero, Aikaterini Galani, Birgit Geueke, Natalia Głowacka, Juliane Glüge, Ksenia Groh, Sylvia Grosse, Peter Haglund, Pertti J. Hakkinen, Sarah E. Hale, Felix Hernandez, Elisabeth M.-L. Janssen, Tim Jonkers, Karin Kiefer, Michal Kirchner, Jan Koschorreck, Martin Krauss, Jessy Krier, Marja H. Lamoree, Marion Letzel, Thomas Letzel, Qingliang Li, James Little, Yanna Liu, David M. Lunderberg, Jonathan W. Martin, Andrew D. McEachran, John A. McLean, Christiane Meier, Jeroen Meijer, Frank Menger, Carla Merino, Jane Muncke, Matthias Muschket, Michael Neumann, Vanessa Neveu, Kelsey Ng, Herbert Oberacher, Jake O’Brien, Peter Oswald, Martina Oswaldova, Jaqueline A. Picache, Cristina Postigo, Noelia Ramirez, Thorsten Reemtsma, Justin Renaud, Pawel Rostkowski, Heinz Rüdel, Reza M. Salek, Saer Samanipour, Martin Scheringer, Ivo Schliebner, Wolfgang Schulz, Tobias Schulze, Manfred Sengl, Benjamin A. Shoemaker, Kerry Sims, Heinz Singer, Randolph R. Singh, Mark Sumarah, Paul A. Thiessen, Kevin V. Thomas, Sonia Torres, Xenia Trier, Annemarie P. van Wezel, Roel C. H. Vermeulen, Jelle J. Vlaanderen, Peter C. von der Ohe, Zhanyun Wang, Antony J. Williams, Egon L. Willighagen, David S. Wishart, Jian Zhang, Nikolaos S. Thomaidis, Juliane Hollender, Jaroslav Slobodnik, Emma L. Schymanski. The NORMAN Suspect List Exchange (NORMAN-SLE): facilitating European and worldwide collaboration on suspect screening in high resolution mass spectrometry. Environmental Sciences Europe 2022, 34 (1) https://doi.org/10.1186/s12302-022-00680-6
  87. Lidia Belova, Alberto Celma, Glenn Van Haesendonck, Filip Lemière, Juan Vicente Sancho, Adrian Covaci, Alexander L.N. van Nuijs, Lubertus Bijlsma. Revealing the differences in collision cross section values of small organic molecules acquired by different instrumental designs and prediction models. Analytica Chimica Acta 2022, 1229 , 340361. https://doi.org/10.1016/j.aca.2022.340361
  88. Fan Yang, Denice van Herwerden, Hugues Preud’homme, Saer Samanipour. Collision Cross Section Prediction with Molecular Fingerprint Using Machine Learning. Molecules 2022, 27 (19) , 6424. https://doi.org/10.3390/molecules27196424
  89. Jinmei Xia, Wenhai Xiao, Xihuang Lin, Yiduo Zhou, Peng Qiu, Hongkun Si, Xiaorong Wu, Siwen Niu, Zhuhua Luo, Xianwen Yang. Ion Mobility-Derived Collision Cross-Sections Add Extra Capability in Distinguishing Isomers and Compounds with Similar Retention Times: The Case of Aphidicolanes. Marine Drugs 2022, 20 (9) , 541. https://doi.org/10.3390/md20090541
  90. Carter K. Asef, Markace Rainey, Brianna M. Garcia, Goncalo J. Gouveia, Amanda O. Shaver, Franklin E. Leach, Allison M. Morse, Arthur S. Edison, Lauren M. McIntyre, Facundo M. Fernández. Ion Mobility for Unknown Metabolite Identification: Hope or Hype?. 2022https://doi.org/10.1101/2022.08.26.505158
  91. María Moran‐Garrido, Sandra. M. Camunas‐Alberca, Alberto Gil‐de‐la Fuente, Antonio Mariscal, Ana Gradillas, Coral Barbas, Jorge Sáiz. Recent developments in data acquisition, treatment and analysis with ion mobility‐mass spectrometry for lipidomics. PROTEOMICS 2022, 22 (15-16) https://doi.org/10.1002/pmic.202100328
  92. Esra te Brinke, Ane Arrizabalaga-Larrañaga, Marco H. Blokland. Insights of ion mobility spectrometry and its application on food safety and authenticity: A review. Analytica Chimica Acta 2022, 1222 , 340039. https://doi.org/10.1016/j.aca.2022.340039
  93. Amy Li, Kelly M. Hines, Dylan H. Ross, James W. MacDonald, Libin Xu. Temporal changes in the brain lipidome during neurodevelopment of Smith–Lemli–Opitz syndrome mice. The Analyst 2022, 147 (8) , 1611-1621. https://doi.org/10.1039/D2AN00137C
  94. Yue-xin Qian, Dong-xue Zhao, Hong-da Wang, He Sun, Ying Xiong, Xiao-yan Xu, Wan-di Hu, Mei-yu Liu, Bo-xue Chen, Ying Hu, Xue Li, Mei-ting Jiang, Wen-zhi Yang, Xiu-mei Gao. An ion mobility-enabled and high-efficiency hybrid scan approach in combination with ultra-high performance liquid chromatography enabling the comprehensive characterization of the multicomponents from Carthamus tinctorius. Journal of Chromatography A 2022, 1667 , 462904. https://doi.org/10.1016/j.chroma.2022.462904
  95. Vanessa Hinnenkamp, Peter Balsaa, Torsten C. Schmidt. Target, suspect and non-target screening analysis from wastewater treatment plant effluents to drinking water using collision cross section values as additional identification criterion. Analytical and Bioanalytical Chemistry 2022, 414 (1) , 425-438. https://doi.org/10.1007/s00216-021-03263-1
  96. Frank Menger, Alberto Celma, Emma L. Schymanski, Foon Yin Lai, Lubertus Bijlsma, Karin Wiberg, Félix Hernández, Juan Vicente Sancho, Ahrens Lutz. Enhancing Spectral Quality in Complex Environmental Matrices: Supporting Suspect and Non-Target Screening in Zebra Mussels with Ion Mobility. SSRN Electronic Journal 2022, 165 https://doi.org/10.2139/ssrn.4149383
  97. Tlou T. Mosekiemang, Maria A. Stander, André de Villiers. Ultra-high pressure liquid chromatography coupled to travelling wave ion mobility-time of flight mass spectrometry for the screening of pharmaceutical metabolites in wastewater samples: Application to antiretrovirals. Journal of Chromatography A 2021, 1660 , 462650. https://doi.org/10.1016/j.chroma.2021.462650
  98. Katyeny Manuela da Silva, Elias Iturrospe, Joris Heyrman, Jeremy P. Koelmel, Matthias Cuykx, Tamara Vanhaecke, Adrian Covaci, Alexander L.N. van Nuijs. Optimization of a liquid chromatography-ion mobility-high resolution mass spectrometry platform for untargeted lipidomics and application to HepaRG cell extracts. Talanta 2021, 235 , 122808. https://doi.org/10.1016/j.talanta.2021.122808
  99. Emma L. Schymanski, Todor Kondić, Steffen Neumann, Paul A. Thiessen, Jian Zhang, Evan E. Bolton. Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag. Journal of Cheminformatics 2021, 13 (1) https://doi.org/10.1186/s13321-021-00489-0
  100. Armin Sebastian Guntner, Thomas Bögl, Franz Mlynek, Wolfgang Buchberger. Large-Scale Evaluation of Collision Cross Sections to Investigate Blood-Brain Barrier Permeation of Drugs. Pharmaceutics 2021, 13 (12) , 2141. https://doi.org/10.3390/pharmaceutics13122141

Analytical Chemistry

Cite this: Anal. Chem. 2020, 92, 6, 4548–4557
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.analchem.9b05772
Published February 25, 2020

Copyright © 2020 American Chemical Society. This publication is licensed under these Terms of Use.

Article Views

3722

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. (A) Counts and (B) comparison of agreement between measurements present in multiple sources. Agreement for all overlapping CCS values in blue. Agreement between DTIM CCS values in red. Agreement between TWIM CCS values in purple. Agreement between DTIM and TWIM CCS values in gold.

    Figure 2

    Figure 2. PCA projections of full CCS database onto principal axes 1, 2, and 3, colored by data set (A,B) or chemical classification (C,D). Correlation of the top three molecular descriptors contributing to separation along PC1 (E–G) and PC2 (H–J). hac = heavy atom count; m/z = mass to charge ratio; ao = acyclic oxygen count; hbd = H-bond donor atoms; ctv = cyclic trivalent nodes; r6 = 6-membered ring count.

    Figure 3

    Figure 3. PLS-RA projections of full CCS database onto axes 1, 2, colored by data set (A) or chemical classification (B). Correlation between PLS-RA projections along axis 1 and PCA projections along PC1 (C). Correlation between molecular descriptors and PLS-RA projections along axis 1 (blue) or CCS (red) for all compounds (D–K). hac = heavy atom count; m/z = mass to charge ratio; hbam = H-bond acceptor sites; c = carbon atom count; ao = acyclic oxygen count; asb = acyclic single bonds; asv = acyclic single valent nodes; adb = acyclic double bonds.

    Figure 4

    Figure 4. PCA projections of full CCS database onto principal axes 1, 2, and 3, colored by chemical class label (A, B), or by cluster (C, D). (E) Plot of CCS vs m/z for full CCS database, colored by cluster. (F) Central structures within each cluster. (G–I) Average predictive performance of models (lasso, forest, svr, respectively) by MDAE and RMSE from five independent trials, trained on the full CCS database (training set = blue, test set = red) or on individual cluster data sets (training set = purple, test set = gold).

    Figure 5

    Figure 5. Workflow describing the process for training and validating the final prediction model. First, MQNs are generated from the compound SMILES structure in addition to the m/z and MS adduct, and this data is stored in a database. The complete data set is randomly partitioned into a training set and test set, preserving the approximate distribution of CCS values between the two sets. The training set is then fit using K-Means clustering to find the dominant groupings within the data set in terms of chemical similarity. The data from each assigned cluster is then used to train an individual predictive model that is specialized for that group of compounds. Finally, the overall CCS prediction performance of this set of models is validated using the test set data by first assigning each sample to one of the fitted clusters then predicting CCS using the corresponding predictive model.

    Figure 6

    Figure 6. (A–C) Complete performance metrics for final predictive model on training (blue) and test (red) data. (A) R2 (B) mean/median absolute error and root mean squared error (C) proportion of predictions falling within 1, 3, 5, and 10% of reference values. (D–F) Comparison of CCS prediction performance between final model (purple) and DeepCCS (gold) on all data sets used for training DeepCCS. (D) R2 (E) mean/median absolute error and root mean squared error (F) mean/median relative error (MRE and MDRE).

  • References


    This article references 46 other publications.

    1. 1
      Prakash, C.; Shaffer, C. L.; Nedderman, A. Mass Spectrom. Rev. 2007, 26, 340369,  DOI: 10.1002/mas.20128
    2. 2
      Blazenovic, I.; Kind, T.; Ji, J.; Fiehn, O. Metabolites 2018, 8, 31,  DOI: 10.3390/metabo8020031
    3. 3
      Clemmer, D. E.; Hudgins, R. R.; Jarrold, M. F. J. Am. Chem. Soc. 1995, 117, 1014110142,  DOI: 10.1021/ja00145a037
    4. 4
      von Helden, G.; Wyttenbach, T.; Bowers, M. T. Science 1995, 267, 14831485,  DOI: 10.1126/science.267.5203.1483
    5. 5
      McLean, J. A.; Ruotolo, B. T.; Gillig, K. J.; Russell, D. H. Int. J. Mass Spectrom. 2005, 240, 301315,  DOI: 10.1016/j.ijms.2004.10.003
    6. 6
      Kanu, B.; Dwivedi, P.; Tam, M.; Matz, L.; Hill, H. H., Jr. J. Mass Spectrom. 2008, 43, 122,  DOI: 10.1002/jms.1383
    7. 7
      Fenn, L. S.; Kliman, M.; Mahsut, A.; Zhao, S. R.; McLean, J. A. Anal. Bioanal. Chem. 2009, 394, 235244,  DOI: 10.1007/s00216-009-2666-3
    8. 8
      Pringle, S. D.; Giles, K.; Wildgoose, J. L.; Williams, J. P.; Slade, S. E.; Thalassinos, K.; Bateman, R. H.; Bowers, M. T.; Scrivens, J. H. Int. J. Mass Spectrom. 2007, 261, 112,  DOI: 10.1016/j.ijms.2006.07.021
    9. 9
      May, J. C.; McLean, J. A. Anal. Chem. 2015, 87, 14221436,  DOI: 10.1021/ac504720m
    10. 10
      Mesleh, M. F.; Hunter, J. M.; Shvartsburg, A. A.; Schatz, G. C.; Jarrold, M. F. J. Phys. Chem. 1996, 100, 1608216086,  DOI: 10.1021/jp961623v
    11. 11
      Hinnenkamp, V.; Klein, J.; Meckelmann, S. W.; Balsaa, P.; Schmidt, T. C.; Schmitz, O. J. Anal. Chem. 2018, 90, 1204212050,  DOI: 10.1021/acs.analchem.8b02711
    12. 12
      Stow, S. M.; Causon, T. J.; Zheng, X.; Kurulugama, R. T.; Mairinger, T.; May, J. C.; Rennie, E. E.; Baker, E. S.; Smith, R. D.; McLean, J. A.; Hann, S.; Fjeldsted, J. C. Anal. Chem. 2017, 89, 90489055,  DOI: 10.1021/acs.analchem.7b01729
    13. 13
      Shvartsburg, A.; Jarrold, M. F. Chem. Phys. Lett. 1996, 261, 8691,  DOI: 10.1016/0009-2614(96)00941-4
    14. 14
      Kim, H. I.; Kim, H.; Pang, E. S.; Ryu, E. K.; Beegle, L. W.; Loo, J. A.; Goddard, W. A.; Kanik, I. Anal. Chem. 2009, 81, 82898297,  DOI: 10.1021/ac900672a
    15. 15
      Kim, H.; Kim, H. I.; Johnson, P. V.; Beegle, L. W.; Beauchamp, J. L.; Goddard, W. A.; Kanik, I. Anal. Chem. 2008, 80, 1928,  DOI: 10.1021/ac701888e
    16. 16
      Campuzano, I.; Bush, M. F.; Robinson, C. V.; Beaumont, C.; Richardson, K.; Kim, H.; Kim, H. I. Anal. Chem. 2012, 84, 10261033,  DOI: 10.1021/ac202625t
    17. 17
      Larriba, C.; Hogan, C. J., Jr. J. Phys. Chem. A 2013, 117, 38873901,  DOI: 10.1021/jp312432z
    18. 18
      Ewing, S. A.; Donor, M. T.; Wilson, J. W.; Prell, J. S. J. Am. Soc. Mass Spectrom. 2017, 28, 587596,  DOI: 10.1007/s13361-017-1594-2
    19. 19
      Lee, J. W.; Lee, H. H. L.; Davidson, K. L.; Bush, M. F.; Kim, H. I. Analyst 2018, 143, 17861796,  DOI: 10.1039/C8AN00270C
    20. 20
      Zanotto, L.; Heerdt, G.; Souza, P. C. T.; Araujo, G.; Skaf, M. S. J. Comput. Chem. 2018, 39, 16751681,  DOI: 10.1002/jcc.25199
    21. 21
      Colby, S. M.; Thomas, D. G.; Nunez, J. R.; Baxter, D. J.; Glaesemann, K. R.; Brown, J. M.; Pirrung, M. A.; Govind, N.; Teeguarden, J. G.; Metz, T. O.; Renslow, R. S. Anal. Chem. 2019, 91, 43464356,  DOI: 10.1021/acs.analchem.8b04567
    22. 22
      Zhou, Z.; Shen, X.; Tu, J.; Zhu, Z. J. Anal. Chem. 2016, 88, 1108411091,  DOI: 10.1021/acs.analchem.6b03091
    23. 23
      Zhou, Z.; Tu, J.; Xiong, X.; Shen, X.; Zhu, Z. J. Anal. Chem. 2017, 89, 95599566,  DOI: 10.1021/acs.analchem.7b02625
    24. 24
      Bijlsma, L.; Bade, R.; Celma, A.; Mullin, L.; Cleland, G.; Stead, S.; Hernandez, F.; Sancho, J. V. Anal. Chem. 2017, 89, 65836589,  DOI: 10.1021/acs.analchem.7b00741
    25. 25
      Soper-Hopper, M. T.; Petrov, A. S.; Howard, J. N.; Yu, S. S.; Forsythe, J. G.; Grover, M. A.; Fernandez, F. M. Chem. Commun. (Cambridge, U. K.) 2017, 53, 76247627,  DOI: 10.1039/C7CC04257D
    26. 26
      Mollerup, C. B.; Mardal, M.; Dalsgaard, P. W.; Linnet, K.; Barron, L. P. J. Chromatogr., A 2018, 1542, 8288,  DOI: 10.1016/j.chroma.2018.02.025
    27. 27
      Plante, P. L.; Francovic-Fontaine, E.; May, J. C.; McLean, J. A.; Baker, E. S.; Laviolette, F.; Marchand, M.; Corbeil, J. Anal. Chem. 2019, 91, 51915199,  DOI: 10.1021/acs.analchem.8b05821
    28. 28
      Hines, K.; Herron, J.; Xu, L. J. Lipid Res. 2017, 58, 809819,  DOI: 10.1194/jlr.D074724
    29. 29
      Hines, K. M.; Waalkes, A.; Penewit, K.; Holmes, E. A.; Salipante, S. J.; Werth, B. J.; Xu, L. mSphere 2017, 2, e00492-17  DOI: 10.1128/mSphere.00492-17
    30. 30
      Hines, K. M.; Xu, L. Chem. Phys. Lipids 2019, 219, 1522,  DOI: 10.1016/j.chemphyslip.2019.01.007
    31. 31
      Groessl, M.; Graf, S.; Knochenmuss, R. Analyst 2015, 140, 69046911,  DOI: 10.1039/C5AN00838G
    32. 32
      Hines, K. M.; Ross, D. H.; Davidson, K. L.; Bush, M. F.; Xu, L. Anal. Chem. 2017, 89, 90239030,  DOI: 10.1021/acs.analchem.7b01709
    33. 33
      May, J. C.; Goodwin, C. R.; Lareau, N. M.; Leaptrot, K. L.; Morris, C. B.; Kurulugama, R. T.; Mordehai, A.; Klein, C.; Barry, W.; Darland, E.; Overney, G.; Imatani, K.; Stafford, G. C.; Fjeldsted, J. C.; McLean, J. A. Anal. Chem. 2014, 86, 21072116,  DOI: 10.1021/ac4038448
    34. 34
      Paglia, G.; Williams, J. P.; Menikarachchi, L.; Thompson, J. W.; Tyldesley-Worster, R.; Halldorsson, S.; Rolfsson, O.; Moseley, A.; Grant, D.; Langridge, J.; Palsson, B. O.; Astarita, G. Anal. Chem. 2014, 86, 39853993,  DOI: 10.1021/ac500405x
    35. 35
      Zheng, X.; Aly, N. A.; Zhou, Y.; Dupuis, K. T.; Bilbao, A.; Paurus, V. L.; Orton, D. J.; Wilson, R.; Payne, S. H.; Smith, R. D.; Baker, E. S. Chem. Sci. 2017, 8, 77247736,  DOI: 10.1039/C7SC03464D
    36. 36
      Zhou, Z.; Xiong, X.; Zhu, Z. J. Bioinformatics 2017, 33, 22352237,  DOI: 10.1093/bioinformatics/btx140
    37. 37
      Nichols, C. M.; May, J. C.; Sherrod, S. D.; McLean, J. A. Analyst 2018, 143, 15561559,  DOI: 10.1039/C8AN00056E
    38. 38
      Righetti, L.; Bergmann, A.; Galaverna, G.; Rolfsson, O.; Paglia, G.; Dall’Asta, C. Anal. Chim. Acta 2018, 1014, 5057,  DOI: 10.1016/j.aca.2018.01.047
    39. 39
      Leaptrot, K. L.; May, J. C.; Dodds, J. N.; McLean, J. A. Nat. Commun. 2019, 10, 985,  DOI: 10.1038/s41467-019-08897-5
    40. 40
      Blaženović, I.; Shen, T.; Mehta, S. S.; Kind, T.; Ji, J.; Piparo, M.; Cacciola, F.; Mondello, L.; Fiehn, O. Anal. Chem. 2018, 90 (18), 1075810764,  DOI: 10.1021/acs.analchem.8b01527
    41. 41
      Hines, K. M.; May, J. C.; McLean, J. A.; Xu, L. Anal. Chem. 2016, 88, 73297336,  DOI: 10.1021/acs.analchem.6b01728
    42. 42
      Reymond, J.-L.; Awale, M. ACS Chem. Neurosci. 2012, 3, 649657,  DOI: 10.1021/cn3000422
    43. 43
      Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. J. Mach Learn Res. 2011, 12, 28252830
    44. 44
      Picache, J. A.; Rose, B. S.; Balinski, A.; Leaptrot, K. L.; Sherrod, S. D.; May, J. C.; McLean, J. A. Chem. Sci. 2019, 10, 983993,  DOI: 10.1039/C8SC04396E
    45. 45
      Hernandez-Mesa, M.; Le Bizec, B.; Monteau, F.; Garcia-Campana, A. M.; Dervilly-Pinel, G. Anal. Chem. 2018, 90, 46164625,  DOI: 10.1021/acs.analchem.7b05117
    46. 46
      Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W. M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R. Metabolomics 2007, 3 (3), 211221,  DOI: 10.1007/s11306-007-0082-2
  • Supporting Information

    Supporting Information


    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.9b05772.

    • All code used to assemble the CCS database and train predictive models is available on GitHub (https://github.com/dylanhross/c3sdb); SI includes Experimental Section and schema; MS adduct encodings, molecular quantum numbers (MQNs), top three features contributing to separation along PC3, CCS prediction accuracy of LipidCCS on different chemical classes, and feature selection trials; and additional Results and Discussion (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.