ACS Publications. Most Trusted. Most Cited. Most Read
Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data
My Activity
  • Open Access
Article

Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data
Click to copy article linkArticle link copied!

  • Helen Sepman
    Helen Sepman
    Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, Sweden
    Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, Sweden
    More by Helen Sepman
  • Louise Malm
    Louise Malm
    Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, Sweden
    More by Louise Malm
  • Pilleriin Peets
    Pilleriin Peets
    Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, Sweden
  • Matthew MacLeod
    Matthew MacLeod
    Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, Sweden
  • Jonathan Martin
    Jonathan Martin
    Science for Life Laboratory, Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, Sweden
  • Magnus Breitholtz
    Magnus Breitholtz
    Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, Sweden
  • Anneli Kruve*
    Anneli Kruve
    Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, Sweden
    Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, Sweden
    *Email: [email protected]
    More by Anneli Kruve
Open PDFSupporting Information (1)

Analytical Chemistry

Cite this: Anal. Chem. 2023, 95, 33, 12329–12338
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.analchem.3c01744
Published August 7, 2023

Copyright © 2023 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0 .

Abstract

Click to copy section linkSection link copied!

Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation (MS2) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS2 spectra of unidentified chemicals using SIRIUS+CSI:FingerID software. The root mean square errors of the training and test sets were 0.55 (3.5×) and 0.80 (6.3×) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2× to 6×. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4×, a geometric mean of 4.5×, and a median of 4.0×. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5×, a geometric mean of 5.6×, and a median of 5.2× on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. The MS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening.

This publication is licensed under

CC-BY 4.0 .
  • cc licence
  • by licence
Copyright © 2023 The Authors. Published by American Chemical Society

Introduction

Click to copy section linkSection link copied!

Advances in liquid chromatography (LC) coupled to high-resolution mass spectrometry (HRMS) by electrospray ionization (ESI) have revolutionized the detection of unknown chemicals in the environment. Utilizing advanced computational tools and spectral libraries to identify or tentatively annotate the structure of the detected chemicals has facilitated a shift from targeted analysis toward nontarget screening (NTS). NTS with LC-HRMS enables the detection of previously overlooked and emerging contaminants and their transformation products; (2−6) however, to assess the risk posed by chemicals in the environment, it is necessary to know both their intrinsic toxicity and concentration. (1,7)
The concentration of an identified chemical can be accurately determined by calibrating to an analytical standard. However, quantification is more challenging if analytical standards are not available as the detected signal intensity is poorly correlated to the concentration across different chemicals. The reason for this is that ionization efficiency in ESI, and therefore the response factor described by the slope of the calibration graph for individual chemicals can differ by orders of magnitude. (8−10) The electrospray ionization mechanism is ambiguously understood; however, intrinsic properties such as hydrophobicity, (8,11,12) proton affinity in gas (13) and liquid phase, (14,15) as well as properties of the mobile phase such as solvent evaporation rate, (16,17) pH, (18,19) and additive type (20) can influence the response factor of a chemical. In addition, the design of the ESI source (17) and instrumental parameters (21) can have an impact on the ionization process.
Determining the response factor of detected chemicals present in the sample is crucial to pinpoint the chemicals posing the highest risk due to high exposure to these substances. Furthermore, these chemicals can be prioritized for identification and quantification. Recently, several approaches for the quantification of chemicals that are tentatively identified with candidate structures in suspect or nontarget screening have been proposed. First, the calibration graph of structurally similar chemicals, (23) such as parent compound for transformation products (24,25) or a homologue, (26) can be used to quantify chemicals detected and identified with NTS. Second, calibration with closely eluting chemicals (27) may be utilized. Lastly, machine learning models can be trained to predict response factors and this prediction can be further used for quantification. (10,28−35) Such quantification methods enable the estimation of concentration as well as prioritization and typically have errors from 2× to 6×. (10,25,27−31,36)
Previous machine learning based quantification approaches require that a candidate structure is first assigned from the NTS data processing, then molecular descriptors are computed for this structure using simplified molecular-input line-entry system (SMILES) notation. The molecular descriptors calculated from the structure are model-specific and may include physicochemical properties, (31) Pharmaceutical Data Exploration Laboratory (PaDEL) descriptors, (10,37) and/or different structural fingerprints. (36) Such machine learning models predict relative ionization efficiency for the candidate structures which need to be further translated into instrument- and method-specific response factors with the help of a set of calibration compounds. Most quantification approaches require a single candidate structure that is assumed to be accurate. However, the variation in rates of correctly identified structures is dependent on the workflow, data quality, and available databases. (4,38−41) For example, Wang et al. detected 335 potential organic micropollutants with suspect and nontarget screening, while 133 candidate structures were successfully confirmed with analytical standards. (38) Furthermore, the fraction of LC-HRMS peaks (features with retention time and two-dimensional MS information) that remain unidentified generally surpasses the number of annotated peaks. (2,4,42,43) As an example, Papazian et al. (4) managed to annotate 17% of 60,300 detected molecular features in air samples from the Indian subcontinent, achieving this high number of annotations by using both liquid and gas chromatography as well as in silico structural predictions. Some of the previously developed quantification approaches allow predictions both for structurally identified as well as unidentified chemicals. For example, Pieke et al. (27) used close eluting standards to quantify detected chemicals with an error up to 4× while Kruve et al. (24) observed a mean error of 3.3× and a maximum error of 88×. Additionally, Groff et al. (44) recently evaluated a bounded response factor method where quantile estimates from the distribution of response factors for standards are used to estimate concentration yielding errors up to 150× for positive mode ESI. Machine learning models taking advantage of empirical analytical information of detected chemicals have the potential to overcome high prediction errors for structure-free quantification. Recently, Palm and Kruve (22) showed that a combination of retention time, exact mass, and the response ratio of peak areas in positive and negative mode can be used to predict the ionization efficiency of chemicals in surface water samples with a mean error of 10× using a machine learning model. Although promising, the approach requires measuring one sample with three sets of chromatographic conditions, as well as utilizing both positive and negative ESI mode.
MS2 spectra carry structurally relevant information about functional groups, (45,46) which further provide information about the compounds’ polarity as well as acid–base properties. A machine learning model, MS2Tox, was recently developed to predict toxicity (LC50 values) of unidentified compounds based on structural fingerprints calculated from the MS2 spectrum. (47) Here, we exploit the same principle and develop a novel quantification approach for unidentified chemicals detected with NTS LC-HRMS using only the mobile phase, MS1 and MS2 spectra. SIRIUS+CSI:FingerID (41,45,48−50) was used to predict the probability that structural fingerprints are present in a chemical from measured MS2 spectrum. To predict ionization efficiency from the structural fingerprints, a unified dataset of 1191 unique known chemicals compiled from 13 datasets measured on 13 instruments was used. We compare our MS2-based quantification approach, MS2Quant, against the best performing comparable candidate structure-based model we could develop, a PaDEL based model, for quantification of 39 chemicals that were used in a NORMAN interlaboratory comparison. We show that ionization efficiency predictions from MS2 data are comparable with structure-based predictions and provide a possibility to quantify the exposure of unidentified compounds in LC-HRMS analysis.

Materials and Methods

Click to copy section linkSection link copied!

Data for Training the Ionization Efficiency Model

The final dataset contains in total 1191 unique compounds and 6049 datapoints measured with 13 different instruments with different types of electrospray ionization sources representing differences in experimental conditions. The range of log IE values in the final dataset was −1.49 to 7.49. Details about compiling a dataset with unified log IE values can be found in Supporting Information Chapter S1, Code S1, and Tables S1 and S2.

Calculation of Descriptors

Molecular, structural, and eluent descriptors were used to develop ionization efficiency prediction models. For the development of MS2Quant, a combined set of structural fingerprints (Chemistry Development Kit (CDK) substructure fingerprints, PubChem CACTVS fingerprints, Klekota-Roth fingerprints, (51) FP3 fingerprints and Molecular ACCess System (MACCS) fingerprints; (52) altogether 1263 descriptors) was used. This combined set of structural fingerprints (further referred to as “structural fingerprints” for better reading) give information about functional groups in the structure and can be calculated either from structure or from MS1 and MS2 spectra with SIRIUS+CSI:FingerID identification software. All structural fingerprints for chemicals used in the training and testing of the model were calculated using “rcdk” library and define structural keys of different size bits. (53) Performance of other tested descriptors can be found in Table S3 and Chapter S2. Additionally, eluent descriptors such as organic modifier percentage, aqueous pH, polarity index, surface tension, and viscosity were added to modeling data due to the known strong effect of environment on ionization efficiency in electrospray ionization processes. These descriptors have been shown to have a strong impact on ionization efficiency in prediction models. (10,18−20)

Data Preprocessing

All features (columns with descriptor values) with more than 10 missing values per descriptor were removed from the dataset, resulting in 633, 1267, 1024, 1024, and 1263 descriptors for PaDEL, Mordred, ECFP2, MAP4, and structural fingerprints, respectively. Similarly, features with near-zero variance were removed from the dataset with the frequency cutoff value of 80/20, leaving 544, 968, 22, 1024, and 184 descriptors for PaDEL, Mordred, ECFP2, MAP4, and structural fingerprints, respectively. Pair-wise correlations were reduced by removing columns with the largest mean absolute correlation in pairs using the correlation cut-off value of 0.75. After preprocessing, the number of descriptors left were 144 for PaDEL, 175 for Mordred, 20 for ECFP2, 630 for MAP4, and 117 for structural fingerprints.

Modeling Parameters

The ionization efficiency data were divided into the training and test set with a ratio of 80/20, giving a training and test set of 4654 and 1395 datapoints. Splitting was performed based on InChIs to avoid having the same compound measured under different conditions in both sets. Extreme gradient boosting-based algorithms, in which ensemble models are trained additively, (54,55) were tested as these have found use in modeling with structural fingerprints. (47) Using caret R-package, tree-based (xgbTree), linear function (xgbLinear), and dropout additive regression trees (xgbDART) extreme gradient boosting-based algorithms were tested. The hyperparameters were optimized with the “boot” resampling method using 5-fold cross-validation. Additionally, y-randomization analysis was performed to MS2Quant which proved the model predictions to be better than random (RMSE of training and test set of 1.107 (12.8×) and 1.144 (13.9×) log-units, respectively; R2 of 0.01).
The model performance was evaluated using the root mean square errorof log IE ofthe training and test set as well as mean and median of fold errors that were calculated for each datapoint by the following formula:
errorprediction={xpredictedxreal,ifxpredicted>xrealxrealxpredicted,otherwise
(1)
where x corresponds to log IE or concentration for models’ development or validation, respectively. Performance of all developed models can be found in Table S3. Data and codes that were used for modeling can be found on on GitHub (https://github.com/kruvelab/MS2Quant).

Chemicals Used in Validation and Fingerprint Prediction from MS2 Data

Detailed overview of NORMAN interlaboratory comparison, chemicals used in this study, and experimental conditions can be found in Supporting Information Tables S4 and S5 and Chapter S3. Detailed information about how SIRIUS+CSI:FingerID was used for calculating structural fingerprints and identification results can be found in Chapter S4 and Table S8.

Converting Predicted Response Factor to the Predicted Ionization Efficiency

To convert a predicted ionization efficiency value to an instrument and measurement specific response factor, calibration of the model is performed by measuring calibrants during the same experimental run with suspects. To predict response factor of a suspect chemical, the ionization efficiency is predicted and converted to the response factor using the regression obtained from calibration compounds using the following equation:
logRFpredicted=logIEpredictedinterceptslope
(2)

Results and Discussion

Click to copy section linkSection link copied!

Model Development

MS2Quant has the advantage of estimating concentrations for both identified and unidentified chemicals from nontarget LC-HRMS analysis. In the case of a known or tentatively identified structure, the SMILES notation of a chemical can be used to calculate the structural fingerprints. For unidentified chemicals, the MS2 spectra are first used to predict the probability of presence or absence of structural fingerprints, thereby providing insight into properties of the chemical. (41,47) To evaluate the suitability of structural fingerprints for predicting the ionization efficiency, we trained and validated the MS2Quant model based on structural fingerprints and eluent descriptors. For model training, the SMILES notation of 952 chemicals was used to calculate associated structural fingerprints, and three machine learning algorithms were used for training (xgbTree, xgbLinear, and xgbDART). The models’ performances were evaluated based on a test set of 239 previously measured chemicals. The highest predictive power on the test set was observed for the xgbTree training algorithm. MS2Quant resulted in root mean square errors (RMSE) of 0.55 and 0.80 log-units for the training and test set, respectively. These RMSE values correspond to 3.5× and 6.3× fold errors; see Figure 1A. The mean, geometric mean, and median prediction errors for the test set calculated based on eq 1 were 15.4×, 4.3×, and 3.2×, respectively. This provides significant advances in ionization efficiency predictions compared to the whole range of 100,000,000× for ionization efficiency values within the training dataset used in this work.

Figure 1

Figure 1. Training (gray) and test (green) sets of two best performing models trained with the xgbTree algorithm and based on (A) structural fingerprints in MS2Quant and (B) on PaDEL descriptors. (C) General modeling workflow used here. For all 1191 chemicals, molecular descriptors/fingerprints were calculated from the structure and 80% of the data (training set) was used for modeling. To clean the descriptors, features with more than 10 missing values were removed. Additionally, features with near-zero variance (cut-off 80/20) and pair-wise correlation (cut-off 0.75) were removed. The training set chemicals were then used for modeling and the performance was assessed based on RMSE and fold prediction errors of the test set.

In comparison, a previously published prediction model by Liigand et al. (10) resulted in root mean square errors of 1.9× and 3.0× on the training and test set, respectively. The latter model included 3139 datapoints measured under various eluent compositions and was validated on a set of 35 chemicals with a mean prediction error of 5.4×; however, a direct comparison with the model is impossible as MS2Quant is trained on a significantly larger, more heterogenous, set of chemicals (n = 954 vs n = 353) from 13 datasets. To evaluate the impact of selection of molecular features on ionization efficiency prediction accuracy, different models using molecular features were trained. Namely, models using PaDEL, Mordred, ECFP2, and MAP4 descriptors were considered. The highest predictive power was observed for PaDEL descriptors with the xgbTree training algorithm, with a RMSE for the training set of 0.56 log-units (3.6×) and the RMSE for the test set of 0.81 log-units (6.5×); see Figure 1B. The mean, geometric mean, and median prediction errors calculated for the test set by eq 1 were 11.7×, 4.4×, and 3.6×, respectively. The difference in the performances of the latter and MS2Quant models was insignificant. This indicates that structural fingerprints can provide similar information about the ionization efficiency of the chemicals as the continuous or hashed molecular features. For a comprehensive comparison of all trained models, please see Table S3.
In principle, both PaDEL descriptors and structural fingerprints (MS2Quant) have a similarly good starting point for ionization efficiency predictions due to the overlap in information incorporated by both features. For example, PaDEL descriptors include information about numbers of hydrogen bond donors and acceptors, solute hydrogen bond basicity and acidity, and atom counts, e.g., for nitrogen which is often the favored protonation site. Certain functional groups described by structural fingerprints in MS2Quant may account for similar information, such as carbonyl or primary, secondary, and tertiary amines, and therefore structural information is similarly beneficial for predicting the ionizability of a compound.
For validation of MS2 spectra-based quantification, the test and training sets were merged and an updated MS2Quant model was trained. Also, a new model based on PaDEL descriptors for structure-based quantification was trained. The model trained on all datapoints using the respective previously optimized hyperparameters is available in the MS2Quant R-package in GitHub.

MS2Quant Performance in NTS Workflow on NORMAN Interlaboratory Comparison Samples

MS2Quant was validated under environmentally relevant conditions. Briefly, the surface water matrix was spiked with a mixture of relevant water pollutants covering ionization efficiency values over more than four orders of magnitude with the peaks spread out over the whole reverse phase chromatography run. MS2 data were acquired in data-dependent acquisition mode with a target inclusion list. The calibration solutions and high and low concentration spiked lake water samples were obtained from NORMAN interlaboratory comparison on quantification in NTS LC-HRMS. (56,57)
For the 36 calibration compounds, molecular fingerprints were computed from SMILES and used to predict ionization efficiency with MS2Quant. Only chemicals observed as protonated molecules or permanently positively charged were considered, as all the training data for predictive ionization efficiency model use these ions exclusively. Measured response factors and predicted ionization efficiency values were correlated (R2 = 0.40, p = 4.0 × 10–5) with a residual standard error of 0.85; see Figure 2A.

Figure 2

Figure 2. Workflow for validation of MS2Quant on NORMAN interlaboratory comparison samples. (A) Molecular fingerprints were computed for 36 chemicals in the calibration mix from SMILES notation with the rcdk package in R. Furthermore, MS2Quant was used to predict ionization efficiency values and linear regression was fit between experimental logarithmic response factors and logarithmic predicted ionization efficiencies. (B) Lake water spiked with 39 suspect compounds in high and low concentrations was measured with LC-HRMS in data-dependent acquisition mode with an inclusion list. SIRIUS+CSI:FingerID was used to predict probabilities of structural fingerprints from MS1 and MS2 spectra and MS2Quant was used to predict ionization efficiencies from these predicted probabilities. Thereafter, the linear regression from calibration compound was used to convert the predicted ionization efficiency values to instrument- and method-specific predicted response factors. Concentrations of suspect chemicals were found using predicted response factors as well as integrated areas from LC-HRMS analysis and was compared to the spiked concentrations. For comparison with PaDEL-based quantification, a similar workflow was used with the PaDEL descriptor-based prediction model instead of MS2Quant and identification of suspects was performed with SIRIUS+CSI:FingerID where the top assigned structure was used for ionization efficiency predictions.

The MS2 spectra were recorded for 39 suspects and were used alongside MS1 spectra to predict the probability of structural fingerprints with SIRIUS+CSI:FingerID (41) for each chemical. Thereafter, the fingerprints were used to predict the ionization efficiency of the chemicals which were converted to predicted response factor values. The suspects were quantified in two lake water samples from NORMAN interlaboratory comparison, spiked at high and low concentrations. The predicted concentrations ranged from 1.3 × 10–8 to 1.6 × 10–5 M and were similar to the real spiked concentrations for the suspects with the range of 6.6 × 10–9–2.9 × 10–6 M.
For validation, the predicted concentrations were compared to spiked concentrations. Generally, the estimated concentrations were over-predicted, as for 78% of datapoints, the predictions exceeded the real concentrations; see Figure 2B. The RMSE between real and predicted concentrations was 5.9×, which was lower than the RMSE of the test set observed for MS2Quant model development (6.3×). The mean prediction error observed for MS2Quant was 7.4×, geometric mean 4.5×, and median error 4.0×, indicating similar performance to the model developed by Liigand et al. (10) which had 5.4× mean prediction error for a validation set of 35 compounds. The compounds with the highest prediction error were omethoate (47.7× and 38.2× for low and high spike, respectively) and metformin (44.6× and 27.8× for low and high spike, respectively).
It is important to note that 26 validation set chemicals were also present in the final MS2Quant training data. For the 13 validation set chemicals that were not present in the training set of MS2Quant, the mean, geometric mean, and median errors were 5.5×, 4.0×, and 3.9×, respectively. Meanwhile, for the 26 chemicals present in the model, the respective errors were 8.3×, 4.7×, and 4.0×. The errors are slightly smaller for chemicals that were not present in the model; however, the differences are minor.

Comparison of MS2-Based Quantification and Suggested Structure-Based Quantification

MS2Quant was compared with the PaDEL-based ionization efficiency prediction model developed here, which uses the structural assignment as the basis for quantification. The PaDEL ionization efficiency prediction model was trained on the same chemical space as MS2Quant to allow a fair comparison. Additionally, the structure-based model developed by Liigand et al. (10) was used to compare the application domains. For this, the structural assignments were generated for each suspect LC-HRMS peak with SIRIUS + CSI:FingerID using the same MS2 spectra that were previously used as input for MS2Quant. Based on the SMILES of the top structural candidate, PaDEL descriptors were computed and used to predict the ionization efficiencies of the structural candidates and the ionization efficiencies were further used to predict the concentrations assuming that the ionization efficiency of the top structure represents the correct chemical. The summary models’ performances on the validation set and a graphical comparison can be found in Tables S6 and S7.
In general, a slightly higher mean, geometric mean, and median error of 9.5×, 5.6×, and 5.2×, respectively, were observed for PaDEL-based quantification when using the top assigned structure as input; however, based on the pair-wise Wilcoxon ranked sum exact test, the difference in quantification errors were statistically insignificant (p-value = 0.13). A similar performance was observed for MS2Quant as well as the model developed by Liigand et al. when top assigned structures were used as input; see Table 1. It is important to note that the training data used to develop MS2Quant- and PaDEL-based model included 27 validation set compounds, while the training data for the model developed by Liigand et al. (10) only included 4 validation set chemicals, which can explain some differences in the performances of these models.
Table 1. Comparison of the Performance of Four Quantification Models on the Validation Set Suspect Compounds Spiked in the Surface Watera
  MS2Quant (MS2)MS2Quant (structure)PaDEL-based model developed here (954 chemicals)PaDEL-based model developed by Liigand et al. (353 chemicals)
results of “true NTS” (MS2Quant from MS2, others with top suggested structure) (39 chemicals)RMSE5.857.297.427.26
R20.460.370.470.49
Mean7.409.519.518.99
Geom. mean4.455.445.635.40
Q252.162.482.292.27
Q50 (Median)4.024.575.194.87
Q758.2710.5412.7413.08
Q9017.4325.3126.0920.36
Q100 (Max)47.6855.2654.9145.87
correct SMILES is used for quantifying suspects (39 chemicals)RMSE 6.777.057.99
R2 0.420.540.46
Mean 8.188.619.89
Geom. mean 5.295.446.00
Q25 2.422.602.35
Q50 (Median) 5.785.496.56
Q75 9.719.4715.20
Q90 20.7823.6920.87
Q100 (Max) 38.7335.8755.26
only suspects that were correctly identified (34 chemicals)RMSE6.127.577.647.34
R20.430.340.440.48
Mean7.809.919.818.83
Geom. mean4.675.695.835.52
Q252.282.532.372.30
Q50 (Median)4.095.275.206.61
Q758.3610.6713.3513.09
Q9017.9525.7826.0919.48
Q100 (Max)47.6855.2654.9141.25
only suspects that were incorrectly identified (5 chemicals)RMSE4.155.516.026.73
R20.680.610.650.55
Mean4.666.787.4610.09
Geom. mean3.204.004.474.63
Q251.721.991.712.29
Q50 (Median)2.452.965.123.45
Q754.706.686.414.91
Q9011.0217.6419.2432.19
Q100 (Max)15.7125.1427.4245.87
only incorrectly identified suspects, but the correct SMILES was used for quantification (5 chemicals)RMSE 4.396.296.77
R2 0.590.720.54
Mean 4.937.068.52
Geom. mean 3.384.905.41
Q25 1.842.002.67
Q50 (Median) 2.166.944.69
Q75 6.149.467.64
Q90 11.0412.9721.90
Q100 (Max) 15.7318.4831.21
a

The quantification was performed with MS2Quant using MS1 and MS2 spectra as input, MS2Quant using SMILES notation as input, PaDEL-based model developed in this work using SMILES as input and PaDEL based model developed by Liigand et al. (10) using SMILES notation as the input.

Out of 39 suspect compounds for which fragmentation spectra were acquired in DDA, 34 compounds were identified correctly as top assigned structures. Two detected LC-HRMS peaks belonging to carbamazepine-10,11-epoxide and 5-chlorobenzotriazole had the correct structure ranked second. For two other peaks corresponding to sebuthylazine and dazomet, the correct structure ranked third. Additionally, one peak corresponding to sudan I was correctly identified only as top 223 structure. The correct and assigned top structures can be found in Table S8.
For five chemicals with incorrectly assigned top structure, the mean, geometric mean, and median prediction error with MS2Quant calculated from MS2 were lower compared to other models; however, the differences were statistically insignificant according to the pair-wise Wilcoxon rank sum exact test (p-value = 0.44). MS2 spectra-based quantification results with MS2Quant ranged between 1.2× and 15.7×, with the PaDEL-based model developed here between 1.3× and 27.4× and with model developed by Liigand et al. between 1.4× and 45.9×.
Generally, in five cases of incorrect structural assignment, both MS2Quant and the PaDEL-based model developed here over-estimated the concentrations, see Figure 3. Still, MS2Quant yielded concentrations closer to spiked concentrations in four out of five cases. Only for dazomet a lower prediction error was observed with the PaDEL-based prediction model even when using the wrong structural assignment. Using PaDEL descriptors of the wrong structural assignment of dazomet yielded a 1.8× error while MS2Quant yielded an error of 2.1×. The results from validation indicate that incorrect structural assignment can yield similar or higher prediction error compared to using MS2 spectra directly for quantification. In general, the performance of MS2Quant that is independent of results of identification is comparable to structure-based methods in use.

Figure 3

Figure 3. Predicted concentrations for high concentration spiked sample with MS2Quant and the PaDEL-based model for five incorrectly identified compounds. Real concentrations are marked with a vertical line.

Analysis of Key Features Learned by MS2Quant

The features with highest importance for MS2Quant ionization efficiency predictions were investigated using variable importance and SHapely Additive exPlanations (SHAP) values. Firstly, the eluent descriptors are significant as all four descriptors (polarity index, surface tension, viscosity, and pH of the aqueous phase) were among the top 10 most impactful features. As seen in Figure 4A,B, the lower polarity index and surface tension result in higher ionization efficiency, which agrees with previously reported findings by Liigand et al. (10) for a PaDEL descriptor-based model. Although continuous eluent descriptors offer more potential splits for tree-based algorithms compared to binary fingerprints which can result in higher variable importance, eluent descriptors were also seen as the top 10 most impactful features for the PaDEL-based model trained here; see Table S9 and Figure S1 for detailed analysis. This similarity in feature importance facilitates that accounting for mobile phase composition in ionization efficiency predictions is of high importance, as also observed previously. (18,19,29) Second, chemical properties associated with basicity of the chemical were among the 10 most impactful features. This is visible through nitrogen containing fingerprints referring to basic functional groups that, when present, facilitate a higher predicted ionization efficiency value. This is also known from previous studies where chemicals which are already charged in the mobile phase tend to have higher ionization efficiency. (58) Third, two fingerprints describing more than two six-member rings and secondary carbon were influential in predicting the ionization efficiency. These fingerprints are possibly accounting for hydrophobicity of the compound. Generally, previous studies show that chemicals with larger hydrophobic moieties possess higher ionization efficiency both in positive (10) and negative mode. (35) It is important to mention that in the case of the presence of a structural fingerprint that describes more than one functional group, any of the functional groups are possible and the exact structure cannot be deducted.

Figure 4

Figure 4. (A) Top 10 most influential variables in the model and their normalized importance (%); (B) SHAP values representing influence of each top 10 feature and their marginal contribution to the prediction and (C) the test set chemicals assigned to different classes by ClassyFire, where each datapoint represents the geometric mean prediction error of log IE of a unique chemical. The classes are in the descending order based on median geometric mean prediction error of all compounds in the group and only classes with three or more unique representatives were plotted.

To understand the impact of chemical properties to the prediction accuracy, ClassyFire (59) was used for automated classification of chemicals in the test set. This resulted in classification of test set compounds into 14 superclasses and 121 classes. The geometric mean prediction error was calculated for each unique test set chemical and their distribution for all classes with three or more unique representatives can be seen in Figure 4C. Analysis of 17 classes with three or more representative chemicals showed significant differences between classes based on the Kruskal–Wallis rank sum test (p-value = 2.2 × 10–16). A pairwise comparisons using the Wilcoxon rank sum test with continuity correction indicated significant differences between all groups except for azobenzenes and fatty acyls. The median prediction error was lower than 10× for all 17 classes.

Limitations

The MS2Quant quantification tool was developed to enable concentration estimations for unidentified chemicals based on information that can be extracted from MS2 data. This approach uses SIRIUS+CSI:FingerID to estimate probabilities of presence or absence of structural fingerprints. In order to extract meaningful structural information, the MS2 spectra that are used as input to SIRIUS must include high mass accuracy data and contain sufficient number of peaks which can be achieved by averaging fragmentation spectra over multiple collision energies. (41)
Depending on the sample, matrix effects can occur and affect the response of the chemicals. In target analysis, this could be corrected by isotope labeled standards that match the analyte; however, this is impossible for unknown chemicals. In a previous study by Wang et al. (60) it was observed that the model prediction error significantly dominates over the error arising from the matrix effects.
In order to use MS2Quant for quantification, a set of calibration chemicals that cover a wide range of ionization efficiencies needs to be measured together with the sample. MS2Quant can be used to estimate the concentration within the chemical space and ionization efficiency range of training set chemicals used in modeling.

Conclusions

Click to copy section linkSection link copied!

A concentration prediction model MS2Quant was developed to estimate the exposure to unidentified chemicals detected with LC-HRMS. MS2Quant was tested and validated on positive mode electrospray ionization data from NORMAN network’s interlaboratory comparison and an accuracy comparable to structure-based methods was observed. The future prospects include development and validation of a complementary negative mode electrospray ionization efficiency prediction model to allow exposure estimations in both ESI modes.
Implementation of MS2Quant in NTS workflow allows giving an estimation for exposure of unidentified chemicals that are otherwise discarded from the analysis. Furthermore, it can be used for pinpointing new, emerging contaminants in risk-based prioritization in a rapid manner alone or in combination with toxicity evaluation. Recently, a MS2Tox machine learning model has been proposed by our group to aid fish LC50 predictions. In combination, exposure and toxicity predictions can be used to evaluate the risk of each chemical detected with LC-HRMS. In the future, it is of interest to evaluate if the LC-HRMS peaks with the highest risk are identified in NTS and use the predicted risk to gear the identification toward peaks with the highest impact on the total risk possessed by the sample. To use MS2Quant for quantification, a set of calibrants needs to be measured together with the sample to relate the predicted ionization efficiencies to instrument- and method-specific response factors. By providing experimental data for calibrants and unidentified chemicals, LC-HRMS features can be quantified using the pretraied MS2Quant model. This novel quantification method is openly available as an R-package MS2Quant on GitHub (https://github.com/kruvelab/MS2Quant).

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.3c01744.

  • Data unification process in detail; example code how the data were unified based on either dataset 1 or a unified dataset; overview of datasets containing metadata and ionization efficiency information used for modeling; comparison between all tested molecular descriptors or fingerprints and machine learning algorithms; overview of the calibrants and suspects used in validation and experimental conditions; a statistical and graphical overview of MS2Quant and structure-based models’ performances on the validation set; overview of incorrectly identified structures and their highest ranked assigned structure by SIRIUS+CSI:FingerID; SIRIUS calculations and parameters used; top 10 most influential variables in a PaDEL-based model developed here; top 10 most influential variables, their SHAP values, and error distribution of different chemical classes assigned by ClassyFire for PaDEL-based model developed here; and first decision three of xgbTree algorithm-based models developed using structural fingerprints and PaDEL descriptors (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
    • Anneli Kruve - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, SwedenDepartment of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0001-9725-3351 Email: [email protected]
  • Authors
    • Helen Sepman - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, SwedenDepartment of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, Sweden
    • Louise Malm - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, Sweden
    • Pilleriin Peets - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, Sweden
    • Matthew MacLeod - Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0003-2562-7339
    • Jonathan Martin - Science for Life Laboratory, Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0001-6265-4294
    • Magnus Breitholtz - Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0002-4984-8323
  • Author Contributions

    H.S. and A.K. designed the research study. H.S. and P.P. developed the method and wrote the code. H.S. and L.M. performed the measurements. H.S., A.K., M.M. contributed to the data analysis. H.S. and A.K. wrote the paper and all authors read, contributed to, and approved the manuscript. A.K., J.M., M.M., M.B. acquired funding for the project.

  • Funding

    The funding has been generously provided by Swedish Research Council for Sustainable Development grant 2020-01511.

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

The authors would like to thank Drew Szabo for proofreading the manuscript.

References

Click to copy section linkSection link copied!

This article references 60 other publications.

  1. 1
    McCord, J. P.; Groff, L. C.; Sobus, J. R. Quantitative Non-Targeted Analysis: Bridging the Gap between Contaminant Discovery and Risk Characterization. Environ. Int. 2022, 158, 107011  DOI: 10.1016/j.envint.2021.107011
  2. 2
    Schymanski, E. L.; Singer, H. P.; Slobodnik, J.; Ipolyi, I. M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; Thomaidis, N. S.; Bletsou, A.; Zwiener, C.; Ibáñez, M.; Portolés, T.; de Boer, R.; Reid, M. J.; Onghena, M.; Kunkel, U.; Schulz, W.; Guillon, A.; Noyon, N.; Leroy, G.; Bados, P.; Bogialli, S.; Stipaničev, D.; Rostkowski, P.; Hollender, J. Non-Target Screening with High-Resolution Mass Spectrometry: Critical Review Using a Collaborative Trial on Water Analysis. Anal. Bioanal. Chem. 2015, 407, 62376255,  DOI: 10.1007/s00216-015-8681-7
  3. 3
    Hollender, J.; Schymanski, E. L.; Singer, H. P.; Ferguson, P. L. Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?. Environ. Sci. Technol. 2017, 51, 1150511512,  DOI: 10.1021/acs.est.7b02184
  4. 4
    Papazian, S.; D’Agostino, L. A.; Sadiktsis, I.; Froment, J.; Bonnefille, B.; Sdougkou, K.; Xie, H.; Athanassiadis, I.; Budhavant, K.; Dasari, S.; Andersson, A.; Gustafsson, Ö.; Martin, J. W. Nontarget Mass Spectrometry and in Silico Molecular Characterization of Air Pollution from the Indian Subcontinent. Commun. Earth Environ. 2022, 3, 35,  DOI: 10.1038/s43247-022-00365-1
  5. 5
    Gago-Ferrero, P.; Schymanski, E. L.; Bletsou, A. A.; Aalizadeh, R.; Hollender, J.; Thomaidis, N. S. Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MS. Environ. Sci. Technol. 2015, 49, 1233312341,  DOI: 10.1021/acs.est.5b03454
  6. 6
    Bletsou, A. A.; Jeon, J.; Hollender, J.; Archontaki, E.; Thomaidis, N. S. Targeted and Non-Targeted Liquid Chromatography-Mass Spectrometric Workflows for Identification of Transformation Products of Emerging Pollutants in the Aquatic Environment. TrAC Trends Anal. Chem. 2015, 66, 3244,  DOI: 10.1016/j.trac.2014.11.009
  7. 7
    Been, F.; Kruve, A.; Vughs, D.; Meekel, N.; Reus, A.; Zwartsen, A.; Wessel, A.; Fischer, A.; ter Laak, T.; Brunner, A. M. Risk-Based Prioritization of Suspects Detected in Riverine Water Using Complementary Chromatographic Techniques. Water Res. 2021, 204, 117612  DOI: 10.1016/j.watres.2021.117612
  8. 8
    Oss, M.; Kruve, A.; Herodes, K.; Leito, I. Electrospray Ionization Efficiency Scale of Organic Compounds. Anal. Chem. 2010, 82, 28652872,  DOI: 10.1021/ac902856t
  9. 9
    Oss, M.; Tshepelevitsh, S.; Kruve, A.; Liigand, P.; Liigand, J.; Rebane, R.; Selberg, S.; Ets, K.; Herodes, K.; Leito, I. Quantitative Electrospray Ionization Efficiency Scale: 10 Years After. Rapid Commun. Mass Spectrom. 2021, 35, e9178  DOI: 10.1002/rcm.9178
  10. 10
    Liigand, J.; Wang, T.; Kellogg, J.; Smedsgaard, J.; Cech, N.; Kruve, A. Quantification for Non-Targeted LC/MS Screening without Standard Substances. Sci. Rep. 2020, 10, 5808,  DOI: 10.1038/s41598-020-62573-z
  11. 11
    Leito, I.; Herodes, K.; Huopolainen, M.; Virro, K.; Künnapas, A.; Kruve, A.; Tanner, R. Towards the Electrospray Ionization Mass Spectrometry Ionization Efficiency Scale of Organic Compounds. Rapid Commun. Mass Spectrom. 2008, 22, 379384,  DOI: 10.1002/rcm.3371
  12. 12
    Cech, N. B.; Enke, C. G. Relating Electrospray Ionization Response to Nonpolar Character of Small Peptides. Anal. Chem. 2000, 72, 27172723,  DOI: 10.1021/ac9914869
  13. 13
    Alymatiri, C. M.; Kouskoura, M. G.; Markopoulou, C. K. Decoding the Signal Response of Steroids in Electrospray Ionization Mode (ESI-MS). Anal. Methods 2015, 7, 1043310444,  DOI: 10.1039/C5AY02839F
  14. 14
    Kruve, A.; Kaupmees, K. Adduct Formation in ESI/MS by Mobile Phase Additives. J. Am. Soc. Mass Spectrom. 2017, 28, 887894,  DOI: 10.1007/s13361-017-1626-y
  15. 15
    Kostiainen, R.; Kauppila, T. J. Effect of Eluent on the Ionization Process in Liquid Chromatography–Mass Spectrometry. J. Chromatogr. A 2009, 1216, 685699,  DOI: 10.1016/j.chroma.2008.08.095
  16. 16
    Kebarle, P.; Tang, L. From Ions in Solution to Ions in the Gas Phase - the Mechanism of Electrospray Mass Spectrometry. Anal. Chem. 1993, 65, 972A986A,  DOI: 10.1021/ac00070a001
  17. 17
    Kruve, A. Influence of Mobile Phase, Source Parameters and Source Type on Electrospray Ionization Efficiency in Negative Ion Mode: Influence of Mobile Phase in ESI/MS. J. Mass Spectrom. 2016, 51, 596601,  DOI: 10.1002/jms.3790
  18. 18
    Liigand, J.; Laaniste, A.; Kruve, A. PH Effects on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2017, 28, 461469,  DOI: 10.1007/s13361-016-1563-1
  19. 19
    Liigand, J.; Kruve, A.; Leito, I.; Girod, M.; Antoine, R. Effect of Mobile Phase on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2014, 25, 18531861,  DOI: 10.1007/s13361-014-0969-x
  20. 20
    Ojakivi, M.; Liigand, J.; Kruve, A. Modifying the Acidity of Charged Droplets. ChemistrySelect 2018, 3, 335338,  DOI: 10.1002/slct.201702269
  21. 21
    Raji, M. A.; Schug, K. A. Chemometric Study of the Influence of Instrumental Parameters on ESI-MS Analyte Response Using Full Factorial Design. Int. J. Mass Spectrom. 2009, 279, 100106,  DOI: 10.1016/j.ijms.2008.10.013
  22. 22
    Palm, E.; Kruve, A. Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMS. Molecules 2022, 27, 1013,  DOI: 10.3390/molecules27031013
  23. 23
    Kalogiouri, N. P.; Aalizadeh, R.; Thomaidis, N. S. Investigating the Organic and Conventional Production Type of Olive Oil with Target and Suspect Screening by LC-QTOF-MS, a Novel Semi-Quantification Method Using Chemical Similarity and Advanced Chemometrics. Anal. Bioanal. Chem. 2017, 409, 54135426,  DOI: 10.1007/s00216-017-0395-6
  24. 24
    Kruve, A.; Kiefer, K.; Hollender, J. Benchmarking of the Quantification Approaches for the Non-Targeted Screening of Micropollutants and Their Transformation Products in Groundwater. Anal. Bioanal. Chem. 2021, 413, 15491559,  DOI: 10.1007/s00216-020-03109-2
  25. 25
    Dahal, U. P.; Jones, J. P.; Davis, J. A.; Rock, D. A. Small Molecule Quantification by Liquid Chromatography-Mass Spectrometry for Metabolites of Drugs and Drug Candidates. Drug Metab. Dispos. 2011, 39, 23552360,  DOI: 10.1124/dmd.111.040865
  26. 26
    Gyllenhammar, I.; Benskin, J. P.; Sandblom, O.; Berger, U.; Ahrens, L.; Lignell, S.; Wiberg, K.; Glynn, A. Perfluoroalkyl Acids (PFAAs) in Serum from 2–4-Month-Old Infants: Influence of Maternal Serum Concentration, Gestational Age, Breast-Feeding, and Contaminated Drinking Water. Environ. Sci. Technol. 2018, 52, 71017110,  DOI: 10.1021/acs.est.8b00770
  27. 27
    Pieke, E. N.; Granby, K.; Trier, X.; Smedsgaard, J. A Framework to Estimate Concentrations of Potentially Unknown Substances by Semi-Quantification in Liquid Chromatography Electrospray Ionization Mass Spectrometry. Anal. Chim. Acta 2017, 975, 3041,  DOI: 10.1016/j.aca.2017.03.054
  28. 28
    Wu, L.; Wu, Y.; Shen, H.; Gong, P.; Cao, L.; Wang, G.; Hao, H. Quantitative Structure–Ion Intensity Relationship Strategy to the Prediction of Absolute Levels without Authentic Standards. Anal. Chim. Acta 2013, 794, 6775,  DOI: 10.1016/j.aca.2013.07.034
  29. 29
    Kruve, A.; Kaupmees, K. Predicting ESI/MS Signal Change for Anions in Different Solvents. Anal. Chem. 2017, 89, 50795086,  DOI: 10.1021/acs.analchem.7b00595
  30. 30
    Liigand, P.; Liigand, J.; Cuyckens, F.; Vreeken, R. J.; Kruve, A. Ionisation Efficiencies Can Be Predicted in Complicated Biological Matrices: A Proof of Concept. Anal. Chim. Acta 2018, 1032, 6874,  DOI: 10.1016/j.aca.2018.05.072
  31. 31
    Panagopoulos Abrahamsson, D.; Park, J.-S.; Singh, R. R.; Sirota, M.; Woodruff, T. J. Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards. J. Chem. Inf. Model. 2020, 60, 27182727,  DOI: 10.1021/acs.jcim.9b01096
  32. 32
    Aalizadeh, R.; Panara, A.; Thomaidis, N. S. Development and Application of a Novel Semi-Quantification Approach in LC-QToF-MS Analysis of Natural Products. J. Am. Soc. Mass Spectrom. 2021, 32, 14121423,  DOI: 10.1021/jasms.1c00032
  33. 33
    Aalizadeh, R.; Thomaidis, N. S.; Bletsou, A. A.; Gago-Ferrero, P. Quantitative Structure–Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental Samples. J. Chem. Inf. Model. 2016, 56, 13841398,  DOI: 10.1021/acs.jcim.5b00752
  34. 34
    Aalizadeh, R.; Alygizakis, N. A.; Schymanski, E. L.; Krauss, M.; Schulze, T.; Ibáñez, M.; McEachran, A. D.; Chao, A.; Williams, A. J.; Gago-Ferrero, P.; Covaci, A.; Moschet, C.; Young, T. M.; Hollender, J.; Slobodnik, J.; Thomaidis, N. S. Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget Screening. Anal. Chem. 2021, 93, 1160111611,  DOI: 10.1021/acs.analchem.1c02348
  35. 35
    Kruve, A.; Kaupmees, K.; Liigand, J.; Leito, I. Negative Electrospray Ionization via Deprotonation: Predicting the Ionization Efficiency. Anal. Chem. 2014, 86, 48224830,  DOI: 10.1021/ac404066v
  36. 36
    Mayhew, A. W.; Topping, D. O.; Hamilton, J. F. New Approach Combining Molecular Fingerprints and Machine Learning to Estimate Relative Ionization Efficiency in Electrospray Ionization. ACS Omega 2020, 5, 95109516,  DOI: 10.1021/acsomega.0c00732
  37. 37
    Aalizadeh, R.; Nikolopoulou, V.; Alygizakis, N.; Slobodnik, J.; Thomaidis, N. S. A Novel Workflow for Semi-Quantification of Emerging Contaminants in Environmental Samples Analyzed by LC-HRMS. Anal. Bioanal. Chem. 2022, 414, 74357450,  DOI: 10.1007/s00216-022-04084-6
  38. 38
    Wang, S.; Basijokaite, R.; Murphy, B. L.; Kelleher, C. A.; Zeng, T. Combining Passive Sampling with Suspect and Nontarget Screening to Characterize Organic Micropollutants in Streams Draining Mixed-Use Watersheds. Environ. Sci. Technol. 2022, 56, 1672616736,  DOI: 10.1021/acs.est.2c02938
  39. 39
    Krier, J.; Singh, R. R.; Kondić, T.; Lai, A.; Diderich, P.; Zhang, J.; Thiessen, P. A.; Bolton, E. E.; Schymanski, E. L. Discovering Pesticides and Their TPs in Luxembourg Waters Using Open Cheminformatics Approaches. Environ. Int. 2022, 158, 106885  DOI: 10.1016/j.envint.2021.106885
  40. 40
    Schymanski, E. L.; Kondić, T.; Neumann, S.; Thiessen, P. A.; Zhang, J.; Bolton, E. E. Empowering Large Chemical Knowledge Bases for Exposomics: PubChemLite Meets MetFrag. J. Cheminformatics 2021, 13, 19,  DOI: 10.1186/s13321-021-00489-0
  41. 41
    Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A. A.; Melnik, A. V.; Meusel, M.; Dorrestein, P. C.; Rousu, J.; Böcker, S. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16, 299302,  DOI: 10.1038/s41592-019-0344-8
  42. 42
    Paszkiewicz, M.; Godlewska, K.; Lis, H.; Caban, M.; Białk-Bielińska, A.; Stepnowski, P. Advances in Suspect Screening and Non-Target Analysis of Polar Emerging Contaminants in the Environmental Monitoring. TrAC Trends Anal. Chem. 2022, 154, 116671  DOI: 10.1016/j.trac.2022.116671
  43. 43
    Meng, D.; Fan, D.; Gu, W.; Wang, Z.; Chen, Y.; Bu, H.; Liu, J. Development of an integral strategy for non-target and target analysis of site-specific potential contaminants in surface water: A case study of Dianshan Lake, China. Chemosphere 2020, 243, 125367  DOI: 10.1016/j.chemosphere.2019.125367
  44. 44
    Groff, L. C.; Grossman, J. N.; Kruve, A.; Minucci, J. M.; Lowe, C. N.; McCord, J. P.; Kapraun, D. F.; Phillips, K. A.; Purucker, S. T.; Chao, A.; Ring, C. L.; Williams, A. J.; Sobus, J. R. Uncertainty Estimation Strategies for Quantitative Non-Targeted Analysis. Anal. Bioanal. Chem. 2022, 414, 49194933,  DOI: 10.1007/s00216-022-04118-z
  45. 45
    Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite Identification and Molecular Fingerprint Prediction through Machine Learning. Bioinformatics 2012, 28, 23332341,  DOI: 10.1093/bioinformatics/bts437
  46. 46
    Meekel, N.; Vughs, D.; Béen, F.; Brunner, A. M. Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition. Anal. Chem. 2021, 93, 50715080,  DOI: 10.1021/acs.analchem.0c04473
  47. 47
    Peets, P.; Wang, W.-C.; MacLeod, M.; Breitholtz, M.; Martin, J. W.; Kruve, A. MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS. Environ. Sci. Technol. 2022, 56, 1550815517,  DOI: 10.1021/acs.est.2c02536
  48. 48
    Hoffmann, M. A.; Nothias, L.-F.; Ludwig, M.; Fleischauer, M.; Gentry, E. C.; Witting, M.; Dorrestein, P. C.; Dührkop, K.; Böcker, S. High-Confidence Structural Annotation of Metabolites Absent from Spectral Libraries. Nat. Biotechnol. 2022, 40, 411421,  DOI: 10.1038/s41587-021-01045-9
  49. 49
    Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 1258012585,  DOI: 10.1073/pnas.1509788112
  50. 50
    Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J. Cheminformatics 2016, 8, 5,  DOI: 10.1186/s13321-016-0116-8
  51. 51
    Klekota, J.; Roth, F. P. Chemical Substructures That Enrich for Biological Activity. Bioinformatics 2008, 24, 25182525,  DOI: 10.1093/bioinformatics/btn479
  52. 52
    Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 12731280,  DOI: 10.1021/ci010132r
  53. 53
    Guha, R. Chemical Informatics Functionality in R. J. Stat. Softw. 2007, 18, 116,  DOI: 10.18637/jss.v018.i05
  54. 54
    Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: San Francisco, California, USA, 2016; pp. 785794.
  55. 55
    Rashmi, K. V.; Gilad-Bachrach, R. DART: Dropouts Meet Multiple Additive Regression Trees. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics; PMLR: San Diego, CA, USA, 2015; Vol. 38, pp. 489497.
  56. 56
    Kruve, A.; Aalizadeh, R.; Malm, L.; Alygizakis, N.; Thomaidis, N. S. Interlaboratory Comparison on Strategies for Semi-Quantitative Non-Targeted LC-ESI-HRMS, 2020. https://www.norman-network.net/sites/default/files/files/QA-QC%20Issues/Invitation%20letter%20JPA%202020%20semi-quant%20inter%20lab%20%28002%29.pdf.
  57. 57
    NORMAN Network; Aalizadeh, R.; Alygizakis, N.; Schymanski, E.; Slobodnik, J.; Fischer, S.; Cirka, L. S0 | SUSDAT | Merged NORMAN Suspect List: SusDat. 2022,  DOI: 10.5281/ZENODO.2664077 .
  58. 58
    Gao, S.; Zhang, Z.; Karnes, H. Sensitivity Enhancement in Liquid Chromatography/Atmospheric Pressure Ionization Mass Spectrometry Using Derivatization and Mobile Phase Additives. J. Chromatogr., B 2005, 825, 98110,  DOI: 10.1016/j.jchromb.2005.04.021
  59. 59
    Djoumbou Feunang, Y.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D. S. ClassyFire: Automated Chemical Classification with a Comprehensive Computable Taxonomy. Aust. J. Chem. 2016, 8, 61,  DOI: 10.1186/s13321-016-0174-y
  60. 60
    Wang, T.; Liigand, J.; Frandsen, H. L.; Smedsgaard, J.; Kruve, A. Standard Substances Free Quantification Makes LC/ESI/MS Non-Targeted Screening of Pesticides in Cereals Comparable between Labs. Food Chem. 2020, 318, 126460  DOI: 10.1016/j.foodchem.2020.126460

Cited By

Click to copy section linkSection link copied!
Citation Statements
Explore this article's citation statements on scite.ai

This article is cited by 26 publications.

  1. Jack A. Brand, Jake M. Martin, Marcus Michelangeli, Eli S.J. Thoré, Natalia Sandoval-Herrera, Erin S. McCallum, Drew Szabo, Damien L. Callahan, Timothy D. Clark, Michael G. Bertram, Tomas Brodin. Advancing the Spatiotemporal Dimension of Wildlife–Pollution Interactions. Environmental Science & Technology Letters 2025, 12 (4) , 358-370. https://doi.org/10.1021/acs.estlett.5c00042
  2. Yingying Yang, Qing Zhang, Adrian Covaci, Yanna Liu, Yilin Xiao, Yu Xiao, Shangwei Zhang, Xiaoman Jiang, Xinghui Xia. Unraveling the Composition Profile and Ecological Risk of Triazine Herbicides and Their Transformation Products in Urban Sewage Discharge. Environmental Science & Technology 2025, 59 (12) , 6235-6246. https://doi.org/10.1021/acs.est.4c12910
  3. Nienke Meekel, Anneli Kruve, Marja H. Lamoree, Frederic M. Been. Machine Learning-based Classification for the Prioritization of Potentially Hazardous Chemicals with Structural Alerts in Nontarget Screening. Environmental Science & Technology 2025, 59 (10) , 5056-5065. https://doi.org/10.1021/acs.est.4c10498
  4. Rick Helmus, Ingrida Bagdonaite, Pim de Voogt, Maarten R. van Bommel, Emma L. Schymanski, Annemarie P. van Wezel, Thomas L. ter Laak. Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals. Environmental Science & Technology 2025, 59 (7) , 3723-3736. https://doi.org/10.1021/acs.est.4c09121
  5. Alexandria Van Grouw, Markace A. Rainey, Olivia K. Reid, Molly M. Ogle, Samuel G. Moore, Johnna S. Temenoff, Facundo M. Fernández. Toward Machine Learning Electrospray Ionization Sensitivity Prediction for Semiquantitative Lipidomics in Stem Cells. Journal of Chemical Information and Modeling 2025, 65 (4) , 1826-1836. https://doi.org/10.1021/acs.jcim.4c02040
  6. Jingrun Hu, Yitao Lyu, Yi Liu, Xiuqi You, Damian E. Helbling, Weiling Sun. Incorporating Transformation Products for an Integrated Assessment of Antibiotic Pollution and Risks in Surface Water. Environmental Science & Technology 2025, 59 (5) , 2815-2826. https://doi.org/10.1021/acs.est.4c12926
  7. Iker Alvarez-Mora, Aset Muratuly, Sarah Johann, Katarzyna Arturi, Florian Jünger, Carolin Huber, Henner Hollert, Martin Krauss, Werner Brack, Melis Muz. High-Throughput Effect-Directed Analysis of Androgenic Compounds in Hospital Wastewater: Identifying Effect Drivers through Non-Target Screening Supported by Toxicity Prediction. Environmental Science & Technology 2025, Article ASAP.
  8. Rhianna L. Evans, Daniel J. Bryant, Aristeidis Voliotis, Dawei Hu, HuiHui Wu, Sara Aisyah Syafira, Osayomwanbor E. Oghama, Gordon McFiggans, Jacqueline F. Hamilton, Andrew R. Rickard. A Semi-Quantitative Approach to Nontarget Compositional Analysis of Complex Samples. Analytical Chemistry 2024, 96 (46) , 18349-18358. https://doi.org/10.1021/acs.analchem.4c00819
  9. Louise Malm, Jaanus Liigand, Reza Aalizadeh, Nikiforos Alygizakis, Kelsey Ng, Emil Egede Fro̷kjær, Mulatu Yohannes Nanusha, Martin Hansen, Merle Plassmann, Stefan Bieber, Thomas Letzel, Lydia Balest, Pier Paolo Abis, Michele Mazzetti, Barbara Kasprzyk-Hordern, Nicola Ceolotto, Sangeeta Kumari, Stephan Hann, Sven Kochmann, Teresa Steininger-Mairinger, Coralie Soulier, Giuseppe Mascolo, Sapia Murgolo, Manuel Garcia-Vara, Miren López de Alda, Juliane Hollender, Katarzyna Arturi, Gianluca Coppola, Massimo Peruzzo, Hanna Joerss, Carla van der Neut-Marchand, Eelco N. Pieke, Pablo Gago-Ferrero, Ruben Gil-Solsona, Viktória Licul-Kucera, Claudio Roscioli, Sara Valsecchi, Austeja Luckute, Jan H. Christensen, Selina Tisler, Dennis Vughs, Nienke Meekel, Begoña Talavera Andújar, Dagny Aurich, Emma L. Schymanski, Gianfranco Frigerio, André Macherius, Uwe Kunkel, Tobias Bader, Pawel Rostkowski, Hans Gundersen, Belinda Valdecanas, W. Clay Davis, Bastian Schulze, Sarit Kaserzon, Martijn Pijnappels, Mar Esperanza, Aurélie Fildier, Emmanuelle Vulliet, Laure Wiest, Adrian Covaci, Alicia Macan Schönleben, Lidia Belova, Alberto Celma, Lubertus Bijlsma, Emilie Caupos, Emmanuelle Mebold, Julien Le Roux, Eugenie Troia, Eva de Rijke, Rick Helmus, Gaëla Leroy, Niels Haelewyck, David Chrastina, Milan Verwoert, Nikolaos S. Thomaidis, Anneli Kruve. Quantification Approaches in Non-Target LC/ESI/HRMS Analysis: An Interlaboratory Comparison. Analytical Chemistry 2024, 96 (41) , 16215-16226. https://doi.org/10.1021/acs.analchem.4c02902
  10. Drew Szabo, Stellan Fischer, Aji P. Mathew, Anneli Kruve. Prioritization, Identification, and Quantification of Emerging Contaminants in Recycled Textiles Using Non-Targeted and Suspect Screening Workflows by LC-ESI-HRMS. Analytical Chemistry 2024, 96 (35) , 14150-14159. https://doi.org/10.1021/acs.analchem.4c02041
  11. Amina Souihi, Anneli Kruve. Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS. Analytical Chemistry 2024, 96 (28) , 11263-11272. https://doi.org/10.1021/acs.analchem.4c01002
  12. Corina Meyer, Michael A. Stravs, Juliane Hollender. How Wastewater Reflects Human Metabolism─Suspect Screening of Pharmaceutical Metabolites in Wastewater Influent. Environmental Science & Technology 2024, 58 (22) , 9828-9839. https://doi.org/10.1021/acs.est.4c00968
  13. Drew Szabo, Travis M. Falconer, Christine M. Fisher, Ted Heise, Allison L. Phillips, Gyorgy Vas, Antony J. Williams, Anneli Kruve. Online and Offline Prioritization of Chemicals of Interest in Suspect Screening and Non-targeted Screening with High-Resolution Mass Spectrometry. Analytical Chemistry 2024, 96 (9) , 3707-3716. https://doi.org/10.1021/acs.analchem.3c05705
  14. Jonathan Zweigle, Selina Tisler, Marta Bevilacqua, Giorgio Tomasi, Nikoline J. Nielsen, Nadine Gawlitta, Josephine S. Lübeck, Age K. Smilde, Jan H. Christensen. Prioritization strategies for non-target screening in environmental samples by chromatography – High-resolution mass spectrometry: A tutorial. Journal of Chromatography A 2025, 1751 , 465944. https://doi.org/10.1016/j.chroma.2025.465944
  15. Jason Devers, David I. Pattison, Asger B. Hansen, Jan H. Christensen. New strategies for non-targeted quantification in comprehensive two-dimensional gas chromatography: The potential of reconstructed TIC response factor surfaces. Journal of Chromatography A 2025, 1747 , 465811. https://doi.org/10.1016/j.chroma.2025.465811
  16. Alexa Canchola, Lillian N. Tran, Wonsik Woo, Linhui Tian, Ying-Hsuan Lin, Wei-Chun Chou. Advancing non-target analysis of emerging environmental contaminants with machine learning: Current status and future implications. Environment International 2025, 198 , 109404. https://doi.org/10.1016/j.envint.2025.109404
  17. Chiara Spaggiari, Isa Sara Aimee Hiemstra, Antoinette Kazbar, Gabriele Costantino, Laura Righetti. Towards eco-metabolomics: NADES-guided extraction enables semi-quantitative metabolomics for Melissa officinalis. Advances in Sample Preparation 2025, 13 , 100154. https://doi.org/10.1016/j.sampre.2025.100154
  18. Iker Alvarez-Mora, Katarzyna Arturi, Frederic Béen, Sebastian Buchinger, Abd El Rahman El Mais, Christine Gallampois, Meike Hahn, Juliane Hollender, Corine Houtman, Sarah Johann, Martin Krauss, Marja Lamoree, Maria Margalef, Riccardo Massei, Werner Brack, Melis Muz. Progress, applications, and challenges in high-throughput effect-directed analysis for toxicity driver identification — is it time for HT-EDA?. Analytical and Bioanalytical Chemistry 2025, 417 (3) , 451-472. https://doi.org/10.1007/s00216-024-05424-4
  19. Haotian Wang, Laijin Zhong, Wenyuan Su, Ting Ruan, Guibin Jiang. Machine learning-assisted identification of environmental pollutants by liquid chromatography coupled with high-resolution mass spectrometry. TrAC Trends in Analytical Chemistry 2024, 180 , 117988. https://doi.org/10.1016/j.trac.2024.117988
  20. Marie Rønne Aggerbeck, Emil Egede Frøkjær, Anders Johansen, Lea Ellegaard-Jensen, Lars Hestbjerg Hansen, Martin Hansen. Non-target analysis of Danish wastewater treatment plant effluent: Statistical analysis of chemical fingerprinting as a step toward a future monitoring tool. Environmental Research 2024, 257 , 119242. https://doi.org/10.1016/j.envres.2024.119242
  21. Matthias Hof, Milo L. de Baat, Jantien Noorda, Willie J.G.M. Peijnenburg, Annemarie P. van Wezel, Agnes G. Oomen. Informing the public about chemical mixtures in the local environment: Currently applied indicators in the Netherlands and ways forward. Journal of Environmental Management 2024, 368 , 122108. https://doi.org/10.1016/j.jenvman.2024.122108
  22. Shuai Wang, Upendra A. Argikar, Maria Chatzopoulou, Sungjoon Cho, Rachel D. Crouch, Deepika Dhaware, Ting-Jia Gu, Carley J. S. Heck, Kevin M. Johnson, Amit S. Kalgutkar, Joyce Liu, Bin Ma, Grover P. Miller, Jessica A. Rowley, Herana Kamal Seneviratne, Donglu Zhang, S. Cyrus Khojasteh. Bioactivation and reactivity research advances – 2023 year in review. Drug Metabolism Reviews 2024, 56 (3) , 247-284. https://doi.org/10.1080/03602532.2024.2376023
  23. Žiga Tkalec, Jean-Philippe Antignac, Nicole Bandow, Frederic M. Béen, Lidia Belova, Jos Bessems, Bruno Le Bizec, Werner Brack, German Cano-Sancho, Jade Chaker, Adrian Covaci, Nicolas Creusot, Arthur David, Laurent Debrauwer, Gaud Dervilly, Radu Corneliu Duca, Valérie Fessard, Joan O. Grimalt, Thierry Guerin, Baninia Habchi, Helge Hecht, Juliane Hollender, Emilien L. Jamin, Jana Klánová, Tina Kosjek, Martin Krauss, Marja Lamoree, Gwenaelle Lavison-Bompard, Jeroen Meijer, Ruth Moeller, Hans Mol, Sophie Mompelat, An Van Nieuwenhuyse, Herbert Oberacher, Julien Parinet, Christof Van Poucke, Robert Roškar, Anne Togola, Jurij Trontelj, Elliott J. Price. Innovative analytical methodologies for characterizing chemical exposure with a view to next-generation risk assessment. Environment International 2024, 186 , 108585. https://doi.org/10.1016/j.envint.2024.108585
  24. Shirley Pu, James P. McCord, Jacqueline Bangma, Jon R. Sobus. Establishing performance metrics for quantitative non-targeted analysis: a demonstration using per- and polyfluoroalkyl substances. Analytical and Bioanalytical Chemistry 2024, 416 (5) , 1249-1267. https://doi.org/10.1007/s00216-023-05117-4
  25. Varvara Nikolopoulou, Nikolaos S. Thomaidis, Reza Aalizadeh. From Chemical Similarity to Ionization Efficiency and Beyond: Toward Semi-Quantitative Analysis of Small Molecules and Its Integration in Non-targeted Screening. 2024https://doi.org/10.1007/698_2024_1188
  26. Helen Sepman, Pilleriin Peets, Lisa Jonsson, Louise Malm, Malte Posselt, Matthew MacLeod, Jonathan Martin, Magnus Breitholtz, Michael McLachlan, Anneli Kruve. Machine Learning Tools Can Pinpoint High-Risk Water Pollutants. 2023, 68. https://doi.org/10.3390/proceedings2023092068

Analytical Chemistry

Cite this: Anal. Chem. 2023, 95, 33, 12329–12338
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.analchem.3c01744
Published August 7, 2023

Copyright © 2023 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0 .

Article Views

4857

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. Training (gray) and test (green) sets of two best performing models trained with the xgbTree algorithm and based on (A) structural fingerprints in MS2Quant and (B) on PaDEL descriptors. (C) General modeling workflow used here. For all 1191 chemicals, molecular descriptors/fingerprints were calculated from the structure and 80% of the data (training set) was used for modeling. To clean the descriptors, features with more than 10 missing values were removed. Additionally, features with near-zero variance (cut-off 80/20) and pair-wise correlation (cut-off 0.75) were removed. The training set chemicals were then used for modeling and the performance was assessed based on RMSE and fold prediction errors of the test set.

    Figure 2

    Figure 2. Workflow for validation of MS2Quant on NORMAN interlaboratory comparison samples. (A) Molecular fingerprints were computed for 36 chemicals in the calibration mix from SMILES notation with the rcdk package in R. Furthermore, MS2Quant was used to predict ionization efficiency values and linear regression was fit between experimental logarithmic response factors and logarithmic predicted ionization efficiencies. (B) Lake water spiked with 39 suspect compounds in high and low concentrations was measured with LC-HRMS in data-dependent acquisition mode with an inclusion list. SIRIUS+CSI:FingerID was used to predict probabilities of structural fingerprints from MS1 and MS2 spectra and MS2Quant was used to predict ionization efficiencies from these predicted probabilities. Thereafter, the linear regression from calibration compound was used to convert the predicted ionization efficiency values to instrument- and method-specific predicted response factors. Concentrations of suspect chemicals were found using predicted response factors as well as integrated areas from LC-HRMS analysis and was compared to the spiked concentrations. For comparison with PaDEL-based quantification, a similar workflow was used with the PaDEL descriptor-based prediction model instead of MS2Quant and identification of suspects was performed with SIRIUS+CSI:FingerID where the top assigned structure was used for ionization efficiency predictions.

    Figure 3

    Figure 3. Predicted concentrations for high concentration spiked sample with MS2Quant and the PaDEL-based model for five incorrectly identified compounds. Real concentrations are marked with a vertical line.

    Figure 4

    Figure 4. (A) Top 10 most influential variables in the model and their normalized importance (%); (B) SHAP values representing influence of each top 10 feature and their marginal contribution to the prediction and (C) the test set chemicals assigned to different classes by ClassyFire, where each datapoint represents the geometric mean prediction error of log IE of a unique chemical. The classes are in the descending order based on median geometric mean prediction error of all compounds in the group and only classes with three or more unique representatives were plotted.

  • References


    This article references 60 other publications.

    1. 1
      McCord, J. P.; Groff, L. C.; Sobus, J. R. Quantitative Non-Targeted Analysis: Bridging the Gap between Contaminant Discovery and Risk Characterization. Environ. Int. 2022, 158, 107011  DOI: 10.1016/j.envint.2021.107011
    2. 2
      Schymanski, E. L.; Singer, H. P.; Slobodnik, J.; Ipolyi, I. M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; Thomaidis, N. S.; Bletsou, A.; Zwiener, C.; Ibáñez, M.; Portolés, T.; de Boer, R.; Reid, M. J.; Onghena, M.; Kunkel, U.; Schulz, W.; Guillon, A.; Noyon, N.; Leroy, G.; Bados, P.; Bogialli, S.; Stipaničev, D.; Rostkowski, P.; Hollender, J. Non-Target Screening with High-Resolution Mass Spectrometry: Critical Review Using a Collaborative Trial on Water Analysis. Anal. Bioanal. Chem. 2015, 407, 62376255,  DOI: 10.1007/s00216-015-8681-7
    3. 3
      Hollender, J.; Schymanski, E. L.; Singer, H. P.; Ferguson, P. L. Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?. Environ. Sci. Technol. 2017, 51, 1150511512,  DOI: 10.1021/acs.est.7b02184
    4. 4
      Papazian, S.; D’Agostino, L. A.; Sadiktsis, I.; Froment, J.; Bonnefille, B.; Sdougkou, K.; Xie, H.; Athanassiadis, I.; Budhavant, K.; Dasari, S.; Andersson, A.; Gustafsson, Ö.; Martin, J. W. Nontarget Mass Spectrometry and in Silico Molecular Characterization of Air Pollution from the Indian Subcontinent. Commun. Earth Environ. 2022, 3, 35,  DOI: 10.1038/s43247-022-00365-1
    5. 5
      Gago-Ferrero, P.; Schymanski, E. L.; Bletsou, A. A.; Aalizadeh, R.; Hollender, J.; Thomaidis, N. S. Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MS. Environ. Sci. Technol. 2015, 49, 1233312341,  DOI: 10.1021/acs.est.5b03454
    6. 6
      Bletsou, A. A.; Jeon, J.; Hollender, J.; Archontaki, E.; Thomaidis, N. S. Targeted and Non-Targeted Liquid Chromatography-Mass Spectrometric Workflows for Identification of Transformation Products of Emerging Pollutants in the Aquatic Environment. TrAC Trends Anal. Chem. 2015, 66, 3244,  DOI: 10.1016/j.trac.2014.11.009
    7. 7
      Been, F.; Kruve, A.; Vughs, D.; Meekel, N.; Reus, A.; Zwartsen, A.; Wessel, A.; Fischer, A.; ter Laak, T.; Brunner, A. M. Risk-Based Prioritization of Suspects Detected in Riverine Water Using Complementary Chromatographic Techniques. Water Res. 2021, 204, 117612  DOI: 10.1016/j.watres.2021.117612
    8. 8
      Oss, M.; Kruve, A.; Herodes, K.; Leito, I. Electrospray Ionization Efficiency Scale of Organic Compounds. Anal. Chem. 2010, 82, 28652872,  DOI: 10.1021/ac902856t
    9. 9
      Oss, M.; Tshepelevitsh, S.; Kruve, A.; Liigand, P.; Liigand, J.; Rebane, R.; Selberg, S.; Ets, K.; Herodes, K.; Leito, I. Quantitative Electrospray Ionization Efficiency Scale: 10 Years After. Rapid Commun. Mass Spectrom. 2021, 35, e9178  DOI: 10.1002/rcm.9178
    10. 10
      Liigand, J.; Wang, T.; Kellogg, J.; Smedsgaard, J.; Cech, N.; Kruve, A. Quantification for Non-Targeted LC/MS Screening without Standard Substances. Sci. Rep. 2020, 10, 5808,  DOI: 10.1038/s41598-020-62573-z
    11. 11
      Leito, I.; Herodes, K.; Huopolainen, M.; Virro, K.; Künnapas, A.; Kruve, A.; Tanner, R. Towards the Electrospray Ionization Mass Spectrometry Ionization Efficiency Scale of Organic Compounds. Rapid Commun. Mass Spectrom. 2008, 22, 379384,  DOI: 10.1002/rcm.3371
    12. 12
      Cech, N. B.; Enke, C. G. Relating Electrospray Ionization Response to Nonpolar Character of Small Peptides. Anal. Chem. 2000, 72, 27172723,  DOI: 10.1021/ac9914869
    13. 13
      Alymatiri, C. M.; Kouskoura, M. G.; Markopoulou, C. K. Decoding the Signal Response of Steroids in Electrospray Ionization Mode (ESI-MS). Anal. Methods 2015, 7, 1043310444,  DOI: 10.1039/C5AY02839F
    14. 14
      Kruve, A.; Kaupmees, K. Adduct Formation in ESI/MS by Mobile Phase Additives. J. Am. Soc. Mass Spectrom. 2017, 28, 887894,  DOI: 10.1007/s13361-017-1626-y
    15. 15
      Kostiainen, R.; Kauppila, T. J. Effect of Eluent on the Ionization Process in Liquid Chromatography–Mass Spectrometry. J. Chromatogr. A 2009, 1216, 685699,  DOI: 10.1016/j.chroma.2008.08.095
    16. 16
      Kebarle, P.; Tang, L. From Ions in Solution to Ions in the Gas Phase - the Mechanism of Electrospray Mass Spectrometry. Anal. Chem. 1993, 65, 972A986A,  DOI: 10.1021/ac00070a001
    17. 17
      Kruve, A. Influence of Mobile Phase, Source Parameters and Source Type on Electrospray Ionization Efficiency in Negative Ion Mode: Influence of Mobile Phase in ESI/MS. J. Mass Spectrom. 2016, 51, 596601,  DOI: 10.1002/jms.3790
    18. 18
      Liigand, J.; Laaniste, A.; Kruve, A. PH Effects on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2017, 28, 461469,  DOI: 10.1007/s13361-016-1563-1
    19. 19
      Liigand, J.; Kruve, A.; Leito, I.; Girod, M.; Antoine, R. Effect of Mobile Phase on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2014, 25, 18531861,  DOI: 10.1007/s13361-014-0969-x
    20. 20
      Ojakivi, M.; Liigand, J.; Kruve, A. Modifying the Acidity of Charged Droplets. ChemistrySelect 2018, 3, 335338,  DOI: 10.1002/slct.201702269
    21. 21
      Raji, M. A.; Schug, K. A. Chemometric Study of the Influence of Instrumental Parameters on ESI-MS Analyte Response Using Full Factorial Design. Int. J. Mass Spectrom. 2009, 279, 100106,  DOI: 10.1016/j.ijms.2008.10.013
    22. 22
      Palm, E.; Kruve, A. Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMS. Molecules 2022, 27, 1013,  DOI: 10.3390/molecules27031013
    23. 23
      Kalogiouri, N. P.; Aalizadeh, R.; Thomaidis, N. S. Investigating the Organic and Conventional Production Type of Olive Oil with Target and Suspect Screening by LC-QTOF-MS, a Novel Semi-Quantification Method Using Chemical Similarity and Advanced Chemometrics. Anal. Bioanal. Chem. 2017, 409, 54135426,  DOI: 10.1007/s00216-017-0395-6
    24. 24
      Kruve, A.; Kiefer, K.; Hollender, J. Benchmarking of the Quantification Approaches for the Non-Targeted Screening of Micropollutants and Their Transformation Products in Groundwater. Anal. Bioanal. Chem. 2021, 413, 15491559,  DOI: 10.1007/s00216-020-03109-2
    25. 25
      Dahal, U. P.; Jones, J. P.; Davis, J. A.; Rock, D. A. Small Molecule Quantification by Liquid Chromatography-Mass Spectrometry for Metabolites of Drugs and Drug Candidates. Drug Metab. Dispos. 2011, 39, 23552360,  DOI: 10.1124/dmd.111.040865
    26. 26
      Gyllenhammar, I.; Benskin, J. P.; Sandblom, O.; Berger, U.; Ahrens, L.; Lignell, S.; Wiberg, K.; Glynn, A. Perfluoroalkyl Acids (PFAAs) in Serum from 2–4-Month-Old Infants: Influence of Maternal Serum Concentration, Gestational Age, Breast-Feeding, and Contaminated Drinking Water. Environ. Sci. Technol. 2018, 52, 71017110,  DOI: 10.1021/acs.est.8b00770
    27. 27
      Pieke, E. N.; Granby, K.; Trier, X.; Smedsgaard, J. A Framework to Estimate Concentrations of Potentially Unknown Substances by Semi-Quantification in Liquid Chromatography Electrospray Ionization Mass Spectrometry. Anal. Chim. Acta 2017, 975, 3041,  DOI: 10.1016/j.aca.2017.03.054
    28. 28
      Wu, L.; Wu, Y.; Shen, H.; Gong, P.; Cao, L.; Wang, G.; Hao, H. Quantitative Structure–Ion Intensity Relationship Strategy to the Prediction of Absolute Levels without Authentic Standards. Anal. Chim. Acta 2013, 794, 6775,  DOI: 10.1016/j.aca.2013.07.034
    29. 29
      Kruve, A.; Kaupmees, K. Predicting ESI/MS Signal Change for Anions in Different Solvents. Anal. Chem. 2017, 89, 50795086,  DOI: 10.1021/acs.analchem.7b00595
    30. 30
      Liigand, P.; Liigand, J.; Cuyckens, F.; Vreeken, R. J.; Kruve, A. Ionisation Efficiencies Can Be Predicted in Complicated Biological Matrices: A Proof of Concept. Anal. Chim. Acta 2018, 1032, 6874,  DOI: 10.1016/j.aca.2018.05.072
    31. 31
      Panagopoulos Abrahamsson, D.; Park, J.-S.; Singh, R. R.; Sirota, M.; Woodruff, T. J. Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards. J. Chem. Inf. Model. 2020, 60, 27182727,  DOI: 10.1021/acs.jcim.9b01096
    32. 32
      Aalizadeh, R.; Panara, A.; Thomaidis, N. S. Development and Application of a Novel Semi-Quantification Approach in LC-QToF-MS Analysis of Natural Products. J. Am. Soc. Mass Spectrom. 2021, 32, 14121423,  DOI: 10.1021/jasms.1c00032
    33. 33
      Aalizadeh, R.; Thomaidis, N. S.; Bletsou, A. A.; Gago-Ferrero, P. Quantitative Structure–Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental Samples. J. Chem. Inf. Model. 2016, 56, 13841398,  DOI: 10.1021/acs.jcim.5b00752
    34. 34
      Aalizadeh, R.; Alygizakis, N. A.; Schymanski, E. L.; Krauss, M.; Schulze, T.; Ibáñez, M.; McEachran, A. D.; Chao, A.; Williams, A. J.; Gago-Ferrero, P.; Covaci, A.; Moschet, C.; Young, T. M.; Hollender, J.; Slobodnik, J.; Thomaidis, N. S. Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget Screening. Anal. Chem. 2021, 93, 1160111611,  DOI: 10.1021/acs.analchem.1c02348
    35. 35
      Kruve, A.; Kaupmees, K.; Liigand, J.; Leito, I. Negative Electrospray Ionization via Deprotonation: Predicting the Ionization Efficiency. Anal. Chem. 2014, 86, 48224830,  DOI: 10.1021/ac404066v
    36. 36
      Mayhew, A. W.; Topping, D. O.; Hamilton, J. F. New Approach Combining Molecular Fingerprints and Machine Learning to Estimate Relative Ionization Efficiency in Electrospray Ionization. ACS Omega 2020, 5, 95109516,  DOI: 10.1021/acsomega.0c00732
    37. 37
      Aalizadeh, R.; Nikolopoulou, V.; Alygizakis, N.; Slobodnik, J.; Thomaidis, N. S. A Novel Workflow for Semi-Quantification of Emerging Contaminants in Environmental Samples Analyzed by LC-HRMS. Anal. Bioanal. Chem. 2022, 414, 74357450,  DOI: 10.1007/s00216-022-04084-6
    38. 38
      Wang, S.; Basijokaite, R.; Murphy, B. L.; Kelleher, C. A.; Zeng, T. Combining Passive Sampling with Suspect and Nontarget Screening to Characterize Organic Micropollutants in Streams Draining Mixed-Use Watersheds. Environ. Sci. Technol. 2022, 56, 1672616736,  DOI: 10.1021/acs.est.2c02938
    39. 39
      Krier, J.; Singh, R. R.; Kondić, T.; Lai, A.; Diderich, P.; Zhang, J.; Thiessen, P. A.; Bolton, E. E.; Schymanski, E. L. Discovering Pesticides and Their TPs in Luxembourg Waters Using Open Cheminformatics Approaches. Environ. Int. 2022, 158, 106885  DOI: 10.1016/j.envint.2021.106885
    40. 40
      Schymanski, E. L.; Kondić, T.; Neumann, S.; Thiessen, P. A.; Zhang, J.; Bolton, E. E. Empowering Large Chemical Knowledge Bases for Exposomics: PubChemLite Meets MetFrag. J. Cheminformatics 2021, 13, 19,  DOI: 10.1186/s13321-021-00489-0
    41. 41
      Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A. A.; Melnik, A. V.; Meusel, M.; Dorrestein, P. C.; Rousu, J.; Böcker, S. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16, 299302,  DOI: 10.1038/s41592-019-0344-8
    42. 42
      Paszkiewicz, M.; Godlewska, K.; Lis, H.; Caban, M.; Białk-Bielińska, A.; Stepnowski, P. Advances in Suspect Screening and Non-Target Analysis of Polar Emerging Contaminants in the Environmental Monitoring. TrAC Trends Anal. Chem. 2022, 154, 116671  DOI: 10.1016/j.trac.2022.116671
    43. 43
      Meng, D.; Fan, D.; Gu, W.; Wang, Z.; Chen, Y.; Bu, H.; Liu, J. Development of an integral strategy for non-target and target analysis of site-specific potential contaminants in surface water: A case study of Dianshan Lake, China. Chemosphere 2020, 243, 125367  DOI: 10.1016/j.chemosphere.2019.125367
    44. 44
      Groff, L. C.; Grossman, J. N.; Kruve, A.; Minucci, J. M.; Lowe, C. N.; McCord, J. P.; Kapraun, D. F.; Phillips, K. A.; Purucker, S. T.; Chao, A.; Ring, C. L.; Williams, A. J.; Sobus, J. R. Uncertainty Estimation Strategies for Quantitative Non-Targeted Analysis. Anal. Bioanal. Chem. 2022, 414, 49194933,  DOI: 10.1007/s00216-022-04118-z
    45. 45
      Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite Identification and Molecular Fingerprint Prediction through Machine Learning. Bioinformatics 2012, 28, 23332341,  DOI: 10.1093/bioinformatics/bts437
    46. 46
      Meekel, N.; Vughs, D.; Béen, F.; Brunner, A. M. Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition. Anal. Chem. 2021, 93, 50715080,  DOI: 10.1021/acs.analchem.0c04473
    47. 47
      Peets, P.; Wang, W.-C.; MacLeod, M.; Breitholtz, M.; Martin, J. W.; Kruve, A. MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS. Environ. Sci. Technol. 2022, 56, 1550815517,  DOI: 10.1021/acs.est.2c02536
    48. 48
      Hoffmann, M. A.; Nothias, L.-F.; Ludwig, M.; Fleischauer, M.; Gentry, E. C.; Witting, M.; Dorrestein, P. C.; Dührkop, K.; Böcker, S. High-Confidence Structural Annotation of Metabolites Absent from Spectral Libraries. Nat. Biotechnol. 2022, 40, 411421,  DOI: 10.1038/s41587-021-01045-9
    49. 49
      Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 1258012585,  DOI: 10.1073/pnas.1509788112
    50. 50
      Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J. Cheminformatics 2016, 8, 5,  DOI: 10.1186/s13321-016-0116-8
    51. 51
      Klekota, J.; Roth, F. P. Chemical Substructures That Enrich for Biological Activity. Bioinformatics 2008, 24, 25182525,  DOI: 10.1093/bioinformatics/btn479
    52. 52
      Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 12731280,  DOI: 10.1021/ci010132r
    53. 53
      Guha, R. Chemical Informatics Functionality in R. J. Stat. Softw. 2007, 18, 116,  DOI: 10.18637/jss.v018.i05
    54. 54
      Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: San Francisco, California, USA, 2016; pp. 785794.
    55. 55
      Rashmi, K. V.; Gilad-Bachrach, R. DART: Dropouts Meet Multiple Additive Regression Trees. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics; PMLR: San Diego, CA, USA, 2015; Vol. 38, pp. 489497.
    56. 56
      Kruve, A.; Aalizadeh, R.; Malm, L.; Alygizakis, N.; Thomaidis, N. S. Interlaboratory Comparison on Strategies for Semi-Quantitative Non-Targeted LC-ESI-HRMS, 2020. https://www.norman-network.net/sites/default/files/files/QA-QC%20Issues/Invitation%20letter%20JPA%202020%20semi-quant%20inter%20lab%20%28002%29.pdf.
    57. 57
      NORMAN Network; Aalizadeh, R.; Alygizakis, N.; Schymanski, E.; Slobodnik, J.; Fischer, S.; Cirka, L. S0 | SUSDAT | Merged NORMAN Suspect List: SusDat. 2022,  DOI: 10.5281/ZENODO.2664077 .
    58. 58
      Gao, S.; Zhang, Z.; Karnes, H. Sensitivity Enhancement in Liquid Chromatography/Atmospheric Pressure Ionization Mass Spectrometry Using Derivatization and Mobile Phase Additives. J. Chromatogr., B 2005, 825, 98110,  DOI: 10.1016/j.jchromb.2005.04.021
    59. 59
      Djoumbou Feunang, Y.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D. S. ClassyFire: Automated Chemical Classification with a Comprehensive Computable Taxonomy. Aust. J. Chem. 2016, 8, 61,  DOI: 10.1186/s13321-016-0174-y
    60. 60
      Wang, T.; Liigand, J.; Frandsen, H. L.; Smedsgaard, J.; Kruve, A. Standard Substances Free Quantification Makes LC/ESI/MS Non-Targeted Screening of Pesticides in Cereals Comparable between Labs. Food Chem. 2020, 318, 126460  DOI: 10.1016/j.foodchem.2020.126460
  • Supporting Information

    Supporting Information


    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.3c01744.

    • Data unification process in detail; example code how the data were unified based on either dataset 1 or a unified dataset; overview of datasets containing metadata and ionization efficiency information used for modeling; comparison between all tested molecular descriptors or fingerprints and machine learning algorithms; overview of the calibrants and suspects used in validation and experimental conditions; a statistical and graphical overview of MS2Quant and structure-based models’ performances on the validation set; overview of incorrectly identified structures and their highest ranked assigned structure by SIRIUS+CSI:FingerID; SIRIUS calculations and parameters used; top 10 most influential variables in a PaDEL-based model developed here; top 10 most influential variables, their SHAP values, and error distribution of different chemical classes assigned by ClassyFire for PaDEL-based model developed here; and first decision three of xgbTree algorithm-based models developed using structural fingerprints and PaDEL descriptors (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.