Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 DataClick to copy article linkArticle link copied!
- Helen SepmanHelen SepmanDepartment of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, SwedenDepartment of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenMore by Helen Sepman
- Louise MalmLouise MalmDepartment of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, SwedenMore by Louise Malm
- Pilleriin PeetsPilleriin PeetsDepartment of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, SwedenMore by Pilleriin Peets
- Matthew MacLeodMatthew MacLeodDepartment of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenMore by Matthew MacLeod
- Jonathan MartinJonathan MartinScience for Life Laboratory, Department of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenMore by Jonathan Martin
- Magnus BreitholtzMagnus BreitholtzDepartment of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenMore by Magnus Breitholtz
- Anneli Kruve*Anneli Kruve*Email: [email protected]Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius väg 16, 106 91 Stockholm, SwedenDepartment of Environmental Science, Stockholm University, Svante Arrhenius väg 8, 106 91 Stockholm, SwedenMore by Anneli Kruve
Abstract
Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation (MS2) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS2 spectra of unidentified chemicals using SIRIUS+CSI:FingerID software. The root mean square errors of the training and test sets were 0.55 (3.5×) and 0.80 (6.3×) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2× to 6×. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4×, a geometric mean of 4.5×, and a median of 4.0×. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5×, a geometric mean of 5.6×, and a median of 5.2× on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. The MS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Introduction
Materials and Methods
Data for Training the Ionization Efficiency Model
Calculation of Descriptors
Data Preprocessing
Modeling Parameters
Chemicals Used in Validation and Fingerprint Prediction from MS2 Data
Converting Predicted Response Factor to the Predicted Ionization Efficiency
Results and Discussion
Model Development
Figure 1
Figure 1. Training (gray) and test (green) sets of two best performing models trained with the xgbTree algorithm and based on (A) structural fingerprints in MS2Quant and (B) on PaDEL descriptors. (C) General modeling workflow used here. For all 1191 chemicals, molecular descriptors/fingerprints were calculated from the structure and 80% of the data (training set) was used for modeling. To clean the descriptors, features with more than 10 missing values were removed. Additionally, features with near-zero variance (cut-off 80/20) and pair-wise correlation (cut-off 0.75) were removed. The training set chemicals were then used for modeling and the performance was assessed based on RMSE and fold prediction errors of the test set.
MS2Quant Performance in NTS Workflow on NORMAN Interlaboratory Comparison Samples
Figure 2
Figure 2. Workflow for validation of MS2Quant on NORMAN interlaboratory comparison samples. (A) Molecular fingerprints were computed for 36 chemicals in the calibration mix from SMILES notation with the rcdk package in R. Furthermore, MS2Quant was used to predict ionization efficiency values and linear regression was fit between experimental logarithmic response factors and logarithmic predicted ionization efficiencies. (B) Lake water spiked with 39 suspect compounds in high and low concentrations was measured with LC-HRMS in data-dependent acquisition mode with an inclusion list. SIRIUS+CSI:FingerID was used to predict probabilities of structural fingerprints from MS1 and MS2 spectra and MS2Quant was used to predict ionization efficiencies from these predicted probabilities. Thereafter, the linear regression from calibration compound was used to convert the predicted ionization efficiency values to instrument- and method-specific predicted response factors. Concentrations of suspect chemicals were found using predicted response factors as well as integrated areas from LC-HRMS analysis and was compared to the spiked concentrations. For comparison with PaDEL-based quantification, a similar workflow was used with the PaDEL descriptor-based prediction model instead of MS2Quant and identification of suspects was performed with SIRIUS+CSI:FingerID where the top assigned structure was used for ionization efficiency predictions.
Comparison of MS2-Based Quantification and Suggested Structure-Based Quantification
MS2Quant (MS2) | MS2Quant (structure) | PaDEL-based model developed here (954 chemicals) | PaDEL-based model developed by Liigand et al. (353 chemicals) | ||
---|---|---|---|---|---|
results of “true NTS” (MS2Quant from MS2, others with top suggested structure) (39 chemicals) | RMSE | 5.85 | 7.29 | 7.42 | 7.26 |
R2 | 0.46 | 0.37 | 0.47 | 0.49 | |
Mean | 7.40 | 9.51 | 9.51 | 8.99 | |
Geom. mean | 4.45 | 5.44 | 5.63 | 5.40 | |
Q25 | 2.16 | 2.48 | 2.29 | 2.27 | |
Q50 (Median) | 4.02 | 4.57 | 5.19 | 4.87 | |
Q75 | 8.27 | 10.54 | 12.74 | 13.08 | |
Q90 | 17.43 | 25.31 | 26.09 | 20.36 | |
Q100 (Max) | 47.68 | 55.26 | 54.91 | 45.87 | |
correct SMILES is used for quantifying suspects (39 chemicals) | RMSE | 6.77 | 7.05 | 7.99 | |
R2 | 0.42 | 0.54 | 0.46 | ||
Mean | 8.18 | 8.61 | 9.89 | ||
Geom. mean | 5.29 | 5.44 | 6.00 | ||
Q25 | 2.42 | 2.60 | 2.35 | ||
Q50 (Median) | 5.78 | 5.49 | 6.56 | ||
Q75 | 9.71 | 9.47 | 15.20 | ||
Q90 | 20.78 | 23.69 | 20.87 | ||
Q100 (Max) | 38.73 | 35.87 | 55.26 | ||
only suspects that were correctly identified (34 chemicals) | RMSE | 6.12 | 7.57 | 7.64 | 7.34 |
R2 | 0.43 | 0.34 | 0.44 | 0.48 | |
Mean | 7.80 | 9.91 | 9.81 | 8.83 | |
Geom. mean | 4.67 | 5.69 | 5.83 | 5.52 | |
Q25 | 2.28 | 2.53 | 2.37 | 2.30 | |
Q50 (Median) | 4.09 | 5.27 | 5.20 | 6.61 | |
Q75 | 8.36 | 10.67 | 13.35 | 13.09 | |
Q90 | 17.95 | 25.78 | 26.09 | 19.48 | |
Q100 (Max) | 47.68 | 55.26 | 54.91 | 41.25 | |
only suspects that were incorrectly identified (5 chemicals) | RMSE | 4.15 | 5.51 | 6.02 | 6.73 |
R2 | 0.68 | 0.61 | 0.65 | 0.55 | |
Mean | 4.66 | 6.78 | 7.46 | 10.09 | |
Geom. mean | 3.20 | 4.00 | 4.47 | 4.63 | |
Q25 | 1.72 | 1.99 | 1.71 | 2.29 | |
Q50 (Median) | 2.45 | 2.96 | 5.12 | 3.45 | |
Q75 | 4.70 | 6.68 | 6.41 | 4.91 | |
Q90 | 11.02 | 17.64 | 19.24 | 32.19 | |
Q100 (Max) | 15.71 | 25.14 | 27.42 | 45.87 | |
only incorrectly identified suspects, but the correct SMILES was used for quantification (5 chemicals) | RMSE | 4.39 | 6.29 | 6.77 | |
R2 | 0.59 | 0.72 | 0.54 | ||
Mean | 4.93 | 7.06 | 8.52 | ||
Geom. mean | 3.38 | 4.90 | 5.41 | ||
Q25 | 1.84 | 2.00 | 2.67 | ||
Q50 (Median) | 2.16 | 6.94 | 4.69 | ||
Q75 | 6.14 | 9.46 | 7.64 | ||
Q90 | 11.04 | 12.97 | 21.90 | ||
Q100 (Max) | 15.73 | 18.48 | 31.21 |
The quantification was performed with MS2Quant using MS1 and MS2 spectra as input, MS2Quant using SMILES notation as input, PaDEL-based model developed in this work using SMILES as input and PaDEL based model developed by Liigand et al. (10) using SMILES notation as the input.
Figure 3
Figure 3. Predicted concentrations for high concentration spiked sample with MS2Quant and the PaDEL-based model for five incorrectly identified compounds. Real concentrations are marked with a vertical line.
Analysis of Key Features Learned by MS2Quant
Figure 4
Figure 4. (A) Top 10 most influential variables in the model and their normalized importance (%); (B) SHAP values representing influence of each top 10 feature and their marginal contribution to the prediction and (C) the test set chemicals assigned to different classes by ClassyFire, where each datapoint represents the geometric mean prediction error of log IE of a unique chemical. The classes are in the descending order based on median geometric mean prediction error of all compounds in the group and only classes with three or more unique representatives were plotted.
Limitations
Conclusions
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.3c01744.
Data unification process in detail; example code how the data were unified based on either dataset 1 or a unified dataset; overview of datasets containing metadata and ionization efficiency information used for modeling; comparison between all tested molecular descriptors or fingerprints and machine learning algorithms; overview of the calibrants and suspects used in validation and experimental conditions; a statistical and graphical overview of MS2Quant and structure-based models’ performances on the validation set; overview of incorrectly identified structures and their highest ranked assigned structure by SIRIUS+CSI:FingerID; SIRIUS calculations and parameters used; top 10 most influential variables in a PaDEL-based model developed here; top 10 most influential variables, their SHAP values, and error distribution of different chemical classes assigned by ClassyFire for PaDEL-based model developed here; and first decision three of xgbTree algorithm-based models developed using structural fingerprints and PaDEL descriptors (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
The authors would like to thank Drew Szabo for proofreading the manuscript.
References
This article references 60 other publications.
- 1McCord, J. P.; Groff, L. C.; Sobus, J. R. Quantitative Non-Targeted Analysis: Bridging the Gap between Contaminant Discovery and Risk Characterization. Environ. Int. 2022, 158, 107011 DOI: 10.1016/j.envint.2021.107011Google Scholar1Quantitative non-targeted analysis: Bridging the gap between contaminant discovery and risk characterizationMcCord James P; Groff Louis C 2nd; Sobus Jon R; Groff Louis C 2ndEnvironment international (2022), 158 (), 107011 ISSN:.Chemical risk assessments follow a long-standing paradigm that integrates hazard, dose-response, and exposure information to facilitate quantitative risk characterization. Targeted analytical measurement data directly support risk assessment activities, as well as downstream risk management and compliance monitoring efforts. Yet, targeted methods have struggled to keep pace with the demands for data regarding the vast, and growing, number of known chemicals. Many contemporary monitoring studies therefore utilize non-targeted analysis (NTA) methods to screen for known chemicals with limited risk information. Qualitative NTA data has enabled identification of previously unknown compounds and characterization of data-poor compounds in support of hazard identification and exposure assessment efforts. In spite of this, NTA data have seen limited use in risk-based decision making due to uncertainties surrounding their quantitative interpretation. Significant efforts have been made in recent years to bridge this quantitative gap. Based on these advancements, quantitative NTA data, when coupled with other high-throughput data streams and predictive models, are poised to directly support 21st-century risk-based decisions. This article highlights components of the chemical risk assessment process that are influenced by NTA data, surveys the existing literature for approaches to derive quantitative estimates of chemicals from NTA measurements, and presents a conceptual framework for incorporating NTA data into contemporary risk assessment frameworks.
- 2Schymanski, E. L.; Singer, H. P.; Slobodnik, J.; Ipolyi, I. M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; Thomaidis, N. S.; Bletsou, A.; Zwiener, C.; Ibáñez, M.; Portolés, T.; de Boer, R.; Reid, M. J.; Onghena, M.; Kunkel, U.; Schulz, W.; Guillon, A.; Noyon, N.; Leroy, G.; Bados, P.; Bogialli, S.; Stipaničev, D.; Rostkowski, P.; Hollender, J. Non-Target Screening with High-Resolution Mass Spectrometry: Critical Review Using a Collaborative Trial on Water Analysis. Anal. Bioanal. Chem. 2015, 407, 6237– 6255, DOI: 10.1007/s00216-015-8681-7Google Scholar2Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysisSchymanski, Emma L.; Singer, Heinz P.; Slobodnik, Jaroslav; Ipolyi, Ildiko M.; Oswald, Peter; Krauss, Martin; Schulze, Tobias; Haglund, Peter; Letzel, Thomas; Grosse, Sylvia; Thomaidis, Nikolaos S.; Bletsou, Anna; Zwiener, Christian; Ibanez, Maria; Portoles, Tania; de Boer, Ronald; Reid, Malcolm J.; Onghena, Matthias; Kunkel, Uwe; Schulz, Wolfgang; Guillon, Amelie; Noyon, Naike; Leroy, Gaela; Bados, Philippe; Bogialli, Sara; Stipanicev, Drazenka; Rostkowski, Pawel; Hollender, JulianeAnalytical and Bioanalytical Chemistry (2015), 407 (21), 6237-6255CODEN: ABCNBP; ISSN:1618-2642. (Springer)A review is given. A dataset from a collaborative non-target screening trial organized by the NORMAN Assocn. is used to review the state-of-the-art and discuss future perspectives of non-target screening using high-resoln. mass spectrometry in water anal. A total of 18 institutes from 12 European countries analyzed an ext. of the same water sample collected from the River Danube with either one or both of liq. and gas chromatog. coupled with mass spectrometry detection. This article focuses mainly on the use of high resoln. screening techniques with target, suspect, and non-target workflows to identify substances in environmental samples. Specific examples are given to emphasize major challenges including isobaric and co-eluting substances, dependence on target and suspect lists, formula assignment, the use of retention information, and the confidence of identification. Approaches and methods applicable to unit resoln. data are also discussed. Although most substances were identified using high resoln. data with target and suspect-screening approaches, some participants proposed tentative non-target identifications. This comprehensive dataset revealed that non-target anal. techniques are already substantially harmonized between the participants, but the data processing remains time-consuming. Although the objective of a fully-automated identification workflow remains elusive in the short term, important steps in this direction have been taken, exemplified by the growing popularity of suspect screening approaches. Major recommendations to improve non-target screening include better integration and connection of desired features into software packages, the exchange of target and suspect lists, and the contribution of more spectra from std. substances into (openly accessible) databases.
- 3Hollender, J.; Schymanski, E. L.; Singer, H. P.; Ferguson, P. L. Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?. Environ. Sci. Technol. 2017, 51, 11505– 11512, DOI: 10.1021/acs.est.7b02184Google Scholar3Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?Hollender, Juliane; Schymanski, Emma L.; Singer, Heinz P.; Ferguson, P. LeeEnvironmental Science & Technology (2017), 51 (20), 11505-11512CODEN: ESTHAG; ISSN:0013-936X. (American Chemical Society)The vast, diverse universe of org. pollutants is a formidable challenge for environmental sciences, engineering, and regulation. Nontarget screening (NTS) based on high resoln. mass spectrometry (HRMS) has enormous potential to help characterize this universe. Here, we argue that development of mass spectrometers with increasingly high resoln. and novel couplings to both liq. and gas chromatog., combined with the integration of high performance computing, have significantly widened our anal. window and have enabled increasingly sophisticated data processing strategies, indicating a bright future for NTS. NTS has great potential for treatment assessment and pollutant prioritization within regulatory applications, as highlighted here by the case of real-time pollutant monitoring on the River Rhine. We discuss challenges for the future, including the transition from research toward soln.-centered and robust, harmonized applications.
- 4Papazian, S.; D’Agostino, L. A.; Sadiktsis, I.; Froment, J.; Bonnefille, B.; Sdougkou, K.; Xie, H.; Athanassiadis, I.; Budhavant, K.; Dasari, S.; Andersson, A.; Gustafsson, Ö.; Martin, J. W. Nontarget Mass Spectrometry and in Silico Molecular Characterization of Air Pollution from the Indian Subcontinent. Commun. Earth Environ. 2022, 3, 35, DOI: 10.1038/s43247-022-00365-1Google ScholarThere is no corresponding record for this reference.
- 5Gago-Ferrero, P.; Schymanski, E. L.; Bletsou, A. A.; Aalizadeh, R.; Hollender, J.; Thomaidis, N. S. Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MS. Environ. Sci. Technol. 2015, 49, 12333– 12341, DOI: 10.1021/acs.est.5b03454Google Scholar5Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MSGago-Ferrero, Pablo; Schymanski, Emma L.; Bletsou, Anna A.; Aalizadeh, Reza; Hollender, Juliane; Thomaidis, Nikolaos S.Environmental Science & Technology (2015), 49 (20), 12333-12341CODEN: ESTHAG; ISSN:0013-936X. (American Chemical Society)An integrated workflow based on liq. chromatog. coupled to a quadrupole-time-of-flight mass spectrometer (LC-QTOF-MS) was developed and applied to detect and identify suspect and unknown contaminants in Greek wastewater. Tentative identifications were initially based on mass accuracy, isotopic pattern, plausibility of the chromatog. retention time and MS/MS spectral interpretation (comparison with spectral libraries, in silico fragmentation). New specific strategies for the identification of metabolites were applied to obtain extra confidence including the comparison of diurnal and/or weekly concn. trends of the metabolite and parent compds. and the complementary use of HILIC. Thirteen of 284 predicted and literature metabolites of selected pharmaceuticals and nicotine were tentatively identified in influent samples from Athens and seven were finally confirmed with ref. stds. Here, 34 nontarget compds. were tentatively identified, 4 were also confirmed. The sulfonated surfactant diglycol ether sulfate was identified along with others in the homologous series (SO4C2H4(OC2H4)xOH), which have not been previously reported in wastewater. As many surfactants were originally found as nontargets, these compds. were studied in detail through retrospective anal.
- 6Bletsou, A. A.; Jeon, J.; Hollender, J.; Archontaki, E.; Thomaidis, N. S. Targeted and Non-Targeted Liquid Chromatography-Mass Spectrometric Workflows for Identification of Transformation Products of Emerging Pollutants in the Aquatic Environment. TrAC Trends Anal. Chem. 2015, 66, 32– 44, DOI: 10.1016/j.trac.2014.11.009Google Scholar6Targeted and non-targeted liquid chromatography-mass spectrometric workflows for identification of transformation products of emerging pollutants in the aquatic environmentBletsou, Anna A.; Jeon, Junho; Hollender, Juliane; Archontaki, Eleni; Thomaidis, Nikolaos S.TrAC, Trends in Analytical Chemistry (2015), 66 (), 32-44CODEN: TTAEDJ; ISSN:0165-9936. (Elsevier B. V.)A review with 92 refs. Identification of transformation products (TPs) of emerging pollutants is challenging, due to the vast no. of compds., mostly unknown, the complexity of the matrixes and their often low concns., requiring highly selective, highly sensitive techniques. We compile background information on biotic and abiotic formation of TPs and anal. developments over the past five years. We present a database of biotic or abiotic TPs compiled from those identified in recent years. We discuss mass spectrometric (MS) techniques and workflows for target, suspect and non-target screening of TPs with emphasis on liq. chromatog. coupled to MS (LC-MS). Both low- and high-resoln. (HR) mass analyzers have been applied, but HR-MS is the technique of choice, due to its high confirmatory capabilities, derived from the high resolving power and the mass accuracy in MS and MS/MS modes, and the sophisticated software developed.
- 7Been, F.; Kruve, A.; Vughs, D.; Meekel, N.; Reus, A.; Zwartsen, A.; Wessel, A.; Fischer, A.; ter Laak, T.; Brunner, A. M. Risk-Based Prioritization of Suspects Detected in Riverine Water Using Complementary Chromatographic Techniques. Water Res. 2021, 204, 117612 DOI: 10.1016/j.watres.2021.117612Google Scholar7Risk-based prioritization of suspects detected in riverine water using complementary chromatographic techniquesBeen, Frederic; Kruve, Anneli; Vughs, Dennis; Meekel, Nienke; Reus, Astrid; Zwartsen, Anne; Wessel, Arnoud; Fischer, Astrid; ter Laak, Thomas; Brunner, Andrea M.Water Research (2021), 204 (), 117612CODEN: WATRAG; ISSN:0043-1354. (Elsevier Ltd.)Surface waters are widely used as drinking water sources and hence their quality needs to be continuously monitored. However, current routine monitoring programs are not comprehensive as they generally cover only a limited no. of known pollutants and emerging contaminants. This study presents a risk-based approach combining suspect and non-target screening (NTS) to help extend the coverage of current monitoring schemes. In particular, the coverage of NTS was widened by combining three complementary sepns. modes: Reverse phase (RP), Hydrophilic interaction liq. chromatog. (HILIC) and Mixed-mode chromatog. (MMC). Suspect lists used were compiled from databases of relevant substances of very high concern (e.g., SVHCs) and the concn. of detected suspects was evaluated based on ionization efficiency prediction. Results show that suspect candidates can be prioritized based on their potential risk (i.e., hazard and exposure) by combining ionization efficiency-based concn. estn., in vitro toxicity data or, if not available, structural alerts and QSAR.based toxicity predictions. The acquired information shows that NTS analyses have the potential to complement target analyses, allowing to update and adapt current monitoring programs, ultimately leading to improved monitoring of drinking water sources.
- 8Oss, M.; Kruve, A.; Herodes, K.; Leito, I. Electrospray Ionization Efficiency Scale of Organic Compounds. Anal. Chem. 2010, 82, 2865– 2872, DOI: 10.1021/ac902856tGoogle Scholar8Electrospray Ionization Efficiency Scale of Organic CompoundsOss, Merit; Kruve, Anneli; Herodes, Koit; Leito, IvoAnalytical Chemistry (Washington, DC, United States) (2010), 82 (7), 2865-2872CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Ionization efficiency (IE) of different compds. in electrospray ionization (ESI) source differs widely, leading to widely differing sensitivities of ESI-MS to different analytes. An approach for quantifying ESI efficiencies (as logIE values) and setting up a self-consistent quant. exptl. ESI efficiency scale of org. compds. under predefined ionization conditions (ionization by monoprotonation) has been developed recently. Using this approach a logIE scale contg. 62 compds. of different chem. nature and ranging for 6 orders of magnitude has been established. The scale is based on over 400 relative IE (ΔlogIE) measurements between more than 250 different pairs of compds. To evaluate which mol. parameters contribute the most to the IE of a compd. linear regression anal. logIE values and different mol. parameters were carried out. The two most influential parameters in predicting the IE in ESI source are the pKa and the mol. vol. of the compd. This scale and the whole approach can be a tool for practicing liq. chromatographists and mass spectrometrists. It can be used in any mass-spectrometry lab. and we encourage practitioners to characterize their analytes with the logIE values so that a broad knowledge base on electrospray ionization efficiencies of compds. would eventually develop.
- 9Oss, M.; Tshepelevitsh, S.; Kruve, A.; Liigand, P.; Liigand, J.; Rebane, R.; Selberg, S.; Ets, K.; Herodes, K.; Leito, I. Quantitative Electrospray Ionization Efficiency Scale: 10 Years After. Rapid Commun. Mass Spectrom. 2021, 35, e9178 DOI: 10.1002/rcm.9178Google Scholar9Quantitative electrospray ionization efficiency scale: 10 years afterOss, Merit; Tshepelevitsh, Sofja; Kruve, Anneli; Liigand, Piia; Liigand, Jaanus; Rebane, Riin; Selberg, Sigrid; Ets, Kristel; Herodes, Koit; Leito, IvoRapid Communications in Mass Spectrometry (2021), 35 (21), e9178CODEN: RCMSEF; ISSN:0951-4198. (John Wiley & Sons Ltd.)The first comprehensive quant. scale of the efficiency of electrospray ionization (ESI) in the pos. mode by monoprotonation, contg. 62 compds., was published in 2010. Several trends were found between the compd. structure and ionization efficiency (IE) but, possibly because of the limited diversity of the compds., some questions remained. This work undertakes to align the new data with the originally published IE scale and carry out statistical anal. of the resulting more extensive and diverse data set to derive more grounded relationships and offer a possibility of predicting logIE values. Recently, several new IE studies with numerous compds. have been conducted. In several of them, more detailed investigations of the influence of compd. structure, solvent properties, or instrument settings have been conducted. IE data from these studies and results from this work were combined, and the multilinear regression method was applied to relate IE to various compd. parameters. The most comprehensive IE scale available, contg. 334 compds. of highly diverse chem. nature and spanning 6 orders of magnitude of IE, has been compiled. Several useful trends were revealed. The ESI ionization efficiency of a compd. by protonation is mainly affected by three factors: basicity (expressed by pKaH in water), mol. size (expressed by molar volume or surface area), and hydrophobicity of the ion (expressed by charge delocalization in the ion or its partition coeff. between a water-acetonitrile mixt. and hexane). The presented models can be used for tentative prediction of logIE of new compds. (under the used conditions) from parameters that can be computed using com. available software. The root mean square error of prediction is in the range of 0.7-0.8 log units.
- 10Liigand, J.; Wang, T.; Kellogg, J.; Smedsgaard, J.; Cech, N.; Kruve, A. Quantification for Non-Targeted LC/MS Screening without Standard Substances. Sci. Rep. 2020, 10, 5808, DOI: 10.1038/s41598-020-62573-zGoogle Scholar10Quantification for non-targeted LC/MS screening without standard substancesLiigand, Jaanus; Wang, Tingting; Kellogg, Joshua; Smedsgaard, Joern; Cech, Nadja; Kruve, AnneliScientific Reports (2020), 10 (1), 5808CODEN: SRCEC3; ISSN:2045-2322. (Nature Research)Non-targeted and suspect analyses with liq. chromatog./electrospray/high-resoln. mass spectrometry (LC/ESI/HRMS) are gaining importance as they enable identification of hundreds or even thousands of compds. in a single sample. Here, we present an approach to address the challenge to quantify compds. identified from LC/HRMS data without authentic stds. The approach uses random forest regression to predict the response of the compds. in ESI/HRMS with a mean error of 2.2 and 2.0 times for ESI pos. and neg. mode, resp. We observe that the predicted responses can be transferred between different instruments via a regression approach. Furthermore, we applied the predicted responses to est. the concn. of the compds. without the std. substances. The approach was validated by quantifying pesticides and mycotoxins in six different cereal samples. For applicability, the accuracy of the concn. prediction needs to be compatible with the effect (e.g. toxicol.) predictions. We achieved the av. quantification error of 5.4 times, which is well compatible with the accuracy of the toxicol. predictions.
- 11Leito, I.; Herodes, K.; Huopolainen, M.; Virro, K.; Künnapas, A.; Kruve, A.; Tanner, R. Towards the Electrospray Ionization Mass Spectrometry Ionization Efficiency Scale of Organic Compounds. Rapid Commun. Mass Spectrom. 2008, 22, 379– 384, DOI: 10.1002/rcm.3371Google Scholar11Towards the electrospray ionization mass spectrometry ionization efficiency scale of organic compoundsLeito, Ivo; Herodes, Koit; Huopolainen, Merit; Virro, Kristina; Kunnapas, Allan; Kruve, Anneli; Tanner, RistoRapid Communications in Mass Spectrometry (2008), 22 (3), 379-384CODEN: RCMSEF; ISSN:0951-4198. (John Wiley & Sons Ltd.)An approach that allows setting up under predefined ionization conditions a rugged self-consistent quant. exptl. scale of electrospray ionization (ESI) efficiencies of org. compds. is presented. By ESI ionization efficiency (IE) we mean the efficiency of generating gas-phase ions from analyte mols. or ions in the ESI source. The approach is based on measurement of relative ionization efficiency (RIE) of two compds. (B1 and B2) by infusing a soln. contg. both compds. at known concns. (C1 and C2) and measuring the mass-spectrometric responses of the protonated forms of the compds. (R1 and R2). The RIE of B1 and B2 is expressed as logRIE(B1, B2) = log[(R1 · C2)/(C1 · R2)]. The relative way of measurement leads to cancellation of many of the factors affecting IE (ESI source design, voltages in the source and ion transport system, solvent compn., flow rates and temps. of the nebulizing and drying gases). Using this approach an ESI IE scale contg. ten compds. (esters and arom. amines) and spanning over 4 logRIE units has been compiled. The consistency of the scale (the consistency std. deviation of the scale is s = 0.16 logRIE units) was assured by making measurements using different concn. ratios (at least 6-fold concn. ratio range) of the compds. and by making circular validation measurements (the logRIE of any two compds. was checked by measuring both against a third compd.).
- 12Cech, N. B.; Enke, C. G. Relating Electrospray Ionization Response to Nonpolar Character of Small Peptides. Anal. Chem. 2000, 72, 2717– 2723, DOI: 10.1021/ac9914869Google Scholar12Relating Electrospray Ionization Response to Nonpolar Character of Small PeptidesCech, Nadja B.; Enke, Christie G.Analytical Chemistry (2000), 72 (13), 2717-2723CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Nonpolar regions in biol. mols. are investigated as a detg. factor governing their electrospray ionization (ESI) mass spectrometric response. Response is compared for a series of peptides whose C-terminal residue is varied among amino acids with increasingly nonpolar side chains. Increased ESI response is obsd. for peptides with more extensive nonpolar regions. The basis for this increase is examd. by comparing values of nonpolar surface area and Gibbs free energy of transfer for the different amino acid residues. Comparisons of response with octadecylamine are also made, and this highly surface-active ion is obsd. to outcompete all other analytes in ESI response. These observations are rationalized on the basis of the equil. partitioning model, which is used successfully to fit exptl. data throughout the concn. range for several two-analyte systems. This model suggests that because excess charge exists on ESI droplet surfaces, an analyte's relative affinity for the droplet surface dets. its relative ESI response. Increased nonpolar character, which leads to enhanced affinity for the surface phase, results in more successful competition for excess charge and higher ESI response.
- 13Alymatiri, C. M.; Kouskoura, M. G.; Markopoulou, C. K. Decoding the Signal Response of Steroids in Electrospray Ionization Mode (ESI-MS). Anal. Methods 2015, 7, 10433– 10444, DOI: 10.1039/C5AY02839FGoogle Scholar13Decoding the signal response of steroids in electrospray ionization mode (ESI-MS)Alymatiri, Christina M.; Kouskoura, Maria G.; Markopoulou, Catherine K.Analytical Methods (2015), 7 (24), 10433-10444CODEN: AMNEGX; ISSN:1759-9679. (Royal Society of Chemistry)Electrospray ionization (ESI) is predominant among soft ionization techniques since it is considered as the method of choice for coupling liq. chromatog. with mass spectrometry (LC-MS). Despite the progress which has been achieved in the ion formation theory, the research community keep their interest in the parameters affecting the increase in the responsiveness of the signal. This particular problem is becoming more complex when the analytes studied are compds. not having characteristic moieties, which are responsible for a mol.'s ionization (carboxylic or amine groups). The present study attempts to decode the signal intensity by correlating it with a series of structural features and physicochem. properties corresponding to 30 steroids. These mols. share a common basic structure with only small differences in the substitution while they do not contain any basic or acidic group (pKbasic < -2.65, pKacidic > 10.6). The correlation and evaluation of the significance of the parameters causing an increase or decrease in the signal response was achieved using multivariate anal. via the Partial Least Squares methodol. (PLS). Moreover, the PLS models that were developed could be used as predictive tools of the signal intensity for unknown substances.
- 14Kruve, A.; Kaupmees, K. Adduct Formation in ESI/MS by Mobile Phase Additives. J. Am. Soc. Mass Spectrom. 2017, 28, 887– 894, DOI: 10.1007/s13361-017-1626-yGoogle Scholar14Adduct Formation in ESI/MS by Mobile Phase AdditivesKruve, Anneli; Kaupmees, KarlJournal of the American Society for Mass Spectrometry (2017), 28 (5), 887-894CODEN: JAMSEF; ISSN:1044-0305. (Springer)Adduct formation is a common ionization method in electrospray ionization mass spectrometry (ESI/MS). However, this process is poorly understood and complicated to control. The authors demonstrate possibilities to control adduct formation via mobile phase additives in ESI pos. mode for 17 oxygen and nitrogen bases. Mobile phase additives are a very effective measure for manipulating the formation efficiencies of adducts. An appropriate choice of additive may increase sensitivity by up to three orders of magnitude. In general, sodium adduct [M + Na]+ and protonated mol. [M + H]+ formation efficiencies are in good correlation; however, the former were significantly more influenced by mobile phase properties. Although the highest formation efficiencies for both species were obsd. in water/acetonitrile mixts. not contg. additives, the repeatability of the formation efficiencies is improved by additives. Mobile phase additives are powerful, yet not limiting factors, for altering adduct formation.
- 15Kostiainen, R.; Kauppila, T. J. Effect of Eluent on the Ionization Process in Liquid Chromatography–Mass Spectrometry. J. Chromatogr. A 2009, 1216, 685– 699, DOI: 10.1016/j.chroma.2008.08.095Google Scholar15Effect of eluent on the ionization process in liquid chromatography-mass spectrometryKostiainen, Risto; Kauppila, Tiina J.Journal of Chromatography A (2009), 1216 (4), 685-699CODEN: JCRAEY; ISSN:0021-9673. (Elsevier B.V.)A review. The most widely used ionization techniques in liq. chromatog.-mass spectrometry (LC-MS) are electrospray ionization (ESI), atm. pressure chem. ionization (APCI) and atm. pressure photoionization (APPI). All three provide user friendly coupling of LC to MS. Achieving optimal LC-MS conditions is not always easy, however, owing to the complexity of ionization processes and the many parameters affecting mass spectrometric sensitivity and chromatog. performance. The selection of eluent compn. requires particular attention since a solvent that is optimal for analyte ionization often does not provide acceptable retention and resoln. in LC. Compromises must then be made between ionization and chromatog. sepn. efficiencies. The review presents an overview of studies concerning the effect of eluent compn. on the ionization efficiency of ESI, APCI and APPI in LC-MS. Solvent characteristics are discussed in the light of ionization theories, and selected anal. applications are described. The aim is to provide practical background information for the development and optimization of LC-MS methods.
- 16Kebarle, P.; Tang, L. From Ions in Solution to Ions in the Gas Phase - the Mechanism of Electrospray Mass Spectrometry. Anal. Chem. 1993, 65, 972A– 986A, DOI: 10.1021/ac00070a001Google Scholar16From ions in solution to ions in the gas phase - the mechanism of electrospray mass spectrometryKebarle, Paul; Tang, LiangAnalytical Chemistry (1993), 65 (22), 972A-986ACODEN: ANCHAM; ISSN:0003-2700.The title topic is reviewed with 44 refs. The subjects include: the electrospray (ES) mechanism, prodn. of charged droplets at the ES capillary tip, shrinkage of charged ES droplets, nature of processes leading to formation of gas-phase ions, details of the Iribarne ion evapn. theory, dependence of ion intensities on concn., effects due to the addn. of 2 electrolytes to the solvent, comparison of coeffs. with Iribarne theory and SIDT (single ion in droplet theory), emission of gas-phase ions from the Taylor tip of the ES capillary, and formation mechanisms of multiply-charged macroions.
- 17Kruve, A. Influence of Mobile Phase, Source Parameters and Source Type on Electrospray Ionization Efficiency in Negative Ion Mode: Influence of Mobile Phase in ESI/MS. J. Mass Spectrom. 2016, 51, 596– 601, DOI: 10.1002/jms.3790Google Scholar17Influence of mobile phase, source parameters and source type on electrospray ionization efficiency in negative ion modeKruve, AnneliJournal of Mass Spectrometry (2016), 51 (8), 596-601CODEN: JMSPFJ; ISSN:1076-5174. (John Wiley & Sons Ltd.)Electrospray ionization (ESI) efficiency is known to be affected by the properties of the analytes, source design and source parameters. In this study, the ionization efficiency of 17 acidic compds. at various conditions in ESI neg. ion mode was evaluated. Namely, the influence of org. solvent content in the mobile phase, ionization source parameters, ionization source geometry and functionality (conventional ESI, ESI with thermal focusing and with addnl. internal nebulizer gas) was studied. It was obsd. that the ionization efficiency in thermal focusing ESI is only marginally affected by the org. solvent compn., while for conventional ESI and ESI with internal nebulizer gas, the ionization efficiency increases significantly with increasing org. modifier content. For all ionization sources and mobile phase compns., the ionization efficiency values between different setups showed good correlation. Copyright © 2016 John Wiley & Sons, Ltd.
- 18Liigand, J.; Laaniste, A.; Kruve, A. PH Effects on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2017, 28, 461– 469, DOI: 10.1007/s13361-016-1563-1Google Scholar18pH Effects on Electrospray Ionization EfficiencyLiigand, Jaanus; Laaniste, Asko; Kruve, AnneliJournal of the American Society for Mass Spectrometry (2017), 28 (3), 461-469CODEN: JAMSEF; ISSN:1044-0305. (Springer)Electrospray ionization efficiency is known to be affected by mobile phase compn. A detailed study of analyte ionization efficiency dependence on mobile phase pH is presented. The pH effect was studied on 28 compds. with different chem. properties. Neither pKa nor soln. phase ionization degree by itself is sufficient at describing how aq. phase pH affects the ionization efficiency of the analyte. Therefore, the analyte behavior was related to various physicochem. properties via linear discriminant analyses. Distinction between pH-dependent and pH-independent compds. was achieved using two parameters: no. of potential charge centers and hydrogen bonding acceptor capacity (in the case of 80% acetonitrile) or polarity of neutral form of analyte and pKa (in the case of 20% acetonitrile). Also decreasing pH may increase ionization efficiency of a compd. by more than two orders of magnitude.
- 19Liigand, J.; Kruve, A.; Leito, I.; Girod, M.; Antoine, R. Effect of Mobile Phase on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2014, 25, 1853– 1861, DOI: 10.1007/s13361-014-0969-xGoogle Scholar19Effect of Mobile Phase on Electrospray Ionization EfficiencyLiigand, Jaanus; Kruve, Anneli; Leito, Ivo; Girod, Marion; Antoine, RodolpheJournal of the American Society for Mass Spectrometry (2014), 25 (11), 1853-1861CODEN: JAMSEF; ISSN:1044-0305. (Springer)Electrospray (ESI) ionization efficiencies (IE) of a set of 10 compds. differing by chem. nature, extent of ionization in soln. (basicity), and by hydrophobicity (tetrapropylammonium and tetraethylammonium ion, triethylamine, 1-naphthylamine, N,N-dimethylaniline, diphenylphthalate, dimethylphtahalate, piperidine, pyrrolidine, pyridine) were measured in seven mobile phases (three acetonitrile percentages 20%, 50%, and 80%, and three different pH-adjusting additives, 0.1% formic acid, 1 mM ammonia, pH 5.0 buffer combination) using the relative measurement method. MS parameters were optimized sep. for each ion. The resulting relative IE data were converted into comparable log IE values by anchoring them to the log IE of tetrapropylammonium ion taking into account the differences of ionization in different solvents and thereby making the logIE values of the compds. comparable across solvents. The following conclusions were made from anal. of the data. The compds. with pKa values in the range of the soln. pH values displayed higher IE at lower pH. The sensitivity of IE towards pH depends on hydrophobicity being very strong with pyridine, weaker with N,N-dimethylaniline, and weakest with 1-naphthylamine. IEs of tetraalkylammonium ions and triethylamine were expectedly insensitive towards soln. pH. Surprisingly high IEs of phthalate esters were obsd. The differences in solns. with different acetonitrile content and similar pH were smaller compared with the pH effects. These results highlight the importance of hydrophobicity in electrospray and demonstrate that high hydrophobicity can sometimes successfully compensate for low basicity.
- 20Ojakivi, M.; Liigand, J.; Kruve, A. Modifying the Acidity of Charged Droplets. ChemistrySelect 2018, 3, 335– 338, DOI: 10.1002/slct.201702269Google Scholar20Modifying the Acidity of Charged DropletsOjakivi, Mari; Liigand, Jaanus; Kruve, AnneliChemistrySelect (2018), 3 (1), 335-338CODEN: CHEMUD; ISSN:2365-6549. (Wiley-VCH Verlag GmbH & Co. KGaA)The concept of acidity in confined spaces is up to date poorly understood; esp., in case of media violating electroneutrality. Here, we describe the acidity of charged droplets via their ability to protonate simple nitrogen bases and we propose ways to modify the protonation efficiency with the help of additives. We obsd. that the protonation of compds. in charged water droplets is independent of soln.-phase acidity; instead, it can be adjusted with the help of additive type. On the other hand, the extent of protonation in charged methanol droplets can be adjusted with the conventional approach of changing the pH.
- 21Raji, M. A.; Schug, K. A. Chemometric Study of the Influence of Instrumental Parameters on ESI-MS Analyte Response Using Full Factorial Design. Int. J. Mass Spectrom. 2009, 279, 100– 106, DOI: 10.1016/j.ijms.2008.10.013Google Scholar21Chemometric study of the influence of instrumental parameters on ESI-MS analyte response using full factorial designRaji, M. A.; Schug, K. A.International Journal of Mass Spectrometry (2009), 279 (2-3), 100-106CODEN: IMSPF8; ISSN:1387-3806. (Elsevier B.V.)Full factorial exptl. design technique was used to study the main effects and the interaction effects between instrumental parameters in 2 mass spectrometers equipped with conventional electrospray ion sources (Thermo LCQ Deca XP and Shimadzu LCMS 2010). Four major parameters (spray voltage, ion transfer capillary temp., ion transfer capillary voltage, and tube lens voltage) were investigated in both instruments for their contribution to analyte response, leading to a total of 16 expts. performed for each instrument. Significant parameters were identified by plotting the cumulative probability of each treatment against the estd. effects in normal plots. Anal. of variance (ANOVA) was employed to evaluate the statistical significance of the effects of the parameters on ESI-MS analyte response. The results reveal a no. of important interactions in addn. to the main effects for each instrument. In all the expts. performed, the tube lens voltage (or Q-array d.c. voltage in LCMS 2010) was found to have significant effects on analyte response in both instruments. The tube lens voltage was also found to interact with the capillary temp. in the case of the LCQ Deca XP and with the spray voltage in the case of the LCMS 2010. The results of these expts. provide important considerations in the instrumental optimization of ionization response for ESI-MS anal.
- 22Palm, E.; Kruve, A. Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMS. Molecules 2022, 27, 1013, DOI: 10.3390/molecules27031013Google Scholar22Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMSPalm, Emma; Kruve, AnneliMolecules (2022), 27 (3), 1013CODEN: MOLEFW; ISSN:1420-3049. (MDPI AG)LC/ESI/HRMS is increasingly employed for monitoring chem. pollutants in water samples, with non-targeted anal. becoming more common. Unfortunately, due to the lack of anal. stds., non-targeted anal. is mostly qual. To remedy this, models have been developed to evaluate the response of compds. from their structure, which can then be used for quantification in non-targeted anal. Still, these models rely on tentatively known structures while for most detected compds., a list of structural candidates, or sometimes only exact mass and retention time are identified. In this study, a quantification approach was developed, where LC/ESI/HRMS descriptors are used for quantification of compds. even if the structure is unknown. The approach was developed based on 92 compds. analyzed in parallel in both pos. and neg. ESI mode with mobile phases at pH 2.7, 8.0, and 10.0. The developed approach was compared with two baseline approaches- one assuming equal response factors for all compds. and one using the response factor of the closest eluting std. The former gave a mean prediction error of a factor of 29, while the latter gave a mean prediction error of a factor of 1300. In the machine learning-based quantification approach developed here, the corresponding prediction error was a factor of 10. Furthermore, the approach was validated by analyzing two blind samples contg. 48 compds. spiked into tap water and ultrapure water. The obtained mean prediction error was lower than a factor of 6.0 for both samples. The errors were found to be comparable to approaches using structural information.
- 23Kalogiouri, N. P.; Aalizadeh, R.; Thomaidis, N. S. Investigating the Organic and Conventional Production Type of Olive Oil with Target and Suspect Screening by LC-QTOF-MS, a Novel Semi-Quantification Method Using Chemical Similarity and Advanced Chemometrics. Anal. Bioanal. Chem. 2017, 409, 5413– 5426, DOI: 10.1007/s00216-017-0395-6Google Scholar23Investigating the organic and conventional production type of olive oil with target and suspect screening by LC-QTOF-MS, a novel semi-quantification method using chemical similarity and advanced chemometricsKalogiouri, Natasa P.; Aalizadeh, Reza; Thomaidis, Nikolaos S.Analytical and Bioanalytical Chemistry (2017), 409 (23), 5413-5426CODEN: ABCNBP; ISSN:1618-2642. (Springer)The discrimination of org. and conventional prodn. has been a crit. topic of public discussion and constitutes a scientific issue. It remains a challenge to establish a correlation between the agronomical practices and their effects on the compn. of olive oils, esp. the phenolic compn., since it defines their organoleptic and nutritional value. Thus, a liq. chromatog.-electrospray ionization-quadrupole time of flight tandem mass spectrometric method was developed, using target and suspect screening workflows, coupled with advanced chemometrics for the identification of phenolic compds. and the discrimination between org. and conventional extra virgin olive oils. The method was optimized by one-factor design and response surface methodol. to derive the optimal conditions of extn. (methanol/water (80:20, vol./vol.), pure methanol, or acetonitrile) and to select the most appropriate internal std. (caffeic acid or syringaldehyde). The results revealed that extn. with methanol/water (80:20, vol./vol.) was the optimum solvent system and syringaldehyde 1.30 mg L-1 was the appropriate internal std. The proposed method demonstrated low limits of detection in the range of 0.002 (luteolin) to 0.028 (tyrosol) mg kg-1. Then, it was successfully applied in 52 olive oils of Kolovi variety. In total, 13 target and 24 suspect phenolic compds. were identified. Target compds. were quantified with com. available stds. A novel semi-quantitation strategy, based on chem. similarity, was introduced for the semi-quantification of the identified suspects. Finally, ant colony optimization-random forest model selected luteolin as the only marker responsible for the discrimination, during a 2-yr study. [Figure not available: see fulltext.].
- 24Kruve, A.; Kiefer, K.; Hollender, J. Benchmarking of the Quantification Approaches for the Non-Targeted Screening of Micropollutants and Their Transformation Products in Groundwater. Anal. Bioanal. Chem. 2021, 413, 1549– 1559, DOI: 10.1007/s00216-020-03109-2Google Scholar24Benchmarking of the quantification approaches for the non-targeted screening of micropollutants and their transformation products in groundwaterKruve, Anneli; Kiefer, Karin; Hollender, JulianeAnalytical and Bioanalytical Chemistry (2021), 413 (6), 1549-1559CODEN: ABCNBP; ISSN:1618-2642. (Springer)A wide range of micropollutants can be monitored with non-targeted screening; however, the quantification of the newly discovered compds. is challenging. Transformation products (TPs) are esp. problematic because anal. stds. are rarely available. Here, we compared three quantification approaches for non-target compds. that do not require the availability of anal. stds. The comparison is based on a unique set of concn. data for 341 compds., mainly pesticides, pharmaceuticals, and their TPs in 31 groundwater samples from Switzerland. The best accuracy was obsd. with the predicted ionization efficiency-based quantification, the mean error of concn. prediction for the groundwater samples was a factor of 1.8, and all of the 74 micropollutants detected in the groundwater were quantified with an error less than a factor of 10. The quantification of TPs with the parent compds. had significantly lower accuracy (mean error of a factor of 3.8) and could only be applied to a fraction of the detected compds., while the mean performance (mean error of a factor of 3.2) of the closest eluting std. approach was similar to the parent compd. approach.
- 25Dahal, U. P.; Jones, J. P.; Davis, J. A.; Rock, D. A. Small Molecule Quantification by Liquid Chromatography-Mass Spectrometry for Metabolites of Drugs and Drug Candidates. Drug Metab. Dispos. 2011, 39, 2355– 2360, DOI: 10.1124/dmd.111.040865Google Scholar25Small molecule quantification by liquid chromatography-mass spectrometry for metabolites of drugs and drug candidatesDahal, Upendra P.; Jones, Jeffrey P.; Davis, John A.; Rock, Dan A.Drug Metabolism and Disposition (2011), 39 (12), 2355-2360CODEN: DMDSAI; ISSN:0090-9556. (American Society for Pharmacology and Experimental Therapeutics)Identification and quantification of the metabolites of drugs and drug candidates are routinely performed using liq. chromatog.-mass spectrometry (LC-MS). The best practice is to generate a std. curve with the metabolite vs. the internal std. However, to avoid the difficulties in metabolite synthesis, std. curves are sometimes prepd. using the substrate, assuming that the signal for substrate and the metabolite will be equiv. We have tested the errors assocd. with this assumption using a series of very similar compds. that undergo common metabolic reactions using both conventional flow electrospray ionization LC-MS and low-flow captive spray ionization (CSI) LC-MS. The differences in std. curves for four different types of transformations (O-demethylation, N-demethylation, arom. hydroxylation, and benzylic hydroxylation) are presented. The results demonstrate that the signals of the substrates compared with those of the metabolites are statistically different in 18 of the 20 substrate-metabolite combinations for both methods. The ratio of the slopes of the std. curves varied up to 4-fold but was slightly less for the CSI method.
- 26Gyllenhammar, I.; Benskin, J. P.; Sandblom, O.; Berger, U.; Ahrens, L.; Lignell, S.; Wiberg, K.; Glynn, A. Perfluoroalkyl Acids (PFAAs) in Serum from 2–4-Month-Old Infants: Influence of Maternal Serum Concentration, Gestational Age, Breast-Feeding, and Contaminated Drinking Water. Environ. Sci. Technol. 2018, 52, 7101– 7110, DOI: 10.1021/acs.est.8b00770Google Scholar26Perfluoroalkyl Acids (PFAAs) in Serum from 2-4-Month-Old Infants: Influence of Maternal Serum Concentration, Gestational Age, Breast-Feeding, and Contaminated Drinking WaterGyllenhammar, Irina; Benskin, Jonathan P.; Sandblom, Oskar; Berger, Urs; Ahrens, Lutz; Lignell, Sanna; Wiberg, Karin; Glynn, AndersEnvironmental Science & Technology (2018), 52 (12), 7101-7110CODEN: ESTHAG; ISSN:0013-936X. (American Chemical Society)Little is known about factors influencing infant perfluorinated alkyl acid (PFAA) concns. Assocns. between serum PFAA concns. in 2-4-mo-old infants and determinants were investigated by multiple linear regression and General Linear Model (GLM) anal. In exclusively breastfed infants, maternal serum PFAA concns. 3 wk after delivery explained 13% (perfluoroundecanoic acid, PFUnDA) to 73% (perfluorohexane sulfonate, PFHxS) of infant PFAA concn. variation. Median infant/maternal ratios decreased with increasing PFAA carbon chain length from 2.8 for perfluoroheptanoic acid (PFHpA) and perfluorooctanoic acid (PFOA) to 0.53 for PFUnDA, and from 1.2 to 0.69 for PFHxS and perfluorooctane sulfonate (PFOS). Infant PFOA, perfluorononanoic acid (PFNA) and PFOS increased 0.7-1.2% per day of gestational age. Bottle-fed infants had 2 times lower mean concns. of PFAAs, and a higher mean percentage of branched (%br) PFOS isomers, than exclusively breastfed infants. PFOA, PFNA and PFHxS increased 8-11% per wk of exclusive breastfeeding. Infants living in an area receiving PFAA-contaminated drinking water had 3-fold higher mean perfluorobutane sulfonate (PFBS) and PFHxS concns., and higher mean %br PFHxS. Pre- and post-natal PFAA exposure significantly contribute to infant PFAA serum concns., depending on PFAA carbon-chain length. Moderately PFBS- and PFHxS-contaminated drinking water is an important indirect exposure source.
- 27Pieke, E. N.; Granby, K.; Trier, X.; Smedsgaard, J. A Framework to Estimate Concentrations of Potentially Unknown Substances by Semi-Quantification in Liquid Chromatography Electrospray Ionization Mass Spectrometry. Anal. Chim. Acta 2017, 975, 30– 41, DOI: 10.1016/j.aca.2017.03.054Google Scholar27A framework to estimate concentrations of potentially unknown substances by semi-quantification in liquid chromatography electrospray ionization mass spectrometryPieke, Eelco N.; Granby, Kit; Trier, Xenia; Smedsgaard, JoernAnalytica Chimica Acta (2017), 975 (), 30-41CODEN: ACACAM; ISSN:0003-2670. (Elsevier B.V.)Risk assessment of exposure to chems. from food and other sources rely on quant. information of the occurrence of these chems. As screening anal. is increasingly used, a strategy to semi-quantify unknown or untargeted analytes is required. A proof of concept strategy to semi-quantifying unknown substances in LC-MS was investigated by studying the responses of a chem. diverse marker set of 17 analytes using an exptl. design study. Optimal conditions were established using two optimization parameters related to weak-responding compds. and to the overall response. All the 17 selected analytes were semi-quantified using a different analyte to assess the quantification performance under various conditions. It was found that source conditions had strong effects on the responses, with the range of low-response signals varying from -80% to over +300% compared to center points. Pos. electrospray (ESI+) was found to have more complex source interactions than neg. electrospray (ESI-). Choice of quantification marker resulted in better quantification if the retention time difference was minimized (12 out of 12 cases error factor < 4.0) rather than if the accurate mass difference was minimized (7 out of 12 cases error factor < 4.0). Using optimal conditions and retention time selection, semi-quantification in ESI+ (70% quantified, av. prediction error factor 2.08) and ESI- (100% quantified, av. prediction error factor 1.74) yielded acceptable results for untargeted screening. The method was successfully applied to an ext. of food contact material contg. over 300 unknown substances. Without identification and authentic stds., the method was able to est. the concn. of a virtually unlimited no. of compds. thereby providing valuable data to prioritize compds. in risk assessment studies.
- 28Wu, L.; Wu, Y.; Shen, H.; Gong, P.; Cao, L.; Wang, G.; Hao, H. Quantitative Structure–Ion Intensity Relationship Strategy to the Prediction of Absolute Levels without Authentic Standards. Anal. Chim. Acta 2013, 794, 67– 75, DOI: 10.1016/j.aca.2013.07.034Google Scholar28Quantitative structure-ion intensity relationship strategy to the prediction of absolute levels without authentic standardsWu, Liang; Wu, Yuzheng; Shen, Hanyuan; Gong, Ping; Cao, Lijuan; Wang, Guangji; Hao, HaipingAnalytica Chimica Acta (2013), 794 (), 67-75CODEN: ACACAM; ISSN:0003-2670. (Elsevier B.V.)The lack of authentic stds. represents a major bottleneck in the quant. anal. of complex samples. Here the authors propose a quant. structure and ionization intensity relation (QSIIR) approach to predict the abs. levels of compds. in complex matrixes. An abs. quant. method for simultaneous quantification of 25 org. acids was firstly developed and validated. Napierian logarithm (LN) of the relative slope rate derived from the calibration curves was applied as an indicator of the relative ionization intensity factor (RIIF) and serves as the dependent variable for building a QSIIR model via a multiple linear regression (MLR) approach. Five independent variables representing for hydrogen bond acidity, HOMO energy, the no. of hydrogen bond donating group, the ratio of org. phase, and the polar solvent accessible surface area are the dominant contributors to the RIIF of org. acids. This QSIIR model was validated to be accurate and robust, with the correlation coeffs. (R2), R2 adjusted, and R2 prediction at 0.945, 0.925, and 0.89, resp. The deviation of accuracy between the predicted and exptl. value in analyzing a real complex sample was <20% in most cases (15/18). Also, the high adaptability of this model was validated one year later in another LC/MS system. The QSIIR approach is expected to provide better understanding of quant. structure and ionization efficiency relation of analogous compds., and also to be useful in predicting the abs. levels of analogous analytes in complex mixts.
- 29Kruve, A.; Kaupmees, K. Predicting ESI/MS Signal Change for Anions in Different Solvents. Anal. Chem. 2017, 89, 5079– 5086, DOI: 10.1021/acs.analchem.7b00595Google Scholar29Predicting ESI/MS Signal Change for Anions in Different SolventsKruve, Anneli; Kaupmees, KarlAnalytical Chemistry (Washington, DC, United States) (2017), 89 (9), 5079-5086CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)LC/ESI/MS is a technique widely used for qual. and quant. anal. in various fields. However, quantification is currently possible only for compds. for which the std. substances are available, as the ionization efficiency of different compds. in ESI source differs by orders of magnitude. In this paper we present an approach for quant. LC/ESI/MS anal. without std. substances. This approach relies on accurately predicting the ionization efficiencies in ESI source based on a model, which uses physicochem. parameters of analytes. Furthermore, the model has been made transferable between different mobile phases and instrument setups by using a suitable set of calibration compds. This approach has been validated both in flow injection and chromatog. mode with gradient elution.
- 30Liigand, P.; Liigand, J.; Cuyckens, F.; Vreeken, R. J.; Kruve, A. Ionisation Efficiencies Can Be Predicted in Complicated Biological Matrices: A Proof of Concept. Anal. Chim. Acta 2018, 1032, 68– 74, DOI: 10.1016/j.aca.2018.05.072Google Scholar30Ionisation efficiencies can be predicted in complicated biological matrices: A proof of conceptLiigand, Piia; Liigand, Jaanus; Cuyckens, Filip; Vreeken, Rob J.; Kruve, AnneliAnalytica Chimica Acta (2018), 1032 (), 68-74CODEN: ACACAM; ISSN:0003-2670. (Elsevier B.V.)The importance of metabolites is assessed based on their abundance. Most of the metabolites are at present identified based on ESI/MS measurements and the relative abundance is assessed from the relative peak areas of these metabolites. Unfortunately, relative intensities can be highly misleading as different compds. ionise with vastly different efficiency in the ESI source and matrix components may cause severe ionisation suppression. In order to reduce this inaccuracy, we propose predicting the ionisation efficiencies of the analytes in seven biol. matrixes (neat solvent, blood, plasma, urine, cerebrospinal fluid, brain and liver tissue homogenates). We demonstrate, that this approach may lead to an order of magnitude increase in accuracy even in complicated matrixes. For the analyses of 10 compds., mostly drugs, in neg. electrospray ionisation mode we reduce the predicted abundance mismatch compared to the actual abundance on av. from 660 to 8 times. The ionisation efficiencies were predicted based on i) the charge delocalisation parameter WAPS and ii) the degree of ionisation α, and the prediction model was subsequently validated based on the cross-validation method 'leave-one-out'.
- 31Panagopoulos Abrahamsson, D.; Park, J.-S.; Singh, R. R.; Sirota, M.; Woodruff, T. J. Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards. J. Chem. Inf. Model. 2020, 60, 2718– 2727, DOI: 10.1021/acs.jcim.9b01096Google Scholar31Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical StandardsPanagopoulos-Abrahamsson, Dimitri; Park, June-Soo; Singh, Randolph R.; Sirota, Marina; Woodruff, Tracey J.Journal of Chemical Information and Modeling (2020), 60 (6), 2718-2727CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Non-targeted anal. provides a comprehensive approach to analyze environmental and biol. samples for nearly all chems. present. One of the main shortcomings of current anal. methods and workflows is that they are unable to provide any quant. information constituting an important obstacle in understanding environmental fate and human exposure. Herein, we present an in silico quantification method using machine-learning for chems. analyzed using electrospray ionization (ESI). We considered three data sets from different instrumental setups: (i) capillary electrophoresis electrospray ionization-mass spectrometry (CE-MS) in pos. ionization mode (ESI+), (ii) liq. chromatog. quadrupole time-of-flight mass spectrometry (LC-QTOF/MS) in ESI+ and (iii) LC-QTOF/MS in neg. ionization mode (ESI-). We developed and applied two different machine-learning algorithms: a random forest (RF) and an artificial neural network (ANN) to predict the relative response factors (RRFs) of different chems. based on their physicochem. properties. Chem. concns. can then be calcd. by dividing the measured abundance of a chem., as peak area or peak height, by its corresponding RRF. We evaluated our models and tested their predictive power using 5-fold cross-validation (CV) and y randomization. Both the RF and the ANN models showed great promise in predicting RRFs. However, the accuracy of the predictions was dependent on the data set compn. and the exptl. setup. For the CE-MS ESI+ data set, the best model predicted measured RRFs with a mean abs. error (MAE) of 0.19 log units and a cross-validation coeff. of detn. (Q2) of 0.84 for the testing set. For the LC-QTOF/MS ESI+ data set, the best model predicted measured RRFs with an MAE of 0.32 and a Q2 of 0.40. For the LC-QTOF/MS ESI- data set, the best model predicted measured RRFs with a MAE of 0.50 and a Q2 of 0.20. Our findings suggest that machine-learning algorithms can be used for predicting concns. of nontargeted chems. with reasonable uncertainties, esp. in ESI+, while the application on ESI- remains a more challenging problem.
- 32Aalizadeh, R.; Panara, A.; Thomaidis, N. S. Development and Application of a Novel Semi-Quantification Approach in LC-QToF-MS Analysis of Natural Products. J. Am. Soc. Mass Spectrom. 2021, 32, 1412– 1423, DOI: 10.1021/jasms.1c00032Google Scholar32Development and Application of a Novel Semi-quantification Approach in LC-QToF-MS Analysis of Natural ProductsAalizadeh, Reza; Panara, Anthi; Thomaidis, Nikolaos S.Journal of the American Society for Mass Spectrometry (2021), 32 (6), 1412-1423CODEN: JAMSEF; ISSN:1879-1123. (American Chemical Society)Use of high-resoln. mass spectrometry (HRMS) including a MS calibration method has enabled simultaneous identification and quantification of knowns/unknowns. This has expanded our knowledge about the existing sample relevant chem. space in a way beyond reconciliation with a quantification task. This is largely due to fact that ref. stds. are not always available to achieve quant. anal. In this scenario, a semi-quant. approach can fill the gap and provide a rough estn. of concn. This research aimed to develop and compare several semi-quantification approaches based on chem. similarity or properties. The ionization efficiency scale was created for several groups of natural products. Advanced modeling approach based on a support vector machine was conducted to learn from the exptl. ionization efficiency and apply it to unknowns or suspected compds. to predict their ionization efficiency in electrospray ionization mode. The developed semi-quantification workflows could be useful in most HRMS based "omics" areas, esp. in natural products discovery.
- 33Aalizadeh, R.; Thomaidis, N. S.; Bletsou, A. A.; Gago-Ferrero, P. Quantitative Structure–Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental Samples. J. Chem. Inf. Model. 2016, 56, 1384– 1398, DOI: 10.1021/acs.jcim.5b00752Google Scholar33Quantitative Structure-Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental SamplesAalizadeh, Reza; Thomaidis, Nikolaos S.; Bletsou, Anna A.; Gago-Ferrero, PabloJournal of Chemical Information and Modeling (2016), 56 (7), 1384-1398CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Over the past decade, the application of liq. chromatog.-high resoln. mass spectroscopy (LC-HRMS) has been growing extensively due to its ability to analyze a wide range of suspected and unknown compds. in environmental samples. However, various criteria, such as mass accuracy and isotopic pattern of the precursor ion, MS/MS spectra evaluation, and retention time plausibility, should be met to reach a certain identification confidence. In this context, a comprehensive work-flow based on computational tools was developed to understand the retention time behavior of a large no. of compds. belonging to emerging contaminants. Two extensive data sets were built for 2 chromatog. systems, 1 for pos. and 1 for neg. electrospray ionization mode, contg. information for the retention time of 528 and 298 compds., resp., to expand the applicability domain of the developed models. Then, the data sets were split into training and test set, employing k-nearest neighborhood clustering, to build and validate the models' internal and external prediction ability. The best subset of mol. descriptors was selected using genetic algorithms. Multiple linear regression, artificial neural networks, and support vector machines were used to correlate the selected descriptors with the exptl. retention times. Several validation techniques were used, including Golbraikh-Tropsha acceptable model criteria, Euclidean based applicability domain, modified correlation coeff. (rm2), and concordance correlation coeff. values, to measure the accuracy and precision of the models. The best linear and nonlinear models for each data set were derived and used to predict the retention time of suspect compds. of a wide-scope survey, as the evaluation data set. For the efficient outlier detection and interpretation of the origin of the prediction error, a novel procedure and tool was developed and applied, enabling one to identify if the suspect compd. was in the applicability domain or not.
- 34Aalizadeh, R.; Alygizakis, N. A.; Schymanski, E. L.; Krauss, M.; Schulze, T.; Ibáñez, M.; McEachran, A. D.; Chao, A.; Williams, A. J.; Gago-Ferrero, P.; Covaci, A.; Moschet, C.; Young, T. M.; Hollender, J.; Slobodnik, J.; Thomaidis, N. S. Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget Screening. Anal. Chem. 2021, 93, 11601– 11611, DOI: 10.1021/acs.analchem.1c02348Google Scholar34Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget ScreeningAalizadeh, Reza; Alygizakis, Nikiforos A.; Schymanski, Emma L.; Krauss, Martin; Schulze, Tobias; Ibanez, Maria; McEachran, Andrew D.; Chao, Alex; Williams, Antony J.; Gago-Ferrero, Pablo; Covaci, Adrian; Moschet, Christoph; Young, Thomas M.; Hollender, Juliane; Slobodnik, Jaroslav; Thomaidis, Nikolaos S.Analytical Chemistry (Washington, DC, United States) (2021), 93 (33), 11601-11611CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)There is an increasing need for comparable and harmonized retention times (tR) in liq. chromatog. (LC) among different labs., to provide supplementary evidence for the identity of compds. in high-resoln. mass spectrometry (HRMS)-based suspect and nontarget screening investigations. In this study, a rigorously tested, flexible, and less system-dependent unified retention time index (RTI) approach for LC is presented, based on the calibration of the elution pattern. Two sets of 18 calibrants were selected for each of ESI+ and ESI-based on the max. overlap with the retention times and chem. similarity indexes from a total set of 2123 compds. The resulting calibration set, with RTI set to range between 1 and 1000, was proposed as the most appropriate RTI system after rigorous evaluation, coordinated by the NORMAN network. The validation of the proposed RTI system was done externally on different instrumentation and LC conditions. The RTI can also be used to check the reproducibility and quality of LC conditions. Two quant. structure-retention relationship (QSRR)-based models were built based on the developed RTI systems, which assist in the removal of false-pos. annotations. The applicability domains of the QSRR models allowed completing the identification process with higher confidence for substances within the domain, while indicating those substances for which results should be treated with caution. The proposed RTI system was used to improve confidence in suspect and nontarget screening and increase the comparability between labs. as demonstrated for two examples. All RTI-related calcns. can be performed online at http://rti.chem.uoa.gr/.
- 35Kruve, A.; Kaupmees, K.; Liigand, J.; Leito, I. Negative Electrospray Ionization via Deprotonation: Predicting the Ionization Efficiency. Anal. Chem. 2014, 86, 4822– 4830, DOI: 10.1021/ac404066vGoogle Scholar35Negative Electrospray Ionization via Deprotonation: Predicting the Ionization EfficiencyKruve, Anneli; Kaupmees, Karl; Liigand, Jaanus; Leito, IvoAnalytical Chemistry (Washington, DC, United States) (2014), 86 (10), 4822-4830CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Electrospray ionization (ESI) in the neg. ion mode has received less attention in fundamental studies than the pos. ion electrospray ionization. In this paper, we study the efficiency of neg. ion formation in the ESI source via deprotonation of substituted phenols and benzoic acids and explore correlations of the obtained ionization efficiency values (logIE) with different mol. properties. It is obsd. that stronger acids (i.e., fully deprotonated in the droplets) yielding anions with highly delocalized charge [quantified by the weighted av. pos. sigma (WAPS) parameter rooted in the COSMO theory] have higher ionization efficiency and give higher signals in the neg.-ion ESI/MS. A linear model was obtained, which equally well describes the logIE of both phenols and benzoic acids (R2 = 0.83, S = 0.40 log units) and contains only an ionization degree in soln. (α) and WAPS as mol. parameters. Both parameters can easily be calcd. with the COSMO-RS method. The model was successfully validated using a test set of acids belonging neither to phenols nor to benzoic acids, thereby demonstrating its broad applicability and the universality of the above-described relationships between IE and mol. properties.
- 36Mayhew, A. W.; Topping, D. O.; Hamilton, J. F. New Approach Combining Molecular Fingerprints and Machine Learning to Estimate Relative Ionization Efficiency in Electrospray Ionization. ACS Omega 2020, 5, 9510– 9516, DOI: 10.1021/acsomega.0c00732Google Scholar36New Approach Combining Molecular Fingerprints and Machine Learning to Estimate Relative Ionization Efficiency in Electrospray IonizationMayhew, Alfred W.; Topping, David O.; Hamilton, Jacqueline F.ACS Omega (2020), 5 (16), 9510-9516CODEN: ACSODF; ISSN:2470-1343. (American Chemical Society)Electrospray ionization (ESI) is widely used as an ionization source for the anal. of complex mixts. by mass spectrometry. However, different compds. ionize more or less effectively in the ESI source, meaning instrument responses can vary by orders of magnitude, often in hard-to-predict ways. This precludes the use of ESI for quant. anal. where authentic stds. are not available. Relative ionization efficiency (RIE) scales have been proposed as a route to predict the response of compds. in ESI. In this work, a scale of RIEs was constructed for 51 carboxylic acids, spanning a wide range of addnl. functionalities, to produce a model for predicting the RIE of unknown compds. While using a limited no. of compds., we explore the usefulness of building a predictor using popular supervised regression techniques, encoding the compds. as combinations of different structural features using a range of common "fingerprints". It was found that Bayesian ridge regression gives the best predictive model, encoding compds. using features designed for activity coeff. models. This produced a predictive model with an R2 score of 0.62 and a root-mean-square error (RMSE) of 0.362. Such scores are comparable to those obtained in previous studies but without the requirement to first measure or predict the phys. properties of the compds., potentially reducing the time required to make predictions.
- 37Aalizadeh, R.; Nikolopoulou, V.; Alygizakis, N.; Slobodnik, J.; Thomaidis, N. S. A Novel Workflow for Semi-Quantification of Emerging Contaminants in Environmental Samples Analyzed by LC-HRMS. Anal. Bioanal. Chem. 2022, 414, 7435– 7450, DOI: 10.1007/s00216-022-04084-6Google Scholar37A novel workflow for semi-quantification of emerging contaminants in environmental samples analyzed by LC-HRMSAalizadeh, Reza; Nikolopoulou, Varvara; Alygizakis, Nikiforos; Slobodnik, Jaroslav; Thomaidis, Nikolaos S.Analytical and Bioanalytical Chemistry (2022), 414 (25), 7435-7450CODEN: ABCNBP; ISSN:1618-2642. (Springer)There is an increasing need for developing a strategy to quantify the newly identified substances in environmental samples, where there are not always ref. stds. available. The semi-quant. anal. can assist risk assessment of chems. and their environmental fate. In this study, a rigorously tested and system-independent semi-quantification workflow is proposed based on ionization efficiency measurement of emerging contaminants analyzed in liq. chromatog.-high-resoln. mass spectrometry. The quant. structure-property relationship (QSPR)-based model was built to predict the ionization efficiency of unknown compds. which can be later used for their semi-quantification. The proposed semi-quantification method was applied and tested in real environmental seawater samples. All semi-quantification-related calcns. can be performed online and free of access at http://trams.chem.uoa.gr/semiquantification/.
- 38Wang, S.; Basijokaite, R.; Murphy, B. L.; Kelleher, C. A.; Zeng, T. Combining Passive Sampling with Suspect and Nontarget Screening to Characterize Organic Micropollutants in Streams Draining Mixed-Use Watersheds. Environ. Sci. Technol. 2022, 56, 16726– 16736, DOI: 10.1021/acs.est.2c02938Google Scholar38Combining Passive Sampling with Suspect and Nontarget Screening to Characterize Organic Micropollutants in Streams Draining Mixed-Use WatershedsWang, Shiru; Basijokaite, Ruta; Murphy, Bethany L.; Kelleher, Christa A.; Zeng, TengEnvironmental Science & Technology (2022), 56 (23), 16726-16736CODEN: ESTHAG; ISSN:1520-5851. (American Chemical Society)Org. micropollutants (OMPs) represent an anthropogenic stressor on stream ecosystems. In this work, we combined passive sampling with suspect and nontarget screening enabled by liq. chromatog.-high-resoln. mass spectrometry to characterize complex mixts. of OMPs in streams draining mixed-use watersheds. Suspect screening identified 122 unique OMPs for target quantification in polar org. chem. integrative samplers (POCIS) and grab samples collected from 20 stream sites in upstate New York over two sampling seasons. Hierarchical clustering established the co-occurrence profiles of OMPs in connection with watershed attributes indicative of anthropogenic influences. Nontarget screening leveraging the time-integrative nature of POCIS and the cross-site variability in watershed attributes prioritized and confirmed 11 addnl. compds. that were ubiquitously present in monitored streams. Field sampling rates for 37 OMPs that simultaneously occurred in POCIS and grab samples spanned the range of 0.02 to 0.22 L/d with a median value of 0.07 L/d. Comparative analyses of the daily av. loads, cumulative exposure-activity ratios, and multi-substance potentially affected fractions supported the feasibility of complementing grab sampling with POCIS for OMP load estn. and screening-level risk assessments. Overall, this work demonstrated a multi-watershed sampling and screening approach that can be adapted to assess OMP contamination in streams across landscapes.
- 39Krier, J.; Singh, R. R.; Kondić, T.; Lai, A.; Diderich, P.; Zhang, J.; Thiessen, P. A.; Bolton, E. E.; Schymanski, E. L. Discovering Pesticides and Their TPs in Luxembourg Waters Using Open Cheminformatics Approaches. Environ. Int. 2022, 158, 106885 DOI: 10.1016/j.envint.2021.106885Google Scholar39Discovering pesticides and their TPs in Luxembourg waters using open cheminformatics approachesKrier, Jessy; Singh, Randolph R.; Kondic, Todor; Lai, Adelene; Diderich, Philippe; Zhang, Jian; Thiessen, Paul A.; Bolton, Evan E.; Schymanski, Emma L.Environment International (2022), 158 (), 106885CODEN: ENVIDV; ISSN:0160-4120. (Elsevier Ltd.)The diversity of hundreds of thousands of potential org. pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resoln. liq. chromatog.-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chems. in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS anal. and pre-screening of the suspects (including automated quality control steps), with spectral annotation to det. which pesticides and, in a second step, their related TPs may be present in the samples. The data anal. with Shinyscreen (https://gitlab.lcsb.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 96 potential TP masses in the samples. Further identification of these mass matches was performed using the open source approach MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target anal. of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal anal. of the results showed that TPs and pesticides followed similar trends, with a max. no. of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sure and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support "L'Administration de la Gestion de l'Eau" on further monitoring steps in Luxembourg.
- 40Schymanski, E. L.; Kondić, T.; Neumann, S.; Thiessen, P. A.; Zhang, J.; Bolton, E. E. Empowering Large Chemical Knowledge Bases for Exposomics: PubChemLite Meets MetFrag. J. Cheminformatics 2021, 13, 19, DOI: 10.1186/s13321-021-00489-0Google Scholar40Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFragSchymanski, Emma L.; Kondic, Todor; Neumann, Steffen; Thiessen, Paul A.; Zhang, Jian; Bolton, Evan E.Journal of Cheminformatics (2021), 13 (1), 19CODEN: JCOHB3; ISSN:1758-2946. (SpringerOpen)In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resoln. mass spectrometry. Benchmarking datasets from earlier publications are used to show how exptl. knowledge and existing datasets can be used to detect and fill gaps in compd. databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome addnl. community input on ideas for future developments.
- 41Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A. A.; Melnik, A. V.; Meusel, M.; Dorrestein, P. C.; Rousu, J.; Böcker, S. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16, 299– 302, DOI: 10.1038/s41592-019-0344-8Google Scholar41SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure informationDuehrkop, Kai; Fleischauer, Markus; Ludwig, Marcus; Aksenov, Alexander A.; Melnik, Alexey V.; Meusel, Marvin; Dorrestein, Pieter C.; Rousu, Juho; Boecker, SebastianNature Methods (2019), 16 (4), 299-302CODEN: NMAEA3; ISSN:1548-7091. (Nature Research)Mass spectrometry is a predominant exptl. technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for mol. structure identification. SIRIUS 4 integrates CSI:FingerID for searching in mol. structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.
- 42Paszkiewicz, M.; Godlewska, K.; Lis, H.; Caban, M.; Białk-Bielińska, A.; Stepnowski, P. Advances in Suspect Screening and Non-Target Analysis of Polar Emerging Contaminants in the Environmental Monitoring. TrAC Trends Anal. Chem. 2022, 154, 116671 DOI: 10.1016/j.trac.2022.116671Google Scholar42Advances in suspect screening and non-target analysis of polar emerging contaminants in the environmental monitoringPaszkiewicz, Monika; Godlewska, Klaudia; Lis, Hanna; Caban, Magda; Bialk-Bielinska, Anna; Stepnowski, PiotrTrAC, Trends in Analytical Chemistry (2022), 154 (), 116671CODEN: TTAEDJ; ISSN:0165-9936. (Elsevier B.V.)A review. The prodn. and use of chems. worldwide, and thus the no. of those that can potentially leach into the environment, is constantly increasing. Recent advances in anal. techniques provide the opportunity to detect a wide range of contaminants in water that would not be detected by traditional targeted anal. (TA) methods. These advanced techniques include the use of high-resoln. mass spectrometry (HRMS) or tandem HRMS in suspect screening anal. (SSA) or non-target anal. (NTA). This review presents the advances of the last five years for SSA and NTA of polar emerging contaminants (ECs) in various matrixes, including drinking water, surface water, wastewater, and soil/sediment. We discuss all steps in the anal. procedure, including novel sampling and extn. approaches, GC or LC-HRMS anal., (pre)data processing, evaluation, and reporting. We also identify challenges and future trends in SSA and NTA monitoring of polar ECs.
- 43Meng, D.; Fan, D.; Gu, W.; Wang, Z.; Chen, Y.; Bu, H.; Liu, J. Development of an integral strategy for non-target and target analysis of site-specific potential contaminants in surface water: A case study of Dianshan Lake, China. Chemosphere 2020, 243, 125367 DOI: 10.1016/j.chemosphere.2019.125367Google Scholar43Development of an integral strategy for non-target and target analysis of site-specific potential contaminants in surface water: A case study of Dianshan Lake, ChinaMeng, Di; Fan, De-ling; Gu, Wen; Wang, Zhen; Chen, Yong-jie; Bu, Hong-zhong; Liu, Ji-ningChemosphere (2020), 243 (), 125367CODEN: CMSHAF; ISSN:0045-6535. (Elsevier Ltd.)Surface water contains a large no. of potential pollutants and their transformation products, which cannot be discovered by normal target anal. alone. To detect site-specific and unknown contaminants in the environment, we established an integral anal. strategy based on liq. chromatog.-high resoln. mass spectrometry (LC-HRMS) combined with data processing using specific software (Compd. Discoverer 3.0). In this case study of Dianshan Lake, 95 potential contaminants were tentatively identified and ranked by the scoring system. Then, the 95 compds. were categorized into 4 subgroups: pesticides, drugs, plastic additives and surfactants. To det. the sources and distribution of those pollutants, 4 heat maps were developed based on the sum of peak areas of resp. categories. In addn., 19 substances with high exposure risk among the 95 compds. tentatively identified were confirmed and quantified. In the present study, the anal. strategy with non-target screening followed by target anal. demonstrated that pesticides and plastic additives are the two dominant types of contaminants in Dianshan Lake. High accuracy and high-resoln. data combined with integrated software provided abundant information for the identification of a wide range of potential contaminants in the environment. This approach can be a useful tool for the simple and rapid screening and tentative detection of site-specific contaminants.
- 44Groff, L. C.; Grossman, J. N.; Kruve, A.; Minucci, J. M.; Lowe, C. N.; McCord, J. P.; Kapraun, D. F.; Phillips, K. A.; Purucker, S. T.; Chao, A.; Ring, C. L.; Williams, A. J.; Sobus, J. R. Uncertainty Estimation Strategies for Quantitative Non-Targeted Analysis. Anal. Bioanal. Chem. 2022, 414, 4919– 4933, DOI: 10.1007/s00216-022-04118-zGoogle Scholar44Uncertainty estimation strategies for quantitative non-targeted analysisGroff II, Louis C.; Grossman, Jarod N.; Kruve, Anneli; Minucci, Jeffrey M.; Lowe, Charles N.; McCord, James P.; Kapraun, Dustin F.; Phillips, Katherine A.; Purucker, S. Thomas; Chao, Alex; Ring, Caroline L.; Williams, Antony J.; Sobus, Jon R.Analytical and Bioanalytical Chemistry (2022), 414 (17), 4919-4933CODEN: ABCNBP; ISSN:1618-2642. (Springer)Non-targeted anal. (NTA) methods are widely used for chem. discovery but seldom employed for quantitation due to a lack of robust methods to est. chem. concns. with confidence limits. Herein, we present and evaluate new statistical methods for quant. NTA (qNTA) using high-resoln. mass spectrometry (HRMS) data from EPA's Non-Targeted Anal. Collaborative Trial (ENTACT). Exptl. intensities of ENTACT analytes were obsd. at multiple concns. using a semi-automated NTA workflow. Chem. concns. and corresponding confidence limits were first estd. using traditional calibration curves. Two qNTA estn. methods were then implemented using exptl. response factor (RF) data (where RF = intensity/concn.). The bounded response factor method used a non-parametric bootstrap procedure to est. select quantiles of training set RF distributions. Quantile ests. then were applied to test set HRMS intensities to inversely est. concns. with confidence limits. The ionization efficiency estn. method restricted the distribution of likely RFs for each analyte using ionization efficiency predictions. Given the intended future use for chem. risk characterization, predicted upper confidence limits (protective values) were compared to known chem. concns. Using traditional calibration curves, 95% of upper confidence limits were within ∼tenfold of the true concns. The error increased to ∼60-fold (ESI+) and ∼120-fold (ESI-) for the ionization efficiency estn. method and to ∼150-fold (ESI+) and ∼130-fold (ESI-) for the bounded response factor method. This work demonstrates successful implementation of confidence limit estn. strategies to support qNTA studies and marks a crucial step towards translating NTA data in a risk-based context.
- 45Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite Identification and Molecular Fingerprint Prediction through Machine Learning. Bioinformatics 2012, 28, 2333– 2341, DOI: 10.1093/bioinformatics/bts437Google Scholar45Metabolite identification and molecular fingerprint prediction through machine learningHeinonen, Markus; Shen, Huibin; Zamboni, Nicola; Rousu, JuhoBioinformatics (2012), 28 (18), 2333-2341CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modeling and network anal. Yet, currently this task requires matching the obsd. spectrum against a database of ref. spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of mols. not present in ref. databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of mol. characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of mol. properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large mol. databases, such as PubChem. We demonstrate that several mol. properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the ref. database does not contain any spectra of the same mol.
- 46Meekel, N.; Vughs, D.; Béen, F.; Brunner, A. M. Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition. Anal. Chem. 2021, 93, 5071– 5080, DOI: 10.1021/acs.analchem.0c04473Google Scholar46Online prioritization of toxic compounds in water samples through intelligent HRMS data acquisitionMeekel, Nienke; Vughs, Dennis; Been, Frederic; Brunner, Andrea M.Analytical Chemistry (Washington, DC, United States) (2021), 93 (12), 5071-5080CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)LC-HRMS-based nontarget screening (NTS) has become the method of choice to monitor org. micropollutants (OMPs) in drinking water and its sources. OMPs are identified by matching exptl. fragmentation (MS2) spectra with library or in silico-predicted spectra. This requires informative exptl. spectra and prioritization to reduce feature nos., currently performed post data acquisition. Here, we propose a different prioritization strategy to ensure high-quality MS2 spectra for OMPs that pose an environmental or human health risk. This online prioritization triggers MS2 events based on detection of suspect list entries or isotopic patterns in the full scan or an addnl. MS2 event based on fragment ion(s)/patterns detected in a first MS2 spectrum. Triggers were detd. using cheminformatics; potentially toxic compds. were selected based on the presence of structural alerts, in silico-fragmented, and recurring fragments and mass shifts characteristic for a given structural alert identified. After MS acquisition parameter optimization, performance of the online prioritization was exptl. examd. Triggered methods led to increased percentages of MS2 spectra and addnl. MS2 spectra for compds. with a structural alert. Application to surface water samples resulted in addnl. MS2 spectra of potentially toxic compds., facilitating more confident identification and emphasizing the method's potential to improve monitoring studies.
- 47Peets, P.; Wang, W.-C.; MacLeod, M.; Breitholtz, M.; Martin, J. W.; Kruve, A. MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS. Environ. Sci. Technol. 2022, 56, 15508– 15517, DOI: 10.1021/acs.est.2c02536Google Scholar47MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMSPeets, Pilleriin; Wang, Wei-Chieh; MacLeod, Matthew; Breitholtz, Magnus; Martin, Jonathan W.; Kruve, AnneliEnvironmental Science & Technology (2022), 56 (22), 15508-15517CODEN: ESTHAG; ISSN:1520-5851. (American Chemical Society)To achieve water quality objectives of the zero pollution action plan in Europe, rapid methods are needed to identify the presence of toxic substances in complex water samples. However, only a small fraction of chems. detected with nontarget high-resoln. mass spectrometry can be identified, and fewer have ecotoxicol. data available. We hypothesized that ecotoxicol. data could be predicted for unknown mol. features in data-rich high-resoln. mass spectrometry (HRMS) spectra, thereby circumventing time-consuming steps of mol. identification and rapidly flagging mols. of potentially high toxicity in complex samples. Here, we present MS2Tox, a machine learning method, to predict the toxicity of unidentified chems. based on high-resoln. accurate mass tandem mass spectra (MS2). The MS2Tox model for fish toxicity was trained and tested on 647 lethal concn. (LC50) values from the CompTox database and validated for 219 chems. and 420 MS2 spectra from MassBank. The root mean square error (RMSE) of MS2Tox predictions was below 0.89 log-mM, while the exptl. repeatability of LC50 values in CompTox was 0.44 log-mM. MS2Tox allowed accurate prediction of fish LC50 values for 22 chems. detected in water samples, and empirical evidence suggested the right directionality for another 68 chems. Moreover, by incorporating structural information, e.g., the presence of carbonyl-benzene, amide moieties, or hydroxyl groups, MS2Tox outperforms baseline models that use only the exact mass or log KOW.
- 48Hoffmann, M. A.; Nothias, L.-F.; Ludwig, M.; Fleischauer, M.; Gentry, E. C.; Witting, M.; Dorrestein, P. C.; Dührkop, K.; Böcker, S. High-Confidence Structural Annotation of Metabolites Absent from Spectral Libraries. Nat. Biotechnol. 2022, 40, 411– 421, DOI: 10.1038/s41587-021-01045-9Google Scholar48High-confidence structural annotation of metabolites absent from spectral librariesHoffmann, Martin A.; Nothias, Louis-Felix; Ludwig, Marcus; Fleischauer, Markus; Gentry, Emily C.; Witting, Michael; Dorrestein, Pieter C.; Duehrkop, Kai; Boecker, SebastianNature Biotechnology (2022), 40 (3), 411-421CODEN: NABIF9; ISSN:1087-0156. (Nature Portfolio)Untargeted metabolomics expts. rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel d. P value estn. and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial no. of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic stds. In human samples, we annotated and manually validated 315 mol. structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics expts. led to 1,715 high-confidence structural annotations that were absent from spectral libraries.
- 49Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 12580– 12585, DOI: 10.1073/pnas.1509788112Google Scholar49Searching molecular structure databases with tandem mass spectra using CSI:FingerIDDuehrkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Boecker, SebastianProceedings of the National Academy of Sciences of the United States of America (2015), 112 (41), 12580-12585CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics expts. usually rely on tandem MS to identify the thousands of compds. in a biol. sample. Today, the vast majority of metabolites remain unknown. The authors present a method for searching mol. structure databases using tandem MS data of small mols. The authors' method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown mol. The authors use the fragmentation tree to predict the mol. structure fingerprint of the unknown compd. using machine learning. This fingerprint is then used to search a mol. structure database such as PubChem. The authors' method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.
- 50Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J. Cheminformatics 2016, 8, 5, DOI: 10.1186/s13321-016-0116-8Google Scholar50Fragmentation trees reloadedBocker Sebastian; Duhrkop KaiJournal of cheminformatics (2016), 8 (), 5 ISSN:1758-2946.BACKGROUND: Untargeted metabolomics commonly uses liquid chromatography mass spectrometry to measure abundances of metabolites; subsequent tandem mass spectrometry is used to derive information about individual compounds. One of the bottlenecks in this experimental setup is the interpretation of fragmentation spectra to accurately and efficiently identify compounds. Fragmentation trees have become a powerful tool for the interpretation of tandem mass spectrometry data of small molecules. These trees are determined from the data using combinatorial optimization, and aim at explaining the experimental data via fragmentation cascades. Fragmentation tree computation does not require spectral or structural databases. To obtain biochemically meaningful trees, one needs an elaborate optimization function (scoring). RESULTS: We present a new scoring for computing fragmentation trees, transforming the combinatorial optimization into a Maximum A Posteriori estimator. We demonstrate the superiority of the new scoring for two tasks: both for the de novo identification of molecular formulas of unknown compounds, and for searching a database for structurally similar compounds, our method SIRIUS 3, performs significantly better than the previous version of our method, as well as other methods for this task. CONCLUSION: SIRIUS 3 can be a part of an untargeted metabolomics workflow, allowing researchers to investigate unknowns using automated computational methods.Graphical abstractWe present a new scoring for computing fragmentation trees from tandem mass spectrometry data based on Bayesian statistics. The best scoring fragmentation tree most likely explains the molecular formula of the measured parent ion.
- 51Klekota, J.; Roth, F. P. Chemical Substructures That Enrich for Biological Activity. Bioinformatics 2008, 24, 2518– 2525, DOI: 10.1093/bioinformatics/btn479Google Scholar51Chemical substructures that enrich for biological activityKlekota, Justin; Roth, Frederick P.Bioinformatics (2008), 24 (21), 2518-2525CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Certain chem. substructures are present in many drugs. This has led to the claim of 'privileged' substructures which are predisposed to bioactivity. Because bias in screening library construction could explain this phenomenon, the existence of privilege was controversial. Using diverse phenotypic assays, we defined bioactivity for multiple compd. libraries. Many substructures were assocd. with bioactivity even after accounting for substructure prevalence in the library, thus validating the privileged substructure concept. Detns. of privilege were confirmed in independent assays and libraries. Our anal. also revealed 'underprivileged' substructures and conditional privilege'-rules relating combinations of substructure to bioactivity. Most previously reported substructures were flat arom. ring systems. Although we validated such substructures, we also identified 3D privileged substructures. Most privileged substructures display a wide variety of substituents suggesting an entropic mechanism of privilege. Compds. contg. privileged substructures had a doubled rate of bioactivity, suggesting practical consequences for pharmaceutical discovery.
- 52Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273– 1280, DOI: 10.1021/ci010132rGoogle Scholar52Reoptimization of MDL Keys for Use in Drug DiscoveryDurant, Joseph L.; Leland, Burton A.; Henry, Douglas R.; Nourse, James G.Journal of Chemical Information and Computer Sciences (2002), 42 (6), 1273-1280CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)For a no. of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in mol. similarity. Classification performance for a test data set of 957 compds. was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset contg. 208 bits and 0.71 for a genetic algorithm optimized keyset contg. 548 bits. We present an overview of the underlying technol. supporting the definition of descriptors and the encoding of these descriptors into keysets. This technol. allows definition of descriptors as combinations of atom properties, bond properties, and at. neighborhoods at various topol. sepns. as well as supporting a no. of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodol. developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning expt. highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.
- 53Guha, R. Chemical Informatics Functionality in R. J. Stat. Softw. 2007, 18, 1– 16, DOI: 10.18637/jss.v018.i05Google ScholarThere is no corresponding record for this reference.
- 54Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: San Francisco, California, USA, 2016; pp. 785– 794.Google ScholarThere is no corresponding record for this reference.
- 55Rashmi, K. V.; Gilad-Bachrach, R. DART: Dropouts Meet Multiple Additive Regression Trees. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics; PMLR: San Diego, CA, USA, 2015; Vol. 38, pp. 489– 497.Google ScholarThere is no corresponding record for this reference.
- 56Kruve, A.; Aalizadeh, R.; Malm, L.; Alygizakis, N.; Thomaidis, N. S. Interlaboratory Comparison on Strategies for Semi-Quantitative Non-Targeted LC-ESI-HRMS, 2020. https://www.norman-network.net/sites/default/files/files/QA-QC%20Issues/Invitation%20letter%20JPA%202020%20semi-quant%20inter%20lab%20%28002%29.pdf.Google ScholarThere is no corresponding record for this reference.
- 57NORMAN Network; Aalizadeh, R.; Alygizakis, N.; Schymanski, E.; Slobodnik, J.; Fischer, S.; Cirka, L. S0 | SUSDAT | Merged NORMAN Suspect List: SusDat. 2022, DOI: 10.5281/ZENODO.2664077 .Google ScholarThere is no corresponding record for this reference.
- 58Gao, S.; Zhang, Z.; Karnes, H. Sensitivity Enhancement in Liquid Chromatography/Atmospheric Pressure Ionization Mass Spectrometry Using Derivatization and Mobile Phase Additives. J. Chromatogr., B 2005, 825, 98– 110, DOI: 10.1016/j.jchromb.2005.04.021Google Scholar58Sensitivity enhancement in liquid chromatography/atmospheric pressure ionization mass spectrometry using derivatization and mobile phase additivesGao, Songmei; Zhang, Zong-Ping; Karnes, H. T.Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences (2005), 825 (2), 98-110CODEN: JCBAAI; ISSN:1570-0232. (Elsevier B.V.)A review. High performance liq. chromatog. with atm. pressure ionization (API) mass spectrometry has been essential to a large no. of quant. anal. applications for a variety of compds. Poor detection sensitivity however is a problem obsd. for a no. of analytes because detection sensitivity can be affected by many factors. The two most crit. factors are the chem. and phys. properties of the analyte and the compn. of the mobile phase. To address these crit. factors which may lead to poor sensitivity, either the structure of the analyte must be modified or the mobile phase compn. optimized. The introduction of permanently charged moieties or readily ionized species may dramatically improve the ionization efficiency for electrospray ionization (ESI), and thus the sensitivity of detection. Detection sensitivity may also be enhanced via introduction of moieties with high proton affinity or electron affinity. Mobile phase component modification is an alternative way to enhance sensitivity by changing the form of the analytes in soln. thereby improving ionization efficiency. PH adjustment and adduct formation have been commonly used to optimize detection conditions. The sensitivity of detection for analytes in bio-matrixes could also be enhanced by decreasing ion-suppression from the matrix through derivatization or mobile phase addn. In this review, the authors will discuss detection-oriented derivatization as well as the application of mobile phase additives to enhance the sensitivity of detection in liq. chromatograph/atm. ionization/mass spectrometry (LC/API/MS), focusing in particular on the applications involving small mols. in bio-matrixes.
- 59Djoumbou Feunang, Y.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D. S. ClassyFire: Automated Chemical Classification with a Comprehensive Computable Taxonomy. Aust. J. Chem. 2016, 8, 61, DOI: 10.1186/s13321-016-0174-yGoogle ScholarThere is no corresponding record for this reference.
- 60Wang, T.; Liigand, J.; Frandsen, H. L.; Smedsgaard, J.; Kruve, A. Standard Substances Free Quantification Makes LC/ESI/MS Non-Targeted Screening of Pesticides in Cereals Comparable between Labs. Food Chem. 2020, 318, 126460 DOI: 10.1016/j.foodchem.2020.126460Google Scholar60Standard substances free quantification makes LC/ESI/MS non-targeted screening of pesticides in cereals comparable between labsWang, Tingting; Liigand, Jaanus; Frandsen, Henrik Lauritz; Smedsgaard, Joern; Kruve, AnneliFood Chemistry (2020), 318 (), 126460CODEN: FOCHDJ; ISSN:0308-8146. (Elsevier Ltd.)LC/ESI/MS is the technique of choice for qual. and quant. food monitoring; however, anal. of a large no. of compds. is challenged by the availability of std. substances. The impediment of detection of food contaminants has been overcome by the suspect and non-targeted screening. Still, the results from one lab. cannot be compared with the results of another lab. as quant. results are required for this purpose. Here we show that the results of the suspect and non-targeted screening for pesticides can be made quant. with the aid of in silico predicted electrospray ionization efficiencies and this allows direct comparison of the results obtained in two different labs. For this purpose, six cereal matrixes were spiked with 134 pesticides and analyzed in two independent labs; a high correlation for the results with the R2 of 0.85.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 26 publications.
- Jack A. Brand, Jake M. Martin, Marcus Michelangeli, Eli S.J. Thoré, Natalia Sandoval-Herrera, Erin S. McCallum, Drew Szabo, Damien L. Callahan, Timothy D. Clark, Michael G. Bertram, Tomas Brodin. Advancing the Spatiotemporal Dimension of Wildlife–Pollution Interactions. Environmental Science & Technology Letters 2025, 12
(4)
, 358-370. https://doi.org/10.1021/acs.estlett.5c00042
- Yingying Yang, Qing Zhang, Adrian Covaci, Yanna Liu, Yilin Xiao, Yu Xiao, Shangwei Zhang, Xiaoman Jiang, Xinghui Xia. Unraveling the Composition Profile and Ecological Risk of Triazine Herbicides and Their Transformation Products in Urban Sewage Discharge. Environmental Science & Technology 2025, 59
(12)
, 6235-6246. https://doi.org/10.1021/acs.est.4c12910
- Nienke Meekel, Anneli Kruve, Marja H. Lamoree, Frederic M. Been. Machine Learning-based Classification for the Prioritization of Potentially Hazardous Chemicals with Structural Alerts in Nontarget Screening. Environmental Science & Technology 2025, 59
(10)
, 5056-5065. https://doi.org/10.1021/acs.est.4c10498
- Rick Helmus, Ingrida Bagdonaite, Pim de Voogt, Maarten R. van Bommel, Emma L. Schymanski, Annemarie P. van Wezel, Thomas L. ter Laak. Comprehensive Mass Spectrometry Workflows to Systematically Elucidate Transformation Processes of Organic Micropollutants: A Case Study on the Photodegradation of Four Pharmaceuticals. Environmental Science & Technology 2025, 59
(7)
, 3723-3736. https://doi.org/10.1021/acs.est.4c09121
- Alexandria Van Grouw, Markace A. Rainey, Olivia K. Reid, Molly M. Ogle, Samuel G. Moore, Johnna S. Temenoff, Facundo M. Fernández. Toward Machine Learning Electrospray Ionization Sensitivity Prediction for Semiquantitative Lipidomics in Stem Cells. Journal of Chemical Information and Modeling 2025, 65
(4)
, 1826-1836. https://doi.org/10.1021/acs.jcim.4c02040
- Jingrun Hu, Yitao Lyu, Yi Liu, Xiuqi You, Damian E. Helbling, Weiling Sun. Incorporating Transformation Products for an Integrated Assessment of Antibiotic Pollution and Risks in Surface Water. Environmental Science & Technology 2025, 59
(5)
, 2815-2826. https://doi.org/10.1021/acs.est.4c12926
- Iker Alvarez-Mora, Aset Muratuly, Sarah Johann, Katarzyna Arturi, Florian Jünger, Carolin Huber, Henner Hollert, Martin Krauss, Werner Brack, Melis Muz. High-Throughput Effect-Directed Analysis of Androgenic Compounds in Hospital Wastewater: Identifying Effect Drivers through Non-Target Screening Supported by Toxicity Prediction. Environmental Science & Technology 2025, Article ASAP.
- Rhianna L. Evans, Daniel J. Bryant, Aristeidis Voliotis, Dawei Hu, HuiHui Wu, Sara Aisyah Syafira, Osayomwanbor E. Oghama, Gordon McFiggans, Jacqueline F. Hamilton, Andrew R. Rickard. A Semi-Quantitative Approach to Nontarget Compositional Analysis of Complex Samples. Analytical Chemistry 2024, 96
(46)
, 18349-18358. https://doi.org/10.1021/acs.analchem.4c00819
- Louise Malm, Jaanus Liigand, Reza Aalizadeh, Nikiforos Alygizakis, Kelsey Ng, Emil Egede Fro̷kjær, Mulatu Yohannes Nanusha, Martin Hansen, Merle Plassmann, Stefan Bieber, Thomas Letzel, Lydia Balest, Pier Paolo Abis, Michele Mazzetti, Barbara Kasprzyk-Hordern, Nicola Ceolotto, Sangeeta Kumari, Stephan Hann, Sven Kochmann, Teresa Steininger-Mairinger, Coralie Soulier, Giuseppe Mascolo, Sapia Murgolo, Manuel Garcia-Vara, Miren López de Alda, Juliane Hollender, Katarzyna Arturi, Gianluca Coppola, Massimo Peruzzo, Hanna Joerss, Carla van der Neut-Marchand, Eelco N. Pieke, Pablo Gago-Ferrero, Ruben Gil-Solsona, Viktória Licul-Kucera, Claudio Roscioli, Sara Valsecchi, Austeja Luckute, Jan H. Christensen, Selina Tisler, Dennis Vughs, Nienke Meekel, Begoña Talavera Andújar, Dagny Aurich, Emma L. Schymanski, Gianfranco Frigerio, André Macherius, Uwe Kunkel, Tobias Bader, Pawel Rostkowski, Hans Gundersen, Belinda Valdecanas, W. Clay Davis, Bastian Schulze, Sarit Kaserzon, Martijn Pijnappels, Mar Esperanza, Aurélie Fildier, Emmanuelle Vulliet, Laure Wiest, Adrian Covaci, Alicia Macan Schönleben, Lidia Belova, Alberto Celma, Lubertus Bijlsma, Emilie Caupos, Emmanuelle Mebold, Julien Le Roux, Eugenie Troia, Eva de Rijke, Rick Helmus, Gaëla Leroy, Niels Haelewyck, David Chrastina, Milan Verwoert, Nikolaos S. Thomaidis, Anneli Kruve. Quantification Approaches in Non-Target LC/ESI/HRMS Analysis: An Interlaboratory Comparison. Analytical Chemistry 2024, 96
(41)
, 16215-16226. https://doi.org/10.1021/acs.analchem.4c02902
- Drew Szabo, Stellan Fischer, Aji P. Mathew, Anneli Kruve. Prioritization, Identification, and Quantification of Emerging Contaminants in Recycled Textiles Using Non-Targeted and Suspect Screening Workflows by LC-ESI-HRMS. Analytical Chemistry 2024, 96
(35)
, 14150-14159. https://doi.org/10.1021/acs.analchem.4c02041
- Amina Souihi, Anneli Kruve. Estimating LoD-s Based on the Ionization Efficiency Values for the Reporting and Harmonization of Amenable Chemical Space in Nontargeted Screening LC/ESI/HRMS. Analytical Chemistry 2024, 96
(28)
, 11263-11272. https://doi.org/10.1021/acs.analchem.4c01002
- Corina Meyer, Michael A. Stravs, Juliane Hollender. How Wastewater Reflects Human Metabolism─Suspect Screening of Pharmaceutical Metabolites in Wastewater Influent. Environmental Science & Technology 2024, 58
(22)
, 9828-9839. https://doi.org/10.1021/acs.est.4c00968
- Drew Szabo, Travis M. Falconer, Christine M. Fisher, Ted Heise, Allison L. Phillips, Gyorgy Vas, Antony J. Williams, Anneli Kruve. Online and Offline Prioritization of Chemicals of Interest in Suspect Screening and Non-targeted Screening with High-Resolution Mass Spectrometry. Analytical Chemistry 2024, 96
(9)
, 3707-3716. https://doi.org/10.1021/acs.analchem.3c05705
- Jonathan Zweigle, Selina Tisler, Marta Bevilacqua, Giorgio Tomasi, Nikoline J. Nielsen, Nadine Gawlitta, Josephine S. Lübeck, Age K. Smilde, Jan H. Christensen. Prioritization strategies for non-target screening in environmental samples by chromatography – High-resolution mass spectrometry: A tutorial. Journal of Chromatography A 2025, 1751 , 465944. https://doi.org/10.1016/j.chroma.2025.465944
- Jason Devers, David I. Pattison, Asger B. Hansen, Jan H. Christensen. New strategies for non-targeted quantification in comprehensive two-dimensional gas chromatography: The potential of reconstructed TIC response factor surfaces. Journal of Chromatography A 2025, 1747 , 465811. https://doi.org/10.1016/j.chroma.2025.465811
- Alexa Canchola, Lillian N. Tran, Wonsik Woo, Linhui Tian, Ying-Hsuan Lin, Wei-Chun Chou. Advancing non-target analysis of emerging environmental contaminants with machine learning: Current status and future implications. Environment International 2025, 198 , 109404. https://doi.org/10.1016/j.envint.2025.109404
- Chiara Spaggiari, Isa Sara Aimee Hiemstra, Antoinette Kazbar, Gabriele Costantino, Laura Righetti. Towards eco-metabolomics: NADES-guided extraction enables semi-quantitative metabolomics for Melissa officinalis. Advances in Sample Preparation 2025, 13 , 100154. https://doi.org/10.1016/j.sampre.2025.100154
- Iker Alvarez-Mora, Katarzyna Arturi, Frederic Béen, Sebastian Buchinger, Abd El Rahman El Mais, Christine Gallampois, Meike Hahn, Juliane Hollender, Corine Houtman, Sarah Johann, Martin Krauss, Marja Lamoree, Maria Margalef, Riccardo Massei, Werner Brack, Melis Muz. Progress, applications, and challenges in high-throughput effect-directed analysis for toxicity driver identification — is it time for HT-EDA?. Analytical and Bioanalytical Chemistry 2025, 417
(3)
, 451-472. https://doi.org/10.1007/s00216-024-05424-4
- Haotian Wang, Laijin Zhong, Wenyuan Su, Ting Ruan, Guibin Jiang. Machine learning-assisted identification of environmental pollutants by liquid chromatography coupled with high-resolution mass spectrometry. TrAC Trends in Analytical Chemistry 2024, 180 , 117988. https://doi.org/10.1016/j.trac.2024.117988
- Marie Rønne Aggerbeck, Emil Egede Frøkjær, Anders Johansen, Lea Ellegaard-Jensen, Lars Hestbjerg Hansen, Martin Hansen. Non-target analysis of Danish wastewater treatment plant effluent: Statistical analysis of chemical fingerprinting as a step toward a future monitoring tool. Environmental Research 2024, 257 , 119242. https://doi.org/10.1016/j.envres.2024.119242
- Matthias Hof, Milo L. de Baat, Jantien Noorda, Willie J.G.M. Peijnenburg, Annemarie P. van Wezel, Agnes G. Oomen. Informing the public about chemical mixtures in the local environment: Currently applied indicators in the Netherlands and ways forward. Journal of Environmental Management 2024, 368 , 122108. https://doi.org/10.1016/j.jenvman.2024.122108
- Shuai Wang, Upendra A. Argikar, Maria Chatzopoulou, Sungjoon Cho, Rachel D. Crouch, Deepika Dhaware, Ting-Jia Gu, Carley J. S. Heck, Kevin M. Johnson, Amit S. Kalgutkar, Joyce Liu, Bin Ma, Grover P. Miller, Jessica A. Rowley, Herana Kamal Seneviratne, Donglu Zhang, S. Cyrus Khojasteh. Bioactivation and reactivity research advances – 2023 year in review. Drug Metabolism Reviews 2024, 56
(3)
, 247-284. https://doi.org/10.1080/03602532.2024.2376023
- Žiga Tkalec, Jean-Philippe Antignac, Nicole Bandow, Frederic M. Béen, Lidia Belova, Jos Bessems, Bruno Le Bizec, Werner Brack, German Cano-Sancho, Jade Chaker, Adrian Covaci, Nicolas Creusot, Arthur David, Laurent Debrauwer, Gaud Dervilly, Radu Corneliu Duca, Valérie Fessard, Joan O. Grimalt, Thierry Guerin, Baninia Habchi, Helge Hecht, Juliane Hollender, Emilien L. Jamin, Jana Klánová, Tina Kosjek, Martin Krauss, Marja Lamoree, Gwenaelle Lavison-Bompard, Jeroen Meijer, Ruth Moeller, Hans Mol, Sophie Mompelat, An Van Nieuwenhuyse, Herbert Oberacher, Julien Parinet, Christof Van Poucke, Robert Roškar, Anne Togola, Jurij Trontelj, Elliott J. Price. Innovative analytical methodologies for characterizing chemical exposure with a view to next-generation risk assessment. Environment International 2024, 186 , 108585. https://doi.org/10.1016/j.envint.2024.108585
- Shirley Pu, James P. McCord, Jacqueline Bangma, Jon R. Sobus. Establishing performance metrics for quantitative non-targeted analysis: a demonstration using per- and polyfluoroalkyl substances. Analytical and Bioanalytical Chemistry 2024, 416
(5)
, 1249-1267. https://doi.org/10.1007/s00216-023-05117-4
- Varvara Nikolopoulou, Nikolaos S. Thomaidis, Reza Aalizadeh. From Chemical Similarity to Ionization Efficiency and Beyond: Toward Semi-Quantitative Analysis of Small Molecules and Its Integration in Non-targeted Screening. 2024https://doi.org/10.1007/698_2024_1188
- Helen Sepman, Pilleriin Peets, Lisa Jonsson, Louise Malm, Malte Posselt, Matthew MacLeod, Jonathan Martin, Magnus Breitholtz, Michael McLachlan, Anneli Kruve. Machine Learning Tools Can Pinpoint High-Risk Water Pollutants. 2023, 68. https://doi.org/10.3390/proceedings2023092068
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Abstract
Figure 1
Figure 1. Training (gray) and test (green) sets of two best performing models trained with the xgbTree algorithm and based on (A) structural fingerprints in MS2Quant and (B) on PaDEL descriptors. (C) General modeling workflow used here. For all 1191 chemicals, molecular descriptors/fingerprints were calculated from the structure and 80% of the data (training set) was used for modeling. To clean the descriptors, features with more than 10 missing values were removed. Additionally, features with near-zero variance (cut-off 80/20) and pair-wise correlation (cut-off 0.75) were removed. The training set chemicals were then used for modeling and the performance was assessed based on RMSE and fold prediction errors of the test set.
Figure 2
Figure 2. Workflow for validation of MS2Quant on NORMAN interlaboratory comparison samples. (A) Molecular fingerprints were computed for 36 chemicals in the calibration mix from SMILES notation with the rcdk package in R. Furthermore, MS2Quant was used to predict ionization efficiency values and linear regression was fit between experimental logarithmic response factors and logarithmic predicted ionization efficiencies. (B) Lake water spiked with 39 suspect compounds in high and low concentrations was measured with LC-HRMS in data-dependent acquisition mode with an inclusion list. SIRIUS+CSI:FingerID was used to predict probabilities of structural fingerprints from MS1 and MS2 spectra and MS2Quant was used to predict ionization efficiencies from these predicted probabilities. Thereafter, the linear regression from calibration compound was used to convert the predicted ionization efficiency values to instrument- and method-specific predicted response factors. Concentrations of suspect chemicals were found using predicted response factors as well as integrated areas from LC-HRMS analysis and was compared to the spiked concentrations. For comparison with PaDEL-based quantification, a similar workflow was used with the PaDEL descriptor-based prediction model instead of MS2Quant and identification of suspects was performed with SIRIUS+CSI:FingerID where the top assigned structure was used for ionization efficiency predictions.
Figure 3
Figure 3. Predicted concentrations for high concentration spiked sample with MS2Quant and the PaDEL-based model for five incorrectly identified compounds. Real concentrations are marked with a vertical line.
Figure 4
Figure 4. (A) Top 10 most influential variables in the model and their normalized importance (%); (B) SHAP values representing influence of each top 10 feature and their marginal contribution to the prediction and (C) the test set chemicals assigned to different classes by ClassyFire, where each datapoint represents the geometric mean prediction error of log IE of a unique chemical. The classes are in the descending order based on median geometric mean prediction error of all compounds in the group and only classes with three or more unique representatives were plotted.
References
This article references 60 other publications.
- 1McCord, J. P.; Groff, L. C.; Sobus, J. R. Quantitative Non-Targeted Analysis: Bridging the Gap between Contaminant Discovery and Risk Characterization. Environ. Int. 2022, 158, 107011 DOI: 10.1016/j.envint.2021.1070111Quantitative non-targeted analysis: Bridging the gap between contaminant discovery and risk characterizationMcCord James P; Groff Louis C 2nd; Sobus Jon R; Groff Louis C 2ndEnvironment international (2022), 158 (), 107011 ISSN:.Chemical risk assessments follow a long-standing paradigm that integrates hazard, dose-response, and exposure information to facilitate quantitative risk characterization. Targeted analytical measurement data directly support risk assessment activities, as well as downstream risk management and compliance monitoring efforts. Yet, targeted methods have struggled to keep pace with the demands for data regarding the vast, and growing, number of known chemicals. Many contemporary monitoring studies therefore utilize non-targeted analysis (NTA) methods to screen for known chemicals with limited risk information. Qualitative NTA data has enabled identification of previously unknown compounds and characterization of data-poor compounds in support of hazard identification and exposure assessment efforts. In spite of this, NTA data have seen limited use in risk-based decision making due to uncertainties surrounding their quantitative interpretation. Significant efforts have been made in recent years to bridge this quantitative gap. Based on these advancements, quantitative NTA data, when coupled with other high-throughput data streams and predictive models, are poised to directly support 21st-century risk-based decisions. This article highlights components of the chemical risk assessment process that are influenced by NTA data, surveys the existing literature for approaches to derive quantitative estimates of chemicals from NTA measurements, and presents a conceptual framework for incorporating NTA data into contemporary risk assessment frameworks.
- 2Schymanski, E. L.; Singer, H. P.; Slobodnik, J.; Ipolyi, I. M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; Thomaidis, N. S.; Bletsou, A.; Zwiener, C.; Ibáñez, M.; Portolés, T.; de Boer, R.; Reid, M. J.; Onghena, M.; Kunkel, U.; Schulz, W.; Guillon, A.; Noyon, N.; Leroy, G.; Bados, P.; Bogialli, S.; Stipaničev, D.; Rostkowski, P.; Hollender, J. Non-Target Screening with High-Resolution Mass Spectrometry: Critical Review Using a Collaborative Trial on Water Analysis. Anal. Bioanal. Chem. 2015, 407, 6237– 6255, DOI: 10.1007/s00216-015-8681-72Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysisSchymanski, Emma L.; Singer, Heinz P.; Slobodnik, Jaroslav; Ipolyi, Ildiko M.; Oswald, Peter; Krauss, Martin; Schulze, Tobias; Haglund, Peter; Letzel, Thomas; Grosse, Sylvia; Thomaidis, Nikolaos S.; Bletsou, Anna; Zwiener, Christian; Ibanez, Maria; Portoles, Tania; de Boer, Ronald; Reid, Malcolm J.; Onghena, Matthias; Kunkel, Uwe; Schulz, Wolfgang; Guillon, Amelie; Noyon, Naike; Leroy, Gaela; Bados, Philippe; Bogialli, Sara; Stipanicev, Drazenka; Rostkowski, Pawel; Hollender, JulianeAnalytical and Bioanalytical Chemistry (2015), 407 (21), 6237-6255CODEN: ABCNBP; ISSN:1618-2642. (Springer)A review is given. A dataset from a collaborative non-target screening trial organized by the NORMAN Assocn. is used to review the state-of-the-art and discuss future perspectives of non-target screening using high-resoln. mass spectrometry in water anal. A total of 18 institutes from 12 European countries analyzed an ext. of the same water sample collected from the River Danube with either one or both of liq. and gas chromatog. coupled with mass spectrometry detection. This article focuses mainly on the use of high resoln. screening techniques with target, suspect, and non-target workflows to identify substances in environmental samples. Specific examples are given to emphasize major challenges including isobaric and co-eluting substances, dependence on target and suspect lists, formula assignment, the use of retention information, and the confidence of identification. Approaches and methods applicable to unit resoln. data are also discussed. Although most substances were identified using high resoln. data with target and suspect-screening approaches, some participants proposed tentative non-target identifications. This comprehensive dataset revealed that non-target anal. techniques are already substantially harmonized between the participants, but the data processing remains time-consuming. Although the objective of a fully-automated identification workflow remains elusive in the short term, important steps in this direction have been taken, exemplified by the growing popularity of suspect screening approaches. Major recommendations to improve non-target screening include better integration and connection of desired features into software packages, the exchange of target and suspect lists, and the contribution of more spectra from std. substances into (openly accessible) databases.
- 3Hollender, J.; Schymanski, E. L.; Singer, H. P.; Ferguson, P. L. Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?. Environ. Sci. Technol. 2017, 51, 11505– 11512, DOI: 10.1021/acs.est.7b021843Nontarget Screening with High Resolution Mass Spectrometry in the Environment: Ready to Go?Hollender, Juliane; Schymanski, Emma L.; Singer, Heinz P.; Ferguson, P. LeeEnvironmental Science & Technology (2017), 51 (20), 11505-11512CODEN: ESTHAG; ISSN:0013-936X. (American Chemical Society)The vast, diverse universe of org. pollutants is a formidable challenge for environmental sciences, engineering, and regulation. Nontarget screening (NTS) based on high resoln. mass spectrometry (HRMS) has enormous potential to help characterize this universe. Here, we argue that development of mass spectrometers with increasingly high resoln. and novel couplings to both liq. and gas chromatog., combined with the integration of high performance computing, have significantly widened our anal. window and have enabled increasingly sophisticated data processing strategies, indicating a bright future for NTS. NTS has great potential for treatment assessment and pollutant prioritization within regulatory applications, as highlighted here by the case of real-time pollutant monitoring on the River Rhine. We discuss challenges for the future, including the transition from research toward soln.-centered and robust, harmonized applications.
- 4Papazian, S.; D’Agostino, L. A.; Sadiktsis, I.; Froment, J.; Bonnefille, B.; Sdougkou, K.; Xie, H.; Athanassiadis, I.; Budhavant, K.; Dasari, S.; Andersson, A.; Gustafsson, Ö.; Martin, J. W. Nontarget Mass Spectrometry and in Silico Molecular Characterization of Air Pollution from the Indian Subcontinent. Commun. Earth Environ. 2022, 3, 35, DOI: 10.1038/s43247-022-00365-1There is no corresponding record for this reference.
- 5Gago-Ferrero, P.; Schymanski, E. L.; Bletsou, A. A.; Aalizadeh, R.; Hollender, J.; Thomaidis, N. S. Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MS. Environ. Sci. Technol. 2015, 49, 12333– 12341, DOI: 10.1021/acs.est.5b034545Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MSGago-Ferrero, Pablo; Schymanski, Emma L.; Bletsou, Anna A.; Aalizadeh, Reza; Hollender, Juliane; Thomaidis, Nikolaos S.Environmental Science & Technology (2015), 49 (20), 12333-12341CODEN: ESTHAG; ISSN:0013-936X. (American Chemical Society)An integrated workflow based on liq. chromatog. coupled to a quadrupole-time-of-flight mass spectrometer (LC-QTOF-MS) was developed and applied to detect and identify suspect and unknown contaminants in Greek wastewater. Tentative identifications were initially based on mass accuracy, isotopic pattern, plausibility of the chromatog. retention time and MS/MS spectral interpretation (comparison with spectral libraries, in silico fragmentation). New specific strategies for the identification of metabolites were applied to obtain extra confidence including the comparison of diurnal and/or weekly concn. trends of the metabolite and parent compds. and the complementary use of HILIC. Thirteen of 284 predicted and literature metabolites of selected pharmaceuticals and nicotine were tentatively identified in influent samples from Athens and seven were finally confirmed with ref. stds. Here, 34 nontarget compds. were tentatively identified, 4 were also confirmed. The sulfonated surfactant diglycol ether sulfate was identified along with others in the homologous series (SO4C2H4(OC2H4)xOH), which have not been previously reported in wastewater. As many surfactants were originally found as nontargets, these compds. were studied in detail through retrospective anal.
- 6Bletsou, A. A.; Jeon, J.; Hollender, J.; Archontaki, E.; Thomaidis, N. S. Targeted and Non-Targeted Liquid Chromatography-Mass Spectrometric Workflows for Identification of Transformation Products of Emerging Pollutants in the Aquatic Environment. TrAC Trends Anal. Chem. 2015, 66, 32– 44, DOI: 10.1016/j.trac.2014.11.0096Targeted and non-targeted liquid chromatography-mass spectrometric workflows for identification of transformation products of emerging pollutants in the aquatic environmentBletsou, Anna A.; Jeon, Junho; Hollender, Juliane; Archontaki, Eleni; Thomaidis, Nikolaos S.TrAC, Trends in Analytical Chemistry (2015), 66 (), 32-44CODEN: TTAEDJ; ISSN:0165-9936. (Elsevier B. V.)A review with 92 refs. Identification of transformation products (TPs) of emerging pollutants is challenging, due to the vast no. of compds., mostly unknown, the complexity of the matrixes and their often low concns., requiring highly selective, highly sensitive techniques. We compile background information on biotic and abiotic formation of TPs and anal. developments over the past five years. We present a database of biotic or abiotic TPs compiled from those identified in recent years. We discuss mass spectrometric (MS) techniques and workflows for target, suspect and non-target screening of TPs with emphasis on liq. chromatog. coupled to MS (LC-MS). Both low- and high-resoln. (HR) mass analyzers have been applied, but HR-MS is the technique of choice, due to its high confirmatory capabilities, derived from the high resolving power and the mass accuracy in MS and MS/MS modes, and the sophisticated software developed.
- 7Been, F.; Kruve, A.; Vughs, D.; Meekel, N.; Reus, A.; Zwartsen, A.; Wessel, A.; Fischer, A.; ter Laak, T.; Brunner, A. M. Risk-Based Prioritization of Suspects Detected in Riverine Water Using Complementary Chromatographic Techniques. Water Res. 2021, 204, 117612 DOI: 10.1016/j.watres.2021.1176127Risk-based prioritization of suspects detected in riverine water using complementary chromatographic techniquesBeen, Frederic; Kruve, Anneli; Vughs, Dennis; Meekel, Nienke; Reus, Astrid; Zwartsen, Anne; Wessel, Arnoud; Fischer, Astrid; ter Laak, Thomas; Brunner, Andrea M.Water Research (2021), 204 (), 117612CODEN: WATRAG; ISSN:0043-1354. (Elsevier Ltd.)Surface waters are widely used as drinking water sources and hence their quality needs to be continuously monitored. However, current routine monitoring programs are not comprehensive as they generally cover only a limited no. of known pollutants and emerging contaminants. This study presents a risk-based approach combining suspect and non-target screening (NTS) to help extend the coverage of current monitoring schemes. In particular, the coverage of NTS was widened by combining three complementary sepns. modes: Reverse phase (RP), Hydrophilic interaction liq. chromatog. (HILIC) and Mixed-mode chromatog. (MMC). Suspect lists used were compiled from databases of relevant substances of very high concern (e.g., SVHCs) and the concn. of detected suspects was evaluated based on ionization efficiency prediction. Results show that suspect candidates can be prioritized based on their potential risk (i.e., hazard and exposure) by combining ionization efficiency-based concn. estn., in vitro toxicity data or, if not available, structural alerts and QSAR.based toxicity predictions. The acquired information shows that NTS analyses have the potential to complement target analyses, allowing to update and adapt current monitoring programs, ultimately leading to improved monitoring of drinking water sources.
- 8Oss, M.; Kruve, A.; Herodes, K.; Leito, I. Electrospray Ionization Efficiency Scale of Organic Compounds. Anal. Chem. 2010, 82, 2865– 2872, DOI: 10.1021/ac902856t8Electrospray Ionization Efficiency Scale of Organic CompoundsOss, Merit; Kruve, Anneli; Herodes, Koit; Leito, IvoAnalytical Chemistry (Washington, DC, United States) (2010), 82 (7), 2865-2872CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Ionization efficiency (IE) of different compds. in electrospray ionization (ESI) source differs widely, leading to widely differing sensitivities of ESI-MS to different analytes. An approach for quantifying ESI efficiencies (as logIE values) and setting up a self-consistent quant. exptl. ESI efficiency scale of org. compds. under predefined ionization conditions (ionization by monoprotonation) has been developed recently. Using this approach a logIE scale contg. 62 compds. of different chem. nature and ranging for 6 orders of magnitude has been established. The scale is based on over 400 relative IE (ΔlogIE) measurements between more than 250 different pairs of compds. To evaluate which mol. parameters contribute the most to the IE of a compd. linear regression anal. logIE values and different mol. parameters were carried out. The two most influential parameters in predicting the IE in ESI source are the pKa and the mol. vol. of the compd. This scale and the whole approach can be a tool for practicing liq. chromatographists and mass spectrometrists. It can be used in any mass-spectrometry lab. and we encourage practitioners to characterize their analytes with the logIE values so that a broad knowledge base on electrospray ionization efficiencies of compds. would eventually develop.
- 9Oss, M.; Tshepelevitsh, S.; Kruve, A.; Liigand, P.; Liigand, J.; Rebane, R.; Selberg, S.; Ets, K.; Herodes, K.; Leito, I. Quantitative Electrospray Ionization Efficiency Scale: 10 Years After. Rapid Commun. Mass Spectrom. 2021, 35, e9178 DOI: 10.1002/rcm.91789Quantitative electrospray ionization efficiency scale: 10 years afterOss, Merit; Tshepelevitsh, Sofja; Kruve, Anneli; Liigand, Piia; Liigand, Jaanus; Rebane, Riin; Selberg, Sigrid; Ets, Kristel; Herodes, Koit; Leito, IvoRapid Communications in Mass Spectrometry (2021), 35 (21), e9178CODEN: RCMSEF; ISSN:0951-4198. (John Wiley & Sons Ltd.)The first comprehensive quant. scale of the efficiency of electrospray ionization (ESI) in the pos. mode by monoprotonation, contg. 62 compds., was published in 2010. Several trends were found between the compd. structure and ionization efficiency (IE) but, possibly because of the limited diversity of the compds., some questions remained. This work undertakes to align the new data with the originally published IE scale and carry out statistical anal. of the resulting more extensive and diverse data set to derive more grounded relationships and offer a possibility of predicting logIE values. Recently, several new IE studies with numerous compds. have been conducted. In several of them, more detailed investigations of the influence of compd. structure, solvent properties, or instrument settings have been conducted. IE data from these studies and results from this work were combined, and the multilinear regression method was applied to relate IE to various compd. parameters. The most comprehensive IE scale available, contg. 334 compds. of highly diverse chem. nature and spanning 6 orders of magnitude of IE, has been compiled. Several useful trends were revealed. The ESI ionization efficiency of a compd. by protonation is mainly affected by three factors: basicity (expressed by pKaH in water), mol. size (expressed by molar volume or surface area), and hydrophobicity of the ion (expressed by charge delocalization in the ion or its partition coeff. between a water-acetonitrile mixt. and hexane). The presented models can be used for tentative prediction of logIE of new compds. (under the used conditions) from parameters that can be computed using com. available software. The root mean square error of prediction is in the range of 0.7-0.8 log units.
- 10Liigand, J.; Wang, T.; Kellogg, J.; Smedsgaard, J.; Cech, N.; Kruve, A. Quantification for Non-Targeted LC/MS Screening without Standard Substances. Sci. Rep. 2020, 10, 5808, DOI: 10.1038/s41598-020-62573-z10Quantification for non-targeted LC/MS screening without standard substancesLiigand, Jaanus; Wang, Tingting; Kellogg, Joshua; Smedsgaard, Joern; Cech, Nadja; Kruve, AnneliScientific Reports (2020), 10 (1), 5808CODEN: SRCEC3; ISSN:2045-2322. (Nature Research)Non-targeted and suspect analyses with liq. chromatog./electrospray/high-resoln. mass spectrometry (LC/ESI/HRMS) are gaining importance as they enable identification of hundreds or even thousands of compds. in a single sample. Here, we present an approach to address the challenge to quantify compds. identified from LC/HRMS data without authentic stds. The approach uses random forest regression to predict the response of the compds. in ESI/HRMS with a mean error of 2.2 and 2.0 times for ESI pos. and neg. mode, resp. We observe that the predicted responses can be transferred between different instruments via a regression approach. Furthermore, we applied the predicted responses to est. the concn. of the compds. without the std. substances. The approach was validated by quantifying pesticides and mycotoxins in six different cereal samples. For applicability, the accuracy of the concn. prediction needs to be compatible with the effect (e.g. toxicol.) predictions. We achieved the av. quantification error of 5.4 times, which is well compatible with the accuracy of the toxicol. predictions.
- 11Leito, I.; Herodes, K.; Huopolainen, M.; Virro, K.; Künnapas, A.; Kruve, A.; Tanner, R. Towards the Electrospray Ionization Mass Spectrometry Ionization Efficiency Scale of Organic Compounds. Rapid Commun. Mass Spectrom. 2008, 22, 379– 384, DOI: 10.1002/rcm.337111Towards the electrospray ionization mass spectrometry ionization efficiency scale of organic compoundsLeito, Ivo; Herodes, Koit; Huopolainen, Merit; Virro, Kristina; Kunnapas, Allan; Kruve, Anneli; Tanner, RistoRapid Communications in Mass Spectrometry (2008), 22 (3), 379-384CODEN: RCMSEF; ISSN:0951-4198. (John Wiley & Sons Ltd.)An approach that allows setting up under predefined ionization conditions a rugged self-consistent quant. exptl. scale of electrospray ionization (ESI) efficiencies of org. compds. is presented. By ESI ionization efficiency (IE) we mean the efficiency of generating gas-phase ions from analyte mols. or ions in the ESI source. The approach is based on measurement of relative ionization efficiency (RIE) of two compds. (B1 and B2) by infusing a soln. contg. both compds. at known concns. (C1 and C2) and measuring the mass-spectrometric responses of the protonated forms of the compds. (R1 and R2). The RIE of B1 and B2 is expressed as logRIE(B1, B2) = log[(R1 · C2)/(C1 · R2)]. The relative way of measurement leads to cancellation of many of the factors affecting IE (ESI source design, voltages in the source and ion transport system, solvent compn., flow rates and temps. of the nebulizing and drying gases). Using this approach an ESI IE scale contg. ten compds. (esters and arom. amines) and spanning over 4 logRIE units has been compiled. The consistency of the scale (the consistency std. deviation of the scale is s = 0.16 logRIE units) was assured by making measurements using different concn. ratios (at least 6-fold concn. ratio range) of the compds. and by making circular validation measurements (the logRIE of any two compds. was checked by measuring both against a third compd.).
- 12Cech, N. B.; Enke, C. G. Relating Electrospray Ionization Response to Nonpolar Character of Small Peptides. Anal. Chem. 2000, 72, 2717– 2723, DOI: 10.1021/ac991486912Relating Electrospray Ionization Response to Nonpolar Character of Small PeptidesCech, Nadja B.; Enke, Christie G.Analytical Chemistry (2000), 72 (13), 2717-2723CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Nonpolar regions in biol. mols. are investigated as a detg. factor governing their electrospray ionization (ESI) mass spectrometric response. Response is compared for a series of peptides whose C-terminal residue is varied among amino acids with increasingly nonpolar side chains. Increased ESI response is obsd. for peptides with more extensive nonpolar regions. The basis for this increase is examd. by comparing values of nonpolar surface area and Gibbs free energy of transfer for the different amino acid residues. Comparisons of response with octadecylamine are also made, and this highly surface-active ion is obsd. to outcompete all other analytes in ESI response. These observations are rationalized on the basis of the equil. partitioning model, which is used successfully to fit exptl. data throughout the concn. range for several two-analyte systems. This model suggests that because excess charge exists on ESI droplet surfaces, an analyte's relative affinity for the droplet surface dets. its relative ESI response. Increased nonpolar character, which leads to enhanced affinity for the surface phase, results in more successful competition for excess charge and higher ESI response.
- 13Alymatiri, C. M.; Kouskoura, M. G.; Markopoulou, C. K. Decoding the Signal Response of Steroids in Electrospray Ionization Mode (ESI-MS). Anal. Methods 2015, 7, 10433– 10444, DOI: 10.1039/C5AY02839F13Decoding the signal response of steroids in electrospray ionization mode (ESI-MS)Alymatiri, Christina M.; Kouskoura, Maria G.; Markopoulou, Catherine K.Analytical Methods (2015), 7 (24), 10433-10444CODEN: AMNEGX; ISSN:1759-9679. (Royal Society of Chemistry)Electrospray ionization (ESI) is predominant among soft ionization techniques since it is considered as the method of choice for coupling liq. chromatog. with mass spectrometry (LC-MS). Despite the progress which has been achieved in the ion formation theory, the research community keep their interest in the parameters affecting the increase in the responsiveness of the signal. This particular problem is becoming more complex when the analytes studied are compds. not having characteristic moieties, which are responsible for a mol.'s ionization (carboxylic or amine groups). The present study attempts to decode the signal intensity by correlating it with a series of structural features and physicochem. properties corresponding to 30 steroids. These mols. share a common basic structure with only small differences in the substitution while they do not contain any basic or acidic group (pKbasic < -2.65, pKacidic > 10.6). The correlation and evaluation of the significance of the parameters causing an increase or decrease in the signal response was achieved using multivariate anal. via the Partial Least Squares methodol. (PLS). Moreover, the PLS models that were developed could be used as predictive tools of the signal intensity for unknown substances.
- 14Kruve, A.; Kaupmees, K. Adduct Formation in ESI/MS by Mobile Phase Additives. J. Am. Soc. Mass Spectrom. 2017, 28, 887– 894, DOI: 10.1007/s13361-017-1626-y14Adduct Formation in ESI/MS by Mobile Phase AdditivesKruve, Anneli; Kaupmees, KarlJournal of the American Society for Mass Spectrometry (2017), 28 (5), 887-894CODEN: JAMSEF; ISSN:1044-0305. (Springer)Adduct formation is a common ionization method in electrospray ionization mass spectrometry (ESI/MS). However, this process is poorly understood and complicated to control. The authors demonstrate possibilities to control adduct formation via mobile phase additives in ESI pos. mode for 17 oxygen and nitrogen bases. Mobile phase additives are a very effective measure for manipulating the formation efficiencies of adducts. An appropriate choice of additive may increase sensitivity by up to three orders of magnitude. In general, sodium adduct [M + Na]+ and protonated mol. [M + H]+ formation efficiencies are in good correlation; however, the former were significantly more influenced by mobile phase properties. Although the highest formation efficiencies for both species were obsd. in water/acetonitrile mixts. not contg. additives, the repeatability of the formation efficiencies is improved by additives. Mobile phase additives are powerful, yet not limiting factors, for altering adduct formation.
- 15Kostiainen, R.; Kauppila, T. J. Effect of Eluent on the Ionization Process in Liquid Chromatography–Mass Spectrometry. J. Chromatogr. A 2009, 1216, 685– 699, DOI: 10.1016/j.chroma.2008.08.09515Effect of eluent on the ionization process in liquid chromatography-mass spectrometryKostiainen, Risto; Kauppila, Tiina J.Journal of Chromatography A (2009), 1216 (4), 685-699CODEN: JCRAEY; ISSN:0021-9673. (Elsevier B.V.)A review. The most widely used ionization techniques in liq. chromatog.-mass spectrometry (LC-MS) are electrospray ionization (ESI), atm. pressure chem. ionization (APCI) and atm. pressure photoionization (APPI). All three provide user friendly coupling of LC to MS. Achieving optimal LC-MS conditions is not always easy, however, owing to the complexity of ionization processes and the many parameters affecting mass spectrometric sensitivity and chromatog. performance. The selection of eluent compn. requires particular attention since a solvent that is optimal for analyte ionization often does not provide acceptable retention and resoln. in LC. Compromises must then be made between ionization and chromatog. sepn. efficiencies. The review presents an overview of studies concerning the effect of eluent compn. on the ionization efficiency of ESI, APCI and APPI in LC-MS. Solvent characteristics are discussed in the light of ionization theories, and selected anal. applications are described. The aim is to provide practical background information for the development and optimization of LC-MS methods.
- 16Kebarle, P.; Tang, L. From Ions in Solution to Ions in the Gas Phase - the Mechanism of Electrospray Mass Spectrometry. Anal. Chem. 1993, 65, 972A– 986A, DOI: 10.1021/ac00070a00116From ions in solution to ions in the gas phase - the mechanism of electrospray mass spectrometryKebarle, Paul; Tang, LiangAnalytical Chemistry (1993), 65 (22), 972A-986ACODEN: ANCHAM; ISSN:0003-2700.The title topic is reviewed with 44 refs. The subjects include: the electrospray (ES) mechanism, prodn. of charged droplets at the ES capillary tip, shrinkage of charged ES droplets, nature of processes leading to formation of gas-phase ions, details of the Iribarne ion evapn. theory, dependence of ion intensities on concn., effects due to the addn. of 2 electrolytes to the solvent, comparison of coeffs. with Iribarne theory and SIDT (single ion in droplet theory), emission of gas-phase ions from the Taylor tip of the ES capillary, and formation mechanisms of multiply-charged macroions.
- 17Kruve, A. Influence of Mobile Phase, Source Parameters and Source Type on Electrospray Ionization Efficiency in Negative Ion Mode: Influence of Mobile Phase in ESI/MS. J. Mass Spectrom. 2016, 51, 596– 601, DOI: 10.1002/jms.379017Influence of mobile phase, source parameters and source type on electrospray ionization efficiency in negative ion modeKruve, AnneliJournal of Mass Spectrometry (2016), 51 (8), 596-601CODEN: JMSPFJ; ISSN:1076-5174. (John Wiley & Sons Ltd.)Electrospray ionization (ESI) efficiency is known to be affected by the properties of the analytes, source design and source parameters. In this study, the ionization efficiency of 17 acidic compds. at various conditions in ESI neg. ion mode was evaluated. Namely, the influence of org. solvent content in the mobile phase, ionization source parameters, ionization source geometry and functionality (conventional ESI, ESI with thermal focusing and with addnl. internal nebulizer gas) was studied. It was obsd. that the ionization efficiency in thermal focusing ESI is only marginally affected by the org. solvent compn., while for conventional ESI and ESI with internal nebulizer gas, the ionization efficiency increases significantly with increasing org. modifier content. For all ionization sources and mobile phase compns., the ionization efficiency values between different setups showed good correlation. Copyright © 2016 John Wiley & Sons, Ltd.
- 18Liigand, J.; Laaniste, A.; Kruve, A. PH Effects on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2017, 28, 461– 469, DOI: 10.1007/s13361-016-1563-118pH Effects on Electrospray Ionization EfficiencyLiigand, Jaanus; Laaniste, Asko; Kruve, AnneliJournal of the American Society for Mass Spectrometry (2017), 28 (3), 461-469CODEN: JAMSEF; ISSN:1044-0305. (Springer)Electrospray ionization efficiency is known to be affected by mobile phase compn. A detailed study of analyte ionization efficiency dependence on mobile phase pH is presented. The pH effect was studied on 28 compds. with different chem. properties. Neither pKa nor soln. phase ionization degree by itself is sufficient at describing how aq. phase pH affects the ionization efficiency of the analyte. Therefore, the analyte behavior was related to various physicochem. properties via linear discriminant analyses. Distinction between pH-dependent and pH-independent compds. was achieved using two parameters: no. of potential charge centers and hydrogen bonding acceptor capacity (in the case of 80% acetonitrile) or polarity of neutral form of analyte and pKa (in the case of 20% acetonitrile). Also decreasing pH may increase ionization efficiency of a compd. by more than two orders of magnitude.
- 19Liigand, J.; Kruve, A.; Leito, I.; Girod, M.; Antoine, R. Effect of Mobile Phase on Electrospray Ionization Efficiency. J. Am. Soc. Mass Spectrom. 2014, 25, 1853– 1861, DOI: 10.1007/s13361-014-0969-x19Effect of Mobile Phase on Electrospray Ionization EfficiencyLiigand, Jaanus; Kruve, Anneli; Leito, Ivo; Girod, Marion; Antoine, RodolpheJournal of the American Society for Mass Spectrometry (2014), 25 (11), 1853-1861CODEN: JAMSEF; ISSN:1044-0305. (Springer)Electrospray (ESI) ionization efficiencies (IE) of a set of 10 compds. differing by chem. nature, extent of ionization in soln. (basicity), and by hydrophobicity (tetrapropylammonium and tetraethylammonium ion, triethylamine, 1-naphthylamine, N,N-dimethylaniline, diphenylphthalate, dimethylphtahalate, piperidine, pyrrolidine, pyridine) were measured in seven mobile phases (three acetonitrile percentages 20%, 50%, and 80%, and three different pH-adjusting additives, 0.1% formic acid, 1 mM ammonia, pH 5.0 buffer combination) using the relative measurement method. MS parameters were optimized sep. for each ion. The resulting relative IE data were converted into comparable log IE values by anchoring them to the log IE of tetrapropylammonium ion taking into account the differences of ionization in different solvents and thereby making the logIE values of the compds. comparable across solvents. The following conclusions were made from anal. of the data. The compds. with pKa values in the range of the soln. pH values displayed higher IE at lower pH. The sensitivity of IE towards pH depends on hydrophobicity being very strong with pyridine, weaker with N,N-dimethylaniline, and weakest with 1-naphthylamine. IEs of tetraalkylammonium ions and triethylamine were expectedly insensitive towards soln. pH. Surprisingly high IEs of phthalate esters were obsd. The differences in solns. with different acetonitrile content and similar pH were smaller compared with the pH effects. These results highlight the importance of hydrophobicity in electrospray and demonstrate that high hydrophobicity can sometimes successfully compensate for low basicity.
- 20Ojakivi, M.; Liigand, J.; Kruve, A. Modifying the Acidity of Charged Droplets. ChemistrySelect 2018, 3, 335– 338, DOI: 10.1002/slct.20170226920Modifying the Acidity of Charged DropletsOjakivi, Mari; Liigand, Jaanus; Kruve, AnneliChemistrySelect (2018), 3 (1), 335-338CODEN: CHEMUD; ISSN:2365-6549. (Wiley-VCH Verlag GmbH & Co. KGaA)The concept of acidity in confined spaces is up to date poorly understood; esp., in case of media violating electroneutrality. Here, we describe the acidity of charged droplets via their ability to protonate simple nitrogen bases and we propose ways to modify the protonation efficiency with the help of additives. We obsd. that the protonation of compds. in charged water droplets is independent of soln.-phase acidity; instead, it can be adjusted with the help of additive type. On the other hand, the extent of protonation in charged methanol droplets can be adjusted with the conventional approach of changing the pH.
- 21Raji, M. A.; Schug, K. A. Chemometric Study of the Influence of Instrumental Parameters on ESI-MS Analyte Response Using Full Factorial Design. Int. J. Mass Spectrom. 2009, 279, 100– 106, DOI: 10.1016/j.ijms.2008.10.01321Chemometric study of the influence of instrumental parameters on ESI-MS analyte response using full factorial designRaji, M. A.; Schug, K. A.International Journal of Mass Spectrometry (2009), 279 (2-3), 100-106CODEN: IMSPF8; ISSN:1387-3806. (Elsevier B.V.)Full factorial exptl. design technique was used to study the main effects and the interaction effects between instrumental parameters in 2 mass spectrometers equipped with conventional electrospray ion sources (Thermo LCQ Deca XP and Shimadzu LCMS 2010). Four major parameters (spray voltage, ion transfer capillary temp., ion transfer capillary voltage, and tube lens voltage) were investigated in both instruments for their contribution to analyte response, leading to a total of 16 expts. performed for each instrument. Significant parameters were identified by plotting the cumulative probability of each treatment against the estd. effects in normal plots. Anal. of variance (ANOVA) was employed to evaluate the statistical significance of the effects of the parameters on ESI-MS analyte response. The results reveal a no. of important interactions in addn. to the main effects for each instrument. In all the expts. performed, the tube lens voltage (or Q-array d.c. voltage in LCMS 2010) was found to have significant effects on analyte response in both instruments. The tube lens voltage was also found to interact with the capillary temp. in the case of the LCQ Deca XP and with the spray voltage in the case of the LCMS 2010. The results of these expts. provide important considerations in the instrumental optimization of ionization response for ESI-MS anal.
- 22Palm, E.; Kruve, A. Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMS. Molecules 2022, 27, 1013, DOI: 10.3390/molecules2703101322Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMSPalm, Emma; Kruve, AnneliMolecules (2022), 27 (3), 1013CODEN: MOLEFW; ISSN:1420-3049. (MDPI AG)LC/ESI/HRMS is increasingly employed for monitoring chem. pollutants in water samples, with non-targeted anal. becoming more common. Unfortunately, due to the lack of anal. stds., non-targeted anal. is mostly qual. To remedy this, models have been developed to evaluate the response of compds. from their structure, which can then be used for quantification in non-targeted anal. Still, these models rely on tentatively known structures while for most detected compds., a list of structural candidates, or sometimes only exact mass and retention time are identified. In this study, a quantification approach was developed, where LC/ESI/HRMS descriptors are used for quantification of compds. even if the structure is unknown. The approach was developed based on 92 compds. analyzed in parallel in both pos. and neg. ESI mode with mobile phases at pH 2.7, 8.0, and 10.0. The developed approach was compared with two baseline approaches- one assuming equal response factors for all compds. and one using the response factor of the closest eluting std. The former gave a mean prediction error of a factor of 29, while the latter gave a mean prediction error of a factor of 1300. In the machine learning-based quantification approach developed here, the corresponding prediction error was a factor of 10. Furthermore, the approach was validated by analyzing two blind samples contg. 48 compds. spiked into tap water and ultrapure water. The obtained mean prediction error was lower than a factor of 6.0 for both samples. The errors were found to be comparable to approaches using structural information.
- 23Kalogiouri, N. P.; Aalizadeh, R.; Thomaidis, N. S. Investigating the Organic and Conventional Production Type of Olive Oil with Target and Suspect Screening by LC-QTOF-MS, a Novel Semi-Quantification Method Using Chemical Similarity and Advanced Chemometrics. Anal. Bioanal. Chem. 2017, 409, 5413– 5426, DOI: 10.1007/s00216-017-0395-623Investigating the organic and conventional production type of olive oil with target and suspect screening by LC-QTOF-MS, a novel semi-quantification method using chemical similarity and advanced chemometricsKalogiouri, Natasa P.; Aalizadeh, Reza; Thomaidis, Nikolaos S.Analytical and Bioanalytical Chemistry (2017), 409 (23), 5413-5426CODEN: ABCNBP; ISSN:1618-2642. (Springer)The discrimination of org. and conventional prodn. has been a crit. topic of public discussion and constitutes a scientific issue. It remains a challenge to establish a correlation between the agronomical practices and their effects on the compn. of olive oils, esp. the phenolic compn., since it defines their organoleptic and nutritional value. Thus, a liq. chromatog.-electrospray ionization-quadrupole time of flight tandem mass spectrometric method was developed, using target and suspect screening workflows, coupled with advanced chemometrics for the identification of phenolic compds. and the discrimination between org. and conventional extra virgin olive oils. The method was optimized by one-factor design and response surface methodol. to derive the optimal conditions of extn. (methanol/water (80:20, vol./vol.), pure methanol, or acetonitrile) and to select the most appropriate internal std. (caffeic acid or syringaldehyde). The results revealed that extn. with methanol/water (80:20, vol./vol.) was the optimum solvent system and syringaldehyde 1.30 mg L-1 was the appropriate internal std. The proposed method demonstrated low limits of detection in the range of 0.002 (luteolin) to 0.028 (tyrosol) mg kg-1. Then, it was successfully applied in 52 olive oils of Kolovi variety. In total, 13 target and 24 suspect phenolic compds. were identified. Target compds. were quantified with com. available stds. A novel semi-quantitation strategy, based on chem. similarity, was introduced for the semi-quantification of the identified suspects. Finally, ant colony optimization-random forest model selected luteolin as the only marker responsible for the discrimination, during a 2-yr study. [Figure not available: see fulltext.].
- 24Kruve, A.; Kiefer, K.; Hollender, J. Benchmarking of the Quantification Approaches for the Non-Targeted Screening of Micropollutants and Their Transformation Products in Groundwater. Anal. Bioanal. Chem. 2021, 413, 1549– 1559, DOI: 10.1007/s00216-020-03109-224Benchmarking of the quantification approaches for the non-targeted screening of micropollutants and their transformation products in groundwaterKruve, Anneli; Kiefer, Karin; Hollender, JulianeAnalytical and Bioanalytical Chemistry (2021), 413 (6), 1549-1559CODEN: ABCNBP; ISSN:1618-2642. (Springer)A wide range of micropollutants can be monitored with non-targeted screening; however, the quantification of the newly discovered compds. is challenging. Transformation products (TPs) are esp. problematic because anal. stds. are rarely available. Here, we compared three quantification approaches for non-target compds. that do not require the availability of anal. stds. The comparison is based on a unique set of concn. data for 341 compds., mainly pesticides, pharmaceuticals, and their TPs in 31 groundwater samples from Switzerland. The best accuracy was obsd. with the predicted ionization efficiency-based quantification, the mean error of concn. prediction for the groundwater samples was a factor of 1.8, and all of the 74 micropollutants detected in the groundwater were quantified with an error less than a factor of 10. The quantification of TPs with the parent compds. had significantly lower accuracy (mean error of a factor of 3.8) and could only be applied to a fraction of the detected compds., while the mean performance (mean error of a factor of 3.2) of the closest eluting std. approach was similar to the parent compd. approach.
- 25Dahal, U. P.; Jones, J. P.; Davis, J. A.; Rock, D. A. Small Molecule Quantification by Liquid Chromatography-Mass Spectrometry for Metabolites of Drugs and Drug Candidates. Drug Metab. Dispos. 2011, 39, 2355– 2360, DOI: 10.1124/dmd.111.04086525Small molecule quantification by liquid chromatography-mass spectrometry for metabolites of drugs and drug candidatesDahal, Upendra P.; Jones, Jeffrey P.; Davis, John A.; Rock, Dan A.Drug Metabolism and Disposition (2011), 39 (12), 2355-2360CODEN: DMDSAI; ISSN:0090-9556. (American Society for Pharmacology and Experimental Therapeutics)Identification and quantification of the metabolites of drugs and drug candidates are routinely performed using liq. chromatog.-mass spectrometry (LC-MS). The best practice is to generate a std. curve with the metabolite vs. the internal std. However, to avoid the difficulties in metabolite synthesis, std. curves are sometimes prepd. using the substrate, assuming that the signal for substrate and the metabolite will be equiv. We have tested the errors assocd. with this assumption using a series of very similar compds. that undergo common metabolic reactions using both conventional flow electrospray ionization LC-MS and low-flow captive spray ionization (CSI) LC-MS. The differences in std. curves for four different types of transformations (O-demethylation, N-demethylation, arom. hydroxylation, and benzylic hydroxylation) are presented. The results demonstrate that the signals of the substrates compared with those of the metabolites are statistically different in 18 of the 20 substrate-metabolite combinations for both methods. The ratio of the slopes of the std. curves varied up to 4-fold but was slightly less for the CSI method.
- 26Gyllenhammar, I.; Benskin, J. P.; Sandblom, O.; Berger, U.; Ahrens, L.; Lignell, S.; Wiberg, K.; Glynn, A. Perfluoroalkyl Acids (PFAAs) in Serum from 2–4-Month-Old Infants: Influence of Maternal Serum Concentration, Gestational Age, Breast-Feeding, and Contaminated Drinking Water. Environ. Sci. Technol. 2018, 52, 7101– 7110, DOI: 10.1021/acs.est.8b0077026Perfluoroalkyl Acids (PFAAs) in Serum from 2-4-Month-Old Infants: Influence of Maternal Serum Concentration, Gestational Age, Breast-Feeding, and Contaminated Drinking WaterGyllenhammar, Irina; Benskin, Jonathan P.; Sandblom, Oskar; Berger, Urs; Ahrens, Lutz; Lignell, Sanna; Wiberg, Karin; Glynn, AndersEnvironmental Science & Technology (2018), 52 (12), 7101-7110CODEN: ESTHAG; ISSN:0013-936X. (American Chemical Society)Little is known about factors influencing infant perfluorinated alkyl acid (PFAA) concns. Assocns. between serum PFAA concns. in 2-4-mo-old infants and determinants were investigated by multiple linear regression and General Linear Model (GLM) anal. In exclusively breastfed infants, maternal serum PFAA concns. 3 wk after delivery explained 13% (perfluoroundecanoic acid, PFUnDA) to 73% (perfluorohexane sulfonate, PFHxS) of infant PFAA concn. variation. Median infant/maternal ratios decreased with increasing PFAA carbon chain length from 2.8 for perfluoroheptanoic acid (PFHpA) and perfluorooctanoic acid (PFOA) to 0.53 for PFUnDA, and from 1.2 to 0.69 for PFHxS and perfluorooctane sulfonate (PFOS). Infant PFOA, perfluorononanoic acid (PFNA) and PFOS increased 0.7-1.2% per day of gestational age. Bottle-fed infants had 2 times lower mean concns. of PFAAs, and a higher mean percentage of branched (%br) PFOS isomers, than exclusively breastfed infants. PFOA, PFNA and PFHxS increased 8-11% per wk of exclusive breastfeeding. Infants living in an area receiving PFAA-contaminated drinking water had 3-fold higher mean perfluorobutane sulfonate (PFBS) and PFHxS concns., and higher mean %br PFHxS. Pre- and post-natal PFAA exposure significantly contribute to infant PFAA serum concns., depending on PFAA carbon-chain length. Moderately PFBS- and PFHxS-contaminated drinking water is an important indirect exposure source.
- 27Pieke, E. N.; Granby, K.; Trier, X.; Smedsgaard, J. A Framework to Estimate Concentrations of Potentially Unknown Substances by Semi-Quantification in Liquid Chromatography Electrospray Ionization Mass Spectrometry. Anal. Chim. Acta 2017, 975, 30– 41, DOI: 10.1016/j.aca.2017.03.05427A framework to estimate concentrations of potentially unknown substances by semi-quantification in liquid chromatography electrospray ionization mass spectrometryPieke, Eelco N.; Granby, Kit; Trier, Xenia; Smedsgaard, JoernAnalytica Chimica Acta (2017), 975 (), 30-41CODEN: ACACAM; ISSN:0003-2670. (Elsevier B.V.)Risk assessment of exposure to chems. from food and other sources rely on quant. information of the occurrence of these chems. As screening anal. is increasingly used, a strategy to semi-quantify unknown or untargeted analytes is required. A proof of concept strategy to semi-quantifying unknown substances in LC-MS was investigated by studying the responses of a chem. diverse marker set of 17 analytes using an exptl. design study. Optimal conditions were established using two optimization parameters related to weak-responding compds. and to the overall response. All the 17 selected analytes were semi-quantified using a different analyte to assess the quantification performance under various conditions. It was found that source conditions had strong effects on the responses, with the range of low-response signals varying from -80% to over +300% compared to center points. Pos. electrospray (ESI+) was found to have more complex source interactions than neg. electrospray (ESI-). Choice of quantification marker resulted in better quantification if the retention time difference was minimized (12 out of 12 cases error factor < 4.0) rather than if the accurate mass difference was minimized (7 out of 12 cases error factor < 4.0). Using optimal conditions and retention time selection, semi-quantification in ESI+ (70% quantified, av. prediction error factor 2.08) and ESI- (100% quantified, av. prediction error factor 1.74) yielded acceptable results for untargeted screening. The method was successfully applied to an ext. of food contact material contg. over 300 unknown substances. Without identification and authentic stds., the method was able to est. the concn. of a virtually unlimited no. of compds. thereby providing valuable data to prioritize compds. in risk assessment studies.
- 28Wu, L.; Wu, Y.; Shen, H.; Gong, P.; Cao, L.; Wang, G.; Hao, H. Quantitative Structure–Ion Intensity Relationship Strategy to the Prediction of Absolute Levels without Authentic Standards. Anal. Chim. Acta 2013, 794, 67– 75, DOI: 10.1016/j.aca.2013.07.03428Quantitative structure-ion intensity relationship strategy to the prediction of absolute levels without authentic standardsWu, Liang; Wu, Yuzheng; Shen, Hanyuan; Gong, Ping; Cao, Lijuan; Wang, Guangji; Hao, HaipingAnalytica Chimica Acta (2013), 794 (), 67-75CODEN: ACACAM; ISSN:0003-2670. (Elsevier B.V.)The lack of authentic stds. represents a major bottleneck in the quant. anal. of complex samples. Here the authors propose a quant. structure and ionization intensity relation (QSIIR) approach to predict the abs. levels of compds. in complex matrixes. An abs. quant. method for simultaneous quantification of 25 org. acids was firstly developed and validated. Napierian logarithm (LN) of the relative slope rate derived from the calibration curves was applied as an indicator of the relative ionization intensity factor (RIIF) and serves as the dependent variable for building a QSIIR model via a multiple linear regression (MLR) approach. Five independent variables representing for hydrogen bond acidity, HOMO energy, the no. of hydrogen bond donating group, the ratio of org. phase, and the polar solvent accessible surface area are the dominant contributors to the RIIF of org. acids. This QSIIR model was validated to be accurate and robust, with the correlation coeffs. (R2), R2 adjusted, and R2 prediction at 0.945, 0.925, and 0.89, resp. The deviation of accuracy between the predicted and exptl. value in analyzing a real complex sample was <20% in most cases (15/18). Also, the high adaptability of this model was validated one year later in another LC/MS system. The QSIIR approach is expected to provide better understanding of quant. structure and ionization efficiency relation of analogous compds., and also to be useful in predicting the abs. levels of analogous analytes in complex mixts.
- 29Kruve, A.; Kaupmees, K. Predicting ESI/MS Signal Change for Anions in Different Solvents. Anal. Chem. 2017, 89, 5079– 5086, DOI: 10.1021/acs.analchem.7b0059529Predicting ESI/MS Signal Change for Anions in Different SolventsKruve, Anneli; Kaupmees, KarlAnalytical Chemistry (Washington, DC, United States) (2017), 89 (9), 5079-5086CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)LC/ESI/MS is a technique widely used for qual. and quant. anal. in various fields. However, quantification is currently possible only for compds. for which the std. substances are available, as the ionization efficiency of different compds. in ESI source differs by orders of magnitude. In this paper we present an approach for quant. LC/ESI/MS anal. without std. substances. This approach relies on accurately predicting the ionization efficiencies in ESI source based on a model, which uses physicochem. parameters of analytes. Furthermore, the model has been made transferable between different mobile phases and instrument setups by using a suitable set of calibration compds. This approach has been validated both in flow injection and chromatog. mode with gradient elution.
- 30Liigand, P.; Liigand, J.; Cuyckens, F.; Vreeken, R. J.; Kruve, A. Ionisation Efficiencies Can Be Predicted in Complicated Biological Matrices: A Proof of Concept. Anal. Chim. Acta 2018, 1032, 68– 74, DOI: 10.1016/j.aca.2018.05.07230Ionisation efficiencies can be predicted in complicated biological matrices: A proof of conceptLiigand, Piia; Liigand, Jaanus; Cuyckens, Filip; Vreeken, Rob J.; Kruve, AnneliAnalytica Chimica Acta (2018), 1032 (), 68-74CODEN: ACACAM; ISSN:0003-2670. (Elsevier B.V.)The importance of metabolites is assessed based on their abundance. Most of the metabolites are at present identified based on ESI/MS measurements and the relative abundance is assessed from the relative peak areas of these metabolites. Unfortunately, relative intensities can be highly misleading as different compds. ionise with vastly different efficiency in the ESI source and matrix components may cause severe ionisation suppression. In order to reduce this inaccuracy, we propose predicting the ionisation efficiencies of the analytes in seven biol. matrixes (neat solvent, blood, plasma, urine, cerebrospinal fluid, brain and liver tissue homogenates). We demonstrate, that this approach may lead to an order of magnitude increase in accuracy even in complicated matrixes. For the analyses of 10 compds., mostly drugs, in neg. electrospray ionisation mode we reduce the predicted abundance mismatch compared to the actual abundance on av. from 660 to 8 times. The ionisation efficiencies were predicted based on i) the charge delocalisation parameter WAPS and ii) the degree of ionisation α, and the prediction model was subsequently validated based on the cross-validation method 'leave-one-out'.
- 31Panagopoulos Abrahamsson, D.; Park, J.-S.; Singh, R. R.; Sirota, M.; Woodruff, T. J. Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical Standards. J. Chem. Inf. Model. 2020, 60, 2718– 2727, DOI: 10.1021/acs.jcim.9b0109631Applications of Machine Learning to In Silico Quantification of Chemicals without Analytical StandardsPanagopoulos-Abrahamsson, Dimitri; Park, June-Soo; Singh, Randolph R.; Sirota, Marina; Woodruff, Tracey J.Journal of Chemical Information and Modeling (2020), 60 (6), 2718-2727CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Non-targeted anal. provides a comprehensive approach to analyze environmental and biol. samples for nearly all chems. present. One of the main shortcomings of current anal. methods and workflows is that they are unable to provide any quant. information constituting an important obstacle in understanding environmental fate and human exposure. Herein, we present an in silico quantification method using machine-learning for chems. analyzed using electrospray ionization (ESI). We considered three data sets from different instrumental setups: (i) capillary electrophoresis electrospray ionization-mass spectrometry (CE-MS) in pos. ionization mode (ESI+), (ii) liq. chromatog. quadrupole time-of-flight mass spectrometry (LC-QTOF/MS) in ESI+ and (iii) LC-QTOF/MS in neg. ionization mode (ESI-). We developed and applied two different machine-learning algorithms: a random forest (RF) and an artificial neural network (ANN) to predict the relative response factors (RRFs) of different chems. based on their physicochem. properties. Chem. concns. can then be calcd. by dividing the measured abundance of a chem., as peak area or peak height, by its corresponding RRF. We evaluated our models and tested their predictive power using 5-fold cross-validation (CV) and y randomization. Both the RF and the ANN models showed great promise in predicting RRFs. However, the accuracy of the predictions was dependent on the data set compn. and the exptl. setup. For the CE-MS ESI+ data set, the best model predicted measured RRFs with a mean abs. error (MAE) of 0.19 log units and a cross-validation coeff. of detn. (Q2) of 0.84 for the testing set. For the LC-QTOF/MS ESI+ data set, the best model predicted measured RRFs with an MAE of 0.32 and a Q2 of 0.40. For the LC-QTOF/MS ESI- data set, the best model predicted measured RRFs with a MAE of 0.50 and a Q2 of 0.20. Our findings suggest that machine-learning algorithms can be used for predicting concns. of nontargeted chems. with reasonable uncertainties, esp. in ESI+, while the application on ESI- remains a more challenging problem.
- 32Aalizadeh, R.; Panara, A.; Thomaidis, N. S. Development and Application of a Novel Semi-Quantification Approach in LC-QToF-MS Analysis of Natural Products. J. Am. Soc. Mass Spectrom. 2021, 32, 1412– 1423, DOI: 10.1021/jasms.1c0003232Development and Application of a Novel Semi-quantification Approach in LC-QToF-MS Analysis of Natural ProductsAalizadeh, Reza; Panara, Anthi; Thomaidis, Nikolaos S.Journal of the American Society for Mass Spectrometry (2021), 32 (6), 1412-1423CODEN: JAMSEF; ISSN:1879-1123. (American Chemical Society)Use of high-resoln. mass spectrometry (HRMS) including a MS calibration method has enabled simultaneous identification and quantification of knowns/unknowns. This has expanded our knowledge about the existing sample relevant chem. space in a way beyond reconciliation with a quantification task. This is largely due to fact that ref. stds. are not always available to achieve quant. anal. In this scenario, a semi-quant. approach can fill the gap and provide a rough estn. of concn. This research aimed to develop and compare several semi-quantification approaches based on chem. similarity or properties. The ionization efficiency scale was created for several groups of natural products. Advanced modeling approach based on a support vector machine was conducted to learn from the exptl. ionization efficiency and apply it to unknowns or suspected compds. to predict their ionization efficiency in electrospray ionization mode. The developed semi-quantification workflows could be useful in most HRMS based "omics" areas, esp. in natural products discovery.
- 33Aalizadeh, R.; Thomaidis, N. S.; Bletsou, A. A.; Gago-Ferrero, P. Quantitative Structure–Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental Samples. J. Chem. Inf. Model. 2016, 56, 1384– 1398, DOI: 10.1021/acs.jcim.5b0075233Quantitative Structure-Retention Relationship Models To Support Nontarget High-Resolution Mass Spectrometric Screening of Emerging Contaminants in Environmental SamplesAalizadeh, Reza; Thomaidis, Nikolaos S.; Bletsou, Anna A.; Gago-Ferrero, PabloJournal of Chemical Information and Modeling (2016), 56 (7), 1384-1398CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Over the past decade, the application of liq. chromatog.-high resoln. mass spectroscopy (LC-HRMS) has been growing extensively due to its ability to analyze a wide range of suspected and unknown compds. in environmental samples. However, various criteria, such as mass accuracy and isotopic pattern of the precursor ion, MS/MS spectra evaluation, and retention time plausibility, should be met to reach a certain identification confidence. In this context, a comprehensive work-flow based on computational tools was developed to understand the retention time behavior of a large no. of compds. belonging to emerging contaminants. Two extensive data sets were built for 2 chromatog. systems, 1 for pos. and 1 for neg. electrospray ionization mode, contg. information for the retention time of 528 and 298 compds., resp., to expand the applicability domain of the developed models. Then, the data sets were split into training and test set, employing k-nearest neighborhood clustering, to build and validate the models' internal and external prediction ability. The best subset of mol. descriptors was selected using genetic algorithms. Multiple linear regression, artificial neural networks, and support vector machines were used to correlate the selected descriptors with the exptl. retention times. Several validation techniques were used, including Golbraikh-Tropsha acceptable model criteria, Euclidean based applicability domain, modified correlation coeff. (rm2), and concordance correlation coeff. values, to measure the accuracy and precision of the models. The best linear and nonlinear models for each data set were derived and used to predict the retention time of suspect compds. of a wide-scope survey, as the evaluation data set. For the efficient outlier detection and interpretation of the origin of the prediction error, a novel procedure and tool was developed and applied, enabling one to identify if the suspect compd. was in the applicability domain or not.
- 34Aalizadeh, R.; Alygizakis, N. A.; Schymanski, E. L.; Krauss, M.; Schulze, T.; Ibáñez, M.; McEachran, A. D.; Chao, A.; Williams, A. J.; Gago-Ferrero, P.; Covaci, A.; Moschet, C.; Young, T. M.; Hollender, J.; Slobodnik, J.; Thomaidis, N. S. Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget Screening. Anal. Chem. 2021, 93, 11601– 11611, DOI: 10.1021/acs.analchem.1c0234834Development and Application of Liquid Chromatographic Retention Time Indices in HRMS-Based Suspect and Nontarget ScreeningAalizadeh, Reza; Alygizakis, Nikiforos A.; Schymanski, Emma L.; Krauss, Martin; Schulze, Tobias; Ibanez, Maria; McEachran, Andrew D.; Chao, Alex; Williams, Antony J.; Gago-Ferrero, Pablo; Covaci, Adrian; Moschet, Christoph; Young, Thomas M.; Hollender, Juliane; Slobodnik, Jaroslav; Thomaidis, Nikolaos S.Analytical Chemistry (Washington, DC, United States) (2021), 93 (33), 11601-11611CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)There is an increasing need for comparable and harmonized retention times (tR) in liq. chromatog. (LC) among different labs., to provide supplementary evidence for the identity of compds. in high-resoln. mass spectrometry (HRMS)-based suspect and nontarget screening investigations. In this study, a rigorously tested, flexible, and less system-dependent unified retention time index (RTI) approach for LC is presented, based on the calibration of the elution pattern. Two sets of 18 calibrants were selected for each of ESI+ and ESI-based on the max. overlap with the retention times and chem. similarity indexes from a total set of 2123 compds. The resulting calibration set, with RTI set to range between 1 and 1000, was proposed as the most appropriate RTI system after rigorous evaluation, coordinated by the NORMAN network. The validation of the proposed RTI system was done externally on different instrumentation and LC conditions. The RTI can also be used to check the reproducibility and quality of LC conditions. Two quant. structure-retention relationship (QSRR)-based models were built based on the developed RTI systems, which assist in the removal of false-pos. annotations. The applicability domains of the QSRR models allowed completing the identification process with higher confidence for substances within the domain, while indicating those substances for which results should be treated with caution. The proposed RTI system was used to improve confidence in suspect and nontarget screening and increase the comparability between labs. as demonstrated for two examples. All RTI-related calcns. can be performed online at http://rti.chem.uoa.gr/.
- 35Kruve, A.; Kaupmees, K.; Liigand, J.; Leito, I. Negative Electrospray Ionization via Deprotonation: Predicting the Ionization Efficiency. Anal. Chem. 2014, 86, 4822– 4830, DOI: 10.1021/ac404066v35Negative Electrospray Ionization via Deprotonation: Predicting the Ionization EfficiencyKruve, Anneli; Kaupmees, Karl; Liigand, Jaanus; Leito, IvoAnalytical Chemistry (Washington, DC, United States) (2014), 86 (10), 4822-4830CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Electrospray ionization (ESI) in the neg. ion mode has received less attention in fundamental studies than the pos. ion electrospray ionization. In this paper, we study the efficiency of neg. ion formation in the ESI source via deprotonation of substituted phenols and benzoic acids and explore correlations of the obtained ionization efficiency values (logIE) with different mol. properties. It is obsd. that stronger acids (i.e., fully deprotonated in the droplets) yielding anions with highly delocalized charge [quantified by the weighted av. pos. sigma (WAPS) parameter rooted in the COSMO theory] have higher ionization efficiency and give higher signals in the neg.-ion ESI/MS. A linear model was obtained, which equally well describes the logIE of both phenols and benzoic acids (R2 = 0.83, S = 0.40 log units) and contains only an ionization degree in soln. (α) and WAPS as mol. parameters. Both parameters can easily be calcd. with the COSMO-RS method. The model was successfully validated using a test set of acids belonging neither to phenols nor to benzoic acids, thereby demonstrating its broad applicability and the universality of the above-described relationships between IE and mol. properties.
- 36Mayhew, A. W.; Topping, D. O.; Hamilton, J. F. New Approach Combining Molecular Fingerprints and Machine Learning to Estimate Relative Ionization Efficiency in Electrospray Ionization. ACS Omega 2020, 5, 9510– 9516, DOI: 10.1021/acsomega.0c0073236New Approach Combining Molecular Fingerprints and Machine Learning to Estimate Relative Ionization Efficiency in Electrospray IonizationMayhew, Alfred W.; Topping, David O.; Hamilton, Jacqueline F.ACS Omega (2020), 5 (16), 9510-9516CODEN: ACSODF; ISSN:2470-1343. (American Chemical Society)Electrospray ionization (ESI) is widely used as an ionization source for the anal. of complex mixts. by mass spectrometry. However, different compds. ionize more or less effectively in the ESI source, meaning instrument responses can vary by orders of magnitude, often in hard-to-predict ways. This precludes the use of ESI for quant. anal. where authentic stds. are not available. Relative ionization efficiency (RIE) scales have been proposed as a route to predict the response of compds. in ESI. In this work, a scale of RIEs was constructed for 51 carboxylic acids, spanning a wide range of addnl. functionalities, to produce a model for predicting the RIE of unknown compds. While using a limited no. of compds., we explore the usefulness of building a predictor using popular supervised regression techniques, encoding the compds. as combinations of different structural features using a range of common "fingerprints". It was found that Bayesian ridge regression gives the best predictive model, encoding compds. using features designed for activity coeff. models. This produced a predictive model with an R2 score of 0.62 and a root-mean-square error (RMSE) of 0.362. Such scores are comparable to those obtained in previous studies but without the requirement to first measure or predict the phys. properties of the compds., potentially reducing the time required to make predictions.
- 37Aalizadeh, R.; Nikolopoulou, V.; Alygizakis, N.; Slobodnik, J.; Thomaidis, N. S. A Novel Workflow for Semi-Quantification of Emerging Contaminants in Environmental Samples Analyzed by LC-HRMS. Anal. Bioanal. Chem. 2022, 414, 7435– 7450, DOI: 10.1007/s00216-022-04084-637A novel workflow for semi-quantification of emerging contaminants in environmental samples analyzed by LC-HRMSAalizadeh, Reza; Nikolopoulou, Varvara; Alygizakis, Nikiforos; Slobodnik, Jaroslav; Thomaidis, Nikolaos S.Analytical and Bioanalytical Chemistry (2022), 414 (25), 7435-7450CODEN: ABCNBP; ISSN:1618-2642. (Springer)There is an increasing need for developing a strategy to quantify the newly identified substances in environmental samples, where there are not always ref. stds. available. The semi-quant. anal. can assist risk assessment of chems. and their environmental fate. In this study, a rigorously tested and system-independent semi-quantification workflow is proposed based on ionization efficiency measurement of emerging contaminants analyzed in liq. chromatog.-high-resoln. mass spectrometry. The quant. structure-property relationship (QSPR)-based model was built to predict the ionization efficiency of unknown compds. which can be later used for their semi-quantification. The proposed semi-quantification method was applied and tested in real environmental seawater samples. All semi-quantification-related calcns. can be performed online and free of access at http://trams.chem.uoa.gr/semiquantification/.
- 38Wang, S.; Basijokaite, R.; Murphy, B. L.; Kelleher, C. A.; Zeng, T. Combining Passive Sampling with Suspect and Nontarget Screening to Characterize Organic Micropollutants in Streams Draining Mixed-Use Watersheds. Environ. Sci. Technol. 2022, 56, 16726– 16736, DOI: 10.1021/acs.est.2c0293838Combining Passive Sampling with Suspect and Nontarget Screening to Characterize Organic Micropollutants in Streams Draining Mixed-Use WatershedsWang, Shiru; Basijokaite, Ruta; Murphy, Bethany L.; Kelleher, Christa A.; Zeng, TengEnvironmental Science & Technology (2022), 56 (23), 16726-16736CODEN: ESTHAG; ISSN:1520-5851. (American Chemical Society)Org. micropollutants (OMPs) represent an anthropogenic stressor on stream ecosystems. In this work, we combined passive sampling with suspect and nontarget screening enabled by liq. chromatog.-high-resoln. mass spectrometry to characterize complex mixts. of OMPs in streams draining mixed-use watersheds. Suspect screening identified 122 unique OMPs for target quantification in polar org. chem. integrative samplers (POCIS) and grab samples collected from 20 stream sites in upstate New York over two sampling seasons. Hierarchical clustering established the co-occurrence profiles of OMPs in connection with watershed attributes indicative of anthropogenic influences. Nontarget screening leveraging the time-integrative nature of POCIS and the cross-site variability in watershed attributes prioritized and confirmed 11 addnl. compds. that were ubiquitously present in monitored streams. Field sampling rates for 37 OMPs that simultaneously occurred in POCIS and grab samples spanned the range of 0.02 to 0.22 L/d with a median value of 0.07 L/d. Comparative analyses of the daily av. loads, cumulative exposure-activity ratios, and multi-substance potentially affected fractions supported the feasibility of complementing grab sampling with POCIS for OMP load estn. and screening-level risk assessments. Overall, this work demonstrated a multi-watershed sampling and screening approach that can be adapted to assess OMP contamination in streams across landscapes.
- 39Krier, J.; Singh, R. R.; Kondić, T.; Lai, A.; Diderich, P.; Zhang, J.; Thiessen, P. A.; Bolton, E. E.; Schymanski, E. L. Discovering Pesticides and Their TPs in Luxembourg Waters Using Open Cheminformatics Approaches. Environ. Int. 2022, 158, 106885 DOI: 10.1016/j.envint.2021.10688539Discovering pesticides and their TPs in Luxembourg waters using open cheminformatics approachesKrier, Jessy; Singh, Randolph R.; Kondic, Todor; Lai, Adelene; Diderich, Philippe; Zhang, Jian; Thiessen, Paul A.; Bolton, Evan E.; Schymanski, Emma L.Environment International (2022), 158 (), 106885CODEN: ENVIDV; ISSN:0160-4120. (Elsevier Ltd.)The diversity of hundreds of thousands of potential org. pollutants and the lack of (publicly available) information about many of them is a huge challenge for environmental sciences, engineering, and regulation. Suspect screening based on high-resoln. liq. chromatog.-mass spectrometry (LC-HRMS) has enormous potential to help characterize the presence of these chems. in our environment, enabling the detection of known and newly emerging pollutants, as well as their potential transformation products (TPs). Here, suspect list creation (focusing on pesticides relevant for Luxembourg, incorporating data sources in 4 languages) was coupled to an automated retrieval of related TPs from PubChem based on high confidence suspect hits, to screen for pesticides and their TPs in Luxembourgish river samples. A computational workflow was established to combine LC-HRMS anal. and pre-screening of the suspects (including automated quality control steps), with spectral annotation to det. which pesticides and, in a second step, their related TPs may be present in the samples. The data anal. with Shinyscreen (https://gitlab.lcsb.uni.lu/eci/shinyscreen/), an open source software developed in house, coupled with custom-made scripts, revealed the presence of 162 potential pesticide masses and 96 potential TP masses in the samples. Further identification of these mass matches was performed using the open source approach MetFrag (https://msbi.ipb-halle.de/MetFrag/). Eventual target anal. of 36 suspects resulted in 31 pesticides and TPs confirmed at Level-1 (highest confidence), and five pesticides and TPs not confirmed due to different retention times. Spatio-temporal anal. of the results showed that TPs and pesticides followed similar trends, with a max. no. of potential detections in July. The highest detections were in the rivers Alzette and Mess and the lowest in the Sure and Eisch. This study (a) added pesticides, classification information and related TPs into the open domain, (b) developed automated open source retrieval methods - both enhancing FAIRness (Findability, Accessibility, Interoperability and Reusability) of the data and methods; and (c) will directly support "L'Administration de la Gestion de l'Eau" on further monitoring steps in Luxembourg.
- 40Schymanski, E. L.; Kondić, T.; Neumann, S.; Thiessen, P. A.; Zhang, J.; Bolton, E. E. Empowering Large Chemical Knowledge Bases for Exposomics: PubChemLite Meets MetFrag. J. Cheminformatics 2021, 13, 19, DOI: 10.1186/s13321-021-00489-040Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFragSchymanski, Emma L.; Kondic, Todor; Neumann, Steffen; Thiessen, Paul A.; Zhang, Jian; Bolton, Evan E.Journal of Cheminformatics (2021), 13 (1), 19CODEN: JCOHB3; ISSN:1758-2946. (SpringerOpen)In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resoln. mass spectrometry. Benchmarking datasets from earlier publications are used to show how exptl. knowledge and existing datasets can be used to detect and fill gaps in compd. databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome addnl. community input on ideas for future developments.
- 41Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A. A.; Melnik, A. V.; Meusel, M.; Dorrestein, P. C.; Rousu, J.; Böcker, S. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16, 299– 302, DOI: 10.1038/s41592-019-0344-841SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure informationDuehrkop, Kai; Fleischauer, Markus; Ludwig, Marcus; Aksenov, Alexander A.; Melnik, Alexey V.; Meusel, Marvin; Dorrestein, Pieter C.; Rousu, Juho; Boecker, SebastianNature Methods (2019), 16 (4), 299-302CODEN: NMAEA3; ISSN:1548-7091. (Nature Research)Mass spectrometry is a predominant exptl. technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for mol. structure identification. SIRIUS 4 integrates CSI:FingerID for searching in mol. structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.
- 42Paszkiewicz, M.; Godlewska, K.; Lis, H.; Caban, M.; Białk-Bielińska, A.; Stepnowski, P. Advances in Suspect Screening and Non-Target Analysis of Polar Emerging Contaminants in the Environmental Monitoring. TrAC Trends Anal. Chem. 2022, 154, 116671 DOI: 10.1016/j.trac.2022.11667142Advances in suspect screening and non-target analysis of polar emerging contaminants in the environmental monitoringPaszkiewicz, Monika; Godlewska, Klaudia; Lis, Hanna; Caban, Magda; Bialk-Bielinska, Anna; Stepnowski, PiotrTrAC, Trends in Analytical Chemistry (2022), 154 (), 116671CODEN: TTAEDJ; ISSN:0165-9936. (Elsevier B.V.)A review. The prodn. and use of chems. worldwide, and thus the no. of those that can potentially leach into the environment, is constantly increasing. Recent advances in anal. techniques provide the opportunity to detect a wide range of contaminants in water that would not be detected by traditional targeted anal. (TA) methods. These advanced techniques include the use of high-resoln. mass spectrometry (HRMS) or tandem HRMS in suspect screening anal. (SSA) or non-target anal. (NTA). This review presents the advances of the last five years for SSA and NTA of polar emerging contaminants (ECs) in various matrixes, including drinking water, surface water, wastewater, and soil/sediment. We discuss all steps in the anal. procedure, including novel sampling and extn. approaches, GC or LC-HRMS anal., (pre)data processing, evaluation, and reporting. We also identify challenges and future trends in SSA and NTA monitoring of polar ECs.
- 43Meng, D.; Fan, D.; Gu, W.; Wang, Z.; Chen, Y.; Bu, H.; Liu, J. Development of an integral strategy for non-target and target analysis of site-specific potential contaminants in surface water: A case study of Dianshan Lake, China. Chemosphere 2020, 243, 125367 DOI: 10.1016/j.chemosphere.2019.12536743Development of an integral strategy for non-target and target analysis of site-specific potential contaminants in surface water: A case study of Dianshan Lake, ChinaMeng, Di; Fan, De-ling; Gu, Wen; Wang, Zhen; Chen, Yong-jie; Bu, Hong-zhong; Liu, Ji-ningChemosphere (2020), 243 (), 125367CODEN: CMSHAF; ISSN:0045-6535. (Elsevier Ltd.)Surface water contains a large no. of potential pollutants and their transformation products, which cannot be discovered by normal target anal. alone. To detect site-specific and unknown contaminants in the environment, we established an integral anal. strategy based on liq. chromatog.-high resoln. mass spectrometry (LC-HRMS) combined with data processing using specific software (Compd. Discoverer 3.0). In this case study of Dianshan Lake, 95 potential contaminants were tentatively identified and ranked by the scoring system. Then, the 95 compds. were categorized into 4 subgroups: pesticides, drugs, plastic additives and surfactants. To det. the sources and distribution of those pollutants, 4 heat maps were developed based on the sum of peak areas of resp. categories. In addn., 19 substances with high exposure risk among the 95 compds. tentatively identified were confirmed and quantified. In the present study, the anal. strategy with non-target screening followed by target anal. demonstrated that pesticides and plastic additives are the two dominant types of contaminants in Dianshan Lake. High accuracy and high-resoln. data combined with integrated software provided abundant information for the identification of a wide range of potential contaminants in the environment. This approach can be a useful tool for the simple and rapid screening and tentative detection of site-specific contaminants.
- 44Groff, L. C.; Grossman, J. N.; Kruve, A.; Minucci, J. M.; Lowe, C. N.; McCord, J. P.; Kapraun, D. F.; Phillips, K. A.; Purucker, S. T.; Chao, A.; Ring, C. L.; Williams, A. J.; Sobus, J. R. Uncertainty Estimation Strategies for Quantitative Non-Targeted Analysis. Anal. Bioanal. Chem. 2022, 414, 4919– 4933, DOI: 10.1007/s00216-022-04118-z44Uncertainty estimation strategies for quantitative non-targeted analysisGroff II, Louis C.; Grossman, Jarod N.; Kruve, Anneli; Minucci, Jeffrey M.; Lowe, Charles N.; McCord, James P.; Kapraun, Dustin F.; Phillips, Katherine A.; Purucker, S. Thomas; Chao, Alex; Ring, Caroline L.; Williams, Antony J.; Sobus, Jon R.Analytical and Bioanalytical Chemistry (2022), 414 (17), 4919-4933CODEN: ABCNBP; ISSN:1618-2642. (Springer)Non-targeted anal. (NTA) methods are widely used for chem. discovery but seldom employed for quantitation due to a lack of robust methods to est. chem. concns. with confidence limits. Herein, we present and evaluate new statistical methods for quant. NTA (qNTA) using high-resoln. mass spectrometry (HRMS) data from EPA's Non-Targeted Anal. Collaborative Trial (ENTACT). Exptl. intensities of ENTACT analytes were obsd. at multiple concns. using a semi-automated NTA workflow. Chem. concns. and corresponding confidence limits were first estd. using traditional calibration curves. Two qNTA estn. methods were then implemented using exptl. response factor (RF) data (where RF = intensity/concn.). The bounded response factor method used a non-parametric bootstrap procedure to est. select quantiles of training set RF distributions. Quantile ests. then were applied to test set HRMS intensities to inversely est. concns. with confidence limits. The ionization efficiency estn. method restricted the distribution of likely RFs for each analyte using ionization efficiency predictions. Given the intended future use for chem. risk characterization, predicted upper confidence limits (protective values) were compared to known chem. concns. Using traditional calibration curves, 95% of upper confidence limits were within ∼tenfold of the true concns. The error increased to ∼60-fold (ESI+) and ∼120-fold (ESI-) for the ionization efficiency estn. method and to ∼150-fold (ESI+) and ∼130-fold (ESI-) for the bounded response factor method. This work demonstrates successful implementation of confidence limit estn. strategies to support qNTA studies and marks a crucial step towards translating NTA data in a risk-based context.
- 45Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite Identification and Molecular Fingerprint Prediction through Machine Learning. Bioinformatics 2012, 28, 2333– 2341, DOI: 10.1093/bioinformatics/bts43745Metabolite identification and molecular fingerprint prediction through machine learningHeinonen, Markus; Shen, Huibin; Zamboni, Nicola; Rousu, JuhoBioinformatics (2012), 28 (18), 2333-2341CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modeling and network anal. Yet, currently this task requires matching the obsd. spectrum against a database of ref. spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of mols. not present in ref. databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of mol. characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of mol. properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large mol. databases, such as PubChem. We demonstrate that several mol. properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the ref. database does not contain any spectra of the same mol.
- 46Meekel, N.; Vughs, D.; Béen, F.; Brunner, A. M. Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition. Anal. Chem. 2021, 93, 5071– 5080, DOI: 10.1021/acs.analchem.0c0447346Online prioritization of toxic compounds in water samples through intelligent HRMS data acquisitionMeekel, Nienke; Vughs, Dennis; Been, Frederic; Brunner, Andrea M.Analytical Chemistry (Washington, DC, United States) (2021), 93 (12), 5071-5080CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)LC-HRMS-based nontarget screening (NTS) has become the method of choice to monitor org. micropollutants (OMPs) in drinking water and its sources. OMPs are identified by matching exptl. fragmentation (MS2) spectra with library or in silico-predicted spectra. This requires informative exptl. spectra and prioritization to reduce feature nos., currently performed post data acquisition. Here, we propose a different prioritization strategy to ensure high-quality MS2 spectra for OMPs that pose an environmental or human health risk. This online prioritization triggers MS2 events based on detection of suspect list entries or isotopic patterns in the full scan or an addnl. MS2 event based on fragment ion(s)/patterns detected in a first MS2 spectrum. Triggers were detd. using cheminformatics; potentially toxic compds. were selected based on the presence of structural alerts, in silico-fragmented, and recurring fragments and mass shifts characteristic for a given structural alert identified. After MS acquisition parameter optimization, performance of the online prioritization was exptl. examd. Triggered methods led to increased percentages of MS2 spectra and addnl. MS2 spectra for compds. with a structural alert. Application to surface water samples resulted in addnl. MS2 spectra of potentially toxic compds., facilitating more confident identification and emphasizing the method's potential to improve monitoring studies.
- 47Peets, P.; Wang, W.-C.; MacLeod, M.; Breitholtz, M.; Martin, J. W.; Kruve, A. MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS. Environ. Sci. Technol. 2022, 56, 15508– 15517, DOI: 10.1021/acs.est.2c0253647MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMSPeets, Pilleriin; Wang, Wei-Chieh; MacLeod, Matthew; Breitholtz, Magnus; Martin, Jonathan W.; Kruve, AnneliEnvironmental Science & Technology (2022), 56 (22), 15508-15517CODEN: ESTHAG; ISSN:1520-5851. (American Chemical Society)To achieve water quality objectives of the zero pollution action plan in Europe, rapid methods are needed to identify the presence of toxic substances in complex water samples. However, only a small fraction of chems. detected with nontarget high-resoln. mass spectrometry can be identified, and fewer have ecotoxicol. data available. We hypothesized that ecotoxicol. data could be predicted for unknown mol. features in data-rich high-resoln. mass spectrometry (HRMS) spectra, thereby circumventing time-consuming steps of mol. identification and rapidly flagging mols. of potentially high toxicity in complex samples. Here, we present MS2Tox, a machine learning method, to predict the toxicity of unidentified chems. based on high-resoln. accurate mass tandem mass spectra (MS2). The MS2Tox model for fish toxicity was trained and tested on 647 lethal concn. (LC50) values from the CompTox database and validated for 219 chems. and 420 MS2 spectra from MassBank. The root mean square error (RMSE) of MS2Tox predictions was below 0.89 log-mM, while the exptl. repeatability of LC50 values in CompTox was 0.44 log-mM. MS2Tox allowed accurate prediction of fish LC50 values for 22 chems. detected in water samples, and empirical evidence suggested the right directionality for another 68 chems. Moreover, by incorporating structural information, e.g., the presence of carbonyl-benzene, amide moieties, or hydroxyl groups, MS2Tox outperforms baseline models that use only the exact mass or log KOW.
- 48Hoffmann, M. A.; Nothias, L.-F.; Ludwig, M.; Fleischauer, M.; Gentry, E. C.; Witting, M.; Dorrestein, P. C.; Dührkop, K.; Böcker, S. High-Confidence Structural Annotation of Metabolites Absent from Spectral Libraries. Nat. Biotechnol. 2022, 40, 411– 421, DOI: 10.1038/s41587-021-01045-948High-confidence structural annotation of metabolites absent from spectral librariesHoffmann, Martin A.; Nothias, Louis-Felix; Ludwig, Marcus; Fleischauer, Markus; Gentry, Emily C.; Witting, Michael; Dorrestein, Pieter C.; Duehrkop, Kai; Boecker, SebastianNature Biotechnology (2022), 40 (3), 411-421CODEN: NABIF9; ISSN:1087-0156. (Nature Portfolio)Untargeted metabolomics expts. rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel d. P value estn. and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial no. of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic stds. In human samples, we annotated and manually validated 315 mol. structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics expts. led to 1,715 high-confidence structural annotations that were absent from spectral libraries.
- 49Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, 12580– 12585, DOI: 10.1073/pnas.150978811249Searching molecular structure databases with tandem mass spectra using CSI:FingerIDDuehrkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Boecker, SebastianProceedings of the National Academy of Sciences of the United States of America (2015), 112 (41), 12580-12585CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics expts. usually rely on tandem MS to identify the thousands of compds. in a biol. sample. Today, the vast majority of metabolites remain unknown. The authors present a method for searching mol. structure databases using tandem MS data of small mols. The authors' method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown mol. The authors use the fragmentation tree to predict the mol. structure fingerprint of the unknown compd. using machine learning. This fingerprint is then used to search a mol. structure database such as PubChem. The authors' method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.
- 50Böcker, S.; Dührkop, K. Fragmentation Trees Reloaded. J. Cheminformatics 2016, 8, 5, DOI: 10.1186/s13321-016-0116-850Fragmentation trees reloadedBocker Sebastian; Duhrkop KaiJournal of cheminformatics (2016), 8 (), 5 ISSN:1758-2946.BACKGROUND: Untargeted metabolomics commonly uses liquid chromatography mass spectrometry to measure abundances of metabolites; subsequent tandem mass spectrometry is used to derive information about individual compounds. One of the bottlenecks in this experimental setup is the interpretation of fragmentation spectra to accurately and efficiently identify compounds. Fragmentation trees have become a powerful tool for the interpretation of tandem mass spectrometry data of small molecules. These trees are determined from the data using combinatorial optimization, and aim at explaining the experimental data via fragmentation cascades. Fragmentation tree computation does not require spectral or structural databases. To obtain biochemically meaningful trees, one needs an elaborate optimization function (scoring). RESULTS: We present a new scoring for computing fragmentation trees, transforming the combinatorial optimization into a Maximum A Posteriori estimator. We demonstrate the superiority of the new scoring for two tasks: both for the de novo identification of molecular formulas of unknown compounds, and for searching a database for structurally similar compounds, our method SIRIUS 3, performs significantly better than the previous version of our method, as well as other methods for this task. CONCLUSION: SIRIUS 3 can be a part of an untargeted metabolomics workflow, allowing researchers to investigate unknowns using automated computational methods.Graphical abstractWe present a new scoring for computing fragmentation trees from tandem mass spectrometry data based on Bayesian statistics. The best scoring fragmentation tree most likely explains the molecular formula of the measured parent ion.
- 51Klekota, J.; Roth, F. P. Chemical Substructures That Enrich for Biological Activity. Bioinformatics 2008, 24, 2518– 2525, DOI: 10.1093/bioinformatics/btn47951Chemical substructures that enrich for biological activityKlekota, Justin; Roth, Frederick P.Bioinformatics (2008), 24 (21), 2518-2525CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Certain chem. substructures are present in many drugs. This has led to the claim of 'privileged' substructures which are predisposed to bioactivity. Because bias in screening library construction could explain this phenomenon, the existence of privilege was controversial. Using diverse phenotypic assays, we defined bioactivity for multiple compd. libraries. Many substructures were assocd. with bioactivity even after accounting for substructure prevalence in the library, thus validating the privileged substructure concept. Detns. of privilege were confirmed in independent assays and libraries. Our anal. also revealed 'underprivileged' substructures and conditional privilege'-rules relating combinations of substructure to bioactivity. Most previously reported substructures were flat arom. ring systems. Although we validated such substructures, we also identified 3D privileged substructures. Most privileged substructures display a wide variety of substituents suggesting an entropic mechanism of privilege. Compds. contg. privileged substructures had a doubled rate of bioactivity, suggesting practical consequences for pharmaceutical discovery.
- 52Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273– 1280, DOI: 10.1021/ci010132r52Reoptimization of MDL Keys for Use in Drug DiscoveryDurant, Joseph L.; Leland, Burton A.; Henry, Douglas R.; Nourse, James G.Journal of Chemical Information and Computer Sciences (2002), 42 (6), 1273-1280CODEN: JCISD8; ISSN:0095-2338. (American Chemical Society)For a no. of years MDL products have exposed both 166 bit and 960 bit keysets based on 2D descriptors. These keysets were originally constructed and optimized for substructure searching. We report on improvements in the performance of MDL keysets which are reoptimized for use in mol. similarity. Classification performance for a test data set of 957 compds. was increased from 0.65 for the 166 bit keyset and 0.67 for the 960 bit keyset to 0.71 for a surprisal S/N pruned keyset contg. 208 bits and 0.71 for a genetic algorithm optimized keyset contg. 548 bits. We present an overview of the underlying technol. supporting the definition of descriptors and the encoding of these descriptors into keysets. This technol. allows definition of descriptors as combinations of atom properties, bond properties, and at. neighborhoods at various topol. sepns. as well as supporting a no. of custom descriptors. These descriptors can then be used to set one or more bits in a keyset. We constructed various keysets and optimized their performance in clustering bioactive substances. Performance was measured using methodol. developed by Briem and Lessel. "Directed pruning" was carried out by eliminating bits from the keysets on the basis of random selection, values of the surprisal of the bit, or values of the surprisal S/N ratio of the bit. The random pruning expt. highlighted the insensitivity of keyset performance for keyset lengths of more than 1000 bits. Contrary to initial expectations, pruning on the basis of the surprisal values of the various bits resulted in keysets which underperformed those resulting from random pruning. In contrast, pruning on the basis of the surprisal S/N ratio was found to yield keysets which performed better than those resulting from random pruning. We also explored the use of genetic algorithms in the selection of optimal keysets. Once more the performance was only a weak function of keyset size, and the optimizations failed to identify a single globally optimal keyset. Instead multiple, equally optimal keysets could be produced which had relatively low overlap of the descriptors they encoded.
- 53Guha, R. Chemical Informatics Functionality in R. J. Stat. Softw. 2007, 18, 1– 16, DOI: 10.18637/jss.v018.i05There is no corresponding record for this reference.
- 54Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: San Francisco, California, USA, 2016; pp. 785– 794.There is no corresponding record for this reference.
- 55Rashmi, K. V.; Gilad-Bachrach, R. DART: Dropouts Meet Multiple Additive Regression Trees. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics; PMLR: San Diego, CA, USA, 2015; Vol. 38, pp. 489– 497.There is no corresponding record for this reference.
- 56Kruve, A.; Aalizadeh, R.; Malm, L.; Alygizakis, N.; Thomaidis, N. S. Interlaboratory Comparison on Strategies for Semi-Quantitative Non-Targeted LC-ESI-HRMS, 2020. https://www.norman-network.net/sites/default/files/files/QA-QC%20Issues/Invitation%20letter%20JPA%202020%20semi-quant%20inter%20lab%20%28002%29.pdf.There is no corresponding record for this reference.
- 57NORMAN Network; Aalizadeh, R.; Alygizakis, N.; Schymanski, E.; Slobodnik, J.; Fischer, S.; Cirka, L. S0 | SUSDAT | Merged NORMAN Suspect List: SusDat. 2022, DOI: 10.5281/ZENODO.2664077 .There is no corresponding record for this reference.
- 58Gao, S.; Zhang, Z.; Karnes, H. Sensitivity Enhancement in Liquid Chromatography/Atmospheric Pressure Ionization Mass Spectrometry Using Derivatization and Mobile Phase Additives. J. Chromatogr., B 2005, 825, 98– 110, DOI: 10.1016/j.jchromb.2005.04.02158Sensitivity enhancement in liquid chromatography/atmospheric pressure ionization mass spectrometry using derivatization and mobile phase additivesGao, Songmei; Zhang, Zong-Ping; Karnes, H. T.Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences (2005), 825 (2), 98-110CODEN: JCBAAI; ISSN:1570-0232. (Elsevier B.V.)A review. High performance liq. chromatog. with atm. pressure ionization (API) mass spectrometry has been essential to a large no. of quant. anal. applications for a variety of compds. Poor detection sensitivity however is a problem obsd. for a no. of analytes because detection sensitivity can be affected by many factors. The two most crit. factors are the chem. and phys. properties of the analyte and the compn. of the mobile phase. To address these crit. factors which may lead to poor sensitivity, either the structure of the analyte must be modified or the mobile phase compn. optimized. The introduction of permanently charged moieties or readily ionized species may dramatically improve the ionization efficiency for electrospray ionization (ESI), and thus the sensitivity of detection. Detection sensitivity may also be enhanced via introduction of moieties with high proton affinity or electron affinity. Mobile phase component modification is an alternative way to enhance sensitivity by changing the form of the analytes in soln. thereby improving ionization efficiency. PH adjustment and adduct formation have been commonly used to optimize detection conditions. The sensitivity of detection for analytes in bio-matrixes could also be enhanced by decreasing ion-suppression from the matrix through derivatization or mobile phase addn. In this review, the authors will discuss detection-oriented derivatization as well as the application of mobile phase additives to enhance the sensitivity of detection in liq. chromatograph/atm. ionization/mass spectrometry (LC/API/MS), focusing in particular on the applications involving small mols. in bio-matrixes.
- 59Djoumbou Feunang, Y.; Eisner, R.; Knox, C.; Chepelev, L.; Hastings, J.; Owen, G.; Fahy, E.; Steinbeck, C.; Subramanian, S.; Bolton, E.; Greiner, R.; Wishart, D. S. ClassyFire: Automated Chemical Classification with a Comprehensive Computable Taxonomy. Aust. J. Chem. 2016, 8, 61, DOI: 10.1186/s13321-016-0174-yThere is no corresponding record for this reference.
- 60Wang, T.; Liigand, J.; Frandsen, H. L.; Smedsgaard, J.; Kruve, A. Standard Substances Free Quantification Makes LC/ESI/MS Non-Targeted Screening of Pesticides in Cereals Comparable between Labs. Food Chem. 2020, 318, 126460 DOI: 10.1016/j.foodchem.2020.12646060Standard substances free quantification makes LC/ESI/MS non-targeted screening of pesticides in cereals comparable between labsWang, Tingting; Liigand, Jaanus; Frandsen, Henrik Lauritz; Smedsgaard, Joern; Kruve, AnneliFood Chemistry (2020), 318 (), 126460CODEN: FOCHDJ; ISSN:0308-8146. (Elsevier Ltd.)LC/ESI/MS is the technique of choice for qual. and quant. food monitoring; however, anal. of a large no. of compds. is challenged by the availability of std. substances. The impediment of detection of food contaminants has been overcome by the suspect and non-targeted screening. Still, the results from one lab. cannot be compared with the results of another lab. as quant. results are required for this purpose. Here we show that the results of the suspect and non-targeted screening for pesticides can be made quant. with the aid of in silico predicted electrospray ionization efficiencies and this allows direct comparison of the results obtained in two different labs. For this purpose, six cereal matrixes were spiked with 134 pesticides and analyzed in two independent labs; a high correlation for the results with the R2 of 0.85.
Supporting Information
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.3c01744.
Data unification process in detail; example code how the data were unified based on either dataset 1 or a unified dataset; overview of datasets containing metadata and ionization efficiency information used for modeling; comparison between all tested molecular descriptors or fingerprints and machine learning algorithms; overview of the calibrants and suspects used in validation and experimental conditions; a statistical and graphical overview of MS2Quant and structure-based models’ performances on the validation set; overview of incorrectly identified structures and their highest ranked assigned structure by SIRIUS+CSI:FingerID; SIRIUS calculations and parameters used; top 10 most influential variables in a PaDEL-based model developed here; top 10 most influential variables, their SHAP values, and error distribution of different chemical classes assigned by ClassyFire for PaDEL-based model developed here; and first decision three of xgbTree algorithm-based models developed using structural fingerprints and PaDEL descriptors (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.