ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img
RETURN TO ISSUEPREVContaminants in Aqua...Contaminants in Aquatic and Terrestrial EnvironmentsNEXT

MS2Tox Machine Learning Tool for Predicting the Ecotoxicity of Unidentified Chemicals in Water by Nontarget LC-HRMS

Cite this: Environ. Sci. Technol. 2022, 56, 22, 15508–15517
Publication Date (Web):October 21, 2022
https://doi.org/10.1021/acs.est.2c02536

Copyright © 2022 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY 4.0.
  • Open Access

Article Views

6810

Altmetric

-

Citations

LEARN ABOUT THESE METRICS
PDF (5 MB)
Supporting Info (1)»

Abstract

To achieve water quality objectives of the zero pollution action plan in Europe, rapid methods are needed to identify the presence of toxic substances in complex water samples. However, only a small fraction of chemicals detected with nontarget high-resolution mass spectrometry can be identified, and fewer have ecotoxicological data available. We hypothesized that ecotoxicological data could be predicted for unknown molecular features in data-rich high-resolution mass spectrometry (HRMS) spectra, thereby circumventing time-consuming steps of molecular identification and rapidly flagging molecules of potentially high toxicity in complex samples. Here, we present MS2Tox, a machine learning method, to predict the toxicity of unidentified chemicals based on high-resolution accurate mass tandem mass spectra (MS2). The MS2Tox model for fish toxicity was trained and tested on 647 lethal concentration (LC50) values from the CompTox database and validated for 219 chemicals and 420 MS2 spectra from MassBank. The root mean square error (RMSE) of MS2Tox predictions was below 0.89 log-mM, while the experimental repeatability of LC50 values in CompTox was 0.44 log-mM. MS2Tox allowed accurate prediction of fish LC50 values for 22 chemicals detected in water samples, and empirical evidence suggested the right directionality for another 68 chemicals. Moreover, by incorporating structural information, e.g., the presence of carbonyl-benzene, amide moieties, or hydroxyl groups, MS2Tox outperforms baseline models that use only the exact mass or log KOW.

This publication is licensed under

CC-BY 4.0.
  • cc licence
  • by licence

Synopsis

Thousands of toxic chemicals are overlooked in targeted analysis of water samples. This work presents MS2Tox to pinpoint toxic chemicals in water samples with the aid of machine learning.

Introduction

ARTICLE SECTIONS
Jump To

The quality of our water resources depends to a large extent on the inherent toxicity and concentration of chemical pollutants that are present. (1) Modern nontarget analysis by liquid chromatography high-resolution mass spectrometry (LC-HRMS) (2) allows rapid profiling of thousands of molecular features in drinking water, (3,4) surface water, (5−7) and wastewater. (8−11) These substances are complex mixtures present in water, including anthropogenic substances (e.g., pesticides and industrial chemicals), (2,7,8,12) natural substances (e.g., natural dissolved organic matter), (6) endogenous human metabolites, (13) and a multitude of associated transformation products with unknown aquatic toxicities. (4)
Identification of the molecular structure associated with each mass spectrometry feature remains a bottleneck in nontarget analyses. This process is slow and highly laborious, ultimately requiring confirmation by authentic chemical standards to reach “Level 1” confidence. (14) Moreover, only a very small fraction of the thousands of detected features are identifiable in this manner. For example, in a nontarget analysis of household dust, only 33 chemicals could be unequivocally identified from over 5000 features, (15) and in Swiss wastewater, only 1.2% of the features with an unequivocal chemical structure could be identified. (16) Nevertheless, there is normally a large amount of relevant data from each feature that could potentially inform toxicological knowledge, even if the molecule cannot be unequivocally identified. This relevant data includes chromatographic retention times and mass spectral (MS) information, such as accurate mass and isotope patterns in full scan spectra (MS1) and data-rich structural information in associated fragmentation spectra (MS2). (8,17)
Even among the molecules that can be unequivocally identified by nontarget analysis, there may not be adequate aquatic toxicological information available. (9) Whole effluent toxicity testing with relevant in vivo bioassays is the most accurate means of testing the toxicity of real-world effluents, but this method is not rapid and does not identify the chemical source(s) of toxicity without follow-up effect-directed analyses, which is furthermore expensive and highly laborious. For chemicals identified with nontarget analysis that lack experimental toxicity values, read-across methodologies and quantitative structure–activity relationship (QSAR) models can be used to predict toxicity. (18−25) Read-across methods are based on the knowledge of common toxicophores─in other words, distinct structural features or moieties that bestow a toxic property. (26,27) Other QSAR (28) models may use physicochemical properties that are either measured or estimated from structure, along with functional groups present in the chemical, to predict toxicity. (22) Application of such read-across and QSAR models has, so far, not been effectively paired with nontarget LC-HRMS analysis because of the presumed need to first have an unequivocal molecular structure.
The analytical behavior of an analyte in LC-HRMS, its physicochemical properties, and its toxicity are all connected to its molecular structure, and thus toxicity could be correlated directly with analytical data. There is a reason for optimism since retention time is correlated with log KOW: (29,30) the accurate mass and isotope pattern of the molecular ion reveal the elemental composition, including the presence of heteroatoms and halogens, (31) and MS2 spectra may indicate the presence of functional groups (e.g., −NH2, −OH, CO2H) by fragments and neutral losses. (12) These data are sufficient to characterize many known toxicophores and are similar to structural alerts systems used in read-across (32,33) and certain QSAR models. (34)
Here, we test the hypotheses that analytical characteristics from typical LC-HRMS nontarget data acquisition of water samples could be used to predict the toxicity of unidentified molecules and moreover that machine learning strategies could be applied to unravel these structural dependencies. For structural characterization, fingerprints that show the presence or absence of functional groups and structural parts can be used. (35,36) Here, we developed a method and tool to predict toxicity values from nontarget LC-HRMS data to enable hazard evaluation for unidentified organic molecules. We use the tool to predict the in vivo 50% lethal concentration (LC50) or effective concentration (EC50) (19) in ecotoxicologically relevant test organisms. A machine learning approach combines the molecular mass and molecular fingerprints from structure for training and testing (Figure 1A) to predict LC50 in fish species (Lepomis macrochirus, Pimephales promelas, Oncorhynchus mykiss combined) and water flea (25 different clones and species combined) as well as EC50 predictions for water flea (8 different clones and species combined, see the Supporting Information section “Toxicity Data from CompTox”) and green algae (10 different clones and species combined, see the SI section “Toxicity Data from CompTox”). For validation, fingerprints from MS1 and MS2 (HRMS) spectra are used (Figure 1B). The resulting root mean square error (RMSE) was less than an order of magnitude in log-mM for the best fish LC50 model and water flea EC50 model, demonstrating proof of concept that toxicity can be predicted from HRMS spectra. The model could thus be a useful tool for flagging inherently toxic molecules in complex mixtures.

Figure 1

Figure 1. Workflow for development of the MS2Tox prediction models. Organic chemicals with toxicity values and MS2 spectra in available databases were used as the validation set, while chemicals with available toxicity values but no MS2 spectra were randomly divided into training and test sets. (A) In the initial training and testing of MS2Tox, molecular fingerprints (FPs) were calculated from chemical structure using rcdk and used to train a gradient-boosted prediction model using xgbDART. (B) In the validation stage, fingerprints were calculated from empirical HRMS spectra using SIRIUS+CSI:FingerID software, and these fingerprints were then used to predict toxicity with the xgbDART model that was trained in (A). The same workflow was repeated for all ecotoxicological endpoints: static fish LC50, flow-through fish LC50, water flea LC50, water flea EC50, and algae EC50.

Materials and Methods

ARTICLE SECTIONS
Jump To

Toxicity Data

LC50 and EC50 values for fish in static and flow-through exposures and for water flea and algae were obtained from the EPA CompTox Chemicals Dashboard. (37) For each endpoint used, the parameters are presented in SI Table S1. All concentrations for predictions were expressed as mM and converted to a logarithmic scale. For chemicals where one species had multiple independently measured LC50 and EC50 values, the median was used. The reported experimental LC50 and EC50 values for one chemical varied significantly with a pooled standard deviation of 0.25–0.44 log-mM depending on the endpoint. Chemicals with experimental standard deviation larger than 1.5 log-mM were deemed unreliable and excluded from datasets.

Training and Testing MS2Tox

For training and testing the model, fingerprints were calculated from structure using the get.fingerprint() function from the R package rcdk. (38) Substructure, MACCS, PubChem, Klekota–Roth, custom-made SMARTS, and ring system fingerprints were calculated. For model training, xgbDART from the xgboost library was used from the caret package. Prior to training, chemicals that had MS2 spectra in MassBank (datasets from Eawag, University of Athens, and LCSB) were filtered out and assigned to the validation set. The remaining chemicals were divided into training (80%) and test (20%) sets. For hyperparameter tuning in the training of a model, a 10-fold cross-validation was additionally used. A weight was given to the median toxicity value based on the standard deviation of the individual toxicity values such that chemicals with more precise toxicity values had a higher impact in the model training. Chemicals with only one measured value were given weight corresponding to the average standard deviation. For analyzing how much each fingerprint affected the model, varImp() and dummyVars() functions from caret were used. For variable importance, SHapley Additive exPlanations (SHAP) graph of the first 10 variables using shap.score.rank() from GitHub package Shap visualization for xgboost was used. (39) For comparing predicted and measured toxicity values, R2, Q2, and RMSE were used.

Validation

MS2 data from MassBank (40) measured by Eawag, LCSB, and the University of Athens were used for validation of MS2Tox. The datasets were selected so that a reasonable proportion of data was left for training, testing, and validation. For example, for the fish static LC50 dataset, the amounts were 517, 130, and 219, respectively. MS2 spectra that had identical LC-HRMS measurement parameters but different collision energies were written into one .ms file for fingerprint calculations. This was done since SIRIUS+CSI:FingerID software uses MS2 data to build fragmentation trees, and adding spectra measured on lower and higher collision voltages together gives more characteristic fragmentation trees (SI section “HRMS Data from MassBank” for detailed explanation). Due to the lack of MS1 spectra in MassBank, the isotope pattern was calculated from the chemical formula. Multiple fingerprint predictions were obtained for one chemical if both negative and positive mode spectra were available or by different research groups or with different LC-HRMS parameters. For validation, fingerprints were calculated using SIRIUS+CSI:FingerID version 4.9.5. (41,42) All of the chosen parameters for SIRIUS+CSI:FingerID are described in the SI section “Fingerprints Calculated with SIRIUS Software”.

Final Model

For the final application, structural fingerprints were also calculated for chemicals in the validation set, and all training, test, and validation set data were used for model training. The function FishLC50Prediction() for predicting fish static and flow-through LC50 values is available in R package MS2Tox on GitHub (https://github.com/kruvelab/MS2Tox).

Application Measurements

For testing the final model, in-house HRMS acquisitions were used. Three solutions containing 152 unique spiked chemicals were analyzed by a Thermo Scientific Dionex Ultimate 3000 (Thermo Fisher Scientific) with an RS binary pump and Thermo Scientific Q Executive Orbitrap. Solvents were acetonitrile and MilliQ water with 0.1% formic acid (pH = 2.7), and separation was carried out on a reversed-phase column Kinetex 2.6 μm EVO C18 (150 × 3.0 mm2) from Phenomenex in gradient elution mode. For the acquisition of MS2 data, an inclusion list with the theoretical exact mass of all of the spiked chemicals was used. Detailed instrumental parameters and a list of all of the spiked chemicals are given in the SI section “Application Solutions”.

Results and Discussion

ARTICLE SECTIONS
Jump To

Machine learning algorithms work best when they are trained with large sets of accurate data. Therefore, publicly available toxicity data in CompTox (37) were screened to evaluate the size and quality of available data. The number of experimentally measured toxicity values available for model training was smaller than that for chemical parameters (e.g., log KOW, pKa), but promising datasets suitable for training machine learning models were flow-through LC50 values for fathead minnow and water flea (640 and 387 unique chemicals, respectively) and EC50 values for water flea and algae (730 and 353 unique chemicals, respectively). Data cleaning and preprocessing revealed that LC50 values for fathead minnow were highly correlated with LC50 values for rainbow trout and bluegill, considering 91 and 74 chemicals for which LC50 values with both species were available (Figure 2A). Assuming that the mode of action for these species is similar, we combined the LC50 values into a common “fish” dataset using a general additive model (GAM). (43) Using the GAM allowed converting the toxicity values available for bluegill and rainbow trout to the scale of fathead minnow, whereby a Gaussian regression fitted over common chemicals accounts for sensitivity differences between species. As a result, five datasets were obtained after data processing: fish static LC50, fish flow-through LC50, water flea LC50, water flea EC50, and algae EC50, having 871, 841, 379, 728, and 353 unique chemicals, respectively.

Figure 2

Figure 2. Data selection and processing steps. (A) Correlation of LC50 values for bluegill and rainbow trout with fathead minnow. The black line shows an ideal agreement of the LC50 values for the two fish species, indicating equal sensitivity. (B) Cross table of the fingerprints calculated from the SMILES of the chemical and fingerprints predicted by fragmentation trees and support vector machines from SIRIUS+CSI:FingerID software. The number on top represents all calculated fingerprints, while the number below in parentheses is the number of these fingerprints that were actually used by the final machine xgbDART model for predicting fish static LC50 values. (C) Agreement between measured and predicted toxicities depending on the assigned molecular formulas by fragmentation trees. (D) Heat map of the cosine similarity of fingerprints for chemicals in training and validation sets. The similarity ranged from zero (red) to one (green), and the color gradient is shown in 10 equal steps. The abundance of green areas for each column in the graphs shows that for a chemical in the validation set, a highly similar chemical exists in the training set; here, we see abundant green areas for a majority of the chemicals.

Model Training and Testing

To train the model for predicting toxicity from the HRMS spectra (MS2 and MS1), the HRMS spectra need to be converted into information relevant to the presence or absence of endpoint-related toxicophores. The structural fingerprints calculated from HRMS spectra (42) yield a probability of the presence of specific structural moieties and are promising as a means to flag the presence of toxicophores; however, they have not yet been tested for such a purpose. Training a machine learning model directly from the experimental HRMS spectra and toxicity values is impractical due to a modest intersection of chemicals with publicly available experimental toxicity endpoints and MS2 spectra. To overcome this hurdle, we first trained a machine learning model based on the molecular fingerprints calculated from the structure of all of the chemicals for which experimental toxicity values were available; to allow predictions for unknown chemicals later in the validation stages, we only used the molecular fingerprints that could also be deduced from HRMS spectra (Figure 1). Thus, the dataset available for model training was limited only by the experimental toxicity values; to train the machine learning models, we considered fingerprints that (1) could be calculated with fragmentation trees and support vector machine (SVM) in SIRIUS+CSI:FingerID (42) from HRMS spectra and with the rcdk package (38) from structure and (2) could be calculated both in negative and positive ionization modes. Altogether 1263 fingerprints for each chemical were calculated and used alongside the exact mass. Thereafter, highly correlated fingerprints and fingerprints with near-zero variance were removed, and approximately 200 fingerprints remained for model training, depending on the organism dataset.
Various regression models, such as SVM, random forest regression, and different gradient boosting algorithms, were trained with the caret package in R. (44) The trained models were compared based on root mean squared error (RMSE) and squared correlation coefficients (R2 and Q2) observed from independent training and test sets (see SI). The extreme gradient boosting Dropouts Additive Regression Trees (xgbDART) (45) method was chosen as the best-performing algorithm. Altogether, five different prediction models were trained: static and flow-through LC50 models for fish, LC50 and EC50 models for water flea, and an EC50 model for algae. For all five models, the RMSE for the test set was close to 1.0 log-mM unit, with values ranging from 0.79 to 1.12 log-mM (see the TEST column in Figure 3). With the exception of the LC50 (flow-through) model, RMSE values for the training set were less than a factor of two lower than those for the test set, indicating that the models did not suffer from excessive overtraining (Figure 3).

Figure 3

Figure 3. Overview of the performance of MS2Tox models for training, test, and validation sets for fish (rows 1 and 2), water flea (rows 3 and 4), and algae (row 5). For comparison, prediction models trained only with log KOW and exact mass using linear regression are visualized on the right. Blue dots in the graph represent the training set, pink dots represent the test set, and green dots represent the validation set. SDExp is the experimental standard deviation from logarithmic endpoint concentrations retrieved from CompTox. The root mean square error (RMSE) shows the difference between experimental and predicted toxicity values, while R2 and Q2 evaluate the correlation. The darker middle line on the graph shows an ideal case where predicted toxicity values agree with the experimental values. Lighter lines mark the difference of 1 and 2 log-mM from ideal prediction. The first two rows show the result for the final models that are represented in the MS2Tox package.

Toxicity Predictions from Mass Spectra

Available mass spectral datasets in MassBank (40) from Eawag, University of Athens, and LCSB (University of Luxembourg) were used to validate the toxicity predictions from MS2 spectra. All chemicals used for validation had not been used in the training and test set, and thus the validation performance indicates the performance under near real-world conditions where the chemicals of interest are unknown. Models predicting static LC50 values for fish and EC50 values for water flea yield mean prediction error below 1.0 log-mM; however, for the LC50 model of water flea, an RMSE of more than 1.1 log-mM was observed (Figure 3). The results in positive and negative ionization modes did not differ significantly. For the fish static LC50 model, the RMSE for negative mode (0.85 log-mM) was slightly lower than that for positive mode (0.90 log-mM) despite the fact that the positive mode dataset was three times larger than negative mode dataset. The RMSE for the static fish LC50 model (0.88 log-mM) was only twice as large as the standard deviation of experimental values (0.44 log-mM) for the same dataset, which is small in relation to the wide range of toxicities, from −6.3 to 2.9 log-mM. The best-performing model was the static LC50 model for fish, wherein 75% of the predicted toxicities differed by less than one log-mM unit from the experimental values, and correspondingly 98% of predicted toxicities differed by less than two log-mM units. The highest error was for permethrin, which was predicted to be ∼1300× less toxic than its corresponding experimental LC50 value. In this specific case, the wrong formula was assigned, and thus incorrect fingerprints, by fragmentation trees and SVM in SIRIUS+CSI:FingerID, which explains the incorrect toxicity. The correct formula of permethrin ranked second, and using this, the predicted toxicity was only 6× less toxic than the experimental LC50. For the salicylanilide spectra from the LCSB dataset, LC50 was also underpredicted by 1000×, even though the correct formula was assigned by SIRIUS+CSI:FingerID with 98% of correctly calculated fingerprints from all fingerprints.
The prediction errors of MS2Tox depended on a number of factors: (1) the accuracy of the fingerprints predicted with fragmentation trees and SVM; (2) the similarity of chemicals of interest and the chemicals used for model training; and (3) accuracy of the xgbDART machine learning model in predicting toxicity from the fingerprints. The molecular fingerprints calculated from the fragmentation trees may have inaccuracies due to a low number of fragments observed in some MS2 spectra. The number of structurally meaningful fragments in MS2 spectra depends on molecular structure and the collision energy, sensitivity, scanning range, and mass resolution of the instrument. (46,47) Over all datasets, 98.4% of the 813,372 fingerprints calculated with SIRIUS+CSI:FingerID were correctly predicted. Here, correct predictions mean that if a specific fingerprint was present in the structure, its predicted probability was above 0.5 as the xgbDART treats all values below 0.5 as zeros and values above 0.5 as ones (Figure 2B). For positive ionization mode spectra, the results were slightly better than those for the negative mode: 98.6% correct fingerprints compared to 97.6%, respectively. Nevertheless, in the two worst cases, almost 200 of 1263 possible fingerprints were incorrectly calculated for the chemical (2R,6S)-fenpropimorph based on two MS2 spectra measured by two different research groups, Athens University and Eawag. In both cases, the fragmentation spectra seemed to contain a sufficient number of peaks (22 selected by SIRIUS+CSI:FingerID). Some of the most incorrectly calculated fingerprints belonged to amide and carbonyl groups (SI Table S8).
Insufficiently characteristic MS2 spectra may ambiguously correspond to multiple fragmentation trees and, therefore, different structural fingerprints. Moreover, for some spectra, more than 10 molecular formulas were predicted. If the fingerprint probabilities differ significantly for alternative molecular formulas, machine learning will also predict different toxicity values for each molecular formula. The predicted molecular formulas are ranked based on plausibility by SIRIUS+CSI:FingerID (i.e., “SiriusScore”); however, correct formulas may not always be ranked highest. It was of interest to see the impact of the assigned molecular formula on the probabilities of molecular fingerprints and therefore the LC50 values. For this reason, three different approaches were evaluated here: (1) toxicity values were predicted for all obtained molecular formulas and averaged; (2) toxicity values were predicted for three highest-ranked molecular formulas and averaged; and (3) toxicity was predicted only for the molecular formula ranked highest. For all cases, the RMSEs were between 0.86 and 0.88 log-mM with no significant difference. As results with the highest-ranked molecular formula and with the correct molecular formula were most similar (Figure 2C), the fingerprints corresponding to the highest-ranked molecular formula were used for final toxicity predictions. This indicates that even if the correct formula is not ranked the highest, the fingerprints can still be predicted accurately. Histogram (SI Figure S3) and explanation showing good cosine similarities between fingerprints from the same MS2 data but different assigned formulas are given in the SI section “Fingerprints Calculated with SIRIUS Software”.
The model prediction accuracy was also influenced by the similarity of the “unknown chemicals” in the validation set relative to chemicals used for model training. The structural similarity of the chemicals in the validation set and the training test set was evaluated based on the cosine similarity of calculated fingerprints and principal component analysis (see Figure S2 in SI). Heat map analysis (Figure 2D) showed that 70% of validation chemicals had at least 20 training set chemicals with the cosine similarity exceeding 0.5 for fish static LC50. The good similarity observed justifies using the chosen fingerprints and training set chemicals for training the MS2Tox model with application on chemicals detectable with LC-HRMS. On the other hand, the model accuracy may also be affected by the similarity of the clones and species combined into one dataset for the water flea and algae model. Here, 25, 8, and 10 species and clones were combined for the water flea LC50, water flea EC50, and algae EC50 model, respectively, and the heterogeneity of species included may have significantly worsened the learning power of the machine learning models. Unfortunately, no single clone or species has a sufficient number of measured toxicity values available for training a model.

Model Interpretation

To understand the information obtained from the xgbDART algorithm, the importance and contribution of each fingerprint to the prediction was evaluated with SHapley Additive exPlanations (SHAP). For the fish static LC50 MS2Tox model trained on the dataset extracted from CompTox, the most important descriptor, by far, was the molecular mass, with the relative importance of the next variable (carbonyl-benzene moiety) being only 12% of the former. From the SHAP plot in Figure 4, it can be seen that the predicted LC50 value drops by almost 3 log-mM when the molecular mass increases from 26 to 400; however, a further increase in molecular mass does not decrease the predicted LC50 values. The high importance of the molecular mass variable is not surprising given that (1) it is the only continuous variable in the dataset and (2) increasing molecular mass is known to be correlated with increased acute toxicity, (48) since the baseline mechanism of acute toxicity is narcosis, i.e., nonspecific reversible disturbance of membrane functioning. (49)

Figure 4

Figure 4. Variable importance for the 10 most important variables in the fish static LC50 model. The bar chart in the top left corner shows the variable importance relative to the most important variable, exact mass. The associated SHAP graphs of each variable show the magnitude and directionality of each variable on the predicted LC50. For each of the molecular fingerprints, the x-axis indicates the absence (0) or presence (1) of the respective structural fragment, and lower SHAP values indicate lower predicted LC50 values (higher toxicity), assuming all other parameters are constant. The line shows the directionality of the impact of the descriptors. Fingerprint naming “Un” refers to the absolute index numbering system in SIRIUS+CSI:FingerID. Graphical descriptions from SIRIUS+CSI:FingerID (42) software is also shown for each fingerprint SHAP plot.

Seventeen fingerprints had relative importance over 2% as large as the importance of molecular mass in the fish static LC50 model. The low relative importance of fingerprints (maximum 11.8% as large as molecular mass) indicates that the information about specific functional groups and elements present in the molecule fine-tunes the toxicity predictions. The most important fingerprints for fine-tuning the fish static LC50 model were the presence of a carbonyl-benzene moiety, the presence of a ketone, amide, or hydroxyl group. A detailed investigation of SHAP (Figure 4) indicated that an increase in molecular mass alongside the presence of aromatic carboxyl acids, ketone groups, and double-bonded sulfur increased the toxicity of the chemicals. For example, the addition of aromatic carboxyl acids decreased the LC50 value on average by 0.4 log-mM.
Given the high variable importance and SHAP values of molecular mass, we evaluated the extent to which structural fingerprints added predictive power to the machine learning models and thus may allow the prediction of specific modes of toxicity in addition to narcosis. To assess this, a simplified linear model with only exact mass was compared with the xgbDART model using exact mass together with fingerprints. The RMSE decreased with fingerprint addition, suggesting that a machine learning model could be feasibly used to predict specific modes of toxicity beyond narcosis. To probe this evidence further and to evaluate if the trained models are able to predict more than just hydrophobicity-driven narcosis, a linear model between log KOW values retrieved from CompTox and toxicity values was investigated. (50) It was observed that predictions based only on log KOW yielded higher errors in both the high- and low-toxicity regions. The RMSE for the log KOW test model was 0.98 log-mM, compared to 0.79 log-mM for the main model. Moreover, a poor correlation compared to the machine learning model (R2 0.57 for the training set vs 0.93, and Q2 0.64 for the test set vs 0.79 (Figure 3)) was observed when comparing predicted and measured toxicity values using log KOW.

Application to Spiked Aqueous Solution

The MS2Tox model for the prediction of fish static LC50 was applied to evaluate the potential for toxicity predictions of unknown substances. We used three spiked water samples (see SI Tables S4–S6) distributed in a NORMAN interlaboratory comparison and previously analyzed in-house by a data-dependent MS2 LC-HRMS acquisition on an Orbitrap instrument. For the LC50 predictions, the MS2Tox model was retrained so that all available LC50 values were included in the model training to yield a model most suitable for practical application. For 90 unique chemicals, 121 HRMS data-dependent MS2 spectra were recorded in negative and positive electrospray ionization mode and treated as belonging to unidentified chemicals. For 22 of the chemicals, experimental LC50 values from CompTox were available and were compared with the predicted values, yielding an RMSE below 0.5 log-mM. For 68 chemicals, no experimental fish static LC50 values were available for comparison. However, even for these chemicals, the plausibility of the predictions can be assessed against expectations based on toxicity tests in other organisms and the identity of the substances. For some of the chemicals, the lethal dose 50% (LD50) values for rats were available in CompTox. (37) Although LC50 values for fish and LD50 for rats may show significant differences due to different uptake pathways, a comparison between the two effect values could still be used to classify substances as high, medium, and low toxicity. For example, chemicals like ketoprofen had relatively high predicted toxicity (i.e., low LC50) for fish and similarly high experimental toxicity for rats. Substances on the least-toxic end of the predicted spectrum were chemicals including histamine, guanidine carboxamide, and adenosine (Figure 5). Although corresponding LD50 values for these latter chemicals are unavailable, histamine and adenosine are endogenous in living organisms and are assumed to have low toxicity relative to other substances in the dataset. (51,52) These results suggested that the developed model has strong potential for application to nontarget water sample analysis.

Figure 5

Figure 5. Final model training of the fish static LC50 prediction model and application to authentic HRMS MS2 spectral data from water sample analyses. (A) Toxicity values from training, test, and validation sets were compiled into a larger training set, and the model was retrained based on the structural fingerprints. (B) From in-house measured HRMS (Orbitrap) spectra, fingerprints were calculated using SIRIUS+CSI:FingerID software, and toxicity was predicted with the final trained model. Chemicals with known toxicity were compared. (C) Green data points are MS2Tox-predicted toxicity values with a corresponding experimental value in the database (last graph in part (B)). Chemicals that did not have fish static LC50 value are represented as gray data points, and thus validation of these points is not possible. Transparent points are used to show in which regions more overlapping points are present. Structures, names, and rat LD50 values are given below for some of the chemicals that did not have LC50 values for fish as an indicative validation.

Limitations and Future Perspectives

In this work, a machine learning-based approach, which we call MS2Tox, was developed for predicting toxicity values of unidentified chemicals from HRMS data in nontarget LC-HRMS analysis in both positive and negative modes. Given that generating new toxicity values in the laboratory can be time-consuming and expensive, MS2Tox relied on existing datasets of LC50 and EC50 values for fish, water flea, and algae from CompTox. Nevertheless, when it comes to the selection of chemicals and coverage of chemical space, MS2Tox is dependent on previous laboratory toxicology studies, which may not be representative of the wide coverage of natural and anthropogenic chemicals detectable with nontarget LC-HRMS. Due to the limited availability of such toxicity data and a likely bias in the existing database toward toxic substances, some of the relations between toxicity and mass spectrometry variables may also be biased.
The RMSE of predicted toxicity values for validation sets range from 0.79 to 1.33 log-mM depending on the species and endpoint. Fish LC50 yielded a higher prediction accuracy compared to the other tested endpoints. As for the EC50 of algae and water flea, the relationship between toxicity value and structure was ambiguous; therefore, the MS2Tox package includes the prediction of fish LC50 only.
Additionally, MS2Tox uses the structural fingerprints predicted from MS2 spectra with SIRIUS+CSI:FingerID software, and the accuracy of MS2Tox is therefore related to the accuracy of these fingerprint predictions. In the case of poor MS2 data (low number of fragment peaks, noisy spectrum), incorrectly calculated fingerprints can strongly affect the predicted toxicity.
In the future, we envision that it will become possible to advance MS2Tox for evaluating the toxic effects of complex mixtures, such as those present in drinking water or wastewater, as well as the contributing effect of individual compounds in these samples. This rapid knowledge will be essential for identifying toxic substances and mitigating environmental harm in real-world scenarios, such as for effluents. No additional measurements by nontarget LC-HRMS should be required, and laboratories analyzing water samples with LC-HRMS can use MS2Tox without additional equipment for toxicity testing or pure test compounds. Together with concentration predictions, (53) MS2Tox might be used to evaluate the hazard of different waters as well as prioritize chemicals for identification and removal.

Supporting Information

ARTICLE SECTIONS
Jump To

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.2c02536.

  • More detailed description of datasets, measurement parameters, and standards used; additional figures for other trained models (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

ARTICLE SECTIONS
Jump To

  • Corresponding Author
    • Anneli Kruve - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, SwedenDepartment of Environmental Science, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0001-9725-3351 Email: [email protected]
  • Authors
    • Pilleriin Peets - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0002-7095-661X
    • Wei-Chieh Wang - Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, Sweden
    • Matthew MacLeod - Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0003-2562-7339
    • Magnus Breitholtz - Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0002-4984-8323
    • Jonathan W. Martin - Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 16, SE-106 91 Stockholm, SwedenOrcidhttps://orcid.org/0000-0001-6265-4294
  • Author Contributions

    A.K., J.W.M., M.M., M.B., and P.P. designed the research study. P.P. developed the method and wrote the code. W.C.W. performed calculations with SIRIUS+CSI:FingerID software. P.P., A.K., J.W.M., M.M., and M.B. wrote the paper. All authors read and approved the manuscript.

  • Funding

    The funding has been generously provided by Swedish Research Council for Sustainable Development grant 2020-01511.

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

ARTICLE SECTIONS
Jump To

The authors thank Emma Palm and Merle Plassmann for helping with in-house LC-HRMS measurements. They also thank Chimnaz Emrah and Lisa Jonsson for the final testing of the MS2Tox package.

References

ARTICLE SECTIONS
Jump To

This article references 53 other publications.

  1. 1
    Dsikowitzky, L.; Schwarzbauer, J. Industrial Organic Contaminants: Identification, Toxicity and Fate in the Environment. Environ. Chem. Lett. 2014, 12, 371386,  DOI: 10.1007/s10311-014-0467-1
  2. 2
    Gosetti, F.; Mazzucco, E.; Gennaro, M. C.; Marengo, E. Contaminants in Water: Non-Target UHPLC/MS Analysis. Environ. Chem. Lett. 2016, 14, 5165,  DOI: 10.1007/s10311-015-0527-1
  3. 3
    Brunner, A. M.; Bertelkamp, C.; Dingemans, M. M. L.; Kolkman, A.; Wols, B.; Harmsen, D.; Siegers, W.; Martijn, B. J.; Oorthuizen, W. A.; ter Laak, T. L. Integration of Target Analyses, Non-Target Screening and Effect-Based Monitoring to Assess OMP Related Water Quality Changes in Drinking Water Treatment. Sci. Total Environ. 2020, 705, 135779  DOI: 10.1016/j.scitotenv.2019.135779
  4. 4
    Postigo, C.; Gil-Solsona, R.; Herrera-Batista, M. F.; Gago-Ferrero, P.; Alygizakis, N.; Ahrens, L.; Wiberg, K. A Step Forward in the Detection of Byproducts of Anthropogenic Organic Micropollutants in Chlorinated Water. Trends Environ. Anal. Chem. 2021, 32, e00148  DOI: 10.1016/j.teac.2021.e00148
  5. 5
    Krauss, M.; Hug, C.; Bloch, R.; Schulze, T.; Brack, W. Prioritising Site-Specific Micropollutants in Surface Water from LC-HRMS Non-Target Screening Data Using a Rarity Score. Environ. Sci. Eur. 2019, 31, 45  DOI: 10.1186/s12302-019-0231-z
  6. 6
    Nanusha, M. Y.; Krauss, M.; Brack, W. Non-Target Screening for Detecting the Occurrence of Plant Metabolites in River Waters. Environ Sci Eur 2020, 32, 130  DOI: 10.1186/s12302-020-00415-5
  7. 7
    Kiefer, K.; Müller, A.; Singer, H.; Hollender, J. New Relevant Pesticide Transformation Products in Groundwater Detected Using Target and Suspect Screening for Agricultural and Urban Micropollutants with LC-HRMS. Water Res. 2019, 165, 114972  DOI: 10.1016/j.watres.2019.114972
  8. 8
    Letzel, T.; Bayer, A.; Schulz, W.; Heermann, A.; Lucke, T.; Greco, G.; Grosse, S.; Schüssler, W.; Sengl, M.; Letzel, M. LC–MS Screening Techniques for Wastewater Analysis and Analytical Data Handling Strategies: Sartans and Their Transformation Products as an Example. Chemosphere 2015, 137, 198206,  DOI: 10.1016/j.chemosphere.2015.06.083
  9. 9
    Escher, B. I.; Stapleton, H. M.; Schymanski, E. L. Tracking Complex Mixtures of Chemicals in Our Changing Environment. Science 2020, 367, 388392,  DOI: 10.1126/science.aay6636
  10. 10
    Schollée, J. E.; Hollender, J.; McArdell, C. S. Characterization of Advanced Wastewater Treatment with Ozone and Activated Carbon Using LC-HRMS Based Non-Target Screening with Automated Trend Assignment. Water Res. 2021, 200, 117209  DOI: 10.1016/j.watres.2021.117209
  11. 11
    Lopez-Herguedas, N.; González-Gaya, B.; Castelblanco-Boyacá, N.; Rico, A.; Etxebarria, N.; Olivares, M.; Prieto, A.; Zuloaga, O. Characterization of the Contamination Fingerprint of Wastewater Treatment Plant Effluents in the Henares River Basin (Central Spain) Based on Target and Suspect Screening Analysis. Sci. Total Environ. 2022, 806, 151262  DOI: 10.1016/j.scitotenv.2021.151262
  12. 12
    Meekel, N.; Vughs, D.; Béen, F.; Brunner, A. M. Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition. Anal. Chem. 2021, 93, 50715080,  DOI: 10.1021/acs.analchem.0c04473
  13. 13
    Gil-Solsona, R.; Nika, M.-C.; Bustamante, M.; Villanueva, C. M.; Foraster, M.; Cosin-Tomás, M.; Alygizakis, N.; Gómez-Roig, M. D.; Llurba-Olive, E.; Sunyer, J.; Thomaidis, N. S.; Dadvand, P.; Gago-Ferrero, P. The Potential of Sewage Sludge to Predict and Evaluate the Human Chemical Exposome. Environ. Sci. Technol. Lett. 2021, 8, 10771084,  DOI: 10.1021/acs.estlett.1c00848
  14. 14
    Schymanski, E. L.; Jeon, J.; Gulde, R.; Fenner, K.; Ruff, M.; Singer, H. P.; Hollender, J. Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence. Environ. Sci. Technol. 2014, 48, 20972098,  DOI: 10.1021/es5002105
  15. 15
    Rager, J. E.; Strynar, M. J.; Liang, S.; McMahen, R. L.; Richard, A. M.; Grulke, C. M.; Wambaugh, J. F.; Isaacs, K. K.; Judson, R.; Williams, A. J.; Sobus, J. R. Linking High Resolution Mass Spectrometry Data with Exposure and Toxicity Forecasts to Advance High-Throughput Environmental Monitoring. Environ. Int. 2016, 88, 269280,  DOI: 10.1016/j.envint.2015.12.008
  16. 16
    Schymanski, E. L.; Singer, H. P.; Longrée, P.; Loos, M.; Ruff, M.; Stravs, M. A.; Ripollés Vidal, C.; Hollender, J. Strategies to Characterize Polar Organic Contamination in Wastewater: Exploring the Capability of High Resolution Mass Spectrometry. Environ. Sci. Technol. 2014, 48, 18111818,  DOI: 10.1021/es4044374
  17. 17
    Kaserzon, S. L.; Heffernan, A. L.; Thompson, K.; Mueller, J. F.; Gomez Ramos, M. J. Rapid Screening and Identification of Chemical Hazards in Surface and Drinking Water Using High Resolution Mass Spectrometry and a Case-Control Filter. Chemosphere 2017, 182, 656664,  DOI: 10.1016/j.chemosphere.2017.05.071
  18. 18
    Judson, R. Public Databases Supporting Computational Toxicology. J. Toxicol. Environ. Health, Part B 2010, 13, 218231,  DOI: 10.1080/10937404.2010.483937
  19. 19
    Raies, A. B.; Bajic, V. B. In Silico Toxicology: Computational Methods for the Prediction of Chemical Toxicity: Computational Methods for the Prediction of Chemical Toxicity. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2016, 6, 147172,  DOI: 10.1002/wcms.1240
  20. 20
    Tang, W.; Chen, J.; Wang, Z.; Xie, H.; Hong, H. Deep Learning for Predicting Toxicity of Chemicals: A Mini Review. J. Environ. Sci. Health, Part C 2018, 36, 252271,  DOI: 10.1080/10590501.2018.1537563
  21. 21
    Idakwo, G.; Luttrell, J.; Chen, M.; Hong, H.; Zhou, Z.; Gong, P.; Zhang, C. A Review on Machine Learning Methods for in Silico Toxicity Prediction. J. Environ. Sci. Health, Part C 2018, 36, 169191,  DOI: 10.1080/10590501.2018.1537118
  22. 22
    Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: Toxicity Prediction Using Deep Learning. Front. Environ. Sci. 2016, 3, 80  DOI: 10.3389/fenvs.2015.00080
  23. 23
    Chen, X.; Dang, L.; Yang, H.; Huang, X.; Yu, X. Machine Learning-Based Prediction of Toxicity of Organic Compounds towards Fathead Minnow. RSC Adv. 2020, 10, 3617436180,  DOI: 10.1039/D0RA05906D
  24. 24
    Zhao, C.; Zhang, H.; Zhang, X.; Liu, M.; Hu, Z.; Fan, B. Application of Support Vector Machine (SVM) for Prediction Toxic Activity of Different Data Sets. Toxicology 2006, 217, 105119,  DOI: 10.1016/j.tox.2005.08.019
  25. 25
    Wu, K.; Wei, G.-W. Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks. J. Chem. Inf. Model. 2018, 58, 520531,  DOI: 10.1021/acs.jcim.7b00558
  26. 26
    Williams, D. P.; Naisbitt, D. J. Toxicophores: Groups and Metabolic Routes Associated with Increased Safety Risk. Curr. Opin. Drug Discovery Dev. 2002, 5, 104115
  27. 27
    Alves, V. M.; Muratov, E. N.; Capuzzi, S. J.; Politi, R.; Low, Y.; Braga, R. C.; Zakharov, A. V.; Sedykh, A.; Mokshyna, E.; Farag, S.; Andrade, C. H.; Kuz’min, V. E.; Fourches, D.; Tropsha, A. Alarms about Structural Alerts. Green Chem. 2016, 18, 43484360,  DOI: 10.1039/C6GC01492E
  28. 28
    Öberg, T. A QSAR for Baseline Toxicity: Validation, Domain of Application, and Prediction. Chem. Res. Toxicol. 2004, 17, 16301637,  DOI: 10.1021/tx0498253
  29. 29
    Randazzo, G. M.; Tonoli, D.; Hambye, S.; Guillarme, D.; Jeanneret, F.; Nurisso, A.; Goracci, L.; Boccard, J.; Rudaz, S. Prediction of Retention Time in Reversed-Phase Liquid Chromatography as a Tool for Steroid Identification. Anal. Chim. Acta 2016, 916, 816,  DOI: 10.1016/j.aca.2016.02.014
  30. 30
    Zhao, M.; Li, Z.; Wu, Y.; Tang, Y.-R.; Wang, C.; Zhang, Z.; Peng, S. Studies on LogP, Retention Time and QSAR of 2-Substituted Phenylnitronyl Nitroxides as Free Radical Scavengers. Eur. J. Med. Chem. 2007, 42, 955965,  DOI: 10.1016/j.ejmech.2006.12.027
  31. 31
    Zhu, P.; Tong, W.; Alton, K.; Chowdhury, S. An Accurate-Mass-Based Spectral-Averaging Isotope-Pattern-Filtering Algorithm for Extraction of Drug Metabolites Possessing a Distinct Isotope Pattern from LC-MS Data. Anal. Chem. 2009, 81, 59105917,  DOI: 10.1021/ac900626d
  32. 32
    Schultz, T. W.; Amcoff, P.; Berggren, E.; Gautier, F.; Klaric, M.; Knight, D. J.; Mahony, C.; Schwarz, M.; White, A.; Cronin, M. T. D. A Strategy for Structuring and Reporting a Read-across Prediction of Toxicity. Regul. Toxicol. Pharmacol. 2015, 72, 586601,  DOI: 10.1016/j.yrtph.2015.05.016
  33. 33
    Kazius, J.; McGuire, R.; Bursi, R. Derivation and Validation of Toxicophores for Mutagenicity Prediction. J. Med. Chem. 2005, 48, 312320,  DOI: 10.1021/jm040835a
  34. 34
    Pradeep, P.; Judson, R.; DeMarini, D. M.; Keshava, N.; Martin, T. M.; Dean, J.; Gibbons, C. F.; Simha, A.; Warren, S. H.; Gwinn, M. R.; Patlewicz, G. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Comput. Toxicol. 2021, 18, 100167  DOI: 10.1016/j.comtox.2021.100167
  35. 35
    Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite Identification and Molecular Fingerprint Prediction through Machine Learning. Bioinformatics 2012, 28, 23332341,  DOI: 10.1093/bioinformatics/bts437
  36. 36
    O’Boyle, N. M.; Sayle, R. A. Comparing Structural Fingerprints Using a Literature-Based Similarity Benchmark. J. Cheminf. 2016, 8, 36  DOI: 10.1186/s13321-016-0148-0
  37. 37
    Williams, A. J.; Grulke, C. M.; Edwards, J.; McEachran, A. D.; Mansouri, K.; Baker, N. C.; Patlewicz, G.; Shah, I.; Wambaugh, J. F.; Judson, R. S.; Richard, A. M. The CompTox Chemistry Dashboard: A Community Data Resource for Environmental Chemistry. J. Cheminf. 2017, 9, 61  DOI: 10.1186/s13321-017-0247-6
  38. 38
    Guha, R. Chemical Informatics Functionality in R. J. Stat. Software 2007, 18, 116,  DOI: 10.18637/jss.v018.i05
  39. 39
    Yang, L. SHAP Visualization in R 2018.
  40. 40
    Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T. MassBank: A Public Repository for Sharing Mass Spectral Data for Life Sciences. J. Mass Spectrom. 2010, 45, 703714,  DOI: 10.1002/jms.1777
  41. 41
    Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 1258012585,  DOI: 10.1073/pnas.1509788112
  42. 42
    Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A. A.; Melnik, A. V.; Meusel, M.; Dorrestein, P. C.; Rousu, J.; Böcker, S. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16, 299302,  DOI: 10.1038/s41592-019-0344-8
  43. 43
    Guisan, A.; Edwards, T. C.; Hastie, T. Generalized Linear and Generalized Additive Models in Studies of Species Distributions: Setting the Scene. Ecol. Modell. 2002, 157, 89100,  DOI: 10.1016/S0304-3800(02)00204-1
  44. 44
    Kuhn, M. caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret (accessed Dec 09, 2020).
  45. 45
    Rashmi, K. V.; Gilad-Bachrach, R. In DART: Dropouts Meet Multiple Additive Regression Trees, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics; PMLR: SanDiego, California, USA, 2015; Vol. 38, pp 489497.
  46. 46
    Stieger, C. E.; Doppler, P.; Mechtler, K. Optimized Fragmentation Improves the Identification of Peptides Cross-Linked by MS-Cleavable Reagents. J. Proteome Res. 2019, 18, 13631370,  DOI: 10.1021/acs.jproteome.8b00947
  47. 47
    Vaniya, A.; Fiehn, O. Using Fragmentation Trees and Mass Spectral Trees for Identifying Unknown Compounds in Metabolomics. TrAC, Trends Anal. Chem. 2015, 69, 5261,  DOI: 10.1016/j.trac.2015.04.002
  48. 48
    Delistraty, D. Acute Toxicity to Rats and Trout with a Focus on Inhalation and Aquatic Exposures. Ecotoxicol. Environ. Saf. 2000, 46, 225233,  DOI: 10.1006/eesa.1999.1906
  49. 49
    Klüver, N.; Vogs, C.; Altenburger, R.; Escher, B. I.; Scholz, S. Development of a General Baseline Toxicity QSAR Model for the Fish Embryo Acute Toxicity Test. Chemosphere 2016, 164, 164173,  DOI: 10.1016/j.chemosphere.2016.08.079
  50. 50
    Mackay, D.; Mccarty, L. S.; Macleod, M. On the Validity of Classifying Chemicals for Persistence, Bioaccumulation, Toxicity, and Potential for Long-range Transport. 2001, 20 1491 1498. DOI: 10.1002/etc.5620200711 .
  51. 51
    Maintz, L.; Novak, N. Histamine and Histamine Intolerance. Am. J. Clin. Nutr. 2007, 85, 11851196,  DOI: 10.1093/ajcn/85.5.1185
  52. 52
    Cronstein, B. N. Adenosine, an Endogenous Anti-Inflammatory Agent. J. Appl. Physiol. 1994, 76, 513,  DOI: 10.1152/jappl.1994.76.1.5
  53. 53
    Palm, E.; Kruve, A. Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMS. Molecules 2022, 27, 1013  DOI: 10.3390/molecules27031013

Cited By

ARTICLE SECTIONS
Jump To

This article is cited by 9 publications.

  1. Fei Cheng, Beate I. Escher, Huizhen Li, Maria König, Yujun Tong, Jiehui Huang, Liwei He, Xinyan Wu, Xiaohan Lou, Dali Wang, Fan Wu, Yuanyuan Pei, Zhiqiang Yu, Bryan W. Brooks, Eddy Y. Zeng, Jing You. Deep Learning Bridged Bioactivity, Structure, and GC-HRMS-Readable Evidence to Decipher Nontarget Toxicants in Sediments. Environmental Science & Technology 2024, Article ASAP.
  2. Ida Rahu, Meelis Kull, Anneli Kruve. Predicting the Activity of Unidentified Chemicals in Complementary Bioassays from the HRMS Data to Pinpoint Potential Endocrine Disruptors. Journal of Chemical Information and Modeling 2024, 64 (8) , 3093-3104. https://doi.org/10.1021/acs.jcim.3c02050
  3. Drew Szabo, Travis M. Falconer, Christine M. Fisher, Ted Heise, Allison L. Phillips, Gyorgy Vas, Antony J. Williams, Anneli Kruve. Online and Offline Prioritization of Chemicals of Interest in Suspect Screening and Non-targeted Screening with High-Resolution Mass Spectrometry. Analytical Chemistry 2024, 96 (9) , 3707-3716. https://doi.org/10.1021/acs.analchem.3c05705
  4. Katarzyna Arturi, Juliane Hollender. Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data. Environmental Science & Technology 2023, 57 (46) , 18067-18079. https://doi.org/10.1021/acs.est.3c00304
  5. S. Codrean, B. Kruit, N. Meekel, D. Vughs, F. Béen. Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning. Analytical Chemistry 2023, 95 (42) , 15810-15817. https://doi.org/10.1021/acs.analchem.3c03470
  6. Helen Sepman, Louise Malm, Pilleriin Peets, Matthew MacLeod, Jonathan Martin, Magnus Breitholtz, Anneli Kruve. Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS2 Data. Analytical Chemistry 2023, 95 (33) , 12329-12338. https://doi.org/10.1021/acs.analchem.3c01744
  7. Bo-Yang Huang, Qi-Xin Lü, Zhi-Xian Tang, Zhong Tang, Hong-Ping Chen, Xin-Ping Yang, Fang-Jie Zhao, Peng Wang. Machine learning methods to predict cadmium (Cd) concentration in rice grain and support soil management at a regional scale. Fundamental Research 2023, 12 https://doi.org/10.1016/j.fmre.2023.02.016
  8. Guillaume Delaittre, Georg Dierkes, Johanna Heine, Constantin Hoch, Ullrich Jahn, Hajo Kries, Björn Meermann, Erik Strub, Carl Christoph Tzschucke. Notizen aus der Chemie. Nachrichten aus der Chemie 2023, 71 (2) , 42-45. https://doi.org/10.1002/nadc.20234134634
  9. Anneli Kruse, Pilleriin Peets, . Machine Learning and Nontargeted Liquid Chromatography–Mass Spectrometry to Assess Ecotoxicity. LCGC Europe 2023, , 29-31. https://doi.org/10.56530/lcgc.eu.wg5784a9
  • Abstract

    Figure 1

    Figure 1. Workflow for development of the MS2Tox prediction models. Organic chemicals with toxicity values and MS2 spectra in available databases were used as the validation set, while chemicals with available toxicity values but no MS2 spectra were randomly divided into training and test sets. (A) In the initial training and testing of MS2Tox, molecular fingerprints (FPs) were calculated from chemical structure using rcdk and used to train a gradient-boosted prediction model using xgbDART. (B) In the validation stage, fingerprints were calculated from empirical HRMS spectra using SIRIUS+CSI:FingerID software, and these fingerprints were then used to predict toxicity with the xgbDART model that was trained in (A). The same workflow was repeated for all ecotoxicological endpoints: static fish LC50, flow-through fish LC50, water flea LC50, water flea EC50, and algae EC50.

    Figure 2

    Figure 2. Data selection and processing steps. (A) Correlation of LC50 values for bluegill and rainbow trout with fathead minnow. The black line shows an ideal agreement of the LC50 values for the two fish species, indicating equal sensitivity. (B) Cross table of the fingerprints calculated from the SMILES of the chemical and fingerprints predicted by fragmentation trees and support vector machines from SIRIUS+CSI:FingerID software. The number on top represents all calculated fingerprints, while the number below in parentheses is the number of these fingerprints that were actually used by the final machine xgbDART model for predicting fish static LC50 values. (C) Agreement between measured and predicted toxicities depending on the assigned molecular formulas by fragmentation trees. (D) Heat map of the cosine similarity of fingerprints for chemicals in training and validation sets. The similarity ranged from zero (red) to one (green), and the color gradient is shown in 10 equal steps. The abundance of green areas for each column in the graphs shows that for a chemical in the validation set, a highly similar chemical exists in the training set; here, we see abundant green areas for a majority of the chemicals.

    Figure 3

    Figure 3. Overview of the performance of MS2Tox models for training, test, and validation sets for fish (rows 1 and 2), water flea (rows 3 and 4), and algae (row 5). For comparison, prediction models trained only with log KOW and exact mass using linear regression are visualized on the right. Blue dots in the graph represent the training set, pink dots represent the test set, and green dots represent the validation set. SDExp is the experimental standard deviation from logarithmic endpoint concentrations retrieved from CompTox. The root mean square error (RMSE) shows the difference between experimental and predicted toxicity values, while R2 and Q2 evaluate the correlation. The darker middle line on the graph shows an ideal case where predicted toxicity values agree with the experimental values. Lighter lines mark the difference of 1 and 2 log-mM from ideal prediction. The first two rows show the result for the final models that are represented in the MS2Tox package.

    Figure 4

    Figure 4. Variable importance for the 10 most important variables in the fish static LC50 model. The bar chart in the top left corner shows the variable importance relative to the most important variable, exact mass. The associated SHAP graphs of each variable show the magnitude and directionality of each variable on the predicted LC50. For each of the molecular fingerprints, the x-axis indicates the absence (0) or presence (1) of the respective structural fragment, and lower SHAP values indicate lower predicted LC50 values (higher toxicity), assuming all other parameters are constant. The line shows the directionality of the impact of the descriptors. Fingerprint naming “Un” refers to the absolute index numbering system in SIRIUS+CSI:FingerID. Graphical descriptions from SIRIUS+CSI:FingerID (42) software is also shown for each fingerprint SHAP plot.

    Figure 5

    Figure 5. Final model training of the fish static LC50 prediction model and application to authentic HRMS MS2 spectral data from water sample analyses. (A) Toxicity values from training, test, and validation sets were compiled into a larger training set, and the model was retrained based on the structural fingerprints. (B) From in-house measured HRMS (Orbitrap) spectra, fingerprints were calculated using SIRIUS+CSI:FingerID software, and toxicity was predicted with the final trained model. Chemicals with known toxicity were compared. (C) Green data points are MS2Tox-predicted toxicity values with a corresponding experimental value in the database (last graph in part (B)). Chemicals that did not have fish static LC50 value are represented as gray data points, and thus validation of these points is not possible. Transparent points are used to show in which regions more overlapping points are present. Structures, names, and rat LD50 values are given below for some of the chemicals that did not have LC50 values for fish as an indicative validation.

  • References

    ARTICLE SECTIONS
    Jump To

    This article references 53 other publications.

    1. 1
      Dsikowitzky, L.; Schwarzbauer, J. Industrial Organic Contaminants: Identification, Toxicity and Fate in the Environment. Environ. Chem. Lett. 2014, 12, 371386,  DOI: 10.1007/s10311-014-0467-1
    2. 2
      Gosetti, F.; Mazzucco, E.; Gennaro, M. C.; Marengo, E. Contaminants in Water: Non-Target UHPLC/MS Analysis. Environ. Chem. Lett. 2016, 14, 5165,  DOI: 10.1007/s10311-015-0527-1
    3. 3
      Brunner, A. M.; Bertelkamp, C.; Dingemans, M. M. L.; Kolkman, A.; Wols, B.; Harmsen, D.; Siegers, W.; Martijn, B. J.; Oorthuizen, W. A.; ter Laak, T. L. Integration of Target Analyses, Non-Target Screening and Effect-Based Monitoring to Assess OMP Related Water Quality Changes in Drinking Water Treatment. Sci. Total Environ. 2020, 705, 135779  DOI: 10.1016/j.scitotenv.2019.135779
    4. 4
      Postigo, C.; Gil-Solsona, R.; Herrera-Batista, M. F.; Gago-Ferrero, P.; Alygizakis, N.; Ahrens, L.; Wiberg, K. A Step Forward in the Detection of Byproducts of Anthropogenic Organic Micropollutants in Chlorinated Water. Trends Environ. Anal. Chem. 2021, 32, e00148  DOI: 10.1016/j.teac.2021.e00148
    5. 5
      Krauss, M.; Hug, C.; Bloch, R.; Schulze, T.; Brack, W. Prioritising Site-Specific Micropollutants in Surface Water from LC-HRMS Non-Target Screening Data Using a Rarity Score. Environ. Sci. Eur. 2019, 31, 45  DOI: 10.1186/s12302-019-0231-z
    6. 6
      Nanusha, M. Y.; Krauss, M.; Brack, W. Non-Target Screening for Detecting the Occurrence of Plant Metabolites in River Waters. Environ Sci Eur 2020, 32, 130  DOI: 10.1186/s12302-020-00415-5
    7. 7
      Kiefer, K.; Müller, A.; Singer, H.; Hollender, J. New Relevant Pesticide Transformation Products in Groundwater Detected Using Target and Suspect Screening for Agricultural and Urban Micropollutants with LC-HRMS. Water Res. 2019, 165, 114972  DOI: 10.1016/j.watres.2019.114972
    8. 8
      Letzel, T.; Bayer, A.; Schulz, W.; Heermann, A.; Lucke, T.; Greco, G.; Grosse, S.; Schüssler, W.; Sengl, M.; Letzel, M. LC–MS Screening Techniques for Wastewater Analysis and Analytical Data Handling Strategies: Sartans and Their Transformation Products as an Example. Chemosphere 2015, 137, 198206,  DOI: 10.1016/j.chemosphere.2015.06.083
    9. 9
      Escher, B. I.; Stapleton, H. M.; Schymanski, E. L. Tracking Complex Mixtures of Chemicals in Our Changing Environment. Science 2020, 367, 388392,  DOI: 10.1126/science.aay6636
    10. 10
      Schollée, J. E.; Hollender, J.; McArdell, C. S. Characterization of Advanced Wastewater Treatment with Ozone and Activated Carbon Using LC-HRMS Based Non-Target Screening with Automated Trend Assignment. Water Res. 2021, 200, 117209  DOI: 10.1016/j.watres.2021.117209
    11. 11
      Lopez-Herguedas, N.; González-Gaya, B.; Castelblanco-Boyacá, N.; Rico, A.; Etxebarria, N.; Olivares, M.; Prieto, A.; Zuloaga, O. Characterization of the Contamination Fingerprint of Wastewater Treatment Plant Effluents in the Henares River Basin (Central Spain) Based on Target and Suspect Screening Analysis. Sci. Total Environ. 2022, 806, 151262  DOI: 10.1016/j.scitotenv.2021.151262
    12. 12
      Meekel, N.; Vughs, D.; Béen, F.; Brunner, A. M. Online Prioritization of Toxic Compounds in Water Samples through Intelligent HRMS Data Acquisition. Anal. Chem. 2021, 93, 50715080,  DOI: 10.1021/acs.analchem.0c04473
    13. 13
      Gil-Solsona, R.; Nika, M.-C.; Bustamante, M.; Villanueva, C. M.; Foraster, M.; Cosin-Tomás, M.; Alygizakis, N.; Gómez-Roig, M. D.; Llurba-Olive, E.; Sunyer, J.; Thomaidis, N. S.; Dadvand, P.; Gago-Ferrero, P. The Potential of Sewage Sludge to Predict and Evaluate the Human Chemical Exposome. Environ. Sci. Technol. Lett. 2021, 8, 10771084,  DOI: 10.1021/acs.estlett.1c00848
    14. 14
      Schymanski, E. L.; Jeon, J.; Gulde, R.; Fenner, K.; Ruff, M.; Singer, H. P.; Hollender, J. Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence. Environ. Sci. Technol. 2014, 48, 20972098,  DOI: 10.1021/es5002105
    15. 15
      Rager, J. E.; Strynar, M. J.; Liang, S.; McMahen, R. L.; Richard, A. M.; Grulke, C. M.; Wambaugh, J. F.; Isaacs, K. K.; Judson, R.; Williams, A. J.; Sobus, J. R. Linking High Resolution Mass Spectrometry Data with Exposure and Toxicity Forecasts to Advance High-Throughput Environmental Monitoring. Environ. Int. 2016, 88, 269280,  DOI: 10.1016/j.envint.2015.12.008
    16. 16
      Schymanski, E. L.; Singer, H. P.; Longrée, P.; Loos, M.; Ruff, M.; Stravs, M. A.; Ripollés Vidal, C.; Hollender, J. Strategies to Characterize Polar Organic Contamination in Wastewater: Exploring the Capability of High Resolution Mass Spectrometry. Environ. Sci. Technol. 2014, 48, 18111818,  DOI: 10.1021/es4044374
    17. 17
      Kaserzon, S. L.; Heffernan, A. L.; Thompson, K.; Mueller, J. F.; Gomez Ramos, M. J. Rapid Screening and Identification of Chemical Hazards in Surface and Drinking Water Using High Resolution Mass Spectrometry and a Case-Control Filter. Chemosphere 2017, 182, 656664,  DOI: 10.1016/j.chemosphere.2017.05.071
    18. 18
      Judson, R. Public Databases Supporting Computational Toxicology. J. Toxicol. Environ. Health, Part B 2010, 13, 218231,  DOI: 10.1080/10937404.2010.483937
    19. 19
      Raies, A. B.; Bajic, V. B. In Silico Toxicology: Computational Methods for the Prediction of Chemical Toxicity: Computational Methods for the Prediction of Chemical Toxicity. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2016, 6, 147172,  DOI: 10.1002/wcms.1240
    20. 20
      Tang, W.; Chen, J.; Wang, Z.; Xie, H.; Hong, H. Deep Learning for Predicting Toxicity of Chemicals: A Mini Review. J. Environ. Sci. Health, Part C 2018, 36, 252271,  DOI: 10.1080/10590501.2018.1537563
    21. 21
      Idakwo, G.; Luttrell, J.; Chen, M.; Hong, H.; Zhou, Z.; Gong, P.; Zhang, C. A Review on Machine Learning Methods for in Silico Toxicity Prediction. J. Environ. Sci. Health, Part C 2018, 36, 169191,  DOI: 10.1080/10590501.2018.1537118
    22. 22
      Mayr, A.; Klambauer, G.; Unterthiner, T.; Hochreiter, S. DeepTox: Toxicity Prediction Using Deep Learning. Front. Environ. Sci. 2016, 3, 80  DOI: 10.3389/fenvs.2015.00080
    23. 23
      Chen, X.; Dang, L.; Yang, H.; Huang, X.; Yu, X. Machine Learning-Based Prediction of Toxicity of Organic Compounds towards Fathead Minnow. RSC Adv. 2020, 10, 3617436180,  DOI: 10.1039/D0RA05906D
    24. 24
      Zhao, C.; Zhang, H.; Zhang, X.; Liu, M.; Hu, Z.; Fan, B. Application of Support Vector Machine (SVM) for Prediction Toxic Activity of Different Data Sets. Toxicology 2006, 217, 105119,  DOI: 10.1016/j.tox.2005.08.019
    25. 25
      Wu, K.; Wei, G.-W. Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks. J. Chem. Inf. Model. 2018, 58, 520531,  DOI: 10.1021/acs.jcim.7b00558
    26. 26
      Williams, D. P.; Naisbitt, D. J. Toxicophores: Groups and Metabolic Routes Associated with Increased Safety Risk. Curr. Opin. Drug Discovery Dev. 2002, 5, 104115
    27. 27
      Alves, V. M.; Muratov, E. N.; Capuzzi, S. J.; Politi, R.; Low, Y.; Braga, R. C.; Zakharov, A. V.; Sedykh, A.; Mokshyna, E.; Farag, S.; Andrade, C. H.; Kuz’min, V. E.; Fourches, D.; Tropsha, A. Alarms about Structural Alerts. Green Chem. 2016, 18, 43484360,  DOI: 10.1039/C6GC01492E
    28. 28
      Öberg, T. A QSAR for Baseline Toxicity: Validation, Domain of Application, and Prediction. Chem. Res. Toxicol. 2004, 17, 16301637,  DOI: 10.1021/tx0498253
    29. 29
      Randazzo, G. M.; Tonoli, D.; Hambye, S.; Guillarme, D.; Jeanneret, F.; Nurisso, A.; Goracci, L.; Boccard, J.; Rudaz, S. Prediction of Retention Time in Reversed-Phase Liquid Chromatography as a Tool for Steroid Identification. Anal. Chim. Acta 2016, 916, 816,  DOI: 10.1016/j.aca.2016.02.014
    30. 30
      Zhao, M.; Li, Z.; Wu, Y.; Tang, Y.-R.; Wang, C.; Zhang, Z.; Peng, S. Studies on LogP, Retention Time and QSAR of 2-Substituted Phenylnitronyl Nitroxides as Free Radical Scavengers. Eur. J. Med. Chem. 2007, 42, 955965,  DOI: 10.1016/j.ejmech.2006.12.027
    31. 31
      Zhu, P.; Tong, W.; Alton, K.; Chowdhury, S. An Accurate-Mass-Based Spectral-Averaging Isotope-Pattern-Filtering Algorithm for Extraction of Drug Metabolites Possessing a Distinct Isotope Pattern from LC-MS Data. Anal. Chem. 2009, 81, 59105917,  DOI: 10.1021/ac900626d
    32. 32
      Schultz, T. W.; Amcoff, P.; Berggren, E.; Gautier, F.; Klaric, M.; Knight, D. J.; Mahony, C.; Schwarz, M.; White, A.; Cronin, M. T. D. A Strategy for Structuring and Reporting a Read-across Prediction of Toxicity. Regul. Toxicol. Pharmacol. 2015, 72, 586601,  DOI: 10.1016/j.yrtph.2015.05.016
    33. 33
      Kazius, J.; McGuire, R.; Bursi, R. Derivation and Validation of Toxicophores for Mutagenicity Prediction. J. Med. Chem. 2005, 48, 312320,  DOI: 10.1021/jm040835a
    34. 34
      Pradeep, P.; Judson, R.; DeMarini, D. M.; Keshava, N.; Martin, T. M.; Dean, J.; Gibbons, C. F.; Simha, A.; Warren, S. H.; Gwinn, M. R.; Patlewicz, G. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Comput. Toxicol. 2021, 18, 100167  DOI: 10.1016/j.comtox.2021.100167
    35. 35
      Heinonen, M.; Shen, H.; Zamboni, N.; Rousu, J. Metabolite Identification and Molecular Fingerprint Prediction through Machine Learning. Bioinformatics 2012, 28, 23332341,  DOI: 10.1093/bioinformatics/bts437
    36. 36
      O’Boyle, N. M.; Sayle, R. A. Comparing Structural Fingerprints Using a Literature-Based Similarity Benchmark. J. Cheminf. 2016, 8, 36  DOI: 10.1186/s13321-016-0148-0
    37. 37
      Williams, A. J.; Grulke, C. M.; Edwards, J.; McEachran, A. D.; Mansouri, K.; Baker, N. C.; Patlewicz, G.; Shah, I.; Wambaugh, J. F.; Judson, R. S.; Richard, A. M. The CompTox Chemistry Dashboard: A Community Data Resource for Environmental Chemistry. J. Cheminf. 2017, 9, 61  DOI: 10.1186/s13321-017-0247-6
    38. 38
      Guha, R. Chemical Informatics Functionality in R. J. Stat. Software 2007, 18, 116,  DOI: 10.18637/jss.v018.i05
    39. 39
      Yang, L. SHAP Visualization in R 2018.
    40. 40
      Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T. MassBank: A Public Repository for Sharing Mass Spectral Data for Life Sciences. J. Mass Spectrom. 2010, 45, 703714,  DOI: 10.1002/jms.1777
    41. 41
      Dührkop, K.; Shen, H.; Meusel, M.; Rousu, J.; Böcker, S. Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID. Proc. Natl. Acad. Sci. U.S.A. 2015, 112, 1258012585,  DOI: 10.1073/pnas.1509788112
    42. 42
      Dührkop, K.; Fleischauer, M.; Ludwig, M.; Aksenov, A. A.; Melnik, A. V.; Meusel, M.; Dorrestein, P. C.; Rousu, J.; Böcker, S. SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information. Nat. Methods 2019, 16, 299302,  DOI: 10.1038/s41592-019-0344-8
    43. 43
      Guisan, A.; Edwards, T. C.; Hastie, T. Generalized Linear and Generalized Additive Models in Studies of Species Distributions: Setting the Scene. Ecol. Modell. 2002, 157, 89100,  DOI: 10.1016/S0304-3800(02)00204-1
    44. 44
      Kuhn, M. caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret (accessed Dec 09, 2020).
    45. 45
      Rashmi, K. V.; Gilad-Bachrach, R. In DART: Dropouts Meet Multiple Additive Regression Trees, Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics; PMLR: SanDiego, California, USA, 2015; Vol. 38, pp 489497.
    46. 46
      Stieger, C. E.; Doppler, P.; Mechtler, K. Optimized Fragmentation Improves the Identification of Peptides Cross-Linked by MS-Cleavable Reagents. J. Proteome Res. 2019, 18, 13631370,  DOI: 10.1021/acs.jproteome.8b00947
    47. 47
      Vaniya, A.; Fiehn, O. Using Fragmentation Trees and Mass Spectral Trees for Identifying Unknown Compounds in Metabolomics. TrAC, Trends Anal. Chem. 2015, 69, 5261,  DOI: 10.1016/j.trac.2015.04.002
    48. 48
      Delistraty, D. Acute Toxicity to Rats and Trout with a Focus on Inhalation and Aquatic Exposures. Ecotoxicol. Environ. Saf. 2000, 46, 225233,  DOI: 10.1006/eesa.1999.1906
    49. 49
      Klüver, N.; Vogs, C.; Altenburger, R.; Escher, B. I.; Scholz, S. Development of a General Baseline Toxicity QSAR Model for the Fish Embryo Acute Toxicity Test. Chemosphere 2016, 164, 164173,  DOI: 10.1016/j.chemosphere.2016.08.079
    50. 50
      Mackay, D.; Mccarty, L. S.; Macleod, M. On the Validity of Classifying Chemicals for Persistence, Bioaccumulation, Toxicity, and Potential for Long-range Transport. 2001, 20 1491 1498. DOI: 10.1002/etc.5620200711 .
    51. 51
      Maintz, L.; Novak, N. Histamine and Histamine Intolerance. Am. J. Clin. Nutr. 2007, 85, 11851196,  DOI: 10.1093/ajcn/85.5.1185
    52. 52
      Cronstein, B. N. Adenosine, an Endogenous Anti-Inflammatory Agent. J. Appl. Physiol. 1994, 76, 513,  DOI: 10.1152/jappl.1994.76.1.5
    53. 53
      Palm, E.; Kruve, A. Machine Learning for Absolute Quantification of Unidentified Compounds in Non-Targeted LC/HRMS. Molecules 2022, 27, 1013  DOI: 10.3390/molecules27031013
  • Supporting Information

    Supporting Information

    ARTICLE SECTIONS
    Jump To

    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.2c02536.

    • More detailed description of datasets, measurement parameters, and standards used; additional figures for other trained models (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

You’ve supercharged your research process with ACS and Mendeley!

STEP 1:
Click to create an ACS ID

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

MENDELEY PAIRING EXPIRED
Your Mendeley pairing has expired. Please reconnect