Cyto-Safe: A Machine Learning Tool for Early Identification of Cytotoxic Compounds in Drug DiscoveryClick to copy article linkArticle link copied!
- Francisco L. FeitosaFrancisco L. FeitosaLaboratory for Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Goiás 74605-220, BrazilCenter for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo 05508-220, BrazilCenter for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, Goiás 74605-170, BrazilMore by Francisco L. Feitosa
- Victoria F. CabralVictoria F. CabralLaboratory for Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Goiás 74605-220, BrazilCenter for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo 05508-220, BrazilCenter for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, Goiás 74605-170, BrazilMore by Victoria F. Cabral
- Igor H. SanchesIgor H. SanchesLaboratory for Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Goiás 74605-220, BrazilCenter for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo 05508-220, BrazilCenter for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, Goiás 74605-170, BrazilMore by Igor H. Sanches
- Sabrina Silva-MendoncaSabrina Silva-MendoncaLaboratory for Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Goiás 74605-220, BrazilCenter for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo 05508-220, BrazilCenter for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, Goiás 74605-170, BrazilMore by Sabrina Silva-Mendonca
- Joyce V. V. B. BorbaJoyce V. V. B. BorbaLaboratory for Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Goiás 74605-220, BrazilCenter for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo 05508-220, BrazilCenter for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, Goiás 74605-170, BrazilMore by Joyce V. V. B. Borba
- Rodolpho C. Braga
- Carolina Horta Andrade*Carolina Horta Andrade*Telephone: +55 62 3209-6451; E-mail: [email protected]Laboratory for Molecular Modeling and Drug Design (LabMol), Faculdade de Farmácia, Universidade Federal de Goiás, Goiânia, Goiás 74605-220, BrazilCenter for the Research and Advancement in Fragments and molecular Targets (CRAFT), School of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Ribeirão Preto, São Paulo 05508-220, BrazilCenter for Excellence in Artificial Intelligence (CEIA), Institute of Informatics, Universidade Federal de Goiás, Goiânia, Goiás 74605-170, BrazilMore by Carolina Horta Andrade
Abstract
Cytotoxicity is essential in drug discovery, enabling early evaluation of toxic compounds during screenings to minimize toxicological risks. In vitro assays support high-throughput screening, allowing for efficient detection of toxic substances while considerably reducing the need for animal testing. Additionally, AI-based Quantitative Structure–Activity Relationship (AI-QSAR) models enhance early stage predictions by assessing the cytotoxic potential of molecular structures, which helps prioritize low-risk compounds for further validation. We present a freely accessible web application designed for identifying potential cytotoxic compounds utilizing QSAR models. This application utilizes machine learning techniques and is built on a data set of approximately 90,000 compounds, evaluated against two cell lines, 3T3 and HEK 293. Users can interact with the app by inputting a SMILES representation, uploading CSV or SDF files, or sketching molecules. The output includes a binary prediction for each cell line, a confidence percentage, and an explainable AI (XAI) analysis. Cyto-Safe web-app version 1.0 is available at http://insightai.labmol.com.br/.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Introduction
CYTO-SAFE
Data Collection
Data Cleaning and Curation
QSAR Modeling
Y-Randomization
Deployment
Explainable AI (XAI)
Results and Discussion
Modeling
BACC | AUC | F1 | MCC | Precision | Se | Sp | |
---|---|---|---|---|---|---|---|
3T3 Unbalanced | 0.81 | 0.81 | 0.69 | 0.68 | 0.78 | 0.63 | 0.99 |
3T3 1:1 | 0.80 | 0.80 | 0.80 | 0.59 | 0.80 | 0.79 | 0.81 |
3T3 1:5 | 0.92 | 0.92 | 0.90 | 0.88 | 0.96 | 0.84 | 0.99 |
HEK Unbalanced | 0.83 | 0.83 | 0.73 | 0.71 | 0.81 | 0.67 | 0.98 |
HEK 1:1 | 0.81 | 0.81 | 0.81 | 0.63 | 0.81 | 0.81 | 0.81 |
HEK 1:5 | 0.90 | 0.90 | 0.87 | 0.84 | 0.92 | 0.82 | 0.99 |
BACC: Balanced accuracy; AUC: Area under the curve; F1: F1 score; MCC: Matthew’s correlation coefficient; Se: Sensibility; Sp: Specificity.
Usability
Figure 1
Figure 1. General scheme of usage, outcome and XAI of Cyto-Safe web app.
Explainable AI (XAI) with Molecular Diagrams and Heatmaps
Figure 2
Figure 2. Explainable AI (XAI) molecular diagrams illustrating the model’s predictions for Doxorubicin on the 3T3 (A) and HEK-293 (B) models, and for Ibuprofen on the 3T3 (C) and HEK-293 (D) models. Red contoured regions highlight areas with a strong positive influence on predicted cytotoxicity, whereas green contoured regions indicate a strong positive influence on predicted nontoxicity. The intensity of the contour colors reflects the magnitude of their influence, with darker shades representing a greater impact on the model’s predictions.
Limitations
Comparative Analysis of QSAR Models for Cytotoxicity Prediction
Conclusions
Data Availability
All molecular structures used for each data set modeled are provided in the Supporting Information. The workflows used to calculate descriptors, split the data, train, and validate the models are available at https://github.com/LabMolUFG/cheminformatics_pipeline. The models are available at https://github.com/LabMolUFG/cytosafe.
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.4c01811.
Full data sets for the training and test for 3T3 and HEK 293 cell lines (XLSX)
Supplementary methods, results and figures including model’s hyperparameters, clustering and chemical space analysis, applicability domain threshold definition and Explainable AI heatmaps (PDF)
Data sets of compounds clustered by structural similarity (XLSX)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
The authors would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for the recognition received through the 21st Edition of the award “Prêmio Destaque na Iniciação Científica e Tecnológica” granted to FLF. This work has been funded by CNPq (grants #440373/2022-0, #140631/2021-6 and #441038/2020-4), FAPEG (#202010267000272), and CAPES (finance code 001). CHA is a CNPq research fellow.
References
This article references 30 other publications.
- 1Khalef, L.; Lydia, R.; Filicia, K.; Moussa, B. Cell Viability and Cytotoxicity Assays: Biochemical Elements and Cellular Compartments. Cell Biochem Funct 2024, 42 (3), e4007, DOI: 10.1002/cbf.4007Google ScholarThere is no corresponding record for this reference.
- 2Aslantürk, Ö. S. In Vitro Cytotoxicity and Cell Viability Assays: Principles, Advantages, and Disadvantages. In Genotoxicity - A Predictable Risk to Our Actual World; InTech, 2018. DOI: 10.5772/intechopen.71923 .Google ScholarThere is no corresponding record for this reference.
- 3Sams-Dodd, F. Target-Based Drug Discovery: Is Something Wrong?. Drug Discov Today 2005, 10 (2), 139– 147, DOI: 10.1016/S1359-6446(04)03316-1Google ScholarThere is no corresponding record for this reference.
- 4Swinney, D. C. Phenotypic vs. Target-Based Drug Discovery for First-in-Class Medicines. Clin Pharmacol Ther 2013, 93 (4), 299– 301, DOI: 10.1038/clpt.2012.236Google Scholar4Phenotypic vs. Target-Based Drug Discovery for First-in-Class MedicinesSwinney, D. C.Clinical Pharmacology & Therapeutics (New York, NY, United States) (2013), 93 (4), 299-301CODEN: CLPTAT; ISSN:0009-9236. (Nature Publishing Group)A review. Current drug discovery strategies include both mol. and empirical approaches. The mol. approaches are predominantly hypothesis-driven and are referred to as target-based. The empirical approaches are referred to as phenotypic because they rely on phenotypic measures of response. A recent anal. revealed the phenotypic approaches to be the more successful strategy for small-mol., first-in-class medicines. The rationalization for this success was the unbiased identification of the mol. mechanism of action (MMOA). Clin. Pharmacol. & Therapeutics (2013); 93 4, 299-301. doi:10.1038/clpt.2012.236.
- 5Riss, T.; Niles, A.; Moravec, R.; Karassina, N.; Vidugiriene, J. Cytotoxicity Assays: In Vitro Methods to Measure Dead Cells. In Assay Guidance Manual [Internet]; Eli Lilly & Company and the National Center for Advancing Translational Sciences, Bethesda (MD), 2019.Google ScholarThere is no corresponding record for this reference.
- 6Clark, A. M.; Dole, K.; Coulon-Spektor, A.; McNutt, A.; Grass, G.; Freundlich, J. S.; Reynolds, R. C.; Ekins, S. Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J. Chem. Inf Model 2015, 55 (6), 1231– 1245, DOI: 10.1021/acs.jcim.5b00143Google Scholar6Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery DatasetsClark, Alex M.; Dole, Krishna; Coulon-Spektor, Anna; McNutt, Andrew; Grass, George; Freundlich, Joel S.; Reynolds, Robert C.; Ekins, SeanJournal of Chemical Information and Modeling (2015), 55 (6), 1231-1245CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)On the order of hundreds of absorption, distribution, metab., excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a ref. implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chem. Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochem. properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid prodn. of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.
- 7Braga, R. C.; Alves, V. M.; Silva, M. F. B.; Muratov, E.; Fourches, D.; Lião, L. M.; Tropsha, A.; Andrade, C. H. Pred-hERG: A Novel Web-Accessible Computational Tool for Predicting Cardiac Toxicity. Mol. Inf. 2015, 34 (10), 698– 701, DOI: 10.1002/minf.201500040Google Scholar7Pred-hERG: A Novel web-Accessible Computational Tool for Predicting Cardiac ToxicityBraga, Rodolpho C.; Alves, Vinicius M.; Silva, Meryck F. B.; Muratov, Eugene; Fourches, Denis; Liao, Luciano M.; Tropsha, Alexander; Andrade, Carolina H.Molecular Informatics (2015), 34 (10), 698-701CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)The blockage of the hERG K+ channels is closely assocd. with lethal cardiac arrhythmia. The notorious ligand promiscuity of this channel earmarked hERG as one of the most important antitargets to be considered in early stages of drug development process. Herein the authors report on the development of an innovative and freely accessible web server for early identification of putative hERG blockers and non-blockers in chem. libraries. The authors have collected the largest publicly available curated hERG dataset of 5984 compds. The authors succeed in developing robust and externally predictive binary (CCR≈0.8) and multiclass models (accuracy≈0.7). These models are available as a web-service freely available for public at http://labmol.farmacia.ufg.br/predherg/. Three following outcomes are available for the users: prediction by binary model, prediction by multi-class model, and the probability maps of at. contribution. The Pred-hERG will be continuously updated and upgraded as new information became available.
- 8Sanches, I. H.; Braga, R. C.; Alves, V. M.; Andrade, C. H. Enhancing HERG Risk Assessment with Interpretable Classificatory and Regression Models. Chem. Res. Toxicol. 2024, 37 (6), 910– 922, DOI: 10.1021/acs.chemrestox.3c00400Google ScholarThere is no corresponding record for this reference.
- 9Borba, J. V. B.; Braga, R. C.; Alves, V. M.; Muratov, E. N.; Kleinstreuer, N.; Tropsha, A.; Andrade, C. H. Pred-Skin: A Web Portal for Accurate Prediction of Human Skin Sensitizers. Chem. Res. Toxicol. 2021, 34 (2), 258– 267, DOI: 10.1021/acs.chemrestox.0c00186Google Scholar9Pred-Skin: A Web Portal for Accurate Prediction of Human Skin SensitizersBorba, Joyce V. B.; Braga, Rodolpho C.; Alves, Vinicius M.; Muratov, Eugene N.; Kleinstreuer, Nicole; Tropsha, Alexander; Andrade, Carolina HortaChemical Research in Toxicology (2021), 34 (2), 258-267CODEN: CRTOEC; ISSN:0893-228X. (American Chemical Society)Safety assessment is an essential component of the regulatory acceptance of industrial chems. Previously, we have developed a model to predict the skin sensitization potential of chems. for two assays, the human patch test and murine local lymph node assay, and implemented this model in a web portal. Here, we report on the substantially revised and expanded freely available web tool, Pred-Skin version 3.0. This up-to-date version of Pred-Skin incorporates multiple quant. structure-activity relationship (QSAR) models developed with in vitro, in chemico, and mice and human in vivo data, integrated into a consensus naive Bayes model that predicts human effects. Individual QSAR models were generated using skin sensitization data derived from human repeat insult patch tests, human maximization tests, and mouse local lymph node assays. In addn., data for three validated alternative methods, the direct peptide reactivity assay, KeratinoSens, and the human cell line activation test, were employed as well. Models were developed using open-source tools and rigorously validated according to the best practices of QSAR modeling. Predictions obtained from these models were then used to build a naive Bayes model for predicting human skin sensitization with the following external prediction accuracy: correct classification rate (89%), sensitivity (94%), pos. predicted value (91%), specificity (84%), and neg. predicted value (89%). As an addnl. assessment of model performance, we identified 11 cosmetic ingredients known to cause skin sensitization but were not included in our training set, and nine of them were accurately predicted as sensitizers by our models. Pred-Skin can be used as a reliable alternative to animal tests for predicting human skin sensitization.
- 10Braga, R. C.; Alves, V. M.; Muratov, E. N.; Strickland, J.; Kleinstreuer, N.; Trospsha, A.; Andrade, C. H. Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of Chemicals. J. Chem. Inf. Model. 2017, 57 (5), 1013– 1017, DOI: 10.1021/acs.jcim.7b00194Google Scholar10Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of ChemicalsBraga, Rodolpho C.; Alves, Vinicius M.; Muratov, Eugene N.; Strickland, Judy; Kleinstreuer, Nicole; Trospsha, Alexander; Andrade, Carolina HortaJournal of Chemical Information and Modeling (2017), 57 (5), 1013-1017CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Chem. induced skin sensitization is a complex immunol. disease with a profound impact on quality of life and working ability. Despite some progress in developing alternative methods for assessing the skin sensitization potential of chem. substances, there is no in vitro test that correlates well with human data. Computational QSAR models provide a rapid screening approach and contribute valuable information for the assessment of chem. toxicity. The authors describe the development of a freely accessible web-based and mobile application for the identification of potential skin sensitizers. The application is based on previously developed binary QSAR models of skin sensitization potential from human (109 compds.) and murine local lymph node assay (LLNA, 515 compds.) data with good external correct classification rate (0.70-0.81 and 0.72-0.84, resp.). The authors also included a multiclass skin sensitization potency model based on LLNA data (accuracy ranging between 0.73 and 0.76). When a user evaluates a compd. in the web app, the outputs are (i) binary predictions of human and murine skin sensitization potential; (ii) multiclass prediction of murine skin sensitization; and (ii) probability maps illustrating the predicted contribution of chem. fragments. The app is the first tool available that incorporates quant. structure-activity relationship (QSAR) models based on human data as well as multiclass models for LLNA. The Pred-Skin web app version 1.0 is freely available for the web, iOS, and Android (in development) at the LabMol web portal (http://labmol.com.br/predskin/), in the Apple Store, and on Google Play, resp. The authors will continuously update the app as new skin sensitization data and resp. models become available.
- 11Lee, O. W.; Austin, S.; Gamma, M.; Cheff, D. M.; Lee, T. D.; Wilson, K. M.; Johnson, J.; Travers, J.; Braisted, J. C.; Guha, R.; Klumpp-Thomas, C.; Shen, M.; Hall, M. D. Cytotoxic Profiling of Annotated and Diverse Chemical Libraries Using Quantitative High-Throughput Screening. SLAS Discov 2020, 25 (1), 9– 20, DOI: 10.1177/2472555219873068Google ScholarThere is no corresponding record for this reference.
- 12Fourches, D.; Muratov, E.; Tropsha, A. Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research. J. Chem. Inf Model 2010, 50 (7), 1189– 1204, DOI: 10.1021/ci100176xGoogle Scholar12Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling ResearchFourches, Denis; Muratov, Eugene; Tropsha, AlexanderJournal of Chemical Information and Modeling (2010), 50 (7), 1189-1204CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)With the recent advent of high-throughput technologies for both compd. synthesis and biol. screening, there is no shortage of publicly or com. available data sets and databases that can be used for computational drug discovery applications. Rapid growth of large, publicly available databases (such as PubChem or ChemSpider contg. more than 20 million mol. records each) enabled by exptl. projects such as NIH's Mol. Libraries and Imaging Initiative provides new opportunities for the development of chemoinformatics methodologies and their application to knowledge discovery in mol. databases. A fundamental assumption of any chemoinformatics study is the correctness of the input data generated by exptl. scientists and available in various data sets. In another recent study, the authors investigated several public and com. databases to calc. their error rates: the latter were ranging from 0.1 to 3.4% depending on the database. How significant is the problem of accurate structure representation (given that the error rates in current databases may appear relatively low) since it concerns exploratory chemoinformatics and mol. modeling research. Recent investigations by a large group of collaborators from six labs. have clearly demonstrated that the type of chem. descriptors has much greater influence on the prediction performance of QSAR models than the nature of model optimization techniques.
- 13Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation. J. Chem. Inf Model 2016, 56 (7), 1243– 1252, DOI: 10.1021/acs.jcim.6b00129Google Scholar13Trust, but Verify II: A Practical Guide to Chemogenomics Data CurationFourches, Denis; Muratov, Eugene; Tropsha, AlexanderJournal of Chemical Information and Modeling (2016), 56 (7), 1243-1252CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)There is a growing public concern about the lack of reproducibility of exptl. data published in peer-reviewed scientific literature. Herein, we review the most recent alerts regarding exptl. data quality and discuss initiatives taken thus far to address this problem, esp. in the area of chem. genomics. Going beyond just acknowledging the issue, we propose a chem. and biol. data curation workflow that relies on existing cheminformatics approaches to flag, and when appropriate, correct possibly erroneous entries in large chemogenomics data sets. We posit that the adherence to the best practices for data curation is important for both exptl. scientists who generate primary data and deposit them in chem. genomics databases and computational researchers who rely on these data for model development.
- 14Zhang, J.; Mani, I. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In Workshop on Learning from Imbalanced Data Sets II ; Washington, 2003.Google ScholarThere is no corresponding record for this reference.
- 15Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform 2010, 29 (6–7), 476– 488, DOI: 10.1002/minf.201000061Google Scholar15Best Practices for QSAR Model Development, Validation, and ExploitationTropsha, AlexanderMolecular Informatics (2010), 29 (6-7), 476-488CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)A review. After nearly 5 decades "in the making", QSAR modeling has established itself as one of the major computational mol. modeling methodologies. As any mature research discipline, QSAR modeling can be characterized by a collection of well defined protocols and procedures that enable the expert application of the method for exploring and exploiting ever growing collections of biol. active chem. compds. This review examines most crit. QSAR modeling routines that we regard as best practices in the field. We discuss these procedures in the context of integrative predictive QSAR modeling workflow that is focused on achieving models of the highest statistical rigor and external predictive power. Specific elements of the workflow consist of data prepn. including chem. structure (and when possible, assocd. biol. data) curation, outlier detection, dataset balancing, and model validation. We esp. emphasize procedures used to validate models, both internally and externally, as well as the need to define model applicability domains that should be used when models are employed for the prediction of external compds. or compd. libraries. Finally, we present several examples of successful applications of QSAR models for virtual screening to identify exptl. confirmed hits.
- 16OECD. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models; OECD, 2014. DOI: 10.1787/9789264085442-en .Google ScholarThere is no corresponding record for this reference.
- 17Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf Model 2010, 50 (5), 742– 754, DOI: 10.1021/ci100050tGoogle Scholar17Extended-Connectivity FingerprintsRogers, David; Hahn, MathewJournal of Chemical Information and Modeling (2010), 50 (5), 742-754CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Extended-connectivity fingerprints (ECFPs) are a novel class of topol. fingerprints for mol. characterization. Historically, topol. fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a no. of useful qualities: they can be very rapidly calcd.; they are not predefined and can represent an essentially infinite no. of different mol. features (including stereochem. information); their features represent the presence of particular substructures, allowing easier interpretation of anal. results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
- 18RDKit: Open-Source Cheminformatics. https://www.rdkit.org/.Google ScholarThere is no corresponding record for this reference.
- 19Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems; NIPS’17; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp 3149– 3157.Google ScholarThere is no corresponding record for this reference.
- 20Riniker, S.; Landrum, G. A. Similarity Maps - a Visualization Strategy for Molecular Fingerprints and Machine-Learning Methods. J. Cheminform 2013, 5 (1), 43, DOI: 10.1186/1758-2946-5-43Google Scholar20Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methodsRiniker Sereina; Landrum Gregory AJournal of cheminformatics (2013), 5 (1), 43 ISSN:1758-2946.: Fingerprint similarity is a common method for comparing chemical structures. Similarity is an appealing approach because, with many fingerprint types, it provides intuitive results: a chemist looking at two molecules can understand why they have been determined to be similar. This transparency is partially lost with the fuzzier similarity methods that are often used for scaffold hopping and tends to vanish completely when molecular fingerprints are used as inputs to machine-learning (ML) models. Here we present similarity maps, a straightforward and general strategy to visualize the atomic contributions to the similarity between two molecules or the predicted probability of a ML model. We show the application of similarity maps to a set of dopamine D3 receptor ligands using atom-pair and circular fingerprints as well as two popular ML methods: random forests and naive Bayes. An open-source implementation of the method is provided.
- 21Gadaleta, D.; Mangiatordi, G. F.; Catto, M.; Carotti, A.; Nicolotti, O. Applicability Domain for QSAR Models. IJQSPR 2016, 1 (1), 45– 63, DOI: 10.4018/IJQSPR.2016010102Google ScholarThere is no corresponding record for this reference.
- 22Ukelis, U.; Kramer, P.-J.; Olejniczak, K.; Mueller, S. O. Replacement of in Vivo Acute Oral Toxicity Studies by in Vitro Cytotoxicity Methods: Opportunities, Limits and Regulatory Status. Regul. Toxicol. Pharmacol. 2008, 51 (1), 108– 118, DOI: 10.1016/j.yrtph.2008.02.002Google Scholar22Replacement of in vivo acute oral toxicity studies by in vitro cytotoxicity methods: Opportunities, limits and regulatory statusUkelis, Ute; Kramer, Peter-Juergen; Olejniczak, Klaus; Mueller, Stefan O.Regulatory Toxicology and Pharmacology (2008), 51 (1), 108-118CODEN: RTOPDW; ISSN:0273-2300. (Elsevier Inc.)A review. The development of a new medicinal product is a long and costly process in particular due to the regulatory requirements for quality, safety and efficacy. There is a common interest to increase the efficiency of drug development and to provide new, better quality medicinal products much faster to the public. One possible way to economize time and costs, as well as to consider animal protection issues, is to introduce new alternative methods into non-clin. toxicity testing. Currently, animal tests are mandatory for the evaluation of acute toxicity of chems. and new drugs. The replacement of the in vivo tests by alternative in vitro assays would offer the opportunity to screen and assess numerous compds. at the same time, to predict acute oral toxicity and thus accelerate drug development. Moreover, the substitution of in vivo tests by in vitro methods shows a proactive pursuit of ethical and animal welfare issues. Importantly, the implementation of in vitro assays for acute oral toxicity would require the establishment of common test guidelines across the EU, USA and Japan, i.e., the regions of ICH (International Conference on Harmonisation of Tech. Requirements for Registration of Pharmaceuticals for Human Use). Presently, alternative in vitro tests are being investigated internationally. Yet, in order to achieve regulatory acceptance and implementation of in vitro assays, convincing results from validation studies are required. In this review, we discuss the current regulatory status of acute oral toxicity testing and point out achievements of alternative methods. We describe the application of in vitro tests, correlating in vitro with in vivo data. The use of in vitro data to predict in vivo acute oral toxicity is analyzed using the Registry of Cytotoxicity, an official independent database. We have then analyzed opportunities and drawbacks for future implementation of in vitro test methods, with particular focus on industrial use.
- 23Niepel, M.; Hafner, M.; Mills, C. E.; Subramanian, K.; Williams, E. H.; Chung, M.; Gaudio, B.; Barrette, A. M.; Stern, A. D.; Hu, B.; Korkola, J. E.; Gray, J. W.; Birtwistle, M. R.; Heiser, L. M.; Sorger, P. K.; Shamu, C. E.; Jayaraman, G.; Azeloglu, E. U.; Iyengar, R.; Sobie, E. A.; Mills, G. B.; Liby, T.; Jaffe, J. D.; Alimova, M.; Davison, D.; Lu, X.; Golub, T. R.; Subramanian, A.; Shelley, B.; Svendsen, C. N.; Ma’ayan, A.; Medvedovic, M.; Feiler, H. S.; Smith, R.; Devlin, K. A Multi-Center Study on the Reproducibility of Drug-Response Assays in Mammalian Cell Lines. Cell Syst 2019, 9 (1), 35– 48, DOI: 10.1016/j.cels.2019.06.005Google ScholarThere is no corresponding record for this reference.
- 24Clothier, R. H. The FRAME Cytotoxicity Test (Kenacid Blue). In In Vitro Toxicity Testing Protocols; Humana Press: Totowa, NJ, 1995; pp 109– 118. DOI: 10.1385/0-89603-282-5:109 .Google ScholarThere is no corresponding record for this reference.
- 25Langdon, S. R.; Mulgrew, J.; Paolini, G. V.; Van Hoorn, W. P. Predicting Cytotoxicity from Heterogeneous Data Sources with Bayesian Learning. J. Cheminform 2010, 2 (1), 11, DOI: 10.1186/1758-2946-2-11Google ScholarThere is no corresponding record for this reference.
- 26Banerjee, P.; Kemmler, E.; Dunkel, M.; Preissner, R. ProTox 3.0: A Webserver for the Prediction of Toxicity of Chemicals. Nucleic Acids Res. 2024, 52 (W1), W513– W520, DOI: 10.1093/nar/gkae303Google ScholarThere is no corresponding record for this reference.
- 27Yin, Z.; Ai, H.; Zhang, L.; Ren, G.; Wang, Y.; Zhao, Q.; Liu, H. Predicting the Cytotoxicity of Chemicals Using Ensemble Learning Methods and Molecular Fingerprints. J. Appl. Toxicol 2019, 39 (10), 1366– 1377, DOI: 10.1002/jat.3785Google ScholarThere is no corresponding record for this reference.
- 28Liu, Q.; He, D.; Fan, M.; Wang, J.; Cui, Z.; Wang, H.; Mi, Y.; Li, N.; Meng, Q.; Hou, Y. Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. J. Chem. Inf Model 2024, DOI: 10.1021/acs.jcim.4c00366Google ScholarThere is no corresponding record for this reference.
- 29Sun, H.; Wang, Y.; Cheff, D. M.; Hall, M. D.; Shen, M. Predictive Models for Estimating Cytotoxicity on the Basis of Chemical Structures. Bioorg. Med. Chem. 2020, 28 (10), 115422, DOI: 10.1016/j.bmc.2020.115422Google Scholar29Predictive models for estimating cytotoxicity on the basis of chemical structuresSun, Hongmao; Wang, Yuhong; Cheff, Dorian M.; Hall, Matthew D.; Shen, MinBioorganic & Medicinal Chemistry (2020), 28 (10), 115422CODEN: BMECEP; ISSN:0968-0896. (Elsevier B.V.)Cytotoxicity is a crit. property in detg. the fate of a small mol. in the drug discovery pipeline. Cytotoxic compds. are identified and triaged in both target-based and cell-based phenotypic approaches due to their off-target toxicity or on-target and on-mechanism toxicity for oncol. and neurodegenerative targets. It is crit. that chem.-induced cytotoxicity be reliably predicted before drug candidates advance to the late stage of development, or more ideally, before compds. are synthesized. In this study, we assessed the cell-based cytotoxicity of nearly 10,000 compds. in NCATS annotated libraries against four 'normal' cell lines (HEK 293, NIH 3T3, CRL-7250 and HaCat) using CellTiter-Glo (CTG) technol. and constructed highly predictive models to est. cytotoxicity from chem. structures. There are 5,241 non-redundant compds. having unambiguous activities in the four different cell lines, among which 11.8% compds. exhibited cytotoxicity in two or more cell lines and are thus labeled cytotoxic. The support vector classification (SVC) models trained with 80% randomly selected mols. achieved the area under the receiver operating characteristic curve (AUC-ROC) of 0.88 on av. for the remaining 20% compds. in the test sets in 10 repeating expts. Application of under-sampling rebalancing method further improved the averaged AUC-ROC to 0.90. Anal. of structural features shared by cytotoxic compds. may offer medicinal chemists heuristic design ideas to eliminate undesirable cytotoxicity. The profiling of cytotoxicity of drug-like mols. with annotated primary mechanism of action (MOA) will inform on the roles played by different targets or pathways in cellular viability. The predictive models for cytotoxicity (accessible at https://tripod.nih.gov/web_adme/cytotox.html) provide the scientific community a fast yet reliable way to prioritize mols. with little or no cytotoxicity for downstream development.
- 30Webel, H. E.; Kimber, T. B.; Radetzki, S.; Neuenschwander, M.; Nazaré, M.; Volkamer, A. Revealing Cytotoxic Substructures in Molecules Using Deep Learning. J. Comput. Aided Mol. Des 2020, 34 (7), 731– 746, DOI: 10.1007/s10822-020-00310-4Google Scholar30Revealing cytotoxic substructures in molecules using deep learningWebel, Henry E.; Kimber, Talia B.; Radetzki, Silke; Neuenschwander, Martin; Nazare, Marc; Volkamer, AndreaJournal of Computer-Aided Molecular Design (2020), 34 (7), 731-746CODEN: JCADEQ; ISSN:0920-654X. (Springer)Abstr.: In drug development, late stage toxicity issues of a compd. are the main cause of failure in clin. trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Tech. advances and the ever growing amt. of available toxicity data enabled machine learning, esp. neural networks, to impact the field of predictive toxicol. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent inhouse data set of over 34,000 compds. with a share of less than 5% of cytotoxic mols. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decompn. method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compd. as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compds.
Cited By
This article has not yet been cited by other publications.
Article Views
Altmetric
Citations
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Abstract
Figure 1
Figure 1. General scheme of usage, outcome and XAI of Cyto-Safe web app.
Figure 2
Figure 2. Explainable AI (XAI) molecular diagrams illustrating the model’s predictions for Doxorubicin on the 3T3 (A) and HEK-293 (B) models, and for Ibuprofen on the 3T3 (C) and HEK-293 (D) models. Red contoured regions highlight areas with a strong positive influence on predicted cytotoxicity, whereas green contoured regions indicate a strong positive influence on predicted nontoxicity. The intensity of the contour colors reflects the magnitude of their influence, with darker shades representing a greater impact on the model’s predictions.
References
This article references 30 other publications.
- 1Khalef, L.; Lydia, R.; Filicia, K.; Moussa, B. Cell Viability and Cytotoxicity Assays: Biochemical Elements and Cellular Compartments. Cell Biochem Funct 2024, 42 (3), e4007, DOI: 10.1002/cbf.4007There is no corresponding record for this reference.
- 2Aslantürk, Ö. S. In Vitro Cytotoxicity and Cell Viability Assays: Principles, Advantages, and Disadvantages. In Genotoxicity - A Predictable Risk to Our Actual World; InTech, 2018. DOI: 10.5772/intechopen.71923 .There is no corresponding record for this reference.
- 3Sams-Dodd, F. Target-Based Drug Discovery: Is Something Wrong?. Drug Discov Today 2005, 10 (2), 139– 147, DOI: 10.1016/S1359-6446(04)03316-1There is no corresponding record for this reference.
- 4Swinney, D. C. Phenotypic vs. Target-Based Drug Discovery for First-in-Class Medicines. Clin Pharmacol Ther 2013, 93 (4), 299– 301, DOI: 10.1038/clpt.2012.2364Phenotypic vs. Target-Based Drug Discovery for First-in-Class MedicinesSwinney, D. C.Clinical Pharmacology & Therapeutics (New York, NY, United States) (2013), 93 (4), 299-301CODEN: CLPTAT; ISSN:0009-9236. (Nature Publishing Group)A review. Current drug discovery strategies include both mol. and empirical approaches. The mol. approaches are predominantly hypothesis-driven and are referred to as target-based. The empirical approaches are referred to as phenotypic because they rely on phenotypic measures of response. A recent anal. revealed the phenotypic approaches to be the more successful strategy for small-mol., first-in-class medicines. The rationalization for this success was the unbiased identification of the mol. mechanism of action (MMOA). Clin. Pharmacol. & Therapeutics (2013); 93 4, 299-301. doi:10.1038/clpt.2012.236.
- 5Riss, T.; Niles, A.; Moravec, R.; Karassina, N.; Vidugiriene, J. Cytotoxicity Assays: In Vitro Methods to Measure Dead Cells. In Assay Guidance Manual [Internet]; Eli Lilly & Company and the National Center for Advancing Translational Sciences, Bethesda (MD), 2019.There is no corresponding record for this reference.
- 6Clark, A. M.; Dole, K.; Coulon-Spektor, A.; McNutt, A.; Grass, G.; Freundlich, J. S.; Reynolds, R. C.; Ekins, S. Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J. Chem. Inf Model 2015, 55 (6), 1231– 1245, DOI: 10.1021/acs.jcim.5b001436Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery DatasetsClark, Alex M.; Dole, Krishna; Coulon-Spektor, Anna; McNutt, Andrew; Grass, George; Freundlich, Joel S.; Reynolds, Robert C.; Ekins, SeanJournal of Chemical Information and Modeling (2015), 55 (6), 1231-1245CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)On the order of hundreds of absorption, distribution, metab., excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a ref. implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chem. Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochem. properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid prodn. of robust machine learning models from public data or the user's own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.
- 7Braga, R. C.; Alves, V. M.; Silva, M. F. B.; Muratov, E.; Fourches, D.; Lião, L. M.; Tropsha, A.; Andrade, C. H. Pred-hERG: A Novel Web-Accessible Computational Tool for Predicting Cardiac Toxicity. Mol. Inf. 2015, 34 (10), 698– 701, DOI: 10.1002/minf.2015000407Pred-hERG: A Novel web-Accessible Computational Tool for Predicting Cardiac ToxicityBraga, Rodolpho C.; Alves, Vinicius M.; Silva, Meryck F. B.; Muratov, Eugene; Fourches, Denis; Liao, Luciano M.; Tropsha, Alexander; Andrade, Carolina H.Molecular Informatics (2015), 34 (10), 698-701CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)The blockage of the hERG K+ channels is closely assocd. with lethal cardiac arrhythmia. The notorious ligand promiscuity of this channel earmarked hERG as one of the most important antitargets to be considered in early stages of drug development process. Herein the authors report on the development of an innovative and freely accessible web server for early identification of putative hERG blockers and non-blockers in chem. libraries. The authors have collected the largest publicly available curated hERG dataset of 5984 compds. The authors succeed in developing robust and externally predictive binary (CCR≈0.8) and multiclass models (accuracy≈0.7). These models are available as a web-service freely available for public at http://labmol.farmacia.ufg.br/predherg/. Three following outcomes are available for the users: prediction by binary model, prediction by multi-class model, and the probability maps of at. contribution. The Pred-hERG will be continuously updated and upgraded as new information became available.
- 8Sanches, I. H.; Braga, R. C.; Alves, V. M.; Andrade, C. H. Enhancing HERG Risk Assessment with Interpretable Classificatory and Regression Models. Chem. Res. Toxicol. 2024, 37 (6), 910– 922, DOI: 10.1021/acs.chemrestox.3c00400There is no corresponding record for this reference.
- 9Borba, J. V. B.; Braga, R. C.; Alves, V. M.; Muratov, E. N.; Kleinstreuer, N.; Tropsha, A.; Andrade, C. H. Pred-Skin: A Web Portal for Accurate Prediction of Human Skin Sensitizers. Chem. Res. Toxicol. 2021, 34 (2), 258– 267, DOI: 10.1021/acs.chemrestox.0c001869Pred-Skin: A Web Portal for Accurate Prediction of Human Skin SensitizersBorba, Joyce V. B.; Braga, Rodolpho C.; Alves, Vinicius M.; Muratov, Eugene N.; Kleinstreuer, Nicole; Tropsha, Alexander; Andrade, Carolina HortaChemical Research in Toxicology (2021), 34 (2), 258-267CODEN: CRTOEC; ISSN:0893-228X. (American Chemical Society)Safety assessment is an essential component of the regulatory acceptance of industrial chems. Previously, we have developed a model to predict the skin sensitization potential of chems. for two assays, the human patch test and murine local lymph node assay, and implemented this model in a web portal. Here, we report on the substantially revised and expanded freely available web tool, Pred-Skin version 3.0. This up-to-date version of Pred-Skin incorporates multiple quant. structure-activity relationship (QSAR) models developed with in vitro, in chemico, and mice and human in vivo data, integrated into a consensus naive Bayes model that predicts human effects. Individual QSAR models were generated using skin sensitization data derived from human repeat insult patch tests, human maximization tests, and mouse local lymph node assays. In addn., data for three validated alternative methods, the direct peptide reactivity assay, KeratinoSens, and the human cell line activation test, were employed as well. Models were developed using open-source tools and rigorously validated according to the best practices of QSAR modeling. Predictions obtained from these models were then used to build a naive Bayes model for predicting human skin sensitization with the following external prediction accuracy: correct classification rate (89%), sensitivity (94%), pos. predicted value (91%), specificity (84%), and neg. predicted value (89%). As an addnl. assessment of model performance, we identified 11 cosmetic ingredients known to cause skin sensitization but were not included in our training set, and nine of them were accurately predicted as sensitizers by our models. Pred-Skin can be used as a reliable alternative to animal tests for predicting human skin sensitization.
- 10Braga, R. C.; Alves, V. M.; Muratov, E. N.; Strickland, J.; Kleinstreuer, N.; Trospsha, A.; Andrade, C. H. Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of Chemicals. J. Chem. Inf. Model. 2017, 57 (5), 1013– 1017, DOI: 10.1021/acs.jcim.7b0019410Pred-Skin: A Fast and Reliable Web Application to Assess Skin Sensitization Effect of ChemicalsBraga, Rodolpho C.; Alves, Vinicius M.; Muratov, Eugene N.; Strickland, Judy; Kleinstreuer, Nicole; Trospsha, Alexander; Andrade, Carolina HortaJournal of Chemical Information and Modeling (2017), 57 (5), 1013-1017CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Chem. induced skin sensitization is a complex immunol. disease with a profound impact on quality of life and working ability. Despite some progress in developing alternative methods for assessing the skin sensitization potential of chem. substances, there is no in vitro test that correlates well with human data. Computational QSAR models provide a rapid screening approach and contribute valuable information for the assessment of chem. toxicity. The authors describe the development of a freely accessible web-based and mobile application for the identification of potential skin sensitizers. The application is based on previously developed binary QSAR models of skin sensitization potential from human (109 compds.) and murine local lymph node assay (LLNA, 515 compds.) data with good external correct classification rate (0.70-0.81 and 0.72-0.84, resp.). The authors also included a multiclass skin sensitization potency model based on LLNA data (accuracy ranging between 0.73 and 0.76). When a user evaluates a compd. in the web app, the outputs are (i) binary predictions of human and murine skin sensitization potential; (ii) multiclass prediction of murine skin sensitization; and (ii) probability maps illustrating the predicted contribution of chem. fragments. The app is the first tool available that incorporates quant. structure-activity relationship (QSAR) models based on human data as well as multiclass models for LLNA. The Pred-Skin web app version 1.0 is freely available for the web, iOS, and Android (in development) at the LabMol web portal (http://labmol.com.br/predskin/), in the Apple Store, and on Google Play, resp. The authors will continuously update the app as new skin sensitization data and resp. models become available.
- 11Lee, O. W.; Austin, S.; Gamma, M.; Cheff, D. M.; Lee, T. D.; Wilson, K. M.; Johnson, J.; Travers, J.; Braisted, J. C.; Guha, R.; Klumpp-Thomas, C.; Shen, M.; Hall, M. D. Cytotoxic Profiling of Annotated and Diverse Chemical Libraries Using Quantitative High-Throughput Screening. SLAS Discov 2020, 25 (1), 9– 20, DOI: 10.1177/2472555219873068There is no corresponding record for this reference.
- 12Fourches, D.; Muratov, E.; Tropsha, A. Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research. J. Chem. Inf Model 2010, 50 (7), 1189– 1204, DOI: 10.1021/ci100176x12Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling ResearchFourches, Denis; Muratov, Eugene; Tropsha, AlexanderJournal of Chemical Information and Modeling (2010), 50 (7), 1189-1204CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)With the recent advent of high-throughput technologies for both compd. synthesis and biol. screening, there is no shortage of publicly or com. available data sets and databases that can be used for computational drug discovery applications. Rapid growth of large, publicly available databases (such as PubChem or ChemSpider contg. more than 20 million mol. records each) enabled by exptl. projects such as NIH's Mol. Libraries and Imaging Initiative provides new opportunities for the development of chemoinformatics methodologies and their application to knowledge discovery in mol. databases. A fundamental assumption of any chemoinformatics study is the correctness of the input data generated by exptl. scientists and available in various data sets. In another recent study, the authors investigated several public and com. databases to calc. their error rates: the latter were ranging from 0.1 to 3.4% depending on the database. How significant is the problem of accurate structure representation (given that the error rates in current databases may appear relatively low) since it concerns exploratory chemoinformatics and mol. modeling research. Recent investigations by a large group of collaborators from six labs. have clearly demonstrated that the type of chem. descriptors has much greater influence on the prediction performance of QSAR models than the nature of model optimization techniques.
- 13Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation. J. Chem. Inf Model 2016, 56 (7), 1243– 1252, DOI: 10.1021/acs.jcim.6b0012913Trust, but Verify II: A Practical Guide to Chemogenomics Data CurationFourches, Denis; Muratov, Eugene; Tropsha, AlexanderJournal of Chemical Information and Modeling (2016), 56 (7), 1243-1252CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)There is a growing public concern about the lack of reproducibility of exptl. data published in peer-reviewed scientific literature. Herein, we review the most recent alerts regarding exptl. data quality and discuss initiatives taken thus far to address this problem, esp. in the area of chem. genomics. Going beyond just acknowledging the issue, we propose a chem. and biol. data curation workflow that relies on existing cheminformatics approaches to flag, and when appropriate, correct possibly erroneous entries in large chemogenomics data sets. We posit that the adherence to the best practices for data curation is important for both exptl. scientists who generate primary data and deposit them in chem. genomics databases and computational researchers who rely on these data for model development.
- 14Zhang, J.; Mani, I. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. In Workshop on Learning from Imbalanced Data Sets II ; Washington, 2003.There is no corresponding record for this reference.
- 15Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform 2010, 29 (6–7), 476– 488, DOI: 10.1002/minf.20100006115Best Practices for QSAR Model Development, Validation, and ExploitationTropsha, AlexanderMolecular Informatics (2010), 29 (6-7), 476-488CODEN: MIONBS; ISSN:1868-1743. (Wiley-VCH Verlag GmbH & Co. KGaA)A review. After nearly 5 decades "in the making", QSAR modeling has established itself as one of the major computational mol. modeling methodologies. As any mature research discipline, QSAR modeling can be characterized by a collection of well defined protocols and procedures that enable the expert application of the method for exploring and exploiting ever growing collections of biol. active chem. compds. This review examines most crit. QSAR modeling routines that we regard as best practices in the field. We discuss these procedures in the context of integrative predictive QSAR modeling workflow that is focused on achieving models of the highest statistical rigor and external predictive power. Specific elements of the workflow consist of data prepn. including chem. structure (and when possible, assocd. biol. data) curation, outlier detection, dataset balancing, and model validation. We esp. emphasize procedures used to validate models, both internally and externally, as well as the need to define model applicability domains that should be used when models are employed for the prediction of external compds. or compd. libraries. Finally, we present several examples of successful applications of QSAR models for virtual screening to identify exptl. confirmed hits.
- 16OECD. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationship [(Q)SAR] Models; OECD, 2014. DOI: 10.1787/9789264085442-en .There is no corresponding record for this reference.
- 17Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf Model 2010, 50 (5), 742– 754, DOI: 10.1021/ci100050t17Extended-Connectivity FingerprintsRogers, David; Hahn, MathewJournal of Chemical Information and Modeling (2010), 50 (5), 742-754CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Extended-connectivity fingerprints (ECFPs) are a novel class of topol. fingerprints for mol. characterization. Historically, topol. fingerprints were developed for substructure and similarity searching. ECFPs were developed specifically for structure-activity modeling. ECFPs are circular fingerprints with a no. of useful qualities: they can be very rapidly calcd.; they are not predefined and can represent an essentially infinite no. of different mol. features (including stereochem. information); their features represent the presence of particular substructures, allowing easier interpretation of anal. results; and the ECFP algorithm can be tailored to generate different types of circular fingerprints, optimized for different uses. While the use of ECFPs has been widely adopted and validated, a description of their implementation has not previously been presented in the literature.
- 18RDKit: Open-Source Cheminformatics. https://www.rdkit.org/.There is no corresponding record for this reference.
- 19Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems; NIPS’17; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp 3149– 3157.There is no corresponding record for this reference.
- 20Riniker, S.; Landrum, G. A. Similarity Maps - a Visualization Strategy for Molecular Fingerprints and Machine-Learning Methods. J. Cheminform 2013, 5 (1), 43, DOI: 10.1186/1758-2946-5-4320Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methodsRiniker Sereina; Landrum Gregory AJournal of cheminformatics (2013), 5 (1), 43 ISSN:1758-2946.: Fingerprint similarity is a common method for comparing chemical structures. Similarity is an appealing approach because, with many fingerprint types, it provides intuitive results: a chemist looking at two molecules can understand why they have been determined to be similar. This transparency is partially lost with the fuzzier similarity methods that are often used for scaffold hopping and tends to vanish completely when molecular fingerprints are used as inputs to machine-learning (ML) models. Here we present similarity maps, a straightforward and general strategy to visualize the atomic contributions to the similarity between two molecules or the predicted probability of a ML model. We show the application of similarity maps to a set of dopamine D3 receptor ligands using atom-pair and circular fingerprints as well as two popular ML methods: random forests and naive Bayes. An open-source implementation of the method is provided.
- 21Gadaleta, D.; Mangiatordi, G. F.; Catto, M.; Carotti, A.; Nicolotti, O. Applicability Domain for QSAR Models. IJQSPR 2016, 1 (1), 45– 63, DOI: 10.4018/IJQSPR.2016010102There is no corresponding record for this reference.
- 22Ukelis, U.; Kramer, P.-J.; Olejniczak, K.; Mueller, S. O. Replacement of in Vivo Acute Oral Toxicity Studies by in Vitro Cytotoxicity Methods: Opportunities, Limits and Regulatory Status. Regul. Toxicol. Pharmacol. 2008, 51 (1), 108– 118, DOI: 10.1016/j.yrtph.2008.02.00222Replacement of in vivo acute oral toxicity studies by in vitro cytotoxicity methods: Opportunities, limits and regulatory statusUkelis, Ute; Kramer, Peter-Juergen; Olejniczak, Klaus; Mueller, Stefan O.Regulatory Toxicology and Pharmacology (2008), 51 (1), 108-118CODEN: RTOPDW; ISSN:0273-2300. (Elsevier Inc.)A review. The development of a new medicinal product is a long and costly process in particular due to the regulatory requirements for quality, safety and efficacy. There is a common interest to increase the efficiency of drug development and to provide new, better quality medicinal products much faster to the public. One possible way to economize time and costs, as well as to consider animal protection issues, is to introduce new alternative methods into non-clin. toxicity testing. Currently, animal tests are mandatory for the evaluation of acute toxicity of chems. and new drugs. The replacement of the in vivo tests by alternative in vitro assays would offer the opportunity to screen and assess numerous compds. at the same time, to predict acute oral toxicity and thus accelerate drug development. Moreover, the substitution of in vivo tests by in vitro methods shows a proactive pursuit of ethical and animal welfare issues. Importantly, the implementation of in vitro assays for acute oral toxicity would require the establishment of common test guidelines across the EU, USA and Japan, i.e., the regions of ICH (International Conference on Harmonisation of Tech. Requirements for Registration of Pharmaceuticals for Human Use). Presently, alternative in vitro tests are being investigated internationally. Yet, in order to achieve regulatory acceptance and implementation of in vitro assays, convincing results from validation studies are required. In this review, we discuss the current regulatory status of acute oral toxicity testing and point out achievements of alternative methods. We describe the application of in vitro tests, correlating in vitro with in vivo data. The use of in vitro data to predict in vivo acute oral toxicity is analyzed using the Registry of Cytotoxicity, an official independent database. We have then analyzed opportunities and drawbacks for future implementation of in vitro test methods, with particular focus on industrial use.
- 23Niepel, M.; Hafner, M.; Mills, C. E.; Subramanian, K.; Williams, E. H.; Chung, M.; Gaudio, B.; Barrette, A. M.; Stern, A. D.; Hu, B.; Korkola, J. E.; Gray, J. W.; Birtwistle, M. R.; Heiser, L. M.; Sorger, P. K.; Shamu, C. E.; Jayaraman, G.; Azeloglu, E. U.; Iyengar, R.; Sobie, E. A.; Mills, G. B.; Liby, T.; Jaffe, J. D.; Alimova, M.; Davison, D.; Lu, X.; Golub, T. R.; Subramanian, A.; Shelley, B.; Svendsen, C. N.; Ma’ayan, A.; Medvedovic, M.; Feiler, H. S.; Smith, R.; Devlin, K. A Multi-Center Study on the Reproducibility of Drug-Response Assays in Mammalian Cell Lines. Cell Syst 2019, 9 (1), 35– 48, DOI: 10.1016/j.cels.2019.06.005There is no corresponding record for this reference.
- 24Clothier, R. H. The FRAME Cytotoxicity Test (Kenacid Blue). In In Vitro Toxicity Testing Protocols; Humana Press: Totowa, NJ, 1995; pp 109– 118. DOI: 10.1385/0-89603-282-5:109 .There is no corresponding record for this reference.
- 25Langdon, S. R.; Mulgrew, J.; Paolini, G. V.; Van Hoorn, W. P. Predicting Cytotoxicity from Heterogeneous Data Sources with Bayesian Learning. J. Cheminform 2010, 2 (1), 11, DOI: 10.1186/1758-2946-2-11There is no corresponding record for this reference.
- 26Banerjee, P.; Kemmler, E.; Dunkel, M.; Preissner, R. ProTox 3.0: A Webserver for the Prediction of Toxicity of Chemicals. Nucleic Acids Res. 2024, 52 (W1), W513– W520, DOI: 10.1093/nar/gkae303There is no corresponding record for this reference.
- 27Yin, Z.; Ai, H.; Zhang, L.; Ren, G.; Wang, Y.; Zhao, Q.; Liu, H. Predicting the Cytotoxicity of Chemicals Using Ensemble Learning Methods and Molecular Fingerprints. J. Appl. Toxicol 2019, 39 (10), 1366– 1377, DOI: 10.1002/jat.3785There is no corresponding record for this reference.
- 28Liu, Q.; He, D.; Fan, M.; Wang, J.; Cui, Z.; Wang, H.; Mi, Y.; Li, N.; Meng, Q.; Hou, Y. Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. J. Chem. Inf Model 2024, DOI: 10.1021/acs.jcim.4c00366There is no corresponding record for this reference.
- 29Sun, H.; Wang, Y.; Cheff, D. M.; Hall, M. D.; Shen, M. Predictive Models for Estimating Cytotoxicity on the Basis of Chemical Structures. Bioorg. Med. Chem. 2020, 28 (10), 115422, DOI: 10.1016/j.bmc.2020.11542229Predictive models for estimating cytotoxicity on the basis of chemical structuresSun, Hongmao; Wang, Yuhong; Cheff, Dorian M.; Hall, Matthew D.; Shen, MinBioorganic & Medicinal Chemistry (2020), 28 (10), 115422CODEN: BMECEP; ISSN:0968-0896. (Elsevier B.V.)Cytotoxicity is a crit. property in detg. the fate of a small mol. in the drug discovery pipeline. Cytotoxic compds. are identified and triaged in both target-based and cell-based phenotypic approaches due to their off-target toxicity or on-target and on-mechanism toxicity for oncol. and neurodegenerative targets. It is crit. that chem.-induced cytotoxicity be reliably predicted before drug candidates advance to the late stage of development, or more ideally, before compds. are synthesized. In this study, we assessed the cell-based cytotoxicity of nearly 10,000 compds. in NCATS annotated libraries against four 'normal' cell lines (HEK 293, NIH 3T3, CRL-7250 and HaCat) using CellTiter-Glo (CTG) technol. and constructed highly predictive models to est. cytotoxicity from chem. structures. There are 5,241 non-redundant compds. having unambiguous activities in the four different cell lines, among which 11.8% compds. exhibited cytotoxicity in two or more cell lines and are thus labeled cytotoxic. The support vector classification (SVC) models trained with 80% randomly selected mols. achieved the area under the receiver operating characteristic curve (AUC-ROC) of 0.88 on av. for the remaining 20% compds. in the test sets in 10 repeating expts. Application of under-sampling rebalancing method further improved the averaged AUC-ROC to 0.90. Anal. of structural features shared by cytotoxic compds. may offer medicinal chemists heuristic design ideas to eliminate undesirable cytotoxicity. The profiling of cytotoxicity of drug-like mols. with annotated primary mechanism of action (MOA) will inform on the roles played by different targets or pathways in cellular viability. The predictive models for cytotoxicity (accessible at https://tripod.nih.gov/web_adme/cytotox.html) provide the scientific community a fast yet reliable way to prioritize mols. with little or no cytotoxicity for downstream development.
- 30Webel, H. E.; Kimber, T. B.; Radetzki, S.; Neuenschwander, M.; Nazaré, M.; Volkamer, A. Revealing Cytotoxic Substructures in Molecules Using Deep Learning. J. Comput. Aided Mol. Des 2020, 34 (7), 731– 746, DOI: 10.1007/s10822-020-00310-430Revealing cytotoxic substructures in molecules using deep learningWebel, Henry E.; Kimber, Talia B.; Radetzki, Silke; Neuenschwander, Martin; Nazare, Marc; Volkamer, AndreaJournal of Computer-Aided Molecular Design (2020), 34 (7), 731-746CODEN: JCADEQ; ISSN:0920-654X. (Springer)Abstr.: In drug development, late stage toxicity issues of a compd. are the main cause of failure in clin. trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Tech. advances and the ever growing amt. of available toxicity data enabled machine learning, esp. neural networks, to impact the field of predictive toxicol. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent inhouse data set of over 34,000 compds. with a share of less than 5% of cytotoxic mols. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decompn. method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compd. as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compds.
Supporting Information
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.4c01811.
Full data sets for the training and test for 3T3 and HEK 293 cell lines (XLSX)
Supplementary methods, results and figures including model’s hyperparameters, clustering and chemical space analysis, applicability domain threshold definition and Explainable AI heatmaps (PDF)
Data sets of compounds clustered by structural similarity (XLSX)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.