Machine Learning ADME Models in Practice: Four Guidelines from a Successful Lead Optimization Case StudyClick to copy article linkArticle link copied!
- Alexander S. Rich*Alexander S. Rich*E-mail: [email protected]Inductive Bio, Inc., 550 Vanderbilt Ave, #730, Brooklyn, New York 11238, United StatesMore by Alexander S. Rich
- Yvonne H. ChanYvonne H. ChanNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Yvonne H. Chan
- Benjamin BirnbaumBenjamin BirnbaumInductive Bio, Inc., 550 Vanderbilt Ave, #730, Brooklyn, New York 11238, United StatesMore by Benjamin Birnbaum
- Kamran HaiderKamran HaiderNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Kamran Haider
- Joshua HaimsonJoshua HaimsonInductive Bio, Inc., 550 Vanderbilt Ave, #730, Brooklyn, New York 11238, United StatesMore by Joshua Haimson
- Michael HaleMichael HaleNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Michael Hale
- Yongxin HanYongxin HanNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Yongxin Han
- William HickmanWilliam HickmanInductive Bio, Inc., 550 Vanderbilt Ave, #730, Brooklyn, New York 11238, United StatesMore by William Hickman
- Klaus P. HoeflichKlaus P. HoeflichNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Klaus P. Hoeflich
- Daniel OrtwineDaniel OrtwineNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Daniel Ortwine
- Ayşegül ÖzenAyşegül ÖzenNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by Ayşegül Özen
- David B. BelangerDavid B. BelangerNested Therapeutics, 1030 Mass Ave, Suite 410, Cambridge, Massachusetts 02138, United StatesMore by David B. Belanger
Abstract
Optimization of the ADME properties and pharmacokinetic (PK) profile of compounds is one of the critical activities in any medicinal chemistry campaign to discover a future clinical candidate. Finding ways to expedite the process to address ADME/PK shortcomings and reduce the number of compounds to synthesize is highly valuable. This article provides practical guidelines and a case study on the use of ML ADME models to guide compound design in small molecule lead optimization. These guidelines highlight that ML models cannot have an impact in a vacuum: they help advance a program when they have the trust of users, are tuned to the needs of the program, and are integrated into decision-making processes in a way that complements and augments the expertise of chemists.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Special Issue
Published as part of ACS Medicinal Chemistry Letters virtual special issue “Exploring the Use of AI/ML Technologies in Medicinal Chemistry and Drug Discovery”.
Guideline 1: Regular Time-Based and Series-Level Evaluation Gives a Realistic Picture of Model Performance and Builds Trust to Use ML Models As a Tool in the Design Process
Guideline 2: Training on a Combination of “Global” Curated Data and “Local” Program Data Leads to the Best Model Performance
Figure 1
Figure 1. Performance of models trained on program data only (local only), nonprogram data only (global only), and on combined data (fine-tuned global), on temporally split test sets for HLM, RLM, MDCK AB, and MDCK ER. Error bars represent 68% bootstrapped confidence intervals. MAE (mean absolute error) units are in log 10 (mL/min/kg) for HLM and RLM, log 10 (μcm/s) for MDCK AB, and log 10 (ratio) for ER.
Guideline 3: Frequent Model Retraining Enables ML Models to Learn Local SAR As a Program Shifts into New Chemical Space and Encounters Activity Cliffs
Guideline 4: To Maximize Impact on the Design Process, ML Models Should Be Interactive, Interpretable, and Integrated with Other Tools
compound # | target engagement assay (nM) | HLM T1/2 (min) | RLM T1/2 (min) | dog LM T1/2 (min) | MDCK Papp (ER) | projected human dose |
---|---|---|---|---|---|---|
1 | 752 | 83 | 37 | 2 | 13.8 (0.8) | |
2 | 100 | 82 | 44 | 22 | 3.6 (2.6) | |
3 | 263 | 82 | 32 | 13 | 4.7 (2.2) | |
4 | 137 | 65 | 65 | 57 | 8.1 (0.9) | 4× higher than desired |
5 | 124 | 83 | 72 | 60 | 7.4 (0.8) | desired |
Figure 2
Figure 2. A screenshot of the interactive design environment with ML ADME predictions, comparison to a reference compound, and highlights of sites of likely metabolism.
Conclusion
References
This article references 18 other publications.
- 1Ortwine, D. F.; Aliagas, I. Physicochemical and DMPK in silico models: facilitating their use by medicinal chemists. Mol. Pharmaceutics 2013, 10 (4), 1153– 1161, DOI: 10.1021/mp3006193Google Scholar1Physicochemical and DMPK In Silico Models: Facilitating Their Use by Medicinal ChemistsOrtwine, Daniel F.; Aliagas, IgnacioMolecular Pharmaceutics (2013), 10 (4), 1153-1161CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)A review. It is known that the developability of drugs is related to their physicochem. and DMPK properties. Given the time and expense involved in discovering and developing new drugs, maximizing the chance of success by calcg. properties ahead of chem. synthesis and testing, and only acting on those candidates whose properties fall into a desired range, would seem to make sense. This paper provides an overview of calculable physicochem. and DMPK properties, an assessment of their relative difficulty of their calcn. and accuracy, and available software. Methods companies have employed to communicate results will be discussed, including the use of composite scoring functions and ranking schemes. Calcns. do no good if chemists will not use them to prioritize synthesis decisions. Strategies are presented for facilitating model usage. An approach adopted at Genentech for presenting results that involves the close coupling of property calcns. with 3D structure based drug design is described.
- 2Goller, A. H.; Kuhnke, L.; Montanari, F.; Bonin, A.; Schneckener, S.; ter Laak, A.; Wichard, J.; Lobell, M.; Hillisch, A. Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discovery Today 2020, 25 (9), 1702– 1709, DOI: 10.1016/j.drudis.2020.07.001Google Scholar2Bayer's in silico ADMET platform: a journey of machine learning over the past two decadesGoller Andreas H; Bonin Anne; Lobell Mario; Kuhnke Lara; Ter Laak Antonius; Montanari Floriane; Schneckener Sebastian; Wichard Jorg; Hillisch AlexanderDrug discovery today (2020), 25 (9), 1702-1709 ISSN:.Over the past two decades, an in silico absorption, distribution, metabolism, and excretion (ADMET) platform has been created at Bayer Pharma with the goal to generate models for a variety of pharmacokinetic and physicochemical endpoints in early drug discovery. These tools are accessible to all scientists within the company and can be a useful in assisting with the selection and design of novel leads, as well as the process of lead optimization. Here. we discuss the development of machine-learning (ML) approaches with special emphasis on data, descriptors, and algorithms. We show that high company internal data quality and tailored descriptors, as well as a thorough understanding of the experimental endpoints, are essential to the utility of our models. We discuss the recent impact of deep neural networks and show selected application examples.
- 3Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: a benchmark for molecular machine learning. Chemical science 2018, 9 (2), 513– 530, DOI: 10.1039/C7SC02664AGoogle Scholar3MoleculeNet: a benchmark for molecular machine learningWu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl; Pande, VijayChemical Science (2018), 9 (2), 513-530CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Mol. machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about mol. properties. However, algorithmic progress has been limited due to the lack of a std. benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for mol. machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed mol. featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for mol. machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mech. and biophys. datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
- 4Sheridan, R. P. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 2013, 53 (4), 783– 790, DOI: 10.1021/ci400084kGoogle Scholar4Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.Sheridan, Robert P.Journal of Chemical Information and Modeling (2013), 53 (4), 783-790CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Cross-validation is a common method to validate a QSAR model. In cross-validation, some compds. are held out as a test set, while the remaining compds. form a training set. A model is built from the training set, and the test set compds. are predicted on that model. The agreement of the predicted and obsd. activity values of the test set (measured by, say, R2) is an est. of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This est. of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compds. in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addn. to random selection as a std. for cross-validation in QSAR model building.
- 5Fluetsch, A.; Di Lascio, E.; Gerebtzoff, G.; Rodríguez-Pérez, R. Adapting Deep Learning QSPR Models to Specific Drug Discovery Projects. Molecular Pharmaceutics 2024, 21, 1817, DOI: 10.1021/acs.molpharmaceut.3c01124Google ScholarThere is no corresponding record for this reference.
- 6Muratov, E. N.; Bajorath, J.; Sheridan, R. P.; Tetko, I. V.; Filimonov, D.; Poroikov, V.; Oprea, T. I.; Baskin, I. I.; Varnek, A.; Roitberg, A.; Isayev, O.; Curtalolo, S.; Fourches, D.; Cohen, Y.; Aspuru-Guzik, A.; Winkler, D. A.; Agrafiotis, D.; Cherkasov, A.; Tropsha, A. QSAR without borders. Chem. Soc. Rev. 2020, 49 (11), 3525– 3564, DOI: 10.1039/D0CS00098AGoogle Scholar6QSAR without bordersMuratov, Eugene N.; Bajorath, Jurgen; Sheridan, Robert P.; Tetko, Igor V.; Filimonov, Dmitry; Poroikov, Vladimir; Oprea, Tudor I.; Baskin, Igor I.; Varnek, Alexandre; Roitberg, Adrian; Isayev, Olexandr; Curtalolo, Stefano; Fourches, Denis; Cohen, Yoram; Aspuru-Guzik, Alan; Winkler, David A.; Agrafiotis, Dimitris; Cherkasov, Artem; Tropsha, AlexanderChemical Society Reviews (2020), 49 (11), 3525-3564CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)A review. Prediction of chem. bioactivity and phys. properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chem. sciences. This field of research, broadly known as quant. structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in phys. org. and medicinal chem. in the past 55+ years. This Perspective summarizes recent technol. advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnol., materials science, biomaterials, and clin. informatics. As modern research methods generate rapidly increasing amts. of data, the knowledge of robust data-driven modeling methods professed within the QSAR field can become essential for scientists working both within and outside of chem. research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
- 7Xiong, G.; Wu, Z.; Yi, J.; Fu, L.; Yang, Z.; Hsieh, C.; Yin, M.; Zeng, X.; Wu, C.; Lu, A.; Chen, X.; Hou, T.; Cao, D. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic acids research 2021, 49 (W1), W5– W14, DOI: 10.1093/nar/gkab255Google ScholarThere is no corresponding record for this reference.
- 8Kawashima, H.; Watanabe, R.; Esaki, T.; Kuroda, M.; Nagao, C.; Natsume-Kitatani, Y.; Ohashi, R.; Komura, H.; Mizuguchi, K. DruMAP: A novel drug metabolism and pharmacokinetics analysis platform. J. Med. Chem. 2023, 66 (14), 9697– 9709, DOI: 10.1021/acs.jmedchem.3c00481Google ScholarThere is no corresponding record for this reference.
- 9Di Lascio, E.; Gerebtzoff, G.; Rodríguez-Pérez, R. Systematic evaluation of local and global machine learning models for the prediction of ADME properties. Mol. Pharmaceutics 2023, 20 (3), 1758– 1767, DOI: 10.1021/acs.molpharmaceut.2c00962Google Scholar9Systematic evaluation of local and global machine learning models for the prediction of ADME propertiesDi Lascio, Elena; Gerebtzoff, Gregori; Rodriguez-Perez, RaquelMolecular Pharmaceutics (2023), 20 (3), 1758-1767CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metab., and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on mol. structures and corresponding ADME assay data to develop quant. structure-property relationship (QSPR) models. Traditional QSPR models were trained on compd. sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compds., namely, compds. designed for the same drug discovery project or chem. series (local model approach) or with a larger set of diverse compds. (global model approach). Global models are built with all exptl. data available for an assay, combining compd. data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data compn. for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different exptl. assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
- 10Aliagas, I.; Gobbi, A.; Heffron, T.; Lee, M. L.; Ortwine, D. F.; Zak, M.; Khojasteh, S. C. A probabilistic method to report predictions from a human liver microsomes stability QSAR model: a practical tool for drug discovery. Journal of Computer-Aided Molecular Design 2015, 29, 327– 338, DOI: 10.1007/s10822-015-9838-3Google ScholarThere is no corresponding record for this reference.
- 11Wieder, O.; Kohlbacher, S.; Kuenemann, M.; Garon, A.; Ducrot, P.; Seidel, T.; Langer, T. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies 2020, 37, 1– 12, DOI: 10.1016/j.ddtec.2020.11.009Google Scholar11A compact review of molecular property prediction with graph neural networksWieder Oliver; Kohlbacher Stefan; Garon Arthur; Ducrot Pierre; Seidel Thomas; Kuenemann Melaine; Langer ThierryDrug discovery today. Technologies (2020), 37 (), 1-12 ISSN:.As graph neural networks are becoming more and more powerful and useful in the field of drug discovery, many pharmaceutical companies are getting interested in utilizing these methods for their own in-house frameworks. This is especially compelling for tasks such as the prediction of molecular properties which is often one of the most crucial tasks in computer-aided drug discovery workflows. The immense hype surrounding these kinds of algorithms has led to the development of many different types of promising architectures and in this review we try to structure this highly dynamic field of AI-research by collecting and classifying 80 GNNs that have been used to predict more than 20 molecular properties using 48 different datasets.
- 12Karmaker, S. K.; Hassan, M. M.; Smith, M. J.; Xu, L.; Zhai, C.; Veeramachaneni, K. AutoML to Date and Beyond: Challenges and Opportunities. ACM Computing Surveys (CSUR) 2022, 54 (8), 1– 36, DOI: 10.1145/3470918Google ScholarThere is no corresponding record for this reference.
- 13Yu, J.; Wang, D.; Zheng, M. Uncertainty quantification: Can we trust artificial intelligence in drug discovery?. iScience 2022, 25 (8), 104814, DOI: 10.1016/j.isci.2022.104814Google ScholarThere is no corresponding record for this reference.
- 14Sheridan, R. P.; Culberson, J. C.; Joshi, E.; Tudor, M.; Karnachi, P. Prediction accuracy of production ADMET models as a function of version: activity cliffs rule. J. Chem. Inf. Model. 2022, 62 (14), 3275– 3280, DOI: 10.1021/acs.jcim.2c00699Google Scholar14Prediction Accuracy of Production ADMET Models as a Function of Version: Activity Cliffs RuleSheridan, Robert P.; Culberson, J. Chris; Joshi, Elizabeth; Tudor, Matthew; Karnachi, PrabhaJournal of Chemical Information and Modeling (2022), 62 (14), 3275-3280CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)As with many other institutions, our company maintains many quant. structure-activity relationship (QSAR) models of absorption, distribution, metab., excretion, and toxicity (ADMET) end points and updates the models regularly. We recently examd. version-to-version predictivity for these models over a period of 10 years. In this approach we monitor the goodness of prediction of new mols. relative to the training set of model version V before they are incorporated in the updated model V+1. Using a cell-based permeability assay (Papp) as an example, we illustrate how the QSAR models made from this data are generally predictive and can be utilized to enrich chem. designs and synthesis. Despite the obvious utility of these models, we turned up unexpected behavior in Papp and other ADMET activities for which the explanation is not obvious. One such behavior is that the apparent predictivity of the models as measured by root-mean-square-error can vary greatly from version to version and is sometimes very poor. One intuitively appealing explanation is that the obsd. activities of the new mols. fall outside the bulk of activities in the training set. Alternatively, one may think that the new mols. are exploring different regions of chem. space than the training set. However, the real explanation has to do with activity cliffs. If the obsd. activities of the new mols. are different than expected based on similar mols. in the training set, the predictions will be less accurate. This is true for all our ADMET end points.
- 15Aleksić, S.; Seeliger, D.; Brown, J. B. ADMET Predictability at Boehringer Ingelheim: State of the Art, and Do Bigger Datasets or Algorithms Make a Difference?. Molecular Informatics 2022, 41 (2), 2100113, DOI: 10.1002/minf.202100113Google ScholarThere is no corresponding record for this reference.
- 16Fang, C.; Wang, Y.; Grater, R.; Kapadnis, S.; Black, C.; Trapa, P.; Sciabola, S. Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective. J. Chem. Inf. Model. 2023, 63 (11), 3263– 3274, DOI: 10.1021/acs.jcim.3c00160Google ScholarThere is no corresponding record for this reference.
- 17Sheridan, R. P. Stability of prediction in production ADMET models as a function of version: why and when predictions change. J. Chem. Inf. Model. 2022, 62 (15), 3477– 3485, DOI: 10.1021/acs.jcim.2c00803Google Scholar17Stability of Prediction in Production ADMET Models as a Function of Version: Why and When Predictions ChangeSheridan, Robert P.Journal of Chemical Information and Modeling (2022), 62 (15), 3477-3485CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)As with other pharma companies, we maintain prodn. QSAR models of ADMET end points and update them regularly. Here, for six ADMET end points, we examine the predictions of test set mols. on multiple versions of random forest models spanning a period of 10 years. For any given end point, the predictions for the majority of mols. are similar for all model versions. However, for a small minority of mols., the prediction shifts substantially over the span of a few versions. For most mols. that shift, the prediction becomes more accurate at later times. This Perspective investigates metrics that can help indicate which mols. will shift substantially in prediction and when the shift will occur.
- 18Rydberg, P.; Gloriam, D. E.; Zaretzki, J.; Breneman, C.; Olsen, L. SMARTCyp: a 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS medicinal chemistry letters 2010, 1 (3), 96– 100, DOI: 10.1021/ml100016xGoogle ScholarThere is no corresponding record for this reference.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 1 publications.
- Srijit Seal, Manas Mahale, Miguel García-Ortegón, Chaitanya K. Joshi, Layla Hosseini-Gerami, Alex Beatson, Matthew Greenig, Mrinal Shekhar, Arijit Patra, Caroline Weis, Arash Mehrjou, Adrien Badré, Brianna Paisley, Rhiannon Lowe, Shantanu Singh, Falgun Shah, Bjarki Johannesson, Dominic Williams, David Rouquie, Djork-Arné Clevert, Patrick Schwab, Nicola Richmond, Christos A. Nicolaou, Raymond J. Gonzalez, Russell Naven, Carolin Schramm, Lewis R Vidler, Kamel Mansouri, W. Patrick Walters, Deidre Dalmas Wilk, Ola Spjuth, Anne E. Carpenter, Andreas Bender. Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World. Chemical Research in Toxicology 2025, 38
(5)
, 759-807. https://doi.org/10.1021/acs.chemrestox.5c00033
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Figure 1
Figure 1. Performance of models trained on program data only (local only), nonprogram data only (global only), and on combined data (fine-tuned global), on temporally split test sets for HLM, RLM, MDCK AB, and MDCK ER. Error bars represent 68% bootstrapped confidence intervals. MAE (mean absolute error) units are in log 10 (mL/min/kg) for HLM and RLM, log 10 (μcm/s) for MDCK AB, and log 10 (ratio) for ER.
Figure 2
Figure 2. A screenshot of the interactive design environment with ML ADME predictions, comparison to a reference compound, and highlights of sites of likely metabolism.
References
This article references 18 other publications.
- 1Ortwine, D. F.; Aliagas, I. Physicochemical and DMPK in silico models: facilitating their use by medicinal chemists. Mol. Pharmaceutics 2013, 10 (4), 1153– 1161, DOI: 10.1021/mp30061931Physicochemical and DMPK In Silico Models: Facilitating Their Use by Medicinal ChemistsOrtwine, Daniel F.; Aliagas, IgnacioMolecular Pharmaceutics (2013), 10 (4), 1153-1161CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)A review. It is known that the developability of drugs is related to their physicochem. and DMPK properties. Given the time and expense involved in discovering and developing new drugs, maximizing the chance of success by calcg. properties ahead of chem. synthesis and testing, and only acting on those candidates whose properties fall into a desired range, would seem to make sense. This paper provides an overview of calculable physicochem. and DMPK properties, an assessment of their relative difficulty of their calcn. and accuracy, and available software. Methods companies have employed to communicate results will be discussed, including the use of composite scoring functions and ranking schemes. Calcns. do no good if chemists will not use them to prioritize synthesis decisions. Strategies are presented for facilitating model usage. An approach adopted at Genentech for presenting results that involves the close coupling of property calcns. with 3D structure based drug design is described.
- 2Goller, A. H.; Kuhnke, L.; Montanari, F.; Bonin, A.; Schneckener, S.; ter Laak, A.; Wichard, J.; Lobell, M.; Hillisch, A. Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discovery Today 2020, 25 (9), 1702– 1709, DOI: 10.1016/j.drudis.2020.07.0012Bayer's in silico ADMET platform: a journey of machine learning over the past two decadesGoller Andreas H; Bonin Anne; Lobell Mario; Kuhnke Lara; Ter Laak Antonius; Montanari Floriane; Schneckener Sebastian; Wichard Jorg; Hillisch AlexanderDrug discovery today (2020), 25 (9), 1702-1709 ISSN:.Over the past two decades, an in silico absorption, distribution, metabolism, and excretion (ADMET) platform has been created at Bayer Pharma with the goal to generate models for a variety of pharmacokinetic and physicochemical endpoints in early drug discovery. These tools are accessible to all scientists within the company and can be a useful in assisting with the selection and design of novel leads, as well as the process of lead optimization. Here. we discuss the development of machine-learning (ML) approaches with special emphasis on data, descriptors, and algorithms. We show that high company internal data quality and tailored descriptors, as well as a thorough understanding of the experimental endpoints, are essential to the utility of our models. We discuss the recent impact of deep neural networks and show selected application examples.
- 3Wu, Z.; Ramsundar, B.; Feinberg, E. N.; Gomes, J.; Geniesse, C.; Pappu, A. S.; Leswing, K.; Pande, V. MoleculeNet: a benchmark for molecular machine learning. Chemical science 2018, 9 (2), 513– 530, DOI: 10.1039/C7SC02664A3MoleculeNet: a benchmark for molecular machine learningWu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl; Pande, VijayChemical Science (2018), 9 (2), 513-530CODEN: CSHCCN; ISSN:2041-6520. (Royal Society of Chemistry)Mol. machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about mol. properties. However, algorithmic progress has been limited due to the lack of a std. benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for mol. machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed mol. featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for mol. machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mech. and biophys. datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
- 4Sheridan, R. P. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 2013, 53 (4), 783– 790, DOI: 10.1021/ci400084k4Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.Sheridan, Robert P.Journal of Chemical Information and Modeling (2013), 53 (4), 783-790CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Cross-validation is a common method to validate a QSAR model. In cross-validation, some compds. are held out as a test set, while the remaining compds. form a training set. A model is built from the training set, and the test set compds. are predicted on that model. The agreement of the predicted and obsd. activity values of the test set (measured by, say, R2) is an est. of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This est. of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compds. in the test set are selected. Here, we show that time-split selection gives an R2 that is more like that of true prospective prediction than the R2 from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addn. to random selection as a std. for cross-validation in QSAR model building.
- 5Fluetsch, A.; Di Lascio, E.; Gerebtzoff, G.; Rodríguez-Pérez, R. Adapting Deep Learning QSPR Models to Specific Drug Discovery Projects. Molecular Pharmaceutics 2024, 21, 1817, DOI: 10.1021/acs.molpharmaceut.3c01124There is no corresponding record for this reference.
- 6Muratov, E. N.; Bajorath, J.; Sheridan, R. P.; Tetko, I. V.; Filimonov, D.; Poroikov, V.; Oprea, T. I.; Baskin, I. I.; Varnek, A.; Roitberg, A.; Isayev, O.; Curtalolo, S.; Fourches, D.; Cohen, Y.; Aspuru-Guzik, A.; Winkler, D. A.; Agrafiotis, D.; Cherkasov, A.; Tropsha, A. QSAR without borders. Chem. Soc. Rev. 2020, 49 (11), 3525– 3564, DOI: 10.1039/D0CS00098A6QSAR without bordersMuratov, Eugene N.; Bajorath, Jurgen; Sheridan, Robert P.; Tetko, Igor V.; Filimonov, Dmitry; Poroikov, Vladimir; Oprea, Tudor I.; Baskin, Igor I.; Varnek, Alexandre; Roitberg, Adrian; Isayev, Olexandr; Curtalolo, Stefano; Fourches, Denis; Cohen, Yoram; Aspuru-Guzik, Alan; Winkler, David A.; Agrafiotis, Dimitris; Cherkasov, Artem; Tropsha, AlexanderChemical Society Reviews (2020), 49 (11), 3525-3564CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)A review. Prediction of chem. bioactivity and phys. properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chem. sciences. This field of research, broadly known as quant. structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in phys. org. and medicinal chem. in the past 55+ years. This Perspective summarizes recent technol. advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnol., materials science, biomaterials, and clin. informatics. As modern research methods generate rapidly increasing amts. of data, the knowledge of robust data-driven modeling methods professed within the QSAR field can become essential for scientists working both within and outside of chem. research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.
- 7Xiong, G.; Wu, Z.; Yi, J.; Fu, L.; Yang, Z.; Hsieh, C.; Yin, M.; Zeng, X.; Wu, C.; Lu, A.; Chen, X.; Hou, T.; Cao, D. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic acids research 2021, 49 (W1), W5– W14, DOI: 10.1093/nar/gkab255There is no corresponding record for this reference.
- 8Kawashima, H.; Watanabe, R.; Esaki, T.; Kuroda, M.; Nagao, C.; Natsume-Kitatani, Y.; Ohashi, R.; Komura, H.; Mizuguchi, K. DruMAP: A novel drug metabolism and pharmacokinetics analysis platform. J. Med. Chem. 2023, 66 (14), 9697– 9709, DOI: 10.1021/acs.jmedchem.3c00481There is no corresponding record for this reference.
- 9Di Lascio, E.; Gerebtzoff, G.; Rodríguez-Pérez, R. Systematic evaluation of local and global machine learning models for the prediction of ADME properties. Mol. Pharmaceutics 2023, 20 (3), 1758– 1767, DOI: 10.1021/acs.molpharmaceut.2c009629Systematic evaluation of local and global machine learning models for the prediction of ADME propertiesDi Lascio, Elena; Gerebtzoff, Gregori; Rodriguez-Perez, RaquelMolecular Pharmaceutics (2023), 20 (3), 1758-1767CODEN: MPOHBP; ISSN:1543-8384. (American Chemical Society)Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metab., and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on mol. structures and corresponding ADME assay data to develop quant. structure-property relationship (QSPR) models. Traditional QSPR models were trained on compd. sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compds., namely, compds. designed for the same drug discovery project or chem. series (local model approach) or with a larger set of diverse compds. (global model approach). Global models are built with all exptl. data available for an assay, combining compd. data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data compn. for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different exptl. assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.
- 10Aliagas, I.; Gobbi, A.; Heffron, T.; Lee, M. L.; Ortwine, D. F.; Zak, M.; Khojasteh, S. C. A probabilistic method to report predictions from a human liver microsomes stability QSAR model: a practical tool for drug discovery. Journal of Computer-Aided Molecular Design 2015, 29, 327– 338, DOI: 10.1007/s10822-015-9838-3There is no corresponding record for this reference.
- 11Wieder, O.; Kohlbacher, S.; Kuenemann, M.; Garon, A.; Ducrot, P.; Seidel, T.; Langer, T. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies 2020, 37, 1– 12, DOI: 10.1016/j.ddtec.2020.11.00911A compact review of molecular property prediction with graph neural networksWieder Oliver; Kohlbacher Stefan; Garon Arthur; Ducrot Pierre; Seidel Thomas; Kuenemann Melaine; Langer ThierryDrug discovery today. Technologies (2020), 37 (), 1-12 ISSN:.As graph neural networks are becoming more and more powerful and useful in the field of drug discovery, many pharmaceutical companies are getting interested in utilizing these methods for their own in-house frameworks. This is especially compelling for tasks such as the prediction of molecular properties which is often one of the most crucial tasks in computer-aided drug discovery workflows. The immense hype surrounding these kinds of algorithms has led to the development of many different types of promising architectures and in this review we try to structure this highly dynamic field of AI-research by collecting and classifying 80 GNNs that have been used to predict more than 20 molecular properties using 48 different datasets.
- 12Karmaker, S. K.; Hassan, M. M.; Smith, M. J.; Xu, L.; Zhai, C.; Veeramachaneni, K. AutoML to Date and Beyond: Challenges and Opportunities. ACM Computing Surveys (CSUR) 2022, 54 (8), 1– 36, DOI: 10.1145/3470918There is no corresponding record for this reference.
- 13Yu, J.; Wang, D.; Zheng, M. Uncertainty quantification: Can we trust artificial intelligence in drug discovery?. iScience 2022, 25 (8), 104814, DOI: 10.1016/j.isci.2022.104814There is no corresponding record for this reference.
- 14Sheridan, R. P.; Culberson, J. C.; Joshi, E.; Tudor, M.; Karnachi, P. Prediction accuracy of production ADMET models as a function of version: activity cliffs rule. J. Chem. Inf. Model. 2022, 62 (14), 3275– 3280, DOI: 10.1021/acs.jcim.2c0069914Prediction Accuracy of Production ADMET Models as a Function of Version: Activity Cliffs RuleSheridan, Robert P.; Culberson, J. Chris; Joshi, Elizabeth; Tudor, Matthew; Karnachi, PrabhaJournal of Chemical Information and Modeling (2022), 62 (14), 3275-3280CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)As with many other institutions, our company maintains many quant. structure-activity relationship (QSAR) models of absorption, distribution, metab., excretion, and toxicity (ADMET) end points and updates the models regularly. We recently examd. version-to-version predictivity for these models over a period of 10 years. In this approach we monitor the goodness of prediction of new mols. relative to the training set of model version V before they are incorporated in the updated model V+1. Using a cell-based permeability assay (Papp) as an example, we illustrate how the QSAR models made from this data are generally predictive and can be utilized to enrich chem. designs and synthesis. Despite the obvious utility of these models, we turned up unexpected behavior in Papp and other ADMET activities for which the explanation is not obvious. One such behavior is that the apparent predictivity of the models as measured by root-mean-square-error can vary greatly from version to version and is sometimes very poor. One intuitively appealing explanation is that the obsd. activities of the new mols. fall outside the bulk of activities in the training set. Alternatively, one may think that the new mols. are exploring different regions of chem. space than the training set. However, the real explanation has to do with activity cliffs. If the obsd. activities of the new mols. are different than expected based on similar mols. in the training set, the predictions will be less accurate. This is true for all our ADMET end points.
- 15Aleksić, S.; Seeliger, D.; Brown, J. B. ADMET Predictability at Boehringer Ingelheim: State of the Art, and Do Bigger Datasets or Algorithms Make a Difference?. Molecular Informatics 2022, 41 (2), 2100113, DOI: 10.1002/minf.202100113There is no corresponding record for this reference.
- 16Fang, C.; Wang, Y.; Grater, R.; Kapadnis, S.; Black, C.; Trapa, P.; Sciabola, S. Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective. J. Chem. Inf. Model. 2023, 63 (11), 3263– 3274, DOI: 10.1021/acs.jcim.3c00160There is no corresponding record for this reference.
- 17Sheridan, R. P. Stability of prediction in production ADMET models as a function of version: why and when predictions change. J. Chem. Inf. Model. 2022, 62 (15), 3477– 3485, DOI: 10.1021/acs.jcim.2c0080317Stability of Prediction in Production ADMET Models as a Function of Version: Why and When Predictions ChangeSheridan, Robert P.Journal of Chemical Information and Modeling (2022), 62 (15), 3477-3485CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)As with other pharma companies, we maintain prodn. QSAR models of ADMET end points and update them regularly. Here, for six ADMET end points, we examine the predictions of test set mols. on multiple versions of random forest models spanning a period of 10 years. For any given end point, the predictions for the majority of mols. are similar for all model versions. However, for a small minority of mols., the prediction shifts substantially over the span of a few versions. For most mols. that shift, the prediction becomes more accurate at later times. This Perspective investigates metrics that can help indicate which mols. will shift substantially in prediction and when the shift will occur.
- 18Rydberg, P.; Gloriam, D. E.; Zaretzki, J.; Breneman, C.; Olsen, L. SMARTCyp: a 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS medicinal chemistry letters 2010, 1 (3), 96– 100, DOI: 10.1021/ml100016xThere is no corresponding record for this reference.