PlayMolecule Glimpse: Understanding Protein–Ligand Property Predictions with Interpretable Neural NetworksClick to copy article linkArticle link copied!
- Alejandro Varela-RialAlejandro Varela-RialComputational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, SpainAcellera Labs, Doctor Trueta 183, 08005 Barcelona, SpainMore by Alejandro Varela-Rial
- Iain Maryanow
- Maciej MajewskiMaciej MajewskiComputational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, SpainMore by Maciej Majewski
- Stefan Doerr
- Nikolai SchapinNikolai SchapinComputational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, SpainAcellera Labs, Doctor Trueta 183, 08005 Barcelona, SpainMore by Nikolai Schapin
- José Jiménez-LunaJosé Jiménez-LunaComputational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, SpainMore by José Jiménez-Luna
- Gianni De Fabritiis*Gianni De Fabritiis*Email: [email protected]Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003 Barcelona, SpainAcellera Labs, Doctor Trueta 183, 08005 Barcelona, SpainInstitució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, SpainMore by Gianni De Fabritiis
Abstract
Deep learning has been successfully applied to structure-based protein–ligand affinity prediction, yet the black box nature of these models raises some questions. In a previous study, we presented KDEEP, a convolutional neural network that predicted the binding affinity of a given protein–ligand complex while reaching state-of-the-art performance. However, it was unclear what this model was learning. In this work, we present a new application to visualize the contribution of each input atom to the prediction made by the convolutional neural network, aiding in the interpretability of such predictions. The results suggest that KDEEP is able to learn meaningful chemistry signals from the data, but it has also exposed the inaccuracies of the current model, serving as a guideline for further optimization of our prediction tools.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
Introduction
Methods
Model Training
Implementation
Integrated Gradients
Graphical User Interface
Usage
Analysis
Results
Clash Detector
Docking Pose Classifier
KDEEP
Quantitative Analysis
Conclusion
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c00691.
“Model training” and “quantitative analysis”, additional information for these sections; Figures S1–S6, distance distribution between the two voxels with highest, absolute attribution value for different channel combinations studied; Figures S7 and S8, examples of protein residues far from ligand having high attribution values; Figure S9, correlation between magnitude of attributions of two best voxels and distance between them; and Figures S10–S13, attribution consistency distributions (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
The authors thank Acellera Ltd. for funding. G.D.F. acknowledges support from PID2020-116564GB-I00/MICIN/AEI/10.13039/501100011033 Ministerio de Ciencia e Innovación. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 823712 (CompBioMed2) and from the Industrial Doctorates Plan of the Secretariat of Universities and Research of the Department of Economy and Knowledge of the Generalitat of Catalonia.
References
This article references 34 other publications.
- 1Dudek, A. Z.; Arodz, T.; Gálvez, J. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb. Chem. High Throughput Screening 2006, 9, 213– 228, DOI: 10.2174/138620706776055539Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XisVSqtbo%253D&md5=230cc86a6228a8f9a97c666e925a5efeComputational methods in developing quantitative structure-activity relationships (QSAR): a reviewDudek, Arkadiusz Z.; Arodz, Tomasz; Galvez, JorgeCombinatorial Chemistry & High Throughput Screening (2006), 9 (3), 213-228CODEN: CCHSFU; ISSN:1386-2073. (Bentham Science Publishers Ltd.)A review. Virtual filtering and screening of combinatorial libraries have recently gained attention as methods complementing the high-throughput screening and combinatorial chem. These chemoinformatic techniques rely heavily on quant. structure-activity relation (QSAR) anal., a field with established methodol. and successful history. In this review, we discuss the computational methods for building QSAR models. We start with outlining their usefulness in high-throughput screening and identifying the general scheme of a QSAR model. Following, we focus on the methodologies in constructing three main components of QSAR model, namely the methods for describing the mol. structure of compds., for selection of informative descriptors and for activity prediction. We present both the well-established methods as well as techniques recently introduced into the QSAR domain.
- 2Cherkasov, A.; Muratov, E. N.; Fourches, D.; Varnek, A.; Baskin, I. I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y. C.; Todeschini, R. QSAR modeling: where have you been? Where are you going to?. J. Med. Chem. 2014, 57, 4977– 5010, DOI: 10.1021/jm4004285Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhvFCrsLnE&md5=a2d371ee33fbecf48ef59ecf914e4aa8QSAR Modeling: Where Have You Been? Where Are You Going To?Cherkasov, Artem; Muratov, Eugene N.; Fourches, Denis; Varnek, Alexandre; Baskin, Igor I.; Cronin, Mark; Dearden, John; Gramatica, Paola; Martin, Yvonne C.; Todeschini, Roberto; Consonni, Viviana; Kuzmin, Victor E.; Cramer, Richard; Benigni, Romualdo; Yang, Chihae; Rathman, James; Terfloth, Lothar; Gasteiger, Johann; Richard, Ann; Tropsha, AlexanderJournal of Medicinal Chemistry (2014), 57 (12), 4977-5010CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A review. Quant. structure-activity relationship modeling is one of the major computational tools employed in medicinal chem. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and exptl. chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific stds. to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
- 3Lo, Y.-C.; Rensi, S. E.; Torng, W.; Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug Discovery Today 2018, 23, 1538– 1546, DOI: 10.1016/j.drudis.2018.05.010Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXptFCnuro%253D&md5=d0bef55f60c8ee7bb2a31e42d4a85fcdMachine learning in chemoinformatics and drug discoveryLo, Yu-Chen; Rensi, Stefano E.; Torng, Wen; Altman, Russ B.Drug Discovery Today (2018), 23 (8), 1538-1546CODEN: DDTOFS; ISSN:1359-6446. (Elsevier Ltd.)Chemoinformatics is an established discipline focusing on extg., processing and extrapolating meaningful data from chem. structures. With the rapid explosion of chem. 'big' data from HTS and combinatorial synthesis, machine learning has become an indispensable tool for drug designers to mine chem. information from large compd. databases to design drugs with important biol. properties. To process the chem. data, we first reviewed multiple processing layers in the chemoinformatics pipeline followed by the introduction of commonly used machine learning models in drug discovery and QSAR anal. Here, we present basic principles and recent case studies to demonstrate the utility of machine learning techniques in chemoinformatics analyses; and we discuss limitations and future directions to guide further development in this evolving field.
- 4Neves, B. J.; Braga, R. C.; Melo-Filho, C. C.; Moreira-Filho, J. T.; Muratov, E. N.; Andrade, C. H. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front. Pharmacol. 2018, 9, 1275, DOI: 10.3389/fphar.2018.01275Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVGmtbnP&md5=1aa305bf00e34d3ac592d19a7e107d7eQSAR-based virtual screening: advances and applications in drug discoveryNeves, Bruno J.; Braga, Rodolpho C.; Melo-Filho, Cleber C.; Moreira-Filho, Jose Teofilo; Muratov, Eugene N.; Andrade, Carolina HortaFrontiers in Pharmacology (2018), 9 (), 1275CODEN: FPRHAU; ISSN:1663-9812. (Frontiers Media S.A.)A review. Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small mols. for new hits with desired properties that can then be tested exptl. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the no. of candidates to be tested exptl., and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and laborsaving. Among the VS approaches, quant. structure-activity relationship (QSAR) anal. is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chem. descriptors are calcd. on different levels of representation of mol. structure, ranging from 1D to nD, and then correlated with the biol. property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biol. property of novel compds. Although the exptl. testing of computational hits is not an inherent part of QSAR methodol., it is highly desired and should be performed as an ultimate validation of developed models. In this minireview, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compds. with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.
- 5Zhang, L.; Zhang, H.; Ai, H.; Hu, H.; Li, S.; Zhao, J.; Liu, H. Applications of Machine Learning Methods in Drug Toxicity Prediction. Curr. Top. Med. Chem. (Sharjah, United Arab Emirates) 2018, 18, 987– 997, DOI: 10.2174/1568026618666180727152557Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhvVejt7fO&md5=e98c7844a12ead052509399b9cf9c6b4Applications of Machine Learning Methods in Drug Toxicity PredictionZhang, Li; Zhang, Hui; Ai, Haixin; Hu, Huan; Li, Shimeng; Zhao, Jian; Liu, HongshengCurrent Topics in Medicinal Chemistry (Sharjah, United Arab Emirates) (2018), 18 (12), 987-997CODEN: CTMCCL; ISSN:1568-0266. (Bentham Science Publishers Ltd.)Toxicity evaluation is an important part of the preclin. safety assessment of new drugs, which is directly related to human health and the fate of drugs. It is of importance to study how to evaluate drug toxicity accurately and economically. The traditional in vitro and in vivo toxicity tests are laborious, time-consuming, highly expensive, and even involve animal welfare issues. Computational methods developed for drug toxicity prediction can compensate for the shortcomings of traditional methods and have been considered useful in the early stages of drug development. Numerous drug toxicity prediction models have been developed using a variety of computational methods. With the advance of the theory of machine learning and mol. representation, more and more drug toxicity prediction models are developed using a variety of machine learning methods, such as support vector machine, random forest, naive Bayesian, back propagation neural network. And significant advances have been made in many toxicity endpoints, such as carcinogenicity, mutagenicity, and hepatotoxicity. In this review, we aimed to provide a comprehensive overview of the machine learning based drug toxicity prediction studies conducted in recent years. In addn., we compared the performance of the models proposed in these studies in terms of accuracy, sensitivity, and specificity, providing a view of the current state-of-the-art in this field and highlighting the issues in the current studies.
- 6Ma, H.; An, W.; Wang, Y.; Sun, H.; Huang, R.; Huang, J. Deep Graph Learning with Property Augmentation for Predicting Drug-Induced Liver Injury. Chem. Res. Toxicol. 2021, 34, 495, DOI: 10.1021/acs.chemrestox.0c00322Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXis1ehsrrI&md5=1eb843cf35938405c37027987f1daea4Deep Graph Learning with Property Augmentation for Predicting Drug-Induced Liver InjuryMa, Hehuan; An, Weizhi; Wang, Yuhong; Sun, Hongmao; Huang, Ruili; Huang, JunzhouChemical Research in Toxicology (2021), 34 (2), 495-506CODEN: CRTOEC; ISSN:0893-228X. (American Chemical Society)Drug-induced liver injury (DILI) is a crucial factor in detg. the qualification of potential drugs. However, the DILI property is excessively difficult to obtain due to the complex testing process. Consequently, an in silico screening in the early stage of drug discovery would help to reduce the total development cost by filtering those drug candidates with a high risk to cause DILI. To serve the screening goal, we apply several computational techniques to predict the DILI property, including traditional machine learning methods and graph-based deep learning techniques. While deep learning models require large training data to tune huge model parameters, the DILI data set only contains a few hundred annotated mols. To alleviate the data scarcity problem, we propose a property augmentation strategy to include massive training data with other property information. Extensive expts. demonstrate that our proposed method significantly outperforms all existing baselines on the DILI data set by obtaining a 81.4% accuracy using cross-validation with random splitting, 78.7% using leave-one-out cross-validation, and 76.5% using cross-validation with scaffold splitting.
- 7Montanari, F.; Kuhnke, L.; Ter Laak, A.; Clevert, D.-A. Modeling physico-chemical ADMET endpoints with multitask graph convolutional networks. Molecules 2020, 25, 44, DOI: 10.3390/molecules25010044Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXjsFWqtLk%253D&md5=62240c8f1b151bab7a95d51e20b6c6a5Modeling physico-chemical ADMET endpoints with multitask graph convolutional networksMontanari, Floriane; Kuhnke, Lara; Ter Laak, Antonius; Clevert, Djork-ArneMolecules (2020), 25 (1), 44CODEN: MOLEFW; ISSN:1420-3049. (MDPI AG)Simple physico-chem. properties, like logD, soly., or m.p., can reveal a great deal about how a compd. under development might later behave. These data are typically measured for most compds. in drug discovery projects in a medium throughput fashion. Collecting and assembling all the Bayer inhouse data related to these properties allowed us to apply powerful machine learning techniques to predict the outcome of those assays for new compds. In this paper, we report our finding that, esp. for predicting physicochem. ADMET endpoints, a multitask graph convolutional approach appears a highly competitive choice. For seven endpoints of interest, we compared the performance of that approach to fully connected neural networks and different single task models. The new model shows increased predictive performance compared to previous modeling methods and will allow early prioritization of compds. even before they are synthesized. In addn., our model follows the generalized soly. equation without being explicitly trained under this constraint.
- 8Peng, Y.; Lin, Y.; Jing, X.-Y.; Zhang, H.; Huang, Y.; Luo, G. S. Enhanced Graph Isomorphism Network for Molecular ADMET Properties Prediction. IEEE Access 2020, 8, 168344– 168360, DOI: 10.1109/ACCESS.2020.3022850Google ScholarThere is no corresponding record for this reference.
- 9Feinberg, E. N.; Joshi, E.; Pande, V. S.; Cheng, A. C. Improvement in ADMET prediction with multitask deep featurization. J. Med. Chem. 2020, 63, 8835– 8848, DOI: 10.1021/acs.jmedchem.9b02187Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXntFCgsLs%253D&md5=845b33af28bfa7b817187c9e423a21ebImprovement in ADMET Prediction with Multitask Deep FeaturizationFeinberg, Evan N.; Joshi, Elizabeth; Pande, Vijay S.; Cheng, Alan C.Journal of Medicinal Chemistry (2020), 63 (16), 8835-8848CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)The absorption, distribution, metab., elimination, and toxicity (ADMET) properties of drug candidates are important for their efficacy and safety as therapeutics. Predicting ADMET properties has therefore been of great interest to the computational chem. and medicinal chem. communities in recent decades. Traditional cheminformatics approaches, using learners such as random forests and deep neural networks, leverage fingerprint feature representations of mols. Here, we learn the features most relevant to each chem. task at hand by representing each mol. explicitly as a graph. By applying graph convolutions to this explicit mol. representation, we achieve, to our knowledge, unprecedented accuracy in prediction of ADMET properties. By challenging our methodol. with rigorous cross-validation procedures and prognostic analyses, we show that deep featurization better enables mol. predictors to not only interpolate but also extrapolate to new regions of chem. space.
- 10Skalic, M.; Varela-Rial, A.; Jiménez, J.; Martínez-Rosell, G.; De Fabritiis, G. LigVoxel: inpainting binding pockets using 3D-convolutional neural networks. Bioinformatics 2019, 35, 243– 250, DOI: 10.1093/bioinformatics/bty583Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitVWgs73N&md5=e8bf4189689b73707a78dc94d58fa7afLigVoxel: inpainting binding pockets using 3D-convolutional neural networksSkalic, Miha; Varela-Rial, Alejandro; Jimenez, Jose; Martinez-Rosell, Gerard; Fabritiis, Gianni DeBioinformatics (2019), 35 (2), 243-250CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Structure-based drug discovery methods exploit protein structural information to design small mols. binding to given protein pockets. This work proposes a purely data driven, structure-based approach for imaging ligands as spatial fields in target protein pockets. We use an end-to-end deep learning framework trained on exptl. protein-ligand complexes with the intention of mimicking a chemist's intuition at manually placing atoms when designing a new compd. We show that these models can generate spatial images of ligand chem. properties like occupancy, aromaticity and donor-acceptor matching the protein pocket. Results: The predicted fields considerably overlap with those of unseen ligands bound to the target pocket. Maximization of the overlap between the predicted fields and a given ligand on the Astex diverse set recovers the original ligand crystal poses in 70 out of 85 cases within a threshold of 2 Å RMSD. We expect that these models can be used for guiding structure-based drug discovery approaches.
- 11Jiménez, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287– 296, DOI: 10.1021/acs.jcim.7b00650Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkslSitQ%253D%253D&md5=81943a6732be99e5439e1e3a25fa4414KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural NetworksJimenez, Jose; Skalic, Miha; Martinez-Rosell, Gerard; De Fabritiis, GianniJournal of Chemical Information and Modeling (2018), 58 (2), 287-296CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Accurately predicting protein-ligand binding affinities is an important problem in computational chem. since it can substantially accelerate drug discovery for virtual screening and lead optimization. We propose here a fast machine-learning approach for predicting binding affinities using state-of-the-art 3D-convolutional neural networks and compare this approach to other machine-learning and scoring methods using several diverse data sets. The results for the std. PDBbind (v.2016) core test-set are state-of-the-art with a Pearson's correlation coeff. of 0.82 and a RMSE of 1.27 in pK units between exptl. and predicted affinity, but accuracy is still very sensitive to the specific protein used. KDEEP is made available via PlayMol.org for users to test easily their own protein-ligand complexes, with each prediction taking a fraction of a second. We believe that the speed, performance, and ease of use of KDEEP makes it already an attractive scoring function for modern computational chem. pipelines.
- 12Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942– 957, DOI: 10.1021/acs.jcim.6b00740Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXlsVems7Y%253D&md5=9cae97167da0c93e1d896f85339cca7eProtein-Ligand Scoring with Convolutional Neural NetworksRagoza, Matthew; Hochuli, Joshua; Idrobo, Elisa; Sunseri, Jocelyn; Koes, David RyanJournal of Chemical Information and Modeling (2017), 57 (4), 942-957CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Computational approaches to drug discovery can reduce the time and cost assocd. with exptl. assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amt. of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive three-dimensional (3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.
- 13Skalic, M.; Martínez-Rosell, G.; Jiménez, J.; De Fabritiis, G. PlayMolecule BindScope: large scale CNN-based virtual screening on the web. Bioinformatics 2019, 35, 1237– 1238, DOI: 10.1093/bioinformatics/bty758Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitFWksb7K&md5=100cce3db2446853ed2f7c82b96d49dfPlayMolecule BindScope: large scale CNN-based virtual screening on the webSkalic, Miha; Martinez-Rosell, Gerard; Jimenez, Jose; De Fabritiis, GianniBioinformatics (2019), 35 (7), 1237-1238CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Summary: Virtual screening pipelines are one of the most popular used tools in structure-based drug discovery, since they can can reduce both time and cost assocd. with exptl. assays. Recent advances in deep learning methodologies have shown that these outperform classical scoring functions at discriminating binder protein-ligand complexes. Here, we present BindScope, a web application for large-scale active-inactive classification of compds. based on deep convolutional neural networks. Performance is on a pair with current state-of-the-art pipelines. Users can screen on the order of hundreds of compds. at once and interactively visualize the results.
- 14Yang, S.; Lee, K. H.; Ryu, S. A comprehensive study on the prediction reliability of graph neural networks for virtual screening. 2020, arXiv:2003.07611, arXiv preprint. https://arxiv.org/abs/2003.07611 (accessed 2021-10-29).Google ScholarThere is no corresponding record for this reference.
- 15Sakai, M.; Nagayasu, K.; Shibui, N.; Andoh, C.; Takayama, K.; Shirakawa, H.; Kaneko, S. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci. Rep. 2021, 11, 525, DOI: 10.1038/s41598-020-80113-7Google Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsVemsLo%253D&md5=dfd62a89f9f705f4e740544f40ce705cPrediction of pharmacological activities from chemical structures with graph convolutional neural networksSakai, Miyuki; Nagayasu, Kazuki; Shibui, Norihiro; Andoh, Chihiro; Takayama, Kaito; Shirakawa, Hisashi; Kaneko, ShujiScientific Reports (2021), 11 (1), 525CODEN: SRCEC3; ISSN:2045-2322. (Nature Research)Many therapeutic drugs are compds. that can be represented by simple chem. structures, which contain important determinants of affinity at the site of action. Recently, graph convolutional neural network (GCN) models have exhibited excellent results in classifying the activity of such compds. For models that make quant. predictions of activity, more complex information has been utilized, such as the three-dimensional structures of compds. and the amino acid sequences of their resp. target proteins. As another approach, we hypothesized that if sufficient exptl. data were available and there were enough nodes in hidden layers, a simple compd. representation would quant. predict activity with satisfactory accuracy. In this study, we report that GCN models constructed solely from the two-dimensional structural information of compds. demonstrated a high degree of activity predictability against 127 diverse targets from the ChEMBL database. Using the information entropy as a metric, we also show that the structural diversity had less effect on the prediction performance. Finally, we report that virtual screening using the constructed model identified a new serotonin transporter inhibitor with activity comparable to that of a marketed drug in vitro and exhibited antidepressant effects in behavioral studies.
- 16Sieg, J.; Flachsenberg, F.; Rarey, M. Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening. J. Chem. Inf. Model. 2019, 59, 947– 961, DOI: 10.1021/acs.jcim.8b00712Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXktVOgsrY%253D&md5=6cb48f2acd541c10c8565f202a6eb9e8In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual ScreeningSieg, Jochen; Flachsenberg, Florian; Rarey, MatthiasJournal of Chemical Information and Modeling (2019), 59 (3), 947-961CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, the authors reevaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, the authors demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from std. benchmarks. On the basis of these results, the authors conclude that there is a need for eligible validation expts. and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, the authors provide guidelines for setting up validation expts. and give a perspective on how new data sets could be generated.
- 17DeGrave, A. J.; Janizek, J. D.; Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021, 3, 610, DOI: 10.1038/s42256-021-00338-7Google ScholarThere is no corresponding record for this reference.
- 18Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; Zhao, S. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discovery 2019, 18, 463– 477, DOI: 10.1038/s41573-019-0024-5Google Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosF2rtrY%253D&md5=211782aeea3d8b9f50368f89177a70d2Applications of machine learning in drug discovery and developmentVamathevan, Jessica; Clark, Dominic; Czodrowski, Paul; Dunham, Ian; Ferran, Edgardo; Lee, George; Li, Bin; Madabhushi, Anant; Shah, Parantu; Spitzer, Michaela; Zhao, ShanrongNature Reviews Drug Discovery (2019), 18 (6), 463-477CODEN: NRDDAG; ISSN:1474-1776. (Nature Research)A review. Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and anal. of digital pathol. data in clin. trials. Applications have ranged in context and methodol., with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
- 19Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. 2017, arXiv preprint. https://arxiv.org/abs/1703.01365 (accessed 2021-10-29).Google ScholarThere is no corresponding record for this reference.
- 20Henderson, R.; Clevert, D.-A.; Montanari, F. Improving Molecular Graph Neural Network Explainability with Orthonormalization and Induced Sparsity. Proceedings of the 38 th International Conference on Machine Learning ; 2021.Google ScholarThere is no corresponding record for this reference.
- 21Kokhlikyan, N.; Miglani, V.; Martin, M.; Wang, E.; Alsallakh, B.; Reynolds, J.; Melnikov, A.; Kliushkina, N.; Araya, C.; Yan, S.; Reblitz-Richardson, O. Captum: A unified and generic model interpretability library for PyTorch. 2020, arXiv preprint. https://arxiv.org/abs/2009.07896 (accessed 2021-06-17).Google ScholarThere is no corresponding record for this reference.
- 22Klaise, J.; Van Looveren, A.; Vacanti, G.; Coca, A. Alibi: Algorithms for monitoring and explaining machine learning models ; 2019. https://github.com/SeldonIO/alibi (accessed 2021-09-08).Google ScholarThere is no corresponding record for this reference.
- 23Hochuli, J.; Helbling, A.; Skaist, T.; Ragoza, M.; Koes, D. R. Visualizing convolutional neural network protein-ligand scoring. J. Mol. Graphics Modell. 2018, 84, 96– 108, DOI: 10.1016/j.jmgm.2018.06.005Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhtF2ltb3P&md5=db697d859132f02c4a2cef4bfc00ea88Visualizing convolutional neural network protein-ligand scoringHochuli, Joshua; Helbling, Alec; Skaist, Tamar; Ragoza, Matthew; Koes, David RyanJournal of Molecular Graphics & Modelling (2018), 84 (), 96-108CODEN: JMGMFI; ISSN:1093-3263. (Elsevier Ltd.)Protein-ligand scoring is an important step in a structure-based drug design pipeline. Selecting a correct binding pose and predicting the binding affinity of a protein-ligand complex enables effective virtual screening. Machine learning techniques can make use of the increasing amts. of structural data that are becoming publicly available. Convolutional neural network (CNN) scoring functions in particular have shown promise in pose selection and affinity prediction for protein-ligand complexes. Neural networks are known for being difficult to interpret. Understanding the decisions of a particular network can help tune parameters and training data to maximize performance. Visualization of neural networks helps decomp. complex scoring functions into pictures that are more easily parsed by humans. Here we present three methods for visualizing how individual protein-ligand complexes are interpreted by 3D convolutional neural networks. We also present a visualization of the convolutional filters and their wts. We describe how the intuition provided by these visualizations aids in network design.
- 24Jiménez, J.; Doerr, S.; Martínez-Rosell, G.; Rose, A. S.; De Fabritiis, G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 2017, 33, 3036– 3042, DOI: 10.1093/bioinformatics/btx350Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhvFGju7nN&md5=fce2763a18192d7a7c4c98fcb2974a22DeepSite: protein-binding site predictor using 3D-convolutional neural networksJimenez, J.; Doerr, S.; Martinez-Rosell, G.; Rose, A. S.; De Fabritiis, G.Bioinformatics (2017), 33 (19), 3036-3042CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: An important step in structure-based drug design consists in the prediction of druggable binding sites. Several algorithms for detecting binding cavities, those likely to bind to a small drug compd., have been developed over the years by clever exploitation of geometric, chem. and evolutionary features of the protein. Results: Here we present a novel knowledge-based approach that uses state-of-the-art convolutional neural networks, where the algorithm is learned by examples. In total, 7622 proteins from the scPDB database of binding sites have been evaluated using both a distance and a volumetric overlap approach. Our machine-learning based method demonstrates superior performance to two other competitive algorithmic strategies. Availability and implementation: DeepSite is freely available at www.playmol.org. Users can submit either a PDB ID or PDB file for pocket detection to our NVIDIA GPU-equipped servers through a WebGL graphical interface.
- 25Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2015, 31, 405– 412, DOI: 10.1093/bioinformatics/btu626Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GjsrnI&md5=cb0debb7251bf8cd34385fb4b09e19bbPDB-wide collection of binding data: current status of the PDBbind databaseLiu, Zhihai; Li, Yan; Han, Li; Li, Jie; Liu, Jie; Zhao, Zhixiong; Nie, Wei; Liu, Yuchen; Wang, RenxiaoBioinformatics (2015), 31 (3), 405-412CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation:Mol. recognition between biol. macromols. and org. small mols. plays an important role in various life processes. Both structural information and binding data of biomol. complexes are indispensable for depicting the underlying mechanism in such an event. The PDBbind database was created to collect exptl. measured binding data for the biomol. complexes throughout the Protein Data Bank (PDB). It thus provides the linkage between structural information and energetic properties of biomol. complexes, which is esp. desirable for computational studies or statistical analyses. Results: Since its first public release in 2004, the PDBbind database has been updated on an annual basis. The latest release (version 2013) provides exptl. binding affinity data for 10 776 biomol. complexes in PDB, including 8302 protein-ligand complexes and 2474 other types of complexes. In this article, we will describe the current methods used for compiling PDBbind and the updated status of this database. We will also review some typical applications of PDBbind published in the scientific literature.
- 26Humphrey, W.; Dalke, A.; Schulten, K. VMD – Visual Molecular Dynamics. J. Mol. Graphics 1996, 14, 33– 38, DOI: 10.1016/0263-7855(96)00018-5Google Scholar26https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK28Xis12nsrg%253D&md5=1e3094ec3151fb85c5ff05f8505c78d5VDM: visual molecular dynamicsHumphrey, William; Dalke, Andrew; Schulten, KlausJournal of Molecular Graphics (1996), 14 (1), 33-8, plates, 27-28CODEN: JMGRDV; ISSN:0263-7855. (Elsevier)VMD is a mol. graphics program designed for the display and anal. of mol. assemblies, in particular, biopolymers such as proteins and nucleic acids. VMD can simultaneously display any no. of structures using a wide variety of rendering styles and coloring methods. Mols. are displayed as one or more "representations," in which each representation embodies a particular rendering method and coloring scheme for a selected subset of atoms. The atoms displayed in each representation are chosen using an extensive atom selection syntax, which includes Boolean operators and regular expressions. VMD provides a complete graphical user interface for program control, as well as a text interface using the Tcl embeddable parser to allow for complex scripts with variable substitution, control loops, and function calls. Full session logging is supported, which produces a VMD command script for later playback. High-resoln. raster images of displayed mols. may be produced by generating input scripts for use by a no. of photorealistic image-rendering applications. VMD has also been expressly designed with the ability to animate mol. dynamics (MD) simulation trajectories, imported either from files or from a direct connection to a running MD simulation. VMD is the visualization component of MDScope, a set of tools for interactive problem solving in structural biol., which also includes the parallel MD program NAMD, and the MDCOMM software used to connect the visualization and simulation programs, VMD is written in C++, using an object-oriented design; the program, including source code and extensive documentation, is freely available via anonymous ftp and through the World Wide Web.
- 27Martínez-Rosell, G.; Giorgino, T.; De Fabritiis, G. PlayMolecule ProteinPrepare: a web application for protein preparation for molecular dynamics simulations. J. Chem. Inf. Model. 2017, 57, 1511– 1516, DOI: 10.1021/acs.jcim.7b00190Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXps1Ghu7c%253D&md5=1a1adc5f0af564e9e9473cc5ebb16ab4PlayMolecule ProteinPrepare: A Web Application for Protein Preparation for Molecular Dynamics SimulationsMartinez-Rosell, Gerard; Giorgino, Toni; De Fabritiis, GianniJournal of Chemical Information and Modeling (2017), 57 (7), 1511-1516CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Protein prepn. is a crit. step in mol. simulations that consists of refining a Protein Data Bank (PDB) structure by assigning titrn. states and optimizing the hydrogen-bonding network. In this application note, the authors describe ProteinPrepare, a web application designed to interactively support the prepn. of protein structures. Users can upload a PDB file, choose the solvent pH value, and inspect the resulting protonated residues and hydrogen-bonding network within a 3D web interface. Protonation states are suggested automatically but can be manually changed using the visual aid of the hydrogen-bonding network. Tables and diagrams provide estd. pKa values and charge states, with visual indication for cases where review is required. The authors expect the graphical interface to be a useful instrument to assess the validity of the prepn., but nevertheless, a script to execute the prepn. offline with the High-Throughput Mol. Dynamics (HTMD) environment is also provided for noninteractive operations.
- 28Barta, T. E.; Veal, J. M.; Rice, J. W.; Partridge, J. M.; Fadden, R. P.; Ma, W.; Jenks, M.; Geng, L.; Hanson, G. J.; Huang, K. H. Discovery of benzamide tetrahydro-4H-carbazol-4-ones as novel small molecule inhibitors of Hsp90. Bioorg. Med. Chem. Lett. 2008, 18, 3517– 3521, DOI: 10.1016/j.bmcl.2008.05.023Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXmvFSns74%253D&md5=f965e9a2cfd0d44a57bb1949cd87de63Discovery of benzamide tetrahydro-4H-carbazol-4-ones as novel small molecule inhibitors of Hsp90Barta, Thomas E.; Veal, James M.; Rice, John W.; Partridge, Jeffrey M.; Fadden, R. Patrick; Ma, Wei; Jenks, Matthew; Geng, Lifeng; Hanson, Gunnar J.; Huang, Kenneth H.; Barabasz, Amy F.; Foley, Briana E.; Otto, James; Hall, Steven E.Bioorganic & Medicinal Chemistry Letters (2008), 18 (12), 3517-3521CODEN: BMCLE8; ISSN:0960-894X. (Elsevier Ltd.)Hsp90 maintains the conformational stability of multiple proteins implicated in oncogenesis and has emerged as a target for chemotherapy. We report here the discovery of a novel small mol. scaffold that inhibits Hsp90. X-ray data show that the scaffold binds competitively at the ATP site on Hsp90. Cellular proliferation and client assays demonstrate that members of the series are able to inhibit Hsp90 at nanomolar concns.
- 29Erlanson, D. A. In Fragment-Based Drug Discovery and X-Ray Crystallography; Davies, T. G., Hyvönen, M., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2012; pp 1– 32, DOI: 10.1007/128_2011_180 .Google ScholarThere is no corresponding record for this reference.
- 30Ruiz-Carmona, S.; Schmidtke, P.; Luque, F. J.; Baker, L.; Matassova, N.; Davis, B.; Roughley, S.; Murray, J.; Hubbard, R.; Barril, X. Dynamic undocking and the quasi-bound state as tools for drug discovery. Nat. Chem. 2017, 9, 201, DOI: 10.1038/nchem.2660Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhvVGiur3I&md5=181235443427435bc9f3e8c17f8df9eeDynamic undocking and the quasi-bound state as tools for drug discoveryRuiz-Carmona, Sergio; Schmidtke, Peter; Luque, F. Javier; Baker, Lisa; Matassova, Natalia; Davis, Ben; Roughley, Stephen; Murray, James; Hubbard, Rod; Barril, XavierNature Chemistry (2017), 9 (3), 201-206CODEN: NCAHBB; ISSN:1755-4330. (Nature Publishing Group)There is a pressing need for new technologies that improve the efficacy and efficiency of drug discovery. Structure-based methods have contributed towards this goal but they focus on predicting the binding affinity of protein-ligand complexes, which is notoriously difficult. The authors adopt an alternative approach that evaluates structural, rather than thermodn., stability. As bioactive mols. present a static binding mode, the authors devised dynamic undocking (DUck), a fast computational method to calc. the work necessary to reach a quasi-bound state at which the ligand has just broken the most important native contact with the receptor. This nonequil. property is surprisingly effective in virtual screening because true ligands form more-resilient interactions than decoys. Notably, DUck is orthogonal to docking and other 'thermodn.' methods. The authors demonstrate the potential of the docking-undocking combination in a fragment screening against the mol. chaperone and oncol. target Hsp90, for which the authors obtain novel chemotypes and a hit rate that approaches 40%.
- 31Hoxie, R. S.; Street, T. O. Hsp90 chaperones have an energetic hot-spot for binding inhibitors. Protein Sci. 2020, 29, 2101– 2111, DOI: 10.1002/pro.3933Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhslKhurfK&md5=1ab67c65a6e11c2cde6fd481ddef6f66Hsp90 chaperones have an energetic hot-spot for binding inhibitorsHoxie, Reyal S.; Street, Timothy O.Protein Science (2020), 29 (10), 2101-2111CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)Although Hsp90-family chaperones have been extensively targeted with ATP-competitive inhibitors, it is unknown whether high affinity is achieved from a few highly stabilizing contacts or from many weaker contacts within the ATP-binding pocket. A large-scale anal. of Hsp90α:inhibitor structures shows that inhibitor hydrogen-bonding to a conserved aspartate (D93 in Hsp90α) stands out as most universal among Hsp90 inhibitors. Here we show that the D93 region makes a dominant energetic contribution to inhibitor binding for both cytosolic and organelle-specific Hsp90 paralogs. For inhibitors in the resorcinol family, the D93:inhibitor hydrogen-bond is pH-dependent because the assocd. inhibitor hydroxyl group is titratable, rationalizing a linked-protonation event previously obsd. by the Matulis group. The inhibitor hydroxyl group pKa assocd. with the D93 hydrogen-bond is therefore crit. for optimizing the affinity of resorcinol derivs., and we demonstrate that spectrophotometric measurements can det. this pKa value. Quantifying the energetic contribution of the D93 hotspot is best achieved with the mitochondrial Hsp90 paralog, yielding 3-6 kcal/mol of stabilization (35-60% of the total binding energy) for a diverse set of inhibitors. The Hsp90 Asp93→Asn substitution has long been known to abolish nucleotide binding, yet puzzlingly, native sequences of structurally similar ATPases, such as Topoisomerasese II, have an asparagine at this same crucial site. While aspartate and asparagine sidechains can both act as hydrogen bond acceptors, we show that a steric clash prevents the Hsp90 Asp93→Asn sidechain from adopting the necessary rotamer, whereas this steric restriction is absent in Topoisomerasese II.
- 32Majewski, M.; Barril, X. Structural Stability Predicts the Binding Mode of Protein–Ligand Complexes. J. Chem. Inf. Model. 2020, 60, 1644– 1651, DOI: 10.1021/acs.jcim.9b01062Google Scholar32https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXivVaqsLk%253D&md5=e4f79145486f5552e96a9a7906160f83Structural Stability Predicts the Binding Mode of Protein-Ligand ComplexesMajewski, Maciej; Barril, XavierJournal of Chemical Information and Modeling (2020), 60 (3), 1644-1651CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The prediction of a ligand's binding mode into its macromol. target is essential in structure-based drug discovery. Even though tremendous effort has been made to address this problem, most of the developed tools work similarly, trying to predict the binding free energy assocd. with each particular binding mode. In this study, we decided to abandon this criterion, following structural stability instead. This view, implemented in a novel computational workflow, quantifies the steepness of the local energy min. assocd. with each potential binding mode. Surprisingly, the protocol outperforms docking scoring functions in case of fragments (ligands with MW < 300 Da) and is as good as docking for drug-like mols. It also identifies substructures that act as structural anchors, predicting their binding mode with particular accuracy. The results open a new phys. perspective for binding mode prediction, which can be combined with existing thermodn.-based approaches.
- 33Hu, L.; Benson, M. L.; Smith, R. D.; Lerner, M. G.; Carlson, H. A. Binding MOAD (Mother of All Databases). Proteins: Struct., Funct., Genet. 2005, 60, 333– 340, DOI: 10.1002/prot.20512Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmvVyrtr4%253D&md5=146632a2f30fda98cb987e984beb7506Binding MOAD (Mother of All Databases)Hu, Liegi; Benson, Mark L.; Smith, Richard D.; Lerner, Michael G.; Carlson, Heather A.Proteins: Structure, Function, and Bioinformatics (2005), 60 (3), 333-340CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)Binding MOAD (Mother of All Databases) is the largest collection of high-quality, protein-ligand complexes available from the Protein Data Bank. At this time, Binding MOAD contains 5331 protein-ligand complexes comprised of 1780 unique protein families and 2630 unique ligands. We have searched the crystallog. papers for all 5000 + structures and compiled binding data for 1375 (26%) of the protein-ligand complexes. The binding-affinity data ranges 13 orders of magnitude. This is the largest collection of binding data reported to date in the literature. We have also addressed the issue of redundancy in the data. To create a nonredundant dataset, one protein from each of the 1780 protein families was chosen as a representative. Representatives were chosen by tightest binding, best resoln., etc. For the 1780 "best" complexes that comprise the nonredundant version of Binding MOAD, 475 (27%) have binding data. This significant collection of protein-ligand complexes will be very useful in elucidating the biophys. patterns of mol. recognition and enzymic regulation. The complexes with binding-affinity data will help in the development of improved scoring functions and structure-based drug discovery techniques. The dataset can be accessed at http://www.BindingMOAD.org.
- 34Ruiz-Carmona, S.; Alvarez-Garcia, D.; Foloppe, N.; Garmendia-Doval, A. B.; Juhos, S.; Schmidtke, P.; Barril, X.; Hubbard, R. E.; Morley, S. D. rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids. PLoS Comput. Biol. 2014, 10, e1003571 DOI: 10.1371/journal.pcbi.1003571Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhsVGlsL%252FO&md5=e4cb786d6567fdc7f2a46f64955a9992rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acidsRuiz-Carmona, Sergio; Alvarez-Garcia, Daniel; Foloppe, Nicolas; Garmendia-Doval, A. Beatriz; Juhos, Szilveszter; Schmidtke, Peter; Barril, Xavier; Hubbard, Roderick E.; Morley, S. DavidPLoS Computational Biology (2014), 10 (4), e1003571/1-e1003571/7, 7 pp.CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)Identification of chem. compds. with specific biol. activities is an important step in both chem. biol. and drug discovery. When the structure of the intended target is available, one approach is to use mol. docking programs to assess the chem. complementarity of small mols. with the target; such calcns. provide a qual. measure of affinity that can be used in virtual screening (VS) to rank order a list of compds. according to their potential to be active. rDock is a mol. docking program developed at Vernalis for high-throughput VS (HTVS) applications. Evolved from RiboDock, the program can be used against proteins and nucleic acids, is designed to be computationally very efficient and allows the user to incorporate addnl. constraints and information as a bias to guide docking. This article provides an overview of the program structure and features and compares rDock to two ref. programs, AutoDock Vina (open source) and Schrodinger's Glide (com.). In terms of computational speed for VS, rDock is faster than Vina and comparable to Glide. For binding mode prediction, rDock and Vina are superior to Glide. The VS performance of rDock is significantly better than Vina, but inferior to Glide for most systems unless pharmacophore constraints are used; in that case rDock and Glide are of equal performance. The program is released under the Lesser General Public License and is freely available for download, together with the manuals, example files and the complete test sets, at online.
Cited By
This article is cited by 11 publications.
- Yunjiang Zhang, Shuyuan Li, Kong Meng, Shaorui Sun. Machine Learning for Sequence and Structure-Based Protein–Ligand Interaction Prediction. Journal of Chemical Information and Modeling 2024, 64
(5)
, 1456-1472. https://doi.org/10.1021/acs.jcim.3c01841
- Erin C. Day, Supraja S. Chittari, Matthew P. Bogen, Abigail S. Knight. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS Polymers Au 2023, 3
(6)
, 406-427. https://doi.org/10.1021/acspolymersau.3c00025
- Vivek Kumar, Pawan Gupta, Vishnu Sharma, Anju Dhiman. Multi-target-based screening of phytochemicals found in aerial parts of Heliotropium indicum L. for identification of potential anti-urolithiatic agents using simulation methods. 2024https://doi.org/10.21203/rs.3.rs-5144782/v1
- Pawan Gupta, Umesh Panwar, Sanjeev Singh. Novel scaffolds identification against Mpro of SARS-CoV-2 using shape based screening and molecular simulation methods. Chemical Physics Impact 2024, 8 , 100496. https://doi.org/10.1016/j.chphi.2024.100496
- Chengcheng Zhang, Xiaoxue Zhao, Feng Li, Jingru Qin, Lu Yang, Qianqian Yin, Yiyi Liu, Zhiyao Zhu, Fei Zhang, Zhongqi Wang, Haibin Liang. Integrating single‐cell and multi‐omic approaches reveals Euphorbiae Humifusae Herba‐dependent mitochondrial dysfunction in non‐small‐cell lung cancer. Journal of Cellular and Molecular Medicine 2024, 28
(10)
https://doi.org/10.1111/jcmm.18317
- Ignacio Ponzoni, Juan Antonio Páez Prosper, Nuria E. Campillo. Explainable artificial intelligence: A taxonomy and guidelines for its application to drug discovery. WIREs Computational Molecular Science 2023, 13
(6)
https://doi.org/10.1002/wcms.1681
- Akey Krishna Swaroop, P.K. Krishnan Namboori, M. Esakkimuthukumar, T.K. Praveen, Palathoti Nagarjuna, Sunil Kumar Patnaik, Jubie Selvaraj. Leveraging decagonal in-silico strategies for uncovering IL-6 inhibitors with precision. Computers in Biology and Medicine 2023, 163 , 107231. https://doi.org/10.1016/j.compbiomed.2023.107231
- Peter G. Bolhuis, Z. Faidon Brotzakis, Bettina G. Keller. Optimizing molecular potential models by imposing kinetic constraints with path reweighting. The Journal of Chemical Physics 2023, 159
(7)
https://doi.org/10.1063/5.0151166
- Vivek Kumar, Pawan Gupta, Vishnu Sharma, Anjana Munshi, Anju Dhiman. Multi-target based virtual screening of phytochemicals from Heliotropium indicum L. leaves for identification of potential anti-urolithiatic agent. 2022https://doi.org/10.21203/rs.3.rs-2387425/v1
- Sandhya Vivekanandan, Umashankar Vetrivel, Luke Elizabeth Hanna. Design of human immunodeficiency virus-1 neutralizing peptides targeting CD4-binding site: An integrative computational biologics approach. Frontiers in Medicine 2022, 9 https://doi.org/10.3389/fmed.2022.1036874
- Gergely Hajgató, Richárd Wéber, Botond Szilágyi, Balázs Tóthpál, Bálint Gyires-Tóth, Csaba Hős. PredMaX: Predictive maintenance with explainable deep convolutional autoencoders. Advanced Engineering Informatics 2022, 54 , 101778. https://doi.org/10.1016/j.aei.2022.101778
- Rocco Meli, Garrett M. Morris, Philip C. Biggin. Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review. Frontiers in Bioinformatics 2022, 2 https://doi.org/10.3389/fbinf.2022.885983
- Jun Zhang. Atom typing using graph representation learning: How do models learn chemistry?. The Journal of Chemical Physics 2022, 156
(20)
https://doi.org/10.1063/5.0095008
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
References
This article references 34 other publications.
- 1Dudek, A. Z.; Arodz, T.; Gálvez, J. Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb. Chem. High Throughput Screening 2006, 9, 213– 228, DOI: 10.2174/1386207067760555391https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XisVSqtbo%253D&md5=230cc86a6228a8f9a97c666e925a5efeComputational methods in developing quantitative structure-activity relationships (QSAR): a reviewDudek, Arkadiusz Z.; Arodz, Tomasz; Galvez, JorgeCombinatorial Chemistry & High Throughput Screening (2006), 9 (3), 213-228CODEN: CCHSFU; ISSN:1386-2073. (Bentham Science Publishers Ltd.)A review. Virtual filtering and screening of combinatorial libraries have recently gained attention as methods complementing the high-throughput screening and combinatorial chem. These chemoinformatic techniques rely heavily on quant. structure-activity relation (QSAR) anal., a field with established methodol. and successful history. In this review, we discuss the computational methods for building QSAR models. We start with outlining their usefulness in high-throughput screening and identifying the general scheme of a QSAR model. Following, we focus on the methodologies in constructing three main components of QSAR model, namely the methods for describing the mol. structure of compds., for selection of informative descriptors and for activity prediction. We present both the well-established methods as well as techniques recently introduced into the QSAR domain.
- 2Cherkasov, A.; Muratov, E. N.; Fourches, D.; Varnek, A.; Baskin, I. I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y. C.; Todeschini, R. QSAR modeling: where have you been? Where are you going to?. J. Med. Chem. 2014, 57, 4977– 5010, DOI: 10.1021/jm40042852https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhvFCrsLnE&md5=a2d371ee33fbecf48ef59ecf914e4aa8QSAR Modeling: Where Have You Been? Where Are You Going To?Cherkasov, Artem; Muratov, Eugene N.; Fourches, Denis; Varnek, Alexandre; Baskin, Igor I.; Cronin, Mark; Dearden, John; Gramatica, Paola; Martin, Yvonne C.; Todeschini, Roberto; Consonni, Viviana; Kuzmin, Victor E.; Cramer, Richard; Benigni, Romualdo; Yang, Chihae; Rathman, James; Terfloth, Lothar; Gasteiger, Johann; Richard, Ann; Tropsha, AlexanderJournal of Medicinal Chemistry (2014), 57 (12), 4977-5010CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)A review. Quant. structure-activity relationship modeling is one of the major computational tools employed in medicinal chem. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and exptl. chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific stds. to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
- 3Lo, Y.-C.; Rensi, S. E.; Torng, W.; Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug Discovery Today 2018, 23, 1538– 1546, DOI: 10.1016/j.drudis.2018.05.0103https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXptFCnuro%253D&md5=d0bef55f60c8ee7bb2a31e42d4a85fcdMachine learning in chemoinformatics and drug discoveryLo, Yu-Chen; Rensi, Stefano E.; Torng, Wen; Altman, Russ B.Drug Discovery Today (2018), 23 (8), 1538-1546CODEN: DDTOFS; ISSN:1359-6446. (Elsevier Ltd.)Chemoinformatics is an established discipline focusing on extg., processing and extrapolating meaningful data from chem. structures. With the rapid explosion of chem. 'big' data from HTS and combinatorial synthesis, machine learning has become an indispensable tool for drug designers to mine chem. information from large compd. databases to design drugs with important biol. properties. To process the chem. data, we first reviewed multiple processing layers in the chemoinformatics pipeline followed by the introduction of commonly used machine learning models in drug discovery and QSAR anal. Here, we present basic principles and recent case studies to demonstrate the utility of machine learning techniques in chemoinformatics analyses; and we discuss limitations and future directions to guide further development in this evolving field.
- 4Neves, B. J.; Braga, R. C.; Melo-Filho, C. C.; Moreira-Filho, J. T.; Muratov, E. N.; Andrade, C. H. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front. Pharmacol. 2018, 9, 1275, DOI: 10.3389/fphar.2018.012754https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVGmtbnP&md5=1aa305bf00e34d3ac592d19a7e107d7eQSAR-based virtual screening: advances and applications in drug discoveryNeves, Bruno J.; Braga, Rodolpho C.; Melo-Filho, Cleber C.; Moreira-Filho, Jose Teofilo; Muratov, Eugene N.; Andrade, Carolina HortaFrontiers in Pharmacology (2018), 9 (), 1275CODEN: FPRHAU; ISSN:1663-9812. (Frontiers Media S.A.)A review. Virtual screening (VS) has emerged in drug discovery as a powerful computational approach to screen large libraries of small mols. for new hits with desired properties that can then be tested exptl. Similar to other computational approaches, VS intention is not to replace in vitro or in vivo assays, but to speed up the discovery process, to reduce the no. of candidates to be tested exptl., and to rationalize their choice. Moreover, VS has become very popular in pharmaceutical companies and academic organizations due to its time-, cost-, resources-, and laborsaving. Among the VS approaches, quant. structure-activity relationship (QSAR) anal. is the most powerful method due to its high and fast throughput and good hit rate. As the first preliminary step of a QSAR model development, relevant chemogenomics data are collected from databases and the literature. Then, chem. descriptors are calcd. on different levels of representation of mol. structure, ranging from 1D to nD, and then correlated with the biol. property using machine learning techniques. Once developed and validated, QSAR models are applied to predict the biol. property of novel compds. Although the exptl. testing of computational hits is not an inherent part of QSAR methodol., it is highly desired and should be performed as an ultimate validation of developed models. In this minireview, we summarize and critically analyze the recent trends of QSAR-based VS in drug discovery and demonstrate successful applications in identifying perspective compds. with desired properties. Moreover, we provide some recommendations about the best practices for QSAR-based VS along with the future perspectives of this approach.
- 5Zhang, L.; Zhang, H.; Ai, H.; Hu, H.; Li, S.; Zhao, J.; Liu, H. Applications of Machine Learning Methods in Drug Toxicity Prediction. Curr. Top. Med. Chem. (Sharjah, United Arab Emirates) 2018, 18, 987– 997, DOI: 10.2174/15680266186661807271525575https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhvVejt7fO&md5=e98c7844a12ead052509399b9cf9c6b4Applications of Machine Learning Methods in Drug Toxicity PredictionZhang, Li; Zhang, Hui; Ai, Haixin; Hu, Huan; Li, Shimeng; Zhao, Jian; Liu, HongshengCurrent Topics in Medicinal Chemistry (Sharjah, United Arab Emirates) (2018), 18 (12), 987-997CODEN: CTMCCL; ISSN:1568-0266. (Bentham Science Publishers Ltd.)Toxicity evaluation is an important part of the preclin. safety assessment of new drugs, which is directly related to human health and the fate of drugs. It is of importance to study how to evaluate drug toxicity accurately and economically. The traditional in vitro and in vivo toxicity tests are laborious, time-consuming, highly expensive, and even involve animal welfare issues. Computational methods developed for drug toxicity prediction can compensate for the shortcomings of traditional methods and have been considered useful in the early stages of drug development. Numerous drug toxicity prediction models have been developed using a variety of computational methods. With the advance of the theory of machine learning and mol. representation, more and more drug toxicity prediction models are developed using a variety of machine learning methods, such as support vector machine, random forest, naive Bayesian, back propagation neural network. And significant advances have been made in many toxicity endpoints, such as carcinogenicity, mutagenicity, and hepatotoxicity. In this review, we aimed to provide a comprehensive overview of the machine learning based drug toxicity prediction studies conducted in recent years. In addn., we compared the performance of the models proposed in these studies in terms of accuracy, sensitivity, and specificity, providing a view of the current state-of-the-art in this field and highlighting the issues in the current studies.
- 6Ma, H.; An, W.; Wang, Y.; Sun, H.; Huang, R.; Huang, J. Deep Graph Learning with Property Augmentation for Predicting Drug-Induced Liver Injury. Chem. Res. Toxicol. 2021, 34, 495, DOI: 10.1021/acs.chemrestox.0c003226https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXis1ehsrrI&md5=1eb843cf35938405c37027987f1daea4Deep Graph Learning with Property Augmentation for Predicting Drug-Induced Liver InjuryMa, Hehuan; An, Weizhi; Wang, Yuhong; Sun, Hongmao; Huang, Ruili; Huang, JunzhouChemical Research in Toxicology (2021), 34 (2), 495-506CODEN: CRTOEC; ISSN:0893-228X. (American Chemical Society)Drug-induced liver injury (DILI) is a crucial factor in detg. the qualification of potential drugs. However, the DILI property is excessively difficult to obtain due to the complex testing process. Consequently, an in silico screening in the early stage of drug discovery would help to reduce the total development cost by filtering those drug candidates with a high risk to cause DILI. To serve the screening goal, we apply several computational techniques to predict the DILI property, including traditional machine learning methods and graph-based deep learning techniques. While deep learning models require large training data to tune huge model parameters, the DILI data set only contains a few hundred annotated mols. To alleviate the data scarcity problem, we propose a property augmentation strategy to include massive training data with other property information. Extensive expts. demonstrate that our proposed method significantly outperforms all existing baselines on the DILI data set by obtaining a 81.4% accuracy using cross-validation with random splitting, 78.7% using leave-one-out cross-validation, and 76.5% using cross-validation with scaffold splitting.
- 7Montanari, F.; Kuhnke, L.; Ter Laak, A.; Clevert, D.-A. Modeling physico-chemical ADMET endpoints with multitask graph convolutional networks. Molecules 2020, 25, 44, DOI: 10.3390/molecules250100447https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXjsFWqtLk%253D&md5=62240c8f1b151bab7a95d51e20b6c6a5Modeling physico-chemical ADMET endpoints with multitask graph convolutional networksMontanari, Floriane; Kuhnke, Lara; Ter Laak, Antonius; Clevert, Djork-ArneMolecules (2020), 25 (1), 44CODEN: MOLEFW; ISSN:1420-3049. (MDPI AG)Simple physico-chem. properties, like logD, soly., or m.p., can reveal a great deal about how a compd. under development might later behave. These data are typically measured for most compds. in drug discovery projects in a medium throughput fashion. Collecting and assembling all the Bayer inhouse data related to these properties allowed us to apply powerful machine learning techniques to predict the outcome of those assays for new compds. In this paper, we report our finding that, esp. for predicting physicochem. ADMET endpoints, a multitask graph convolutional approach appears a highly competitive choice. For seven endpoints of interest, we compared the performance of that approach to fully connected neural networks and different single task models. The new model shows increased predictive performance compared to previous modeling methods and will allow early prioritization of compds. even before they are synthesized. In addn., our model follows the generalized soly. equation without being explicitly trained under this constraint.
- 8Peng, Y.; Lin, Y.; Jing, X.-Y.; Zhang, H.; Huang, Y.; Luo, G. S. Enhanced Graph Isomorphism Network for Molecular ADMET Properties Prediction. IEEE Access 2020, 8, 168344– 168360, DOI: 10.1109/ACCESS.2020.3022850There is no corresponding record for this reference.
- 9Feinberg, E. N.; Joshi, E.; Pande, V. S.; Cheng, A. C. Improvement in ADMET prediction with multitask deep featurization. J. Med. Chem. 2020, 63, 8835– 8848, DOI: 10.1021/acs.jmedchem.9b021879https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXntFCgsLs%253D&md5=845b33af28bfa7b817187c9e423a21ebImprovement in ADMET Prediction with Multitask Deep FeaturizationFeinberg, Evan N.; Joshi, Elizabeth; Pande, Vijay S.; Cheng, Alan C.Journal of Medicinal Chemistry (2020), 63 (16), 8835-8848CODEN: JMCMAR; ISSN:0022-2623. (American Chemical Society)The absorption, distribution, metab., elimination, and toxicity (ADMET) properties of drug candidates are important for their efficacy and safety as therapeutics. Predicting ADMET properties has therefore been of great interest to the computational chem. and medicinal chem. communities in recent decades. Traditional cheminformatics approaches, using learners such as random forests and deep neural networks, leverage fingerprint feature representations of mols. Here, we learn the features most relevant to each chem. task at hand by representing each mol. explicitly as a graph. By applying graph convolutions to this explicit mol. representation, we achieve, to our knowledge, unprecedented accuracy in prediction of ADMET properties. By challenging our methodol. with rigorous cross-validation procedures and prognostic analyses, we show that deep featurization better enables mol. predictors to not only interpolate but also extrapolate to new regions of chem. space.
- 10Skalic, M.; Varela-Rial, A.; Jiménez, J.; Martínez-Rosell, G.; De Fabritiis, G. LigVoxel: inpainting binding pockets using 3D-convolutional neural networks. Bioinformatics 2019, 35, 243– 250, DOI: 10.1093/bioinformatics/bty58310https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitVWgs73N&md5=e8bf4189689b73707a78dc94d58fa7afLigVoxel: inpainting binding pockets using 3D-convolutional neural networksSkalic, Miha; Varela-Rial, Alejandro; Jimenez, Jose; Martinez-Rosell, Gerard; Fabritiis, Gianni DeBioinformatics (2019), 35 (2), 243-250CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Structure-based drug discovery methods exploit protein structural information to design small mols. binding to given protein pockets. This work proposes a purely data driven, structure-based approach for imaging ligands as spatial fields in target protein pockets. We use an end-to-end deep learning framework trained on exptl. protein-ligand complexes with the intention of mimicking a chemist's intuition at manually placing atoms when designing a new compd. We show that these models can generate spatial images of ligand chem. properties like occupancy, aromaticity and donor-acceptor matching the protein pocket. Results: The predicted fields considerably overlap with those of unseen ligands bound to the target pocket. Maximization of the overlap between the predicted fields and a given ligand on the Astex diverse set recovers the original ligand crystal poses in 70 out of 85 cases within a threshold of 2 Å RMSD. We expect that these models can be used for guiding structure-based drug discovery approaches.
- 11Jiménez, J.; Škalič, M.; Martínez-Rosell, G.; De Fabritiis, G. KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287– 296, DOI: 10.1021/acs.jcim.7b0065011https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkslSitQ%253D%253D&md5=81943a6732be99e5439e1e3a25fa4414KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural NetworksJimenez, Jose; Skalic, Miha; Martinez-Rosell, Gerard; De Fabritiis, GianniJournal of Chemical Information and Modeling (2018), 58 (2), 287-296CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Accurately predicting protein-ligand binding affinities is an important problem in computational chem. since it can substantially accelerate drug discovery for virtual screening and lead optimization. We propose here a fast machine-learning approach for predicting binding affinities using state-of-the-art 3D-convolutional neural networks and compare this approach to other machine-learning and scoring methods using several diverse data sets. The results for the std. PDBbind (v.2016) core test-set are state-of-the-art with a Pearson's correlation coeff. of 0.82 and a RMSE of 1.27 in pK units between exptl. and predicted affinity, but accuracy is still very sensitive to the specific protein used. KDEEP is made available via PlayMol.org for users to test easily their own protein-ligand complexes, with each prediction taking a fraction of a second. We believe that the speed, performance, and ease of use of KDEEP makes it already an attractive scoring function for modern computational chem. pipelines.
- 12Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R. Protein-Ligand Scoring with Convolutional Neural Networks. J. Chem. Inf. Model. 2017, 57, 942– 957, DOI: 10.1021/acs.jcim.6b0074012https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXlsVems7Y%253D&md5=9cae97167da0c93e1d896f85339cca7eProtein-Ligand Scoring with Convolutional Neural NetworksRagoza, Matthew; Hochuli, Joshua; Idrobo, Elisa; Sunseri, Jocelyn; Koes, David RyanJournal of Chemical Information and Modeling (2017), 57 (4), 942-957CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Computational approaches to drug discovery can reduce the time and cost assocd. with exptl. assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amt. of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive three-dimensional (3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.
- 13Skalic, M.; Martínez-Rosell, G.; Jiménez, J.; De Fabritiis, G. PlayMolecule BindScope: large scale CNN-based virtual screening on the web. Bioinformatics 2019, 35, 1237– 1238, DOI: 10.1093/bioinformatics/bty75813https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitFWksb7K&md5=100cce3db2446853ed2f7c82b96d49dfPlayMolecule BindScope: large scale CNN-based virtual screening on the webSkalic, Miha; Martinez-Rosell, Gerard; Jimenez, Jose; De Fabritiis, GianniBioinformatics (2019), 35 (7), 1237-1238CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Summary: Virtual screening pipelines are one of the most popular used tools in structure-based drug discovery, since they can can reduce both time and cost assocd. with exptl. assays. Recent advances in deep learning methodologies have shown that these outperform classical scoring functions at discriminating binder protein-ligand complexes. Here, we present BindScope, a web application for large-scale active-inactive classification of compds. based on deep convolutional neural networks. Performance is on a pair with current state-of-the-art pipelines. Users can screen on the order of hundreds of compds. at once and interactively visualize the results.
- 14Yang, S.; Lee, K. H.; Ryu, S. A comprehensive study on the prediction reliability of graph neural networks for virtual screening. 2020, arXiv:2003.07611, arXiv preprint. https://arxiv.org/abs/2003.07611 (accessed 2021-10-29).There is no corresponding record for this reference.
- 15Sakai, M.; Nagayasu, K.; Shibui, N.; Andoh, C.; Takayama, K.; Shirakawa, H.; Kaneko, S. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci. Rep. 2021, 11, 525, DOI: 10.1038/s41598-020-80113-715https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsVemsLo%253D&md5=dfd62a89f9f705f4e740544f40ce705cPrediction of pharmacological activities from chemical structures with graph convolutional neural networksSakai, Miyuki; Nagayasu, Kazuki; Shibui, Norihiro; Andoh, Chihiro; Takayama, Kaito; Shirakawa, Hisashi; Kaneko, ShujiScientific Reports (2021), 11 (1), 525CODEN: SRCEC3; ISSN:2045-2322. (Nature Research)Many therapeutic drugs are compds. that can be represented by simple chem. structures, which contain important determinants of affinity at the site of action. Recently, graph convolutional neural network (GCN) models have exhibited excellent results in classifying the activity of such compds. For models that make quant. predictions of activity, more complex information has been utilized, such as the three-dimensional structures of compds. and the amino acid sequences of their resp. target proteins. As another approach, we hypothesized that if sufficient exptl. data were available and there were enough nodes in hidden layers, a simple compd. representation would quant. predict activity with satisfactory accuracy. In this study, we report that GCN models constructed solely from the two-dimensional structural information of compds. demonstrated a high degree of activity predictability against 127 diverse targets from the ChEMBL database. Using the information entropy as a metric, we also show that the structural diversity had less effect on the prediction performance. Finally, we report that virtual screening using the constructed model identified a new serotonin transporter inhibitor with activity comparable to that of a marketed drug in vitro and exhibited antidepressant effects in behavioral studies.
- 16Sieg, J.; Flachsenberg, F.; Rarey, M. Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening. J. Chem. Inf. Model. 2019, 59, 947– 961, DOI: 10.1021/acs.jcim.8b0071216https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXktVOgsrY%253D&md5=6cb48f2acd541c10c8565f202a6eb9e8In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual ScreeningSieg, Jochen; Flachsenberg, Florian; Rarey, MatthiasJournal of Chemical Information and Modeling (2019), 59 (3), 947-961CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)A review. Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, the authors reevaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, the authors demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from std. benchmarks. On the basis of these results, the authors conclude that there is a need for eligible validation expts. and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, the authors provide guidelines for setting up validation expts. and give a perspective on how new data sets could be generated.
- 17DeGrave, A. J.; Janizek, J. D.; Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 2021, 3, 610, DOI: 10.1038/s42256-021-00338-7There is no corresponding record for this reference.
- 18Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; Zhao, S. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discovery 2019, 18, 463– 477, DOI: 10.1038/s41573-019-0024-518https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosF2rtrY%253D&md5=211782aeea3d8b9f50368f89177a70d2Applications of machine learning in drug discovery and developmentVamathevan, Jessica; Clark, Dominic; Czodrowski, Paul; Dunham, Ian; Ferran, Edgardo; Lee, George; Li, Bin; Madabhushi, Anant; Shah, Parantu; Spitzer, Michaela; Zhao, ShanrongNature Reviews Drug Discovery (2019), 18 (6), 463-477CODEN: NRDDAG; ISSN:1474-1776. (Nature Research)A review. Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and anal. of digital pathol. data in clin. trials. Applications have ranged in context and methodol., with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.
- 19Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. 2017, arXiv preprint. https://arxiv.org/abs/1703.01365 (accessed 2021-10-29).There is no corresponding record for this reference.
- 20Henderson, R.; Clevert, D.-A.; Montanari, F. Improving Molecular Graph Neural Network Explainability with Orthonormalization and Induced Sparsity. Proceedings of the 38 th International Conference on Machine Learning ; 2021.There is no corresponding record for this reference.
- 21Kokhlikyan, N.; Miglani, V.; Martin, M.; Wang, E.; Alsallakh, B.; Reynolds, J.; Melnikov, A.; Kliushkina, N.; Araya, C.; Yan, S.; Reblitz-Richardson, O. Captum: A unified and generic model interpretability library for PyTorch. 2020, arXiv preprint. https://arxiv.org/abs/2009.07896 (accessed 2021-06-17).There is no corresponding record for this reference.
- 22Klaise, J.; Van Looveren, A.; Vacanti, G.; Coca, A. Alibi: Algorithms for monitoring and explaining machine learning models ; 2019. https://github.com/SeldonIO/alibi (accessed 2021-09-08).There is no corresponding record for this reference.
- 23Hochuli, J.; Helbling, A.; Skaist, T.; Ragoza, M.; Koes, D. R. Visualizing convolutional neural network protein-ligand scoring. J. Mol. Graphics Modell. 2018, 84, 96– 108, DOI: 10.1016/j.jmgm.2018.06.00523https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhtF2ltb3P&md5=db697d859132f02c4a2cef4bfc00ea88Visualizing convolutional neural network protein-ligand scoringHochuli, Joshua; Helbling, Alec; Skaist, Tamar; Ragoza, Matthew; Koes, David RyanJournal of Molecular Graphics & Modelling (2018), 84 (), 96-108CODEN: JMGMFI; ISSN:1093-3263. (Elsevier Ltd.)Protein-ligand scoring is an important step in a structure-based drug design pipeline. Selecting a correct binding pose and predicting the binding affinity of a protein-ligand complex enables effective virtual screening. Machine learning techniques can make use of the increasing amts. of structural data that are becoming publicly available. Convolutional neural network (CNN) scoring functions in particular have shown promise in pose selection and affinity prediction for protein-ligand complexes. Neural networks are known for being difficult to interpret. Understanding the decisions of a particular network can help tune parameters and training data to maximize performance. Visualization of neural networks helps decomp. complex scoring functions into pictures that are more easily parsed by humans. Here we present three methods for visualizing how individual protein-ligand complexes are interpreted by 3D convolutional neural networks. We also present a visualization of the convolutional filters and their wts. We describe how the intuition provided by these visualizations aids in network design.
- 24Jiménez, J.; Doerr, S.; Martínez-Rosell, G.; Rose, A. S.; De Fabritiis, G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 2017, 33, 3036– 3042, DOI: 10.1093/bioinformatics/btx35024https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhvFGju7nN&md5=fce2763a18192d7a7c4c98fcb2974a22DeepSite: protein-binding site predictor using 3D-convolutional neural networksJimenez, J.; Doerr, S.; Martinez-Rosell, G.; Rose, A. S.; De Fabritiis, G.Bioinformatics (2017), 33 (19), 3036-3042CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: An important step in structure-based drug design consists in the prediction of druggable binding sites. Several algorithms for detecting binding cavities, those likely to bind to a small drug compd., have been developed over the years by clever exploitation of geometric, chem. and evolutionary features of the protein. Results: Here we present a novel knowledge-based approach that uses state-of-the-art convolutional neural networks, where the algorithm is learned by examples. In total, 7622 proteins from the scPDB database of binding sites have been evaluated using both a distance and a volumetric overlap approach. Our machine-learning based method demonstrates superior performance to two other competitive algorithmic strategies. Availability and implementation: DeepSite is freely available at www.playmol.org. Users can submit either a PDB ID or PDB file for pocket detection to our NVIDIA GPU-equipped servers through a WebGL graphical interface.
- 25Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2015, 31, 405– 412, DOI: 10.1093/bioinformatics/btu62625https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GjsrnI&md5=cb0debb7251bf8cd34385fb4b09e19bbPDB-wide collection of binding data: current status of the PDBbind databaseLiu, Zhihai; Li, Yan; Han, Li; Li, Jie; Liu, Jie; Zhao, Zhixiong; Nie, Wei; Liu, Yuchen; Wang, RenxiaoBioinformatics (2015), 31 (3), 405-412CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation:Mol. recognition between biol. macromols. and org. small mols. plays an important role in various life processes. Both structural information and binding data of biomol. complexes are indispensable for depicting the underlying mechanism in such an event. The PDBbind database was created to collect exptl. measured binding data for the biomol. complexes throughout the Protein Data Bank (PDB). It thus provides the linkage between structural information and energetic properties of biomol. complexes, which is esp. desirable for computational studies or statistical analyses. Results: Since its first public release in 2004, the PDBbind database has been updated on an annual basis. The latest release (version 2013) provides exptl. binding affinity data for 10 776 biomol. complexes in PDB, including 8302 protein-ligand complexes and 2474 other types of complexes. In this article, we will describe the current methods used for compiling PDBbind and the updated status of this database. We will also review some typical applications of PDBbind published in the scientific literature.
- 26Humphrey, W.; Dalke, A.; Schulten, K. VMD – Visual Molecular Dynamics. J. Mol. Graphics 1996, 14, 33– 38, DOI: 10.1016/0263-7855(96)00018-526https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK28Xis12nsrg%253D&md5=1e3094ec3151fb85c5ff05f8505c78d5VDM: visual molecular dynamicsHumphrey, William; Dalke, Andrew; Schulten, KlausJournal of Molecular Graphics (1996), 14 (1), 33-8, plates, 27-28CODEN: JMGRDV; ISSN:0263-7855. (Elsevier)VMD is a mol. graphics program designed for the display and anal. of mol. assemblies, in particular, biopolymers such as proteins and nucleic acids. VMD can simultaneously display any no. of structures using a wide variety of rendering styles and coloring methods. Mols. are displayed as one or more "representations," in which each representation embodies a particular rendering method and coloring scheme for a selected subset of atoms. The atoms displayed in each representation are chosen using an extensive atom selection syntax, which includes Boolean operators and regular expressions. VMD provides a complete graphical user interface for program control, as well as a text interface using the Tcl embeddable parser to allow for complex scripts with variable substitution, control loops, and function calls. Full session logging is supported, which produces a VMD command script for later playback. High-resoln. raster images of displayed mols. may be produced by generating input scripts for use by a no. of photorealistic image-rendering applications. VMD has also been expressly designed with the ability to animate mol. dynamics (MD) simulation trajectories, imported either from files or from a direct connection to a running MD simulation. VMD is the visualization component of MDScope, a set of tools for interactive problem solving in structural biol., which also includes the parallel MD program NAMD, and the MDCOMM software used to connect the visualization and simulation programs, VMD is written in C++, using an object-oriented design; the program, including source code and extensive documentation, is freely available via anonymous ftp and through the World Wide Web.
- 27Martínez-Rosell, G.; Giorgino, T.; De Fabritiis, G. PlayMolecule ProteinPrepare: a web application for protein preparation for molecular dynamics simulations. J. Chem. Inf. Model. 2017, 57, 1511– 1516, DOI: 10.1021/acs.jcim.7b0019027https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXps1Ghu7c%253D&md5=1a1adc5f0af564e9e9473cc5ebb16ab4PlayMolecule ProteinPrepare: A Web Application for Protein Preparation for Molecular Dynamics SimulationsMartinez-Rosell, Gerard; Giorgino, Toni; De Fabritiis, GianniJournal of Chemical Information and Modeling (2017), 57 (7), 1511-1516CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Protein prepn. is a crit. step in mol. simulations that consists of refining a Protein Data Bank (PDB) structure by assigning titrn. states and optimizing the hydrogen-bonding network. In this application note, the authors describe ProteinPrepare, a web application designed to interactively support the prepn. of protein structures. Users can upload a PDB file, choose the solvent pH value, and inspect the resulting protonated residues and hydrogen-bonding network within a 3D web interface. Protonation states are suggested automatically but can be manually changed using the visual aid of the hydrogen-bonding network. Tables and diagrams provide estd. pKa values and charge states, with visual indication for cases where review is required. The authors expect the graphical interface to be a useful instrument to assess the validity of the prepn., but nevertheless, a script to execute the prepn. offline with the High-Throughput Mol. Dynamics (HTMD) environment is also provided for noninteractive operations.
- 28Barta, T. E.; Veal, J. M.; Rice, J. W.; Partridge, J. M.; Fadden, R. P.; Ma, W.; Jenks, M.; Geng, L.; Hanson, G. J.; Huang, K. H. Discovery of benzamide tetrahydro-4H-carbazol-4-ones as novel small molecule inhibitors of Hsp90. Bioorg. Med. Chem. Lett. 2008, 18, 3517– 3521, DOI: 10.1016/j.bmcl.2008.05.02328https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXmvFSns74%253D&md5=f965e9a2cfd0d44a57bb1949cd87de63Discovery of benzamide tetrahydro-4H-carbazol-4-ones as novel small molecule inhibitors of Hsp90Barta, Thomas E.; Veal, James M.; Rice, John W.; Partridge, Jeffrey M.; Fadden, R. Patrick; Ma, Wei; Jenks, Matthew; Geng, Lifeng; Hanson, Gunnar J.; Huang, Kenneth H.; Barabasz, Amy F.; Foley, Briana E.; Otto, James; Hall, Steven E.Bioorganic & Medicinal Chemistry Letters (2008), 18 (12), 3517-3521CODEN: BMCLE8; ISSN:0960-894X. (Elsevier Ltd.)Hsp90 maintains the conformational stability of multiple proteins implicated in oncogenesis and has emerged as a target for chemotherapy. We report here the discovery of a novel small mol. scaffold that inhibits Hsp90. X-ray data show that the scaffold binds competitively at the ATP site on Hsp90. Cellular proliferation and client assays demonstrate that members of the series are able to inhibit Hsp90 at nanomolar concns.
- 29Erlanson, D. A. In Fragment-Based Drug Discovery and X-Ray Crystallography; Davies, T. G., Hyvönen, M., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2012; pp 1– 32, DOI: 10.1007/128_2011_180 .There is no corresponding record for this reference.
- 30Ruiz-Carmona, S.; Schmidtke, P.; Luque, F. J.; Baker, L.; Matassova, N.; Davis, B.; Roughley, S.; Murray, J.; Hubbard, R.; Barril, X. Dynamic undocking and the quasi-bound state as tools for drug discovery. Nat. Chem. 2017, 9, 201, DOI: 10.1038/nchem.266030https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhvVGiur3I&md5=181235443427435bc9f3e8c17f8df9eeDynamic undocking and the quasi-bound state as tools for drug discoveryRuiz-Carmona, Sergio; Schmidtke, Peter; Luque, F. Javier; Baker, Lisa; Matassova, Natalia; Davis, Ben; Roughley, Stephen; Murray, James; Hubbard, Rod; Barril, XavierNature Chemistry (2017), 9 (3), 201-206CODEN: NCAHBB; ISSN:1755-4330. (Nature Publishing Group)There is a pressing need for new technologies that improve the efficacy and efficiency of drug discovery. Structure-based methods have contributed towards this goal but they focus on predicting the binding affinity of protein-ligand complexes, which is notoriously difficult. The authors adopt an alternative approach that evaluates structural, rather than thermodn., stability. As bioactive mols. present a static binding mode, the authors devised dynamic undocking (DUck), a fast computational method to calc. the work necessary to reach a quasi-bound state at which the ligand has just broken the most important native contact with the receptor. This nonequil. property is surprisingly effective in virtual screening because true ligands form more-resilient interactions than decoys. Notably, DUck is orthogonal to docking and other 'thermodn.' methods. The authors demonstrate the potential of the docking-undocking combination in a fragment screening against the mol. chaperone and oncol. target Hsp90, for which the authors obtain novel chemotypes and a hit rate that approaches 40%.
- 31Hoxie, R. S.; Street, T. O. Hsp90 chaperones have an energetic hot-spot for binding inhibitors. Protein Sci. 2020, 29, 2101– 2111, DOI: 10.1002/pro.393331https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhslKhurfK&md5=1ab67c65a6e11c2cde6fd481ddef6f66Hsp90 chaperones have an energetic hot-spot for binding inhibitorsHoxie, Reyal S.; Street, Timothy O.Protein Science (2020), 29 (10), 2101-2111CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)Although Hsp90-family chaperones have been extensively targeted with ATP-competitive inhibitors, it is unknown whether high affinity is achieved from a few highly stabilizing contacts or from many weaker contacts within the ATP-binding pocket. A large-scale anal. of Hsp90α:inhibitor structures shows that inhibitor hydrogen-bonding to a conserved aspartate (D93 in Hsp90α) stands out as most universal among Hsp90 inhibitors. Here we show that the D93 region makes a dominant energetic contribution to inhibitor binding for both cytosolic and organelle-specific Hsp90 paralogs. For inhibitors in the resorcinol family, the D93:inhibitor hydrogen-bond is pH-dependent because the assocd. inhibitor hydroxyl group is titratable, rationalizing a linked-protonation event previously obsd. by the Matulis group. The inhibitor hydroxyl group pKa assocd. with the D93 hydrogen-bond is therefore crit. for optimizing the affinity of resorcinol derivs., and we demonstrate that spectrophotometric measurements can det. this pKa value. Quantifying the energetic contribution of the D93 hotspot is best achieved with the mitochondrial Hsp90 paralog, yielding 3-6 kcal/mol of stabilization (35-60% of the total binding energy) for a diverse set of inhibitors. The Hsp90 Asp93→Asn substitution has long been known to abolish nucleotide binding, yet puzzlingly, native sequences of structurally similar ATPases, such as Topoisomerasese II, have an asparagine at this same crucial site. While aspartate and asparagine sidechains can both act as hydrogen bond acceptors, we show that a steric clash prevents the Hsp90 Asp93→Asn sidechain from adopting the necessary rotamer, whereas this steric restriction is absent in Topoisomerasese II.
- 32Majewski, M.; Barril, X. Structural Stability Predicts the Binding Mode of Protein–Ligand Complexes. J. Chem. Inf. Model. 2020, 60, 1644– 1651, DOI: 10.1021/acs.jcim.9b0106232https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXivVaqsLk%253D&md5=e4f79145486f5552e96a9a7906160f83Structural Stability Predicts the Binding Mode of Protein-Ligand ComplexesMajewski, Maciej; Barril, XavierJournal of Chemical Information and Modeling (2020), 60 (3), 1644-1651CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)The prediction of a ligand's binding mode into its macromol. target is essential in structure-based drug discovery. Even though tremendous effort has been made to address this problem, most of the developed tools work similarly, trying to predict the binding free energy assocd. with each particular binding mode. In this study, we decided to abandon this criterion, following structural stability instead. This view, implemented in a novel computational workflow, quantifies the steepness of the local energy min. assocd. with each potential binding mode. Surprisingly, the protocol outperforms docking scoring functions in case of fragments (ligands with MW < 300 Da) and is as good as docking for drug-like mols. It also identifies substructures that act as structural anchors, predicting their binding mode with particular accuracy. The results open a new phys. perspective for binding mode prediction, which can be combined with existing thermodn.-based approaches.
- 33Hu, L.; Benson, M. L.; Smith, R. D.; Lerner, M. G.; Carlson, H. A. Binding MOAD (Mother of All Databases). Proteins: Struct., Funct., Genet. 2005, 60, 333– 340, DOI: 10.1002/prot.2051233https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXmvVyrtr4%253D&md5=146632a2f30fda98cb987e984beb7506Binding MOAD (Mother of All Databases)Hu, Liegi; Benson, Mark L.; Smith, Richard D.; Lerner, Michael G.; Carlson, Heather A.Proteins: Structure, Function, and Bioinformatics (2005), 60 (3), 333-340CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)Binding MOAD (Mother of All Databases) is the largest collection of high-quality, protein-ligand complexes available from the Protein Data Bank. At this time, Binding MOAD contains 5331 protein-ligand complexes comprised of 1780 unique protein families and 2630 unique ligands. We have searched the crystallog. papers for all 5000 + structures and compiled binding data for 1375 (26%) of the protein-ligand complexes. The binding-affinity data ranges 13 orders of magnitude. This is the largest collection of binding data reported to date in the literature. We have also addressed the issue of redundancy in the data. To create a nonredundant dataset, one protein from each of the 1780 protein families was chosen as a representative. Representatives were chosen by tightest binding, best resoln., etc. For the 1780 "best" complexes that comprise the nonredundant version of Binding MOAD, 475 (27%) have binding data. This significant collection of protein-ligand complexes will be very useful in elucidating the biophys. patterns of mol. recognition and enzymic regulation. The complexes with binding-affinity data will help in the development of improved scoring functions and structure-based drug discovery techniques. The dataset can be accessed at http://www.BindingMOAD.org.
- 34Ruiz-Carmona, S.; Alvarez-Garcia, D.; Foloppe, N.; Garmendia-Doval, A. B.; Juhos, S.; Schmidtke, P.; Barril, X.; Hubbard, R. E.; Morley, S. D. rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids. PLoS Comput. Biol. 2014, 10, e1003571 DOI: 10.1371/journal.pcbi.100357134https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhsVGlsL%252FO&md5=e4cb786d6567fdc7f2a46f64955a9992rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acidsRuiz-Carmona, Sergio; Alvarez-Garcia, Daniel; Foloppe, Nicolas; Garmendia-Doval, A. Beatriz; Juhos, Szilveszter; Schmidtke, Peter; Barril, Xavier; Hubbard, Roderick E.; Morley, S. DavidPLoS Computational Biology (2014), 10 (4), e1003571/1-e1003571/7, 7 pp.CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)Identification of chem. compds. with specific biol. activities is an important step in both chem. biol. and drug discovery. When the structure of the intended target is available, one approach is to use mol. docking programs to assess the chem. complementarity of small mols. with the target; such calcns. provide a qual. measure of affinity that can be used in virtual screening (VS) to rank order a list of compds. according to their potential to be active. rDock is a mol. docking program developed at Vernalis for high-throughput VS (HTVS) applications. Evolved from RiboDock, the program can be used against proteins and nucleic acids, is designed to be computationally very efficient and allows the user to incorporate addnl. constraints and information as a bias to guide docking. This article provides an overview of the program structure and features and compares rDock to two ref. programs, AutoDock Vina (open source) and Schrodinger's Glide (com.). In terms of computational speed for VS, rDock is faster than Vina and comparable to Glide. For binding mode prediction, rDock and Vina are superior to Glide. The VS performance of rDock is significantly better than Vina, but inferior to Glide for most systems unless pharmacophore constraints are used; in that case rDock and Glide are of equal performance. The program is released under the Lesser General Public License and is freely available for download, together with the manuals, example files and the complete test sets, at online.
Supporting Information
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c00691.
“Model training” and “quantitative analysis”, additional information for these sections; Figures S1–S6, distance distribution between the two voxels with highest, absolute attribution value for different channel combinations studied; Figures S7 and S8, examples of protein residues far from ligand having high attribution values; Figure S9, correlation between magnitude of attributions of two best voxels and distance between them; and Figures S10–S13, attribution consistency distributions (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.