Overcoming the Heuristic Nature of k-Means Clustering: Identification and Characterization of Binding Modes from Simulations of Molecular Recognition ComplexesClick to copy article linkArticle link copied!
- Parker Ladd BremerParker Ladd BremerDepartment of Chemistry & Biochemistry, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United StatesMore by Parker Ladd Bremer
- Danna De BoerDanna De BoerDepartment of Chemistry & Biochemistry, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United StatesMore by Danna De Boer
- Walter AlvaradoWalter AlvaradoDepartment of Physics & Astronomy, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United StatesMore by Walter Alvarado
- Xavier MartinezXavier MartinezDepartment of Computer Engineering & Computer Science, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United StatesMore by Xavier Martinez
- Eric J. Sorin*Eric J. Sorin*Email: [email protected]. Phone: 562-985-7537.Department of Chemistry & Biochemistry, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United StatesMore by Eric J. Sorin
Abstract
The accurate and reproducible detection and description of thermodynamic states in computational data is a nontrivial problem, particularly when the number of states is unknown a priori and for large, flexible chemical systems and complexes. To this end, we report a novel clustering protocol that combines high-resolution structural representation, brute-force repeat clustering, and optimization of clustering statistics to reproducibly identify the number of clusters present in a data set (k) for simulated ensembles of butyrylcholinesterase in complex with two previously studied organophosphate inhibitors. Each structure within our simulated ensembles was depicted as a high-dimensionality vector with components defined by specific protein–inhibitor contacts at the chemical group level and the magnitudes of these components defined by their respective extents of pair-wise atomic contact, thus allowing for algorithmic differentiation between varying degrees of interaction. These surface-weighted interaction fingerprints were tabulated for each of over 1 million structures from more than 100 μs of all-atom molecular dynamics simulation per complex and used as the input for repetitive k-means clustering. Minimization of cluster population variance and range afforded accurate and reproducible identification of k, thereby allowing for the characterization of discrete binding modes from molecular simulation data in the form of contact tables that concisely encapsulate the observed intermolecular contact motifs. While the protocol presented herein to determine k and achieve non-heuristic clustering is demonstrated on data from massive atomistic simulation, our approach is generalizable to other data types and clustering algorithms, and is tractable with limited computational resources.
Cited By
This article is cited by 11 publications.
- Chanyu Yao, Qiang Wang, Xiaohui Lu, Xiaofeng Chen, Zheng Li. Hydrogel-Based Microdroplet Ensembles Encapsulating Multiplexed EXPAR Assays for Trichromic Digital Profiling of MicroRNAs and in-Depth Classification of Primary Urethral Cancers. Nano Letters 2024, 24
(49)
, 15861-15869. https://doi.org/10.1021/acs.nanolett.4c04898
- Lexin Chen, Daniel R. Roe, Matthew Kochert, Carlos Simmerling, Ramón Alain Miranda-Quintana. k-Means NANI: An Improved Clustering Algorithm for Molecular Dynamics Simulations. Journal of Chemical Theory and Computation 2024, 20
(13)
, 5583-5597. https://doi.org/10.1021/acs.jctc.4c00308
- In Sub M. Han, Kelly M. Thayer. Reconnaissance of Allostery via the Restoration of Native p53 DNA-Binding Domain Dynamics in Y220C Mutant p53 Tumor Suppressor Protein. ACS Omega 2024, 9
(18)
, 19837-19847. https://doi.org/10.1021/acsomega.3c08509
- In Sub M. Han, Dylan Abramson, Kelly M. Thayer. Insights into Rational Design of a New Class of Allosteric Effectors with Molecular Dynamics Markov State Models and Network Theory. ACS Omega 2022, 7
(3)
, 2831-2841. https://doi.org/10.1021/acsomega.1c05624
- Luiz Patrick Cordeiro Josino, Renan Patrick da Penha Valente, Maria Luane de Souza da Silva, Cláudio Nahum Alves, Anderson H. Lima. Molecular dynamics of transferrin receptor binder peptides: unlocking blood-brain barrier for enhanced CNS drug delivery. Journal of Biomolecular Structure and Dynamics 2025, , 1-10. https://doi.org/10.1080/07391102.2024.2446676
- Mohammed Zakariae El Khattabi, Mostapha El Jai, Youssef Lahmadi, Lahcen Oughdir. Geometry-Inference Based Clustering Heuristic: New k-means Metric for Gaussian Data and Experimental Proof of Concept. Operations Research Forum 2024, 5
(1)
https://doi.org/10.1007/s43069-024-00291-2
- Patrick Allen, Nguyet Nguyen, Nicholas D. Humphrey, Jia Mao, Daniel Chavez-Bonilla, Eric J. Sorin. A Hands-On Collaboration-Ready Single- or Interdisciplinary Computational Exercise in Molecular Recognition and Drug Design. Education Sciences 2024, 14
(2)
, 139. https://doi.org/10.3390/educsci14020139
- Bayo Lau, Prashant S Emani, Jackson Chapman, Lijing Yao, Tarsus Lam, Paul Merrill, Jonathan Warrell, Mark B Gerstein, Hugo Y K Lam, . Insights from incorporating quantum computing into drug design workflows. Bioinformatics 2023, 39
(1)
https://doi.org/10.1093/bioinformatics/btac789
- Shreyas Kaptan, Ilpo Vattulainen. Machine learning in the analysis of biomolecular simulations. Advances in Physics: X 2022, 7
(1)
https://doi.org/10.1080/23746149.2021.2006080
- Otávio Augusto Chaves, Carlyle Ribeiro Lima, Natalia Fintelman-Rodrigues, Carolina Q. Sacramento, Caroline S. de Freitas, Leonardo Vazquez, Jairo R. Temerozo, Marco E.N. Rocha, Suelen S.G. Dias, Nicolas Carels, Patrícia T. Bozza, Hugo Caire Castro-Faria-Neto, Thiago Moreno L. Souza. Agathisflavone, a natural biflavonoid that inhibits SARS-CoV-2 replication by targeting its proteases. International Journal of Biological Macromolecules 2022, 222 , 1015-1026. https://doi.org/10.1016/j.ijbiomac.2022.09.204
- Danna De Boer, Nguyet Nguyen, Jia Mao, Jessica Moore, Eric J. Sorin. A Comprehensive Review of Cholinesterase Modeling and Simulation. Biomolecules 2021, 11
(4)
, 580. https://doi.org/10.3390/biom11040580
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.