ACS Publications. Most Trusted. Most Cited. Most Read
Overcoming the Heuristic Nature of k-Means Clustering: Identification and Characterization of Binding Modes from Simulations of Molecular Recognition Complexes
My Activity

Figure 1Loading Img
    Article

    Overcoming the Heuristic Nature of k-Means Clustering: Identification and Characterization of Binding Modes from Simulations of Molecular Recognition Complexes
    Click to copy article linkArticle link copied!

    • Parker Ladd Bremer
      Parker Ladd Bremer
      Department of Chemistry & Biochemistry, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United States
    • Danna De Boer
      Danna De Boer
      Department of Chemistry & Biochemistry, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United States
    • Walter Alvarado
      Walter Alvarado
      Department of Physics & Astronomy, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United States
    • Xavier Martinez
      Xavier Martinez
      Department of Computer Engineering & Computer Science, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United States
    • Eric J. Sorin*
      Eric J. Sorin
      Department of Chemistry & Biochemistry, California State University, Long Beach, 1250 Bellflower Boulevard, Long Beach, California 90840, United States
      *Email: [email protected]. Phone: 562-985-7537.
    Other Access OptionsSupporting Information (1)

    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2020, 60, 6, 3081–3092
    Click to copy citationCitation copied!
    https://doi.org/10.1021/acs.jcim.9b01137
    Published May 8, 2020
    Copyright © 2020 American Chemical Society

    Abstract

    Click to copy section linkSection link copied!
    Abstract Image

    The accurate and reproducible detection and description of thermodynamic states in computational data is a nontrivial problem, particularly when the number of states is unknown a priori and for large, flexible chemical systems and complexes. To this end, we report a novel clustering protocol that combines high-resolution structural representation, brute-force repeat clustering, and optimization of clustering statistics to reproducibly identify the number of clusters present in a data set (k) for simulated ensembles of butyrylcholinesterase in complex with two previously studied organophosphate inhibitors. Each structure within our simulated ensembles was depicted as a high-dimensionality vector with components defined by specific protein–inhibitor contacts at the chemical group level and the magnitudes of these components defined by their respective extents of pair-wise atomic contact, thus allowing for algorithmic differentiation between varying degrees of interaction. These surface-weighted interaction fingerprints were tabulated for each of over 1 million structures from more than 100 μs of all-atom molecular dynamics simulation per complex and used as the input for repetitive k-means clustering. Minimization of cluster population variance and range afforded accurate and reproducible identification of k, thereby allowing for the characterization of discrete binding modes from molecular simulation data in the form of contact tables that concisely encapsulate the observed intermolecular contact motifs. While the protocol presented herein to determine k and achieve non-heuristic clustering is demonstrated on data from massive atomistic simulation, our approach is generalizable to other data types and clustering algorithms, and is tractable with limited computational resources.

    Copyright © 2020 American Chemical Society

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.

    Recommended

    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. Add or change your institution or let them know you’d like them to include access.

    Supporting Information

    Click to copy section linkSection link copied!

    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.9b01137.

    • Inhibitor charge derivation and charges used; elbow plots for k-means trials performed on varying data sets of n simulations; secondary population distribution matrices for validation of identification of k (PDF)

    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

    Cited By

    Click to copy section linkSection link copied!

    This article is cited by 11 publications.

    1. Chanyu Yao, Qiang Wang, Xiaohui Lu, Xiaofeng Chen, Zheng Li. Hydrogel-Based Microdroplet Ensembles Encapsulating Multiplexed EXPAR Assays for Trichromic Digital Profiling of MicroRNAs and in-Depth Classification of Primary Urethral Cancers. Nano Letters 2024, 24 (49) , 15861-15869. https://doi.org/10.1021/acs.nanolett.4c04898
    2. Lexin Chen, Daniel R. Roe, Matthew Kochert, Carlos Simmerling, Ramón Alain Miranda-Quintana. k-Means NANI: An Improved Clustering Algorithm for Molecular Dynamics Simulations. Journal of Chemical Theory and Computation 2024, 20 (13) , 5583-5597. https://doi.org/10.1021/acs.jctc.4c00308
    3. In Sub M. Han, Kelly M. Thayer. Reconnaissance of Allostery via the Restoration of Native p53 DNA-Binding Domain Dynamics in Y220C Mutant p53 Tumor Suppressor Protein. ACS Omega 2024, 9 (18) , 19837-19847. https://doi.org/10.1021/acsomega.3c08509
    4. In Sub M. Han, Dylan Abramson, Kelly M. Thayer. Insights into Rational Design of a New Class of Allosteric Effectors with Molecular Dynamics Markov State Models and Network Theory. ACS Omega 2022, 7 (3) , 2831-2841. https://doi.org/10.1021/acsomega.1c05624
    5. Luiz Patrick Cordeiro Josino, Renan Patrick da Penha Valente, Maria Luane de Souza da Silva, Cláudio Nahum Alves, Anderson H. Lima. Molecular dynamics of transferrin receptor binder peptides: unlocking blood-brain barrier for enhanced CNS drug delivery. Journal of Biomolecular Structure and Dynamics 2025, , 1-10. https://doi.org/10.1080/07391102.2024.2446676
    6. Mohammed Zakariae El Khattabi, Mostapha El Jai, Youssef Lahmadi, Lahcen Oughdir. Geometry-Inference Based Clustering Heuristic: New k-means Metric for Gaussian Data and Experimental Proof of Concept. Operations Research Forum 2024, 5 (1) https://doi.org/10.1007/s43069-024-00291-2
    7. Patrick Allen, Nguyet Nguyen, Nicholas D. Humphrey, Jia Mao, Daniel Chavez-Bonilla, Eric J. Sorin. A Hands-On Collaboration-Ready Single- or Interdisciplinary Computational Exercise in Molecular Recognition and Drug Design. Education Sciences 2024, 14 (2) , 139. https://doi.org/10.3390/educsci14020139
    8. Bayo Lau, Prashant S Emani, Jackson Chapman, Lijing Yao, Tarsus Lam, Paul Merrill, Jonathan Warrell, Mark B Gerstein, Hugo Y K Lam, . Insights from incorporating quantum computing into drug design workflows. Bioinformatics 2023, 39 (1) https://doi.org/10.1093/bioinformatics/btac789
    9. Shreyas Kaptan, Ilpo Vattulainen. Machine learning in the analysis of biomolecular simulations. Advances in Physics: X 2022, 7 (1) https://doi.org/10.1080/23746149.2021.2006080
    10. Otávio Augusto Chaves, Carlyle Ribeiro Lima, Natalia Fintelman-Rodrigues, Carolina Q. Sacramento, Caroline S. de Freitas, Leonardo Vazquez, Jairo R. Temerozo, Marco E.N. Rocha, Suelen S.G. Dias, Nicolas Carels, Patrícia T. Bozza, Hugo Caire Castro-Faria-Neto, Thiago Moreno L. Souza. Agathisflavone, a natural biflavonoid that inhibits SARS-CoV-2 replication by targeting its proteases. International Journal of Biological Macromolecules 2022, 222 , 1015-1026. https://doi.org/10.1016/j.ijbiomac.2022.09.204
    11. Danna De Boer, Nguyet Nguyen, Jia Mao, Jessica Moore, Eric J. Sorin. A Comprehensive Review of Cholinesterase Modeling and Simulation. Biomolecules 2021, 11 (4) , 580. https://doi.org/10.3390/biom11040580

    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2020, 60, 6, 3081–3092
    Click to copy citationCitation copied!
    https://doi.org/10.1021/acs.jcim.9b01137
    Published May 8, 2020
    Copyright © 2020 American Chemical Society

    Article Views

    696

    Altmetric

    -

    Citations

    Learn about these metrics

    Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

    Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

    The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.