A Minimum Variance Clustering Approach Produces Robust and Interpretable Coarse-Grained Models
- Brooke E. Husic
- ,
- Keri A. McKiernan
- ,
- Hannah K. Wayment-Steele
- ,
- Mohammad M. Sultan
- , and
- Vijay S. Pande
Abstract
Markov state models (MSMs) are a powerful framework for the analysis of molecular dynamics data sets, such as protein folding simulations, because of their straightforward construction and statistical rigor. The coarse-graining of MSMs into an interpretable number of macrostates is a crucial step for connecting theoretical results with experimental observables. Here we present the minimum variance clustering approach (MVCA) for the coarse-graining of MSMs into macrostate models. The method utilizes agglomerative clustering with Ward’s minimum variance objective function, and the similarity of the microstate dynamics is determined using the Jensen–Shannon divergence between the corresponding rows in the MSM transition probability matrix. We first show that MVCA produces intuitive results for a simple tripeptide system and is robust toward long-duration statistical artifacts. MVCA is then applied to two protein folding simulations of the same protein in different force fields to demonstrate that a different number of macrostates is appropriate for each model, revealing a misfolded state present in only one of the simulations. Finally, we show that the same method can be used to analyze a data set containing many MSMs from simulations in different force fields by aggregating them into groups and quantifying their dynamical similarity in the context of force field parameter choices. The minimum variance clustering approach with the Jensen–Shannon divergence provides a powerful tool to group dynamics by similarity, both among model states and among dynamical models themselves.
Cited By
This article is cited by 12 publications.
- Robert E. Arbon, Yanchen Zhu, Antonia S. J. S. Mey. Markov State Models: To Optimize or Not to Optimize. Journal of Chemical Theory and Computation 2024, 20
(2)
, 977-988. https://doi.org/10.1021/acs.jctc.3c01134
- Francesco Cocina, Andreas Vitalis, Amedeo Caflisch. Sapphire-Based Clustering. Journal of Chemical Theory and Computation 2020, 16
(10)
, 6383-6396. https://doi.org/10.1021/acs.jctc.0c00604
- Brooke E. Husic and Vijay S. Pande . Markov State Models: From an Art to a Science. Journal of the American Chemical Society 2018, 140
(7)
, 2386-2396. https://doi.org/10.1021/jacs.7b12191
- Hangjin Jiang, Han Li, Wing Hung Wong, Xiaodan Fan. Revealing Free Energy Landscape From MD Data via Conditional Angle Partition Tree. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2023, 20
(2)
, 1384-1394. https://doi.org/10.1109/TCBB.2022.3172352
- Michael D. Ward, Maxwell I. Zimmerman, Artur Meller, Moses Chung, S. J. Swamidass, Gregory R. Bowman. Deep learning the structural determinants of protein biochemical properties by comparing structural ensembles with DiffNets. Nature Communications 2021, 12
(1)
https://doi.org/10.1038/s41467-021-23246-1
- Daniel J. Sharpe, David J. Wales. Nearly reducible finite Markov chains: Theory and algorithms. The Journal of Chemical Physics 2021, 155
(14)
https://doi.org/10.1063/5.0060978
- Hangjin Jiang, Xiaodan Fan. The Two-Step Clustering Approach for Metastable States Learning. International Journal of Molecular Sciences 2021, 22
(12)
, 6576. https://doi.org/10.3390/ijms22126576
- R. Gregor Weiß, Benjamin Ries, Shuzhe Wang, Sereina Riniker. Volume-scaled common nearest neighbor clustering algorithm with free-energy hierarchy. The Journal of Chemical Physics 2021, 154
(8)
https://doi.org/10.1063/5.0025797
- Joseph F. Rudzinski. Recent Progress towards Chemically-Specific Coarse-Grained Simulation Models with Consistent Dynamical Properties. Computation 2019, 7
(3)
, 42. https://doi.org/10.3390/computation7030042
- Erik H. Thiede, Dimitrios Giannakis, Aaron R. Dinner, Jonathan Weare. Galerkin approximation of dynamical quantities using trajectory data. The Journal of Chemical Physics 2019, 150
(24)
https://doi.org/10.1063/1.5063730
- Brooke E. Husic, Kristy L. Schlueter-Kuck, John O. Dabiri, . Simultaneous coherent structure coloring facilitates interpretable clustering of scientific data by amplifying dissimilarity. PLOS ONE 2019, 14
(3)
, e0212442. https://doi.org/10.1371/journal.pone.0212442
- Brajesh Narayan, Colm Herbert, Ye Yuan, Brian J. Rodriguez, Bernard R. Brooks, Nicolae-Viorel Buchete. Conformational analysis of replica exchange MD: Temperature-dependent Markov networks for FF amyloid peptides. The Journal of Chemical Physics 2018, 149
(7)
https://doi.org/10.1063/1.5027580