ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img

Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection

Cite this: J. Chem. Inf. Model. 2019, 59, 5, 1919–1929
Publication Date (Web):February 6, 2019
https://doi.org/10.1021/acs.jcim.8b00734
Copyright © 2019 American Chemical Society

    Article Views

    574

    Altmetric

    -

    Citations

    LEARN ABOUT THESE METRICS
    Read OnlinePDF (1 MB)
    Supporting Info (1)»

    Abstract

    Abstract Image

    Knowledge-based potentials generally perform better than physics-based scoring functions in detecting the native structure from a collection of decoy protein structures. Through the use of a reference state, the pure interactions between atom/residue pairs can be obtained through the removal of contributions from ideal-gas state potentials. However, it is a challenge for conventional knowledge-based potentials to assign different importance factors to different atom/residue pairs. In this work, via the use of the “comparison” concept, Random Forest (RF) models were successfully generated using unbalanced data sets that assign different importance factors to atom pair potentials to enhance their ability to identify native proteins from decoy proteins. Individual and combined data sets consisting of 12 decoy sets were used to test the performance of the RF models. We find that RF models increase the recognition of native structures without affecting their ability to identify the best decoy structures. We also created models using scrambled atom types, which create physically unrealistic probability functions in order to test the ability of the RF algorithm to create useful models based on inputted scrambled probability functions. From this test, we find that we are unable to create models that are of similar quality relative to the unscrambled probability functions. Next, we created uniform probability functions where the peak positions are the same as the original, but each interaction has the same peak height. Using these uniform potentials, we were able to recover models as good as the ones using the full potentials suggesting all that is important in these models are the experimental peak positions. The KECSA2 potential along with all codes used in this work are available at https://github.com/JunPei000/protein_folding-decoy-set.

    Supporting Information

    ARTICLE SECTIONS
    Jump To

    The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.8b00734.

    • Description of the KECSA2 potential database, results of the best RF hyperparameter sets for each decoy set (Table s1), comparison of the native structure ranking using RF models with different numbers of features (Table s2), comparison of the first decoy structural RMSD using RF models with different numbers of features (Table s3), comparison of the first decoy structure’s TM-score using RF models with different numbers of features (Table s4), and feature importance analysis results for the overall decoy set (Figure s1). (PDF)

    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

    Cited By

    This article is cited by 10 publications.

    1. Jun Pei, Lin Frank Song, Kenneth M. Merz, Jr.. Pair Potentials as Machine Learning Features. Journal of Chemical Theory and Computation 2020, 16 (8) , 5385-5400. https://doi.org/10.1021/acs.jctc.9b01246
    2. Jun Pei, Zheng Zheng, Hyunji Kim, Lin Frank Song, Sarah Walworth, Margaux R. Merz, Kenneth M. Merz, Jr.. Random Forest Refinement of Pairwise Potentials for Protein–Ligand Decoy Detection. Journal of Chemical Information and Modeling 2019, 59 (7) , 3305-3315. https://doi.org/10.1021/acs.jcim.9b00356
    3. Habibah A. Wahab, Rommie E. Amaro, Zoe Cournia. A Celebration of Women in Computational Chemistry. Journal of Chemical Information and Modeling 2019, 59 (5) , 1683-1692. https://doi.org/10.1021/acs.jcim.9b00368
    4. Hsin-Yi Chen, Jian-Qiang Chen, Jun-Yan Li, Hung-Jin Huang, Xi Chen, Hao-Ying Zhang, Calvin Yu-Chian Chen. Deep Learning and Random Forest Approach for Finding the Optimal Traditional Chinese Medicine Formula for Treatment of Alzheimer’s Disease. Journal of Chemical Information and Modeling 2019, 59 (4) , 1605-1623. https://doi.org/10.1021/acs.jcim.9b00041
    5. Jiashun Mao, Javed Akhtar, Xiao Zhang, Liang Sun, Shenghui Guan, Xinyu Li, Guangming Chen, Jiaxin Liu, Hyeon-Nae Jeon, Min Sung Kim, Kyoung Tai No, Guanyu Wang. Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models. iScience 2021, 24 (9) , 103052. https://doi.org/10.1016/j.isci.2021.103052
    6. Kiyoto A. Tanemura, Jun Pei, Kenneth M. Merz. Refinement of pairwise potentials via logistic regression to score protein‐protein interactions. Proteins: Structure, Function, and Bioinformatics 2020, 88 (12) , 1559-1568. https://doi.org/10.1002/prot.25973
    7. Min-Hsuan Lee. Identification of host–guest systems in green TADF-based OLEDs with energy level matching based on a machine-learning study. Physical Chemistry Chemical Physics 2020, 22 (28) , 16378-16386. https://doi.org/10.1039/D0CP02871A
    8. Katerina Serafimova, Iliyan Mihaylov, Dimitar Vassilev, Irena Avdjieva, Piotr Zielenkiewicz, Szymon Kaczanowski. Using Machine Learning in Accuracy Assessment of Knowledge-Based Energy and Frequency Base Likelihood in Protein Structures. 2020, 572-584. https://doi.org/10.1007/978-3-030-50420-5_43
    9. Shiyang Long, Pu Tian. A simple neural network implementation of generalized solvation free energy for assessment of protein structural models. RSC Advances 2019, 9 (62) , 36227-36233. https://doi.org/10.1039/C9RA05168F
    10. Edelmiro Moman, Maria A. Grishina, Vladimir A. Potemkin. Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions. Journal of Computer-Aided Molecular Design 2019, 33 (11) , 943-953. https://doi.org/10.1007/s10822-019-00248-2

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    You’ve supercharged your research process with ACS and Mendeley!

    STEP 1:
    Click to create an ACS ID

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    MENDELEY PAIRING EXPIRED
    Your Mendeley pairing has expired. Please reconnect