ACS Publications. Most Trusted. Most Cited. Most Read
My Activity

Figure 1Loading Img

Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning

Cite this: J. Chem. Inf. Model. 2020, 60, 8, 4098–4107
Publication Date (Web):July 8, 2020
Copyright © 2020 American Chemical Society

    Article Views





    Other access options
    Supporting Info (1)»


    Abstract Image

    Accurate prediction of the optimal catalytic temperature (Topt) of enzymes is vital in biotechnology, as enzymes with high Topt values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting Topt was developed. TOME was trained on a normally distributed data set with a median Topt of 37 °C and less than 5% of Topt values above 85 °C, limiting the method’s predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on Topt values greater than 85 °C is nearly an order of magnitude higher than the error on values between 30 and 50 °C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high Topt values (>85 °C) by 60% and increase the overall R2 value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.


    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. You can change your affiliated institution below.

    Supporting Information

    Jump To

    The Supporting Information is available free of charge at

    • Performance of resampling strategies for all hyperparameter combinations algorithms (pseudocode) for the RO, SMOTER, GN, WERCS, WERCS-GN, and REBAGG resampling strategies(Figure S1) (PDF)

    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system:

    Cited By

    This article is cited by 25 publications.

    1. Xiaotao Wang, Yuwei Zong, Xuanjie Zhou, Li Xu, Wei He, Shu Quan. Artificial Intelligence-Powered Construction of a Microbial Optimal Growth Temperature Database and Its Impact on Enzyme Optimal Temperature Prediction. The Journal of Physical Chemistry B 2024, 128 (10) , 2281-2292.
    2. Petr Kouba, Pavel Kohout, Faraneh Haddadi, Anton Bushuiev, Raman Samusevich, Jiri Sedlar, Jiri Damborsky, Tomas Pluskal, Josef Sivic, Stanislav Mazurenko. Machine Learning-Guided Protein Engineering. ACS Catalysis 2023, 13 (21) , 13863-13895.
    3. Md Al Mamunur Rashid, Seul Lee, Kwang Ho Kim, Jaeoh Kim, Keunhong Jeong. Machine Learning Approach for Predicting the Hole Mobility of the Perovskite Solar Cells. Advanced Theory and Simulations 2024, 2
    4. Juscimara G. Avelino, George D. C. Cavalcanti, Rafael M. O. Cruz. Resampling strategies for imbalanced regression: a survey and empirical analysis. Artificial Intelligence Review 2024, 57 (4)
    5. Kian Jalaleddini, Dejan Jakimovski, Anisha Keshavan, Shannon McCurdy, Kelly Leyden, Ferhan Qureshi, Atiyeh Ghoreyshi, Niels Bergsland, Michael G. Dwyer, Murali Ramanathan, Bianca Weinstock‐Guttman, Ralph HB Benedict, Robert Zivadinov. Proteomic signatures of physical, cognitive, and imaging outcomes in multiple sclerosis. Annals of Clinical and Translational Neurology 2024, 11 (3) , 729-743.
    6. Syed Khasim, Hritwik Ghosh, Irfan Sadiq Rahat, Kareemulla Shaik, Manava Yesubabu. Deciphering Microorganisms through Intelligent Image Recognition: Machine Learning and Deep Learning Approaches, Challenges, and Advancements. EAI Endorsed Transactions on Internet of Things 2023, 10
    7. Sizhe Qiu, Simiao Zhao, Aidong Yang. DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates. Briefings in Bioinformatics 2023, 25 (1)
    8. Philipp Wendering, Zoran Nikoloski. Model-driven insights into the effects of temperature on metabolism. Biotechnology Advances 2023, 67 , 108203.
    9. Michal Vasina, David Kovar, Jiri Damborsky, Yun Ding, Tianjin Yang, Andrew deMello, Stanislav Mazurenko, Stavros Stavrakis, Zbynek Prokop. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnology Advances 2023, 66 , 108171.
    10. Tinghong Gao, Yong Ma, Yutao Liu, Qian Chen, Yongchao Liang, Quan Xie, Qingquan Xiao. Insights into metal glass forming ability based on data-driven analysis. Materials & Design 2023, 232 , 112129.
    11. Motoyasu Kanazawa, Tongtong Wang, Robert Skulstad, Guoyuan Li, Houxiang Zhang. Physics-data cooperative ship motion prediction with onboard wave radar for safe operations. 2023, 1-8.
    12. Yongfan Ming, Wenkang Wang, Rui Yin, Min Zeng, Li Tang, Shizhe Tang, Min Li. A review of enzyme design in catalytic stability by artificial intelligence. Briefings in Bioinformatics 2023, 24 (3)
    13. Felix Jung, Kevin Frey, David Zimmer, Timo Mühlhaus. DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences 2023, 24 (8) , 7444.
    14. Zhixin Dou, Yuqing Sun, Xukai Jiang, Xiuyun Wu, Yingjie Li, Bin Gong, Lushan Wang. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochimica et Biophysica Sinica 2023, 55 (3) , 343-355.
    15. Gang Li, Filip Buric, Jan Zrimec, Sandra Viknander, Jens Nielsen, Aleksej Zelezniak, Martin K. M. Engqvist. Learning deep representations of enzyme thermal adaptation. Protein Science 2022, 31 (12)
    16. Erika Erickson, Japheth E. Gado, Luisana Avilán, Felicia Bratti, Richard K. Brizendine, Paul A. Cox, Raj Gill, Rosie Graham, Dong-Jin Kim, Gerhard König, William E. Michener, Saroj Poudel, Kelsey J. Ramirez, Thomas J. Shakespeare, Michael Zahn, Eric S. Boyd, Christina M. Payne, Jennifer L. DuBois, Andrew R. Pickford, Gregg T. Beckham, John E. McGeehan. Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity. Nature Communications 2022, 13 (1)
    17. Yan Zhang, Feifei Guan, Guoshun Xu, Xiaoqing Liu, Yuhong Zhang, Jilu Sun, Bin Yao, Huoqing Huang, Ningfeng Wu, Jian Tian. A novel thermophilic chitinase directly mined from the marine metagenome using the deep learning tool Preoptem. Bioresources and Bioprocessing 2022, 9 (1)
    18. Marie Bieber, Wim J. C. Verhagen. A Generic Framework for Prognostics of Complex Systems. Aerospace 2022, 9 (12) , 839.
    19. Jie Xiong, Tong-Yi Zhang. Data-driven glass-forming ability criterion for bulk amorphous metals with data augmentation. Journal of Materials Science & Technology 2022, 121 , 99-104.
    20. Jin-Rong Yang, Qiang Chen, Hao Wang, Xu-Yang Hu, Ya-Min Guo, Jian-Zhong Chen. Reliable CA-(Q)SAR generation based on entropy weight optimized by grid search and correction factors. Computers in Biology and Medicine 2022, 146 , 105573.
    21. Marie Bieber, Verhagen Wim, Bruno F. Santos. The Impact of Metrics on the Choice of Prognostic Methodologies. 2022
    22. Ye Tian, Dachuan Zhang, Pengli Cai, Huikang Lin, Hao Ying, Qian-Nan Hu, Aibo Wu. Elimination of Fusarium mycotoxin deoxynivalenol (DON) via microbial and enzymatic strategies: Current status and future perspectives. Trends in Food Science & Technology 2022, 124 , 96-107.
    23. Mehdi F. Shahraki, Fereshteh F. Atanaki, Shohreh Ariaeenejad, Mohammad R. Ghaffari, Mohammad H. Norouzi‐Beirami, Morteza Maleki, Ghasem H. Salekdeh, Kaveh Kavousi. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: A case study of lipase identification. Biotechnology and Bioengineering 2022, 119 (4) , 1115-1128.
    24. Moritz Kohls, Babak Saremi, Ihsan Muchsin, Nicole Fischer, Paul Becher, Klaus Jung. A resampling strategy for studying robustness in virus detection pipelines. Computational Biology and Chemistry 2021, 94 , 107555.
    25. Junyan Li, Yuechao Wu, Yanjun Li, Ji Xiang, Bo Zheng. The Temperature Prediction of Hydro-generating Units based on Temporal Convolutional Network and Recurrent Neural Network. 2021, 8228-8233.

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    You’ve supercharged your research process with ACS and Mendeley!

    STEP 1:
    Click to create an ACS ID

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Your Mendeley pairing has expired. Please reconnect