ACS Publications. Most Trusted. Most Cited. Most Read
Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning
My Activity

Figure 1Loading Img
    Bioinformatics

    Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning
    Click to copy article linkArticle link copied!

    Other Access OptionsSupporting Information (1)

    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2020, 60, 8, 4098–4107
    Click to copy citationCitation copied!
    https://doi.org/10.1021/acs.jcim.0c00489
    Published July 8, 2020
    Copyright © 2020 American Chemical Society

    Abstract

    Click to copy section linkSection link copied!
    Abstract Image

    Accurate prediction of the optimal catalytic temperature (Topt) of enzymes is vital in biotechnology, as enzymes with high Topt values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting Topt was developed. TOME was trained on a normally distributed data set with a median Topt of 37 °C and less than 5% of Topt values above 85 °C, limiting the method’s predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on Topt values greater than 85 °C is nearly an order of magnitude higher than the error on values between 30 and 50 °C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high Topt values (>85 °C) by 60% and increase the overall R2 value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.

    Copyright © 2020 American Chemical Society

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.

    Recommended

    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. Add or change your institution or let them know you’d like them to include access.

    Supporting Information

    Click to copy section linkSection link copied!

    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.0c00489.

    • Performance of resampling strategies for all hyperparameter combinations algorithms (pseudocode) for the RO, SMOTER, GN, WERCS, WERCS-GN, and REBAGG resampling strategies(Figure S1) (PDF)

    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

    Cited By

    Click to copy section linkSection link copied!
    Citation Statements
    Explore this article's citation statements on scite.ai

    This article is cited by 32 publications.

    1. Xiaotao Wang, Yuwei Zong, Xuanjie Zhou, Li Xu, Wei He, Shu Quan. Artificial Intelligence-Powered Construction of a Microbial Optimal Growth Temperature Database and Its Impact on Enzyme Optimal Temperature Prediction. The Journal of Physical Chemistry B 2024, 128 (10) , 2281-2292. https://doi.org/10.1021/acs.jpcb.3c06526
    2. Petr Kouba, Pavel Kohout, Faraneh Haddadi, Anton Bushuiev, Raman Samusevich, Jiri Sedlar, Jiri Damborsky, Tomas Pluskal, Josef Sivic, Stanislav Mazurenko. Machine Learning-Guided Protein Engineering. ACS Catalysis 2023, 13 (21) , 13863-13895. https://doi.org/10.1021/acscatal.3c02743
    3. Japheth E. Gado, Matthew Knotts, Ada Y. Shaw, Debora Marks, Nicholas P. Gauthier, Chris Sander, Gregg T. Beckham. Machine learning prediction of enzyme optimum pH. Nature Machine Intelligence 2025, 96 https://doi.org/10.1038/s42256-025-01026-6
    4. Sizhe Qiu, Bozhen Hu, Jing Zhao, Weiren Xu, Aidong Yang. Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature. Briefings in Bioinformatics 2025, 26 (2) https://doi.org/10.1093/bib/bbaf114
    5. Debanjan Saha, Nayan Jyoti Borah, Afrin Mamud. Molecular structure and role of microbial proteins: a delimitation employing bioinformatics techniques. 2025, 273-300. https://doi.org/10.1016/B978-0-443-31550-3.00014-7
    6. Jiawei Li, Lichao Sun, Yi-Xin Huo. High-Temperature Catalytic Platform Powered by Thermophilic Microorganisms and Thermozymes. Synthetic Biology and Engineering 2025, 3 (1) , 10001-10001. https://doi.org/10.70322/sbe.2025.10001
    7. Rohan Ali, Yifei Zhang. Machine learning meets enzyme engineering: examples in the design of polyethylene terephthalate hydrolases. Frontiers of Chemical Science and Engineering 2024, 18 (12) https://doi.org/10.1007/s11705-024-2500-7
    8. Yi-Cheng Wu, Lei Yan, Jin-Feng Liu, Hai Qiu, Bo Deng, Dong-Peng Wang, Rong-Hao Shi, Yong Chen, Peng-Fei Guan. Data-driven glass-forming ability for Fe-based amorphous alloys. Materials Today Communications 2024, 40 , 109440. https://doi.org/10.1016/j.mtcomm.2024.109440
    9. Md Al Mamunur Rashid, Seul Lee, Kwang Ho Kim, Jaeoh Kim, Keunhong Jeong. Machine Learning Approach for Predicting the Hole Mobility of the Perovskite Solar Cells. Advanced Theory and Simulations 2024, 7 (6) https://doi.org/10.1002/adts.202300978
    10. Yinyin Cao, Boyu Qiu, Xiao Ning, Lin Fan, Yanmei Qin, Dong Yu, Chunhe Yang, Hongwu Ma, Xiaoping Liao, Chun You. Enhancing Machine-Learning Prediction of Enzyme Catalytic Temperature Optima through Amino Acid Conservation Analysis. International Journal of Molecular Sciences 2024, 25 (11) , 6252. https://doi.org/10.3390/ijms25116252
    11. Juscimara G. Avelino, George D. C. Cavalcanti, Rafael M. O. Cruz. Resampling strategies for imbalanced regression: a survey and empirical analysis. Artificial Intelligence Review 2024, 57 (4) https://doi.org/10.1007/s10462-024-10724-3
    12. Kian Jalaleddini, Dejan Jakimovski, Anisha Keshavan, Shannon McCurdy, Kelly Leyden, Ferhan Qureshi, Atiyeh Ghoreyshi, Niels Bergsland, Michael G. Dwyer, Murali Ramanathan, Bianca Weinstock‐Guttman, Ralph HB Benedict, Robert Zivadinov. Proteomic signatures of physical, cognitive, and imaging outcomes in multiple sclerosis. Annals of Clinical and Translational Neurology 2024, 11 (3) , 729-743. https://doi.org/10.1002/acn3.51996
    13. Syed Khasim, Hritwik Ghosh, Irfan Sadiq Rahat, Kareemulla Shaik, Manava Yesubabu. Deciphering Microorganisms through Intelligent Image Recognition: Machine Learning and Deep Learning Approaches, Challenges, and Advancements. EAI Endorsed Transactions on Internet of Things 2023, 10 https://doi.org/10.4108/eetiot.4484
    14. Sizhe Qiu, Simiao Zhao, Aidong Yang. DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates. Briefings in Bioinformatics 2023, 25 (1) https://doi.org/10.1093/bib/bbad506
    15. Philipp Wendering, Zoran Nikoloski. Model-driven insights into the effects of temperature on metabolism. Biotechnology Advances 2023, 67 , 108203. https://doi.org/10.1016/j.biotechadv.2023.108203
    16. Michal Vasina, David Kovar, Jiri Damborsky, Yun Ding, Tianjin Yang, Andrew deMello, Stanislav Mazurenko, Stavros Stavrakis, Zbynek Prokop. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnology Advances 2023, 66 , 108171. https://doi.org/10.1016/j.biotechadv.2023.108171
    17. Tinghong Gao, Yong Ma, Yutao Liu, Qian Chen, Yongchao Liang, Quan Xie, Qingquan Xiao. Insights into metal glass forming ability based on data-driven analysis. Materials & Design 2023, 232 , 112129. https://doi.org/10.1016/j.matdes.2023.112129
    18. Motoyasu Kanazawa, Tongtong Wang, Robert Skulstad, Guoyuan Li, Houxiang Zhang. Physics-data cooperative ship motion prediction with onboard wave radar for safe operations. 2023, 1-8. https://doi.org/10.1109/ISIE51358.2023.10228113
    19. Yongfan Ming, Wenkang Wang, Rui Yin, Min Zeng, Li Tang, Shizhe Tang, Min Li. A review of enzyme design in catalytic stability by artificial intelligence. Briefings in Bioinformatics 2023, 24 (3) https://doi.org/10.1093/bib/bbad065
    20. Felix Jung, Kevin Frey, David Zimmer, Timo Mühlhaus. DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences 2023, 24 (8) , 7444. https://doi.org/10.3390/ijms24087444
    21. Zhixin Dou, Yuqing Sun, Xukai Jiang, Xiuyun Wu, Yingjie Li, Bin Gong, Lushan Wang. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochimica et Biophysica Sinica 2023, 55 (3) , 343-355. https://doi.org/10.3724/abbs.2023033
    22. Gang Li, Filip Buric, Jan Zrimec, Sandra Viknander, Jens Nielsen, Aleksej Zelezniak, Martin K. M. Engqvist. Learning deep representations of enzyme thermal adaptation. Protein Science 2022, 31 (12) https://doi.org/10.1002/pro.4480
    23. Erika Erickson, Japheth E. Gado, Luisana Avilán, Felicia Bratti, Richard K. Brizendine, Paul A. Cox, Raj Gill, Rosie Graham, Dong-Jin Kim, Gerhard König, William E. Michener, Saroj Poudel, Kelsey J. Ramirez, Thomas J. Shakespeare, Michael Zahn, Eric S. Boyd, Christina M. Payne, Jennifer L. DuBois, Andrew R. Pickford, Gregg T. Beckham, John E. McGeehan. Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity. Nature Communications 2022, 13 (1) https://doi.org/10.1038/s41467-022-35237-x
    24. Yan Zhang, Feifei Guan, Guoshun Xu, Xiaoqing Liu, Yuhong Zhang, Jilu Sun, Bin Yao, Huoqing Huang, Ningfeng Wu, Jian Tian. A novel thermophilic chitinase directly mined from the marine metagenome using the deep learning tool Preoptem. Bioresources and Bioprocessing 2022, 9 (1) https://doi.org/10.1186/s40643-022-00543-1
    25. Marie Bieber, Wim J. C. Verhagen. A Generic Framework for Prognostics of Complex Systems. Aerospace 2022, 9 (12) , 839. https://doi.org/10.3390/aerospace9120839
    26. Jie Xiong, Tong-Yi Zhang. Data-driven glass-forming ability criterion for bulk amorphous metals with data augmentation. Journal of Materials Science & Technology 2022, 121 , 99-104. https://doi.org/10.1016/j.jmst.2021.12.056
    27. Jin-Rong Yang, Qiang Chen, Hao Wang, Xu-Yang Hu, Ya-Min Guo, Jian-Zhong Chen. Reliable CA-(Q)SAR generation based on entropy weight optimized by grid search and correction factors. Computers in Biology and Medicine 2022, 146 , 105573. https://doi.org/10.1016/j.compbiomed.2022.105573
    28. Marie Bieber, Verhagen Wim, Bruno F. Santos. The Impact of Metrics on the Choice of Prognostic Methodologies. 2022https://doi.org/10.2514/6.2022-3966
    29. Ye Tian, Dachuan Zhang, Pengli Cai, Huikang Lin, Hao Ying, Qian-Nan Hu, Aibo Wu. Elimination of Fusarium mycotoxin deoxynivalenol (DON) via microbial and enzymatic strategies: Current status and future perspectives. Trends in Food Science & Technology 2022, 124 , 96-107. https://doi.org/10.1016/j.tifs.2022.04.002
    30. Mehdi F. Shahraki, Fereshteh F. Atanaki, Shohreh Ariaeenejad, Mohammad R. Ghaffari, Mohammad H. Norouzi‐Beirami, Morteza Maleki, Ghasem H. Salekdeh, Kaveh Kavousi. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: A case study of lipase identification. Biotechnology and Bioengineering 2022, 119 (4) , 1115-1128. https://doi.org/10.1002/bit.28037
    31. Moritz Kohls, Babak Saremi, Ihsan Muchsin, Nicole Fischer, Paul Becher, Klaus Jung. A resampling strategy for studying robustness in virus detection pipelines. Computational Biology and Chemistry 2021, 94 , 107555. https://doi.org/10.1016/j.compbiolchem.2021.107555
    32. Junyan Li, Yuechao Wu, Yanjun Li, Ji Xiang, Bo Zheng. The Temperature Prediction of Hydro-generating Units based on Temporal Convolutional Network and Recurrent Neural Network. 2021, 8228-8233. https://doi.org/10.23919/CCC52363.2021.9549853

    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2020, 60, 8, 4098–4107
    Click to copy citationCitation copied!
    https://doi.org/10.1021/acs.jcim.0c00489
    Published July 8, 2020
    Copyright © 2020 American Chemical Society

    Article Views

    2246

    Altmetric

    -

    Citations

    Learn about these metrics

    Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

    Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

    The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.