Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble LearningClick to copy article linkArticle link copied!
- Japheth E. GadoJapheth E. GadoDepartment of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United StatesNational Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United StatesMore by Japheth E. Gado
- Gregg T. Beckham*Gregg T. Beckham*Email: [email protected]National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United StatesMore by Gregg T. Beckham
- Christina M. Payne*Christina M. Payne*Email: [email protected]Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United StatesMore by Christina M. Payne
Abstract

Accurate prediction of the optimal catalytic temperature (Topt) of enzymes is vital in biotechnology, as enzymes with high Topt values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting Topt was developed. TOME was trained on a normally distributed data set with a median Topt of 37 °C and less than 5% of Topt values above 85 °C, limiting the method’s predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on Topt values greater than 85 °C is nearly an order of magnitude higher than the error on values between 30 and 50 °C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high Topt values (>85 °C) by 60% and increase the overall R2 value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 32 publications.
- Xiaotao Wang, Yuwei Zong, Xuanjie Zhou, Li Xu, Wei He, Shu Quan. Artificial Intelligence-Powered Construction of a Microbial Optimal Growth Temperature Database and Its Impact on Enzyme Optimal Temperature Prediction. The Journal of Physical Chemistry B 2024, 128
(10)
, 2281-2292. https://doi.org/10.1021/acs.jpcb.3c06526
- Petr Kouba, Pavel Kohout, Faraneh Haddadi, Anton Bushuiev, Raman Samusevich, Jiri Sedlar, Jiri Damborsky, Tomas Pluskal, Josef Sivic, Stanislav Mazurenko. Machine Learning-Guided Protein Engineering. ACS Catalysis 2023, 13
(21)
, 13863-13895. https://doi.org/10.1021/acscatal.3c02743
- Japheth E. Gado, Matthew Knotts, Ada Y. Shaw, Debora Marks, Nicholas P. Gauthier, Chris Sander, Gregg T. Beckham. Machine learning prediction of enzyme optimum pH. Nature Machine Intelligence 2025, 96 https://doi.org/10.1038/s42256-025-01026-6
- Sizhe Qiu, Bozhen Hu, Jing Zhao, Weiren Xu, Aidong Yang. Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature. Briefings in Bioinformatics 2025, 26
(2)
https://doi.org/10.1093/bib/bbaf114
- Debanjan Saha, Nayan Jyoti Borah, Afrin Mamud. Molecular structure and role of microbial proteins: a delimitation employing bioinformatics techniques. 2025, 273-300. https://doi.org/10.1016/B978-0-443-31550-3.00014-7
- Jiawei Li, Lichao Sun, Yi-Xin Huo. High-Temperature Catalytic Platform Powered by Thermophilic Microorganisms and Thermozymes. Synthetic Biology and Engineering 2025, 3
(1)
, 10001-10001. https://doi.org/10.70322/sbe.2025.10001
- Rohan Ali, Yifei Zhang. Machine learning meets enzyme engineering: examples in the design of polyethylene terephthalate hydrolases. Frontiers of Chemical Science and Engineering 2024, 18
(12)
https://doi.org/10.1007/s11705-024-2500-7
- Yi-Cheng Wu, Lei Yan, Jin-Feng Liu, Hai Qiu, Bo Deng, Dong-Peng Wang, Rong-Hao Shi, Yong Chen, Peng-Fei Guan. Data-driven glass-forming ability for Fe-based amorphous alloys. Materials Today Communications 2024, 40 , 109440. https://doi.org/10.1016/j.mtcomm.2024.109440
- Md Al Mamunur Rashid, Seul Lee, Kwang Ho Kim, Jaeoh Kim, Keunhong Jeong. Machine Learning Approach for Predicting the Hole Mobility of the Perovskite Solar Cells. Advanced Theory and Simulations 2024, 7
(6)
https://doi.org/10.1002/adts.202300978
- Yinyin Cao, Boyu Qiu, Xiao Ning, Lin Fan, Yanmei Qin, Dong Yu, Chunhe Yang, Hongwu Ma, Xiaoping Liao, Chun You. Enhancing Machine-Learning Prediction of Enzyme Catalytic Temperature Optima through Amino Acid Conservation Analysis. International Journal of Molecular Sciences 2024, 25
(11)
, 6252. https://doi.org/10.3390/ijms25116252
- Juscimara G. Avelino, George D. C. Cavalcanti, Rafael M. O. Cruz. Resampling strategies for imbalanced regression: a survey and empirical analysis. Artificial Intelligence Review 2024, 57
(4)
https://doi.org/10.1007/s10462-024-10724-3
- Kian Jalaleddini, Dejan Jakimovski, Anisha Keshavan, Shannon McCurdy, Kelly Leyden, Ferhan Qureshi, Atiyeh Ghoreyshi, Niels Bergsland, Michael G. Dwyer, Murali Ramanathan, Bianca Weinstock‐Guttman, Ralph HB Benedict, Robert Zivadinov. Proteomic signatures of physical, cognitive, and imaging outcomes in multiple sclerosis. Annals of Clinical and Translational Neurology 2024, 11
(3)
, 729-743. https://doi.org/10.1002/acn3.51996
- Syed Khasim, Hritwik Ghosh, Irfan Sadiq Rahat, Kareemulla Shaik, Manava Yesubabu. Deciphering Microorganisms through Intelligent Image Recognition: Machine Learning and Deep Learning Approaches, Challenges, and Advancements. EAI Endorsed Transactions on Internet of Things 2023, 10 https://doi.org/10.4108/eetiot.4484
- Sizhe Qiu, Simiao Zhao, Aidong Yang. DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates. Briefings in Bioinformatics 2023, 25
(1)
https://doi.org/10.1093/bib/bbad506
- Philipp Wendering, Zoran Nikoloski. Model-driven insights into the effects of temperature on metabolism. Biotechnology Advances 2023, 67 , 108203. https://doi.org/10.1016/j.biotechadv.2023.108203
- Michal Vasina, David Kovar, Jiri Damborsky, Yun Ding, Tianjin Yang, Andrew deMello, Stanislav Mazurenko, Stavros Stavrakis, Zbynek Prokop. In-depth analysis of biocatalysts by microfluidics: An emerging source of data for machine learning. Biotechnology Advances 2023, 66 , 108171. https://doi.org/10.1016/j.biotechadv.2023.108171
- Tinghong Gao, Yong Ma, Yutao Liu, Qian Chen, Yongchao Liang, Quan Xie, Qingquan Xiao. Insights into metal glass forming ability based on data-driven analysis. Materials & Design 2023, 232 , 112129. https://doi.org/10.1016/j.matdes.2023.112129
- Motoyasu Kanazawa, Tongtong Wang, Robert Skulstad, Guoyuan Li, Houxiang Zhang. Physics-data cooperative ship motion prediction with onboard wave radar for safe operations. 2023, 1-8. https://doi.org/10.1109/ISIE51358.2023.10228113
- Yongfan Ming, Wenkang Wang, Rui Yin, Min Zeng, Li Tang, Shizhe Tang, Min Li. A review of enzyme design in catalytic stability by artificial intelligence. Briefings in Bioinformatics 2023, 24
(3)
https://doi.org/10.1093/bib/bbad065
- Felix Jung, Kevin Frey, David Zimmer, Timo Mühlhaus. DeepSTABp: A Deep Learning Approach for the Prediction of Thermal Protein Stability. International Journal of Molecular Sciences 2023, 24
(8)
, 7444. https://doi.org/10.3390/ijms24087444
- Zhixin Dou, Yuqing Sun, Xukai Jiang, Xiuyun Wu, Yingjie Li, Bin Gong, Lushan Wang. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochimica et Biophysica Sinica 2023, 55
(3)
, 343-355. https://doi.org/10.3724/abbs.2023033
- Gang Li, Filip Buric, Jan Zrimec, Sandra Viknander, Jens Nielsen, Aleksej Zelezniak, Martin K. M. Engqvist. Learning deep representations of enzyme thermal adaptation. Protein Science 2022, 31
(12)
https://doi.org/10.1002/pro.4480
- Erika Erickson, Japheth E. Gado, Luisana Avilán, Felicia Bratti, Richard K. Brizendine, Paul A. Cox, Raj Gill, Rosie Graham, Dong-Jin Kim, Gerhard König, William E. Michener, Saroj Poudel, Kelsey J. Ramirez, Thomas J. Shakespeare, Michael Zahn, Eric S. Boyd, Christina M. Payne, Jennifer L. DuBois, Andrew R. Pickford, Gregg T. Beckham, John E. McGeehan. Sourcing thermotolerant poly(ethylene terephthalate) hydrolase scaffolds from natural diversity. Nature Communications 2022, 13
(1)
https://doi.org/10.1038/s41467-022-35237-x
- Yan Zhang, Feifei Guan, Guoshun Xu, Xiaoqing Liu, Yuhong Zhang, Jilu Sun, Bin Yao, Huoqing Huang, Ningfeng Wu, Jian Tian. A novel thermophilic chitinase directly mined from the marine metagenome using the deep learning tool Preoptem. Bioresources and Bioprocessing 2022, 9
(1)
https://doi.org/10.1186/s40643-022-00543-1
- Marie Bieber, Wim J. C. Verhagen. A Generic Framework for Prognostics of Complex Systems. Aerospace 2022, 9
(12)
, 839. https://doi.org/10.3390/aerospace9120839
- Jie Xiong, Tong-Yi Zhang. Data-driven glass-forming ability criterion for bulk amorphous metals with data augmentation. Journal of Materials Science & Technology 2022, 121 , 99-104. https://doi.org/10.1016/j.jmst.2021.12.056
- Jin-Rong Yang, Qiang Chen, Hao Wang, Xu-Yang Hu, Ya-Min Guo, Jian-Zhong Chen. Reliable CA-(Q)SAR generation based on entropy weight optimized by grid search and correction factors. Computers in Biology and Medicine 2022, 146 , 105573. https://doi.org/10.1016/j.compbiomed.2022.105573
- Marie Bieber, Verhagen Wim, Bruno F. Santos. The Impact of Metrics on the Choice of Prognostic Methodologies. 2022https://doi.org/10.2514/6.2022-3966
- Ye Tian, Dachuan Zhang, Pengli Cai, Huikang Lin, Hao Ying, Qian-Nan Hu, Aibo Wu. Elimination of Fusarium mycotoxin deoxynivalenol (DON) via microbial and enzymatic strategies: Current status and future perspectives. Trends in Food Science & Technology 2022, 124 , 96-107. https://doi.org/10.1016/j.tifs.2022.04.002
- Mehdi F. Shahraki, Fereshteh F. Atanaki, Shohreh Ariaeenejad, Mohammad R. Ghaffari, Mohammad H. Norouzi‐Beirami, Morteza Maleki, Ghasem H. Salekdeh, Kaveh Kavousi. A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: A case study of lipase identification. Biotechnology and Bioengineering 2022, 119
(4)
, 1115-1128. https://doi.org/10.1002/bit.28037
- Moritz Kohls, Babak Saremi, Ihsan Muchsin, Nicole Fischer, Paul Becher, Klaus Jung. A resampling strategy for studying robustness in virus detection pipelines. Computational Biology and Chemistry 2021, 94 , 107555. https://doi.org/10.1016/j.compbiolchem.2021.107555
- Junyan Li, Yuechao Wu, Yanjun Li, Ji Xiang, Bo Zheng. The Temperature Prediction of Hydro-generating Units based on Temporal Convolutional Network and Recurrent Neural Network. 2021, 8228-8233. https://doi.org/10.23919/CCC52363.2021.9549853
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.