Sequence-Based Prediction of Protein–Carbohydrate Binding Sites Using Support Vector MachinesClick to copy article linkArticle link copied!
Abstract
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew’s correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH.
Cited By
This article is cited by 56 publications.
- Parth Bibekar, Lucien Krapp, Matteo Dal Peraro. PeSTo-Carbs: Geometric Deep Learning for Prediction of Protein–Carbohydrate Binding Interfaces. Journal of Chemical Theory and Computation 2024, 20
(8)
, 2985-2991. https://doi.org/10.1021/acs.jctc.3c01145
- Can Wang, Xianqin Lu, Jia Gao, Xuezhi Li, Jian Zhao. Xylo-oligosaccharides Inhibit Enzymatic Hydrolysis by Influencing Enzymatic Activity of Cellulase from Penicillium oxalicum. Energy & Fuels 2018, 32
(9)
, 9427-9437. https://doi.org/10.1021/acs.energyfuels.8b01424
- Zijuan Zhao, Zhenling Peng, Jianyi Yang. Improving Sequence-Based Prediction of Protein–Peptide Binding Residues by Introducing Intrinsic Disorder and a Consensus Method. Journal of Chemical Information and Modeling 2018, 58
(7)
, 1459-1468. https://doi.org/10.1021/acs.jcim.8b00019
- Anna Carbery, Martin Buttenschoen, Rachael Skyner, Frank von Delft, Charlotte M. Deane. Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures. Journal of Cheminformatics 2024, 16
(1)
https://doi.org/10.1186/s13321-024-00821-4
- Xinheng He, Lifen Zhao, Yinping Tian, Rui Li, Qinyu Chu, Zhiyong Gu, Mingyue Zheng, Yusong Wang, Shaoning Li, Hualiang Jiang, Yi Jiang, Liuqing Wen, Dingyan Wang, Xi Cheng. Highly accurate carbohydrate-binding site prediction with DeepGlycanSite. Nature Communications 2024, 15
(1)
https://doi.org/10.1038/s41467-024-49516-2
- Shima Shafiee, Abdolhossein Fathi, Ghazaleh Taherzadeh. DP-site: A dual deep learning-based method for protein-peptide interaction site prediction. Methods 2024, 229 , 17-29. https://doi.org/10.1016/j.ymeth.2024.06.001
- Qianmu Yuan, Chong Tian, Yuedong Yang. Genome-scale annotation of protein binding sites via language model and geometric deep learning. eLife 2024, 13 https://doi.org/10.7554/eLife.93695.3
- Qianmu Yuan, Chong Tian, Yuedong Yang. Genome-scale annotation of protein binding sites via language model and geometric deep learning. eLife 2024, 13 https://doi.org/10.7554/eLife.93695
- Qianmu Yuan, Chong Tian, Yuedong Yang. Genome-scale annotation of protein binding sites via language model and geometric deep learning. 2024https://doi.org/10.7554/eLife.93695.2
- Qianmu Yuan, Chong Tian, Yuedong Yang. Genome-scale annotation of protein binding sites via language model and geometric deep learning. 2024https://doi.org/10.7554/eLife.93695.1
- Samuel W. Canner, Sudhanshu Shanker, Jeffrey J. Gray. Structure-based neural network protein–carbohydrate interaction predictions at the residue level. Frontiers in Bioinformatics 2023, 3 https://doi.org/10.3389/fbinf.2023.1186531
- Shima Shafiee, Abdolhossein Fathi, Ghazaleh Taherzadeh. SPPPred: Sequence-Based Protein-Peptide Binding Residue Prediction Using Genetic Programming and Ensemble Learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2023, 20
(3)
, 2029-2040. https://doi.org/10.1109/TCBB.2022.3230540
- Wei Wang, Bin Sun, MengXue Yu, ShiYu Wu, Dong Liu, HongJun Zhang, Yun Zhou. GraphPLBR: Protein-Ligand Binding Residue Prediction With Deep Graph Convolution Network. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2023, 20
(3)
, 2223-2232. https://doi.org/10.1109/TCBB.2023.3239983
- Rewati Dixit, Khushal Khambhati, Kolli Venkata Supraja, Vijai Singh, Franziska Lederer, Pau-Loke Show, Mukesh Kumar Awasthi, Abhinav Sharma, Rohan Jain. Application of machine learning on understanding biomolecule interactions in cellular machinery. Bioresource Technology 2023, 370 , 128522. https://doi.org/10.1016/j.biortech.2022.128522
- Maciej Staszak, Katarzyna Staszak. In silico approaches for carbohydrates. 2023, 129-155. https://doi.org/10.1016/B978-0-323-90995-2.00005-9
- Wajid Arshad Abbasi, Asma Anjam, Sadia Khalil, Saiqa Andleeb, Maryum Bibi, Syed Ali Abbas. COYOTE: Sequence-derived structural descriptors-based computational identification of glycoproteins. Journal of Bioinformatics and Computational Biology 2022, 20
(05)
https://doi.org/10.1142/S0219720022500196
- Kavipriya Gananathan, Manjula Dhanabalachandran, Vijayan Sugumaran. Chronological Order Based Wrapper Technique for Drug-Target Interaction
Prediction (CO-WT DTI). Current Bioinformatics 2022, 17
(6)
, 541-557. https://doi.org/10.2174/1574893617666220509185052
- Aida Tayebi, Niloofar Yousefi, Mehdi Yazdani-Jahromi, Elayaraja Kolanthai, Craig Neal, Sudipta Seal, Ozlem Garibay. UnbiasedDTI: Mitigating Real-World Bias of Drug-Target Interaction Prediction by Using Deep Ensemble-Balanced Learning. Molecules 2022, 27
(9)
, 2980. https://doi.org/10.3390/molecules27092980
- Wei Yang, Zhentao Hu, Lin Zhou, Yong Jin. Protein secondary structure prediction using a lightweight convolutional network and label distribution aware margin loss. Knowledge-Based Systems 2022, 237 , 107771. https://doi.org/10.1016/j.knosys.2021.107771
- Adeel Malik, Sathiyamoorthy Subramaniyam, Chang-Bae Kim, Balachandran Manavalan. SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information. Computational and Structural Biotechnology Journal 2022, 20 , 165-174. https://doi.org/10.1016/j.csbj.2021.12.014
- Jianwen Chen, Shuangjia Zheng, Huiying Zhao, Yuedong Yang. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. Journal of Cheminformatics 2021, 13
(1)
https://doi.org/10.1186/s13321-021-00488-1
- Cheng Chen, Han Shi, Zhiwen Jiang, Adil Salhi, Ruixin Chen, Xuefeng Cui, Bin Yu. DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network. Computers in Biology and Medicine 2021, 136 , 104676. https://doi.org/10.1016/j.compbiomed.2021.104676
- Teng-Ruei Chen, Chia-Hua Lo, Sheng-Hung Juan, Wei-Cheng Lo, . The influence of dataset homology and a rigorous evaluation strategy on protein secondary structure prediction. PLOS ONE 2021, 16
(7)
, e0254555. https://doi.org/10.1371/journal.pone.0254555
- Babacar Gaye, Dezheng Zhang, Aziguli Wulamu, . Improvement of Support Vector Machine Algorithm in Big Data Background. Mathematical Problems in Engineering 2021, 2021 , 1-9. https://doi.org/10.1155/2021/5594899
- Shima Shafiee, Abdolhossein Fathi. Prediction of protein–peptide-binding amino acid residues regions using machine learning algorithms. 2021, 1-6. https://doi.org/10.1109/CSICC52343.2021.9420568
- Jaykumar Jani, Anju Pappachan. Protein Analysis: From Sequence to Structure. 2021, 59-82. https://doi.org/10.1007/978-981-33-6191-1_4
- Jinyong Cheng, Ying Xu, Yunxiang Zhao. Prediction of protein secondary structure based on deep residual convolutional neural network. Biotechnology & Biotechnological Equipment 2021, 35
(1)
, 1881-1890. https://doi.org/10.1080/13102818.2022.2026815
- Zhe Sun, Shuangjia Zheng, Huiying Zhao, Zhangming Niu, Yutong Lu, Yi Pan, Yuedong Yang. To improve the predictions of binding residues with DNA, RNA, carbohydrate, and peptide via multi-task deep neural networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2021, 39 , 1-1. https://doi.org/10.1109/TCBB.2021.3118916
- Sofi Siti Shofiyah, Dewi Yuliani, Nurul Widya, Fean D. Sarian, Fernita Puspasari, Ocky Karna Radjasa, Ihsanawati, Dessy Natalia. Isolation, expression, and characterization of raw starch degrading α-amylase from a marine lake Bacillus megaterium NL3. Heliyon 2020, 6
(12)
, e05796. https://doi.org/10.1016/j.heliyon.2020.e05796
- Saeed Ahmed, Muhammad Kabir, Muhammad Arif, Zakir Ali, Zar Nawab Khan Swati. Prediction of human phosphorylated proteins by extracting multi-perspective discriminative features from the evolutionary profile and physicochemical properties through LFDA. Chemometrics and Intelligent Laboratory Systems 2020, 203 , 104066. https://doi.org/10.1016/j.chemolab.2020.104066
- Maxim Shapovalov, Roland L. Dunbrack, Slobodan Vucetic, . Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction. PLOS ONE 2020, 15
(5)
, e0232528. https://doi.org/10.1371/journal.pone.0232528
- S.M. Hasan Mahmud, Wenyu Chen, Han Meng, Hosney Jahan, Yongsheng Liu, S.M. Mamun Hasan. Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Analytical Biochemistry 2020, 589 , 113507. https://doi.org/10.1016/j.ab.2019.113507
- Suraj Gattani, Avdesh Mishra, Md Tamjidul Hoque. StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence. Carbohydrate Research 2019, 486 , 107857. https://doi.org/10.1016/j.carres.2019.107857
- Ghazaleh Taherzadeh, Abdollah Dehzangi, Maryam Golchin, Yaoqi Zhou, Matthew P Campbell, . SPRINT-Gly: predicting
N-
and
O-
linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics 2019, 35
(20)
, 4140-4146. https://doi.org/10.1093/bioinformatics/btz215
- Alok Sharma, Artem Lysenko, Yosvany López, Abdollah Dehzangi, Ronesh Sharma, Hamendra Reddy, Abdul Sattar, Tatsuhiko Tsunoda. HseSUMO: Sumoylation site prediction using half-sphere exposures of amino acids residues. BMC Genomics 2019, 19
(S9)
https://doi.org/10.1186/s12864-018-5206-8
- Abel Avitesh Chandra, Alok Sharma, Abdollah Dehzangi, Tatushiko Tsunoda. EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction. BMC Genomics 2019, 19
(S9)
https://doi.org/10.1186/s12864-018-5383-5
- Farshid Rayhan, Sajid Ahmed, Dewan Md Farid, Abdollah Dehzangi, Swakkhar Shatabda. CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction. Journal of Theoretical Biology 2019, 464 , 1-8. https://doi.org/10.1016/j.jtbi.2018.12.024
- Hamendra Manhar Reddy, Alok Sharma, Abdollah Dehzangi, Daichi Shigemizu, Abel Avitesh Chandra, Tatushiko Tsunoda. GlyStruct: glycation prediction using structural properties of amino acid residues. BMC Bioinformatics 2019, 19
(S13)
https://doi.org/10.1186/s12859-018-2547-x
- Michael Flot, Avdesh Mishra, Aditi Sharma Kuchi, Md Tamjidul Hoque. StackSSSPred: A Stacking-Based Prediction of Supersecondary Structure from Sequence. 2019, 101-122. https://doi.org/10.1007/978-1-4939-9161-7_5
- Vineet Singh, Alok Sharma, Abel Chandra, Abdollah Dehzangi, Daichi Shigemizu, Tatsuhiko Tsunoda. Computational Prediction of Lysine Pupylation Sites in Prokaryotic Proteins Using Position Specific Scoring Matrix into Bigram for Feature Extraction. 2019, 488-500. https://doi.org/10.1007/978-3-030-29894-4_39
- Adeel Malik, Mohammad H. Baig, Balachandran Manavalan. Protein-Carbohydrate Interactions. 2019, 666-677. https://doi.org/10.1016/B978-0-12-809633-8.20661-4
- S. M. Hasan Mahmud, Wenyu Chen, Hosney Jahan, Yongsheng Liu, Nasir Islam Sujan, Saeed Ahmed. iDTi-CSsmoteB: Identification of Drug–Target Interaction Based on Drug Chemical Structure and Protein Sequence Using XGBoost With Over-Sampling Technique SMOTE. IEEE Access 2019, 7 , 48699-48714. https://doi.org/10.1109/ACCESS.2019.2910277
- Joe Tiralongo, Oren Cooper, Tom Litfin, Yuedong Yang, Rebecca King, Jian Zhan, Huiying Zhao, Nicolai Bovin, Christopher J. Day, Yaoqi Zhou. YesU from Bacillus subtilis preferentially binds fucosylated glycans. Scientific Reports 2018, 8
(1)
https://doi.org/10.1038/s41598-018-31241-8
- Abel Chandra, Alok Sharma, Abdollah Dehzangi, Shoba Ranganathan, Anjeela Jokhan, Kuo-Chen Chou, Tatsuhiko Tsunoda. PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids. Scientific Reports 2018, 8
(1)
https://doi.org/10.1038/s41598-018-36203-8
- Abdollah Dehzangi, Yosvany López, Ghazaleh Taherzadeh, Alok Sharma, Tatsuhiko Tsunoda. SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure. Molecules 2018, 23
(12)
, 3260. https://doi.org/10.3390/molecules23123260
- Huiying Zhao, Ghazaleh Taherzadeh, Yaoqi Zhou, Yuedong Yang. Computational Prediction of Carbohydrate‐Binding Proteins and Binding Sites. Current Protocols in Protein Science 2018, 94
(1)
https://doi.org/10.1002/cpps.75
- Ghazaleh Taherzadeh, Yuedong Yang, Haodong Xu, Yu Xue, Alan Wee‐Chung Liew, Yaoqi Zhou. Predicting lysine‐malonylation sites of proteins using sequence and predicted structural features. Journal of Computational Chemistry 2018, 39
(22)
, 1757-1763. https://doi.org/10.1002/jcc.25353
- Jessica Poole, Christopher J. Day, Mark von Itzstein, James C. Paton, Michael P. Jennings. Glycointeractions in bacterial pathogenesis. Nature Reviews Microbiology 2018, 16
(7)
, 440-452. https://doi.org/10.1038/s41579-018-0007-2
- Md. Raihan Uddin, Alok Sharma, Dewan Md Farid, Md. Mahmudur Rahman, Abdollah Dehzangi, Swakkhar Shatabda. EvoStruct-Sub: An accurate Gram-positive protein subcellular localization predictor using evolutionary and structural features. Journal of Theoretical Biology 2018, 443 , 138-146. https://doi.org/10.1016/j.jtbi.2018.02.002
- Ghazaleh Taherzadeh, Yaoqi Zhou, Alan Wee-Chung Liew, Yuedong Yang, . Structure-based prediction of protein– peptide binding regions using Random Forest. Bioinformatics 2018, 34
(3)
, 477-484. https://doi.org/10.1093/bioinformatics/btx614
- Yosvany López, Alok Sharma, Abdollah Dehzangi, Sunil Pranit Lal, Ghazaleh Taherzadeh, Abdul Sattar, Tatsuhiko Tsunoda. Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction. BMC Genomics 2018, 19
(S1)
https://doi.org/10.1186/s12864-017-4336-8
- Farshid Rayhan, Sajid Ahmed, Swakkhar Shatabda, Dewan Md Farid, Zaynab Mousavian, Abdollah Dehzangi, M. Sohel Rahman. iDTI-ESBoost: Identification of Drug Target Interaction Using Evolutionary and Structural Features with Boosting. Scientific Reports 2017, 7
(1)
https://doi.org/10.1038/s41598-017-18025-2
- Chao Fang, Yi Shang, Dong Xu. A New Deep Neighbor Residual Network for Protein Secondary Structure Prediction. 2017, 66-71. https://doi.org/10.1109/ICTAI.2017.00022
- Laercio Pol-Fachin. Insights into the effects of glycosylation and the monosaccharide-binding activity of the plant lectin CrataBL. Glycoconjugate Journal 2017, 34
(4)
, 515-522. https://doi.org/10.1007/s10719-017-9766-7
- Abdollah Dehzangi, Yosvany López, Sunil Pranit Lal, Ghazaleh Taherzadeh, Jacob Michaelson, Abdul Sattar, Tatsuhiko Tsunoda, Alok Sharma. PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. Journal of Theoretical Biology 2017, 425 , 97-102. https://doi.org/10.1016/j.jtbi.2017.05.005
- Yosvany López, Abdollah Dehzangi, Sunil Pranit Lal, Ghazaleh Taherzadeh, Jacob Michaelson, Abdul Sattar, Tatsuhiko Tsunoda, Alok Sharma. SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. Analytical Biochemistry 2017, 527 , 24-32. https://doi.org/10.1016/j.ab.2017.03.021
- Thusitha S. Gunasekera, Loryn L. Bowen, Carol E. Zhou, Susan C. Howard-Byerly, William S. Foley, Richard C. Striebich, Larry C. Dugan, Oscar N. Ruiz, . Transcriptomic Analyses Elucidate Adaptive Differences of Closely Related Strains of Pseudomonas aeruginosa in Fuel. Applied and Environmental Microbiology 2017, 83
(10)
https://doi.org/10.1128/AEM.03249-16
- Yuedong Yang, Jianzhao Gao, Jihua Wang, Rhys Heffernan, Jack Hanson, Kuldip Paliwal, Yaoqi Zhou. Sixty-five years of the long march in protein secondary structure prediction: the final stretch?. Briefings in Bioinformatics 2016, 82(Suppl 2) , bbw129. https://doi.org/10.1093/bib/bbw129
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.