Predicting Enzyme Subclass by Functional Domain Composition and Pseudo Amino Acid Composition

Yu-Dong Cai§ and Kuo-Chen Chou*
Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China, Shanghai Centre for Bioinformatics Technology, 100 Qing-Zhou Road, Shanghai 200235, China, Biomolecular Sciences Department, University of Manchester Institute of Science & Technology, Manchester, M60 1QD, United Kingdom, and Gordon Life Science Institute, San Diego, California 92130
J. Proteome Res., 2005, 4 (3), pp 967–971
DOI: 10.1021/pr0500399
Publication Date (Web): April 8, 2005
Copyright © 2005 American Chemical Society

 Chinese Academy of Sciences.

,

 Shanghai Centre for Bioinformatics Technology.

,
§

 University of Manchester Institute of Science & Technology.

,

 Gordon Life Science Institute.

,
*

 To whom correspondence should be addressed. E-mail:  kchou@ san.rr.com.

Abstract

Abstract Image

As a continuous effort to use the sequence approach to identify enzymatic function at a deeper level, investigations are extended from the main enzyme classes (Protein Sci. 2004, 13, 2857−2863) to their subclasses. This is indispensable if we wish to understand the molecular mechanism of an enzyme at a deeper level. For each of the 6 main enzyme classes (i.e., oxidoreductase, transferase, hydrolase, lyase, isomerase, and ligase), a subclass training dataset is constructed. To reduce homologous bias, a stringent cutoff was imposed that all the entries included in the datasets have less than 40% sequence identity to each other. To catch the core feature that is intimately related to the biological function, the sample of a protein is represented by hybridizing the functional domain composition and pseudo amino acid composition. On the basis of such a hybridization representation, the FunD-PseAA predictor is established. It is demonstrated by the jackknife cross-validation tests that the overall success rate in identifying the 21 subclasses of oxidoreductases is above 86%, and the corresponding rates in identifying the subclasses of the other 5 main enzyme classes are 94−97%. The high success rates imply that the FunD-PseAA predictor may become a useful tool in bioinformatics and proteomics of the post-genomic era.

Keywords: ENZYME database • 40% cutoff • functional domain • pseudo amino acid composition • ISort predictor • FunD-PseAA predictor • bioinformatics • proteomics

Tools

History

  • Published In Issue June 13, 2005
  • Received February 17, 2005

Recommend & Share

Related Content

Other ACS content by these authors: