Robust Method for Proteome Analysis by MS/MS Using an Entire Translated Genome:  Demonstration on the Ciliome of Tetrahymena thermophila

Department of Chemistry and Centre for Research in Mass Spectrometry, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3, Department of Botany, University of Toronto, Toronto, Ontario, Canada M5S 3B2, and Department of Biology, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3
Cite this: J. Proteome Res. 2005, 4, 3, 909–919
Publication Date (Web):April 9, 2005
Copyright © 2005 American Chemical Society

    To improve the utility of increasingly large numbers of available unannotated and initially poorly annotated genomic sequences for proteome analysis, we demonstrate that effective protein identification can be made on a large and unannotated genome. The strategy developed is to translate the unannotated genome sequence into amino acid sequence encoding putative proteins in all six reading frames, to identify peptides by tandem mass spectrometry (MS/MS), to localize them on the genome sequence, and to preliminarily annotate the protein via a similarity search by BLAST. These tasks have been optimized and automated. Optimization to obtain multiple peptide matches in effect extends the searchable region and results in more robust protein identification. The viability of this strategy is demonstrated with the identification of 223 cilia proteins in the unicellular eukaryotic model organism Tetrahymena thermophila, whose initial genomic sequence draft was released in November 2003. To the best of our knowledge, this is the first demonstration of large-scale protein identification based on such a large, unannotated genome. Of the 223 cilia proteins, 84 have no similarity to proteins in NCBI's nonredundant (nr) database. This methodology allows identifying the locations of the genes encoding these novel proteins, which is a necessary first step to downstream functional genomic experimentation.

    Keywords: shotgun proteomic analysis • entire unannotated genome-sequence translation • Tetrahymena thermophila cilia • BLAST analysis • 223 cilia proteins • multiple-peptide matches

     To whom correspondence should be addressed. Tel:  (416) 650-8021. Fax:  (416) 736-5936. E-mail:  [email protected].

    Table 1S, complete analysis data on the 223 proteins in the T. thermophila ciliome. Table 2S, additional information from searching against yeast, EST, and PGP databases. This material is available free of charge via the Internet at

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system:

