Because you are using a browser that does not support web-standards, you have been routed to the basic version of our web site. You still have access to all of the site's content, but for the full experience you need to upgrade your browser.

Recommended free, web-standard compliant browsers for Macintosh OSX users: Safari (OSX default browser) or Firefox

Recommended free, web-standard compliant browser for Windows users: Firefox

Recommended free, web-standard compliant browser for Macintosh OS9 users: Explorer 5.1.7

Journal of Proteome Research
Meeting News

Katie Cottingham reports from the U.S. HUPO Third Annual Congress-Seattle

Identifying peptides without a database

Pavel Pevzner, graduate student Nuno Bandeira, and colleagues at the University of California San Diego advocate a radical method for peptide identification that doesn’t involve a database. With so-called spectral networks, researchers can obtain sequence information, decrease the amount of noise in spectra ~12-fold, and discover posttranslational modifications (PTMs) and new proteins that are not included in current databases. The new method is rapid and can be easily applied to high-throughput experiments.

Related spectral pairs
Pavel Pevzner
It’s all relative. Related spectral pairs can be combined into a spectral network to identify peptides. Red circles represent spectra of peptides that can be identified by traditional database searches. Orange and yellow circles represent spectra of modified peptides that cannot be identified by traditional methods but are identified by spectral networks analysis.

The idea for spectral networks came from bioinformatics approaches that were originally developed for genomics. “I think there is so much unexplored synergy,” says Pevzner. He points out that proteomics scientists are grappling with issues that are similar to those faced by genomics researchers a decade ago. “Proteomics has now reached a point when it has become a very sophisticated field computationally. It would be good for the proteomics and genomics communities to come closer together and start working together,” he says.

Currently, most proteomics researchers use algorithms such as Sequest or Mascot to identify peptides. These programs match MS/MS spectra with theoretical spectra generated from sequences stored in public databases. When researchers include PTMs in the search, these algorithms are very slow because they compare each experimental spectrum against spectra derived from all possible PTMs of all possible peptides in a database. Recently, several approaches have been proposed that speed up database searches by comparing experimental spectra with a subset of the theoretical spectra, but these methods still rely on gene sequences in databases. Spectral networks, however, eliminate the need for a sequence database entirely and allow researchers to rapidly identify modified peptides without searching against all possible combinations of PTMs. In this new approach, every experimental mass spectrum is compared to all of the other experimental mass spectra to find those that are similar. These similar spectra, called spectral pairs, are aligned and then connected by relatedness into a network.

Just as their genomics counterparts compare different DNA sequences by aligning them base by base, proteomics scientists could compare spectra of different peptides by aligning them peak by peak, explains Pevzner. With spectral networks analysis, pairs of spectra that come from tryptic and nontryptic overlapping fragments or from modified and unmodified versions of the same peptide are singled out. These spectra are aligned by constructing an alignment matrix that is similar to the matrix used in the classical Smith-Waterman algorithm for DNA sequence alignment. A virtual composite spectrum is created to eliminate most of the noise observed in experimental spectra and distinguish b and y ions; these features allow researchers to accurately infer peptide sequences. Then, spectral pairs are linked into a network.

Pevzner says that network construction is like drawing a map. “As soon as you find a pair of related spectra, you can think of them as two cities connected by a road,” he says. “Then, if you construct the similarity between all paths, you have a map in which one city is connected to another and maybe to two or three, and it kind of becomes a country, what we call a spectral network.” He explains that when a peptide is connected to three other peptides in this manner, researchers obtain three independent clues about a spectrum of interest instead of only a single-peptide clue.

A key advantage of spectral networks is that they allow researchers to easily identify peptides that are not included in any existing sequence database. For example, the method has allowed Pevzner and colleagues to uncover previously unreported PTMs in proteins of the human lens. In collaboration with Karl Clauser at the Broad Institute, they have applied this method to the identification of proteins in snake venom for which no sequence database exists. In addition, alternative splice forms and short proteins that are not predicted by current gene prediction algorithms could be identified with the approach. Of special interest to Pevzner is the identification of fusion proteins in cancer. “Almost every cancer is associated with rearrangements and thus fusion proteins, but there is no tool to find these fusion proteins,” he says. Spectral networks, however, could aid researchers in the quest to find these and other proteins that exist but are not predicted from an organism’s genome.

The spectral networks program is freely available at www-cse.ucsd.edu/groups/bioinformatics/software.html.

Article Quick Search: