A powerful tool for PTM discovery
In vivo, most proteins undergo some form of posttranslational modification (PTM), such as phosphorylation, acetylation, or methylation. Even the addition of a tiny chemical group to a single amino acid residue can dramatically affect protein structure and function. However, the multitude of potential PTMs and modification sites greatly complicates MS proteomics analyses. Researchers need a method to filter spurious results from genuine posttranslationally modified peptides. In research published in JPR (2008, 1, 170–181), Stephen Tanner and co-workers at the University of California San Diego and the Oregon Health and Science University introduce PTMFinder, a tool for the automated discovery of PTM sites.
To identify posttranslationally modified peptides, scientists usually search large protein databases for sequences that correspond with peptide mass spectra. Before the search, researchers typically must specify a list of allowed PTMs. Although hundreds of PTMs are known, it’s likely that many more remain to be identified, so this “restrictive search” approach may overlook biologically important PTMs. In contrast, unrestrictive database searches permit the search algorithm to consider any PTM within a certain size limit.
Although unrestrictive searches could greatly enhance PTM discovery, these analyses are limited by the massive amount of data generated and by the time required to manually validate each PTM proposed by the search algorithm. Unrestrictive searches with large proteomics data sets that contain millions of mass spectra often result in numerous false peptide identifications. In particular, many spectrum annotations are generated that identify the correct peptide but the incorrect PTM position or mass (“δ-correct annotations”).
Manual inspection of each annotation is an extremely time-consuming, painstaking task. So, Tanner and co-workers developed the PTMFinder algorithm as an automated tool to improve the accuracy of PTM identifications. “Once you have the unrestrictive database search results, you feed them into PTMFinder, which winnows [down] the annotations to find the most reliable ones,” says Tanner. “PTMFinder does a lot of the straightforward filtering so that you don’t have to look spectrum by spectrum; you can look modification by modification.”
Instead of focusing on the accuracies of individual spectra, PTMFinder integrates data from multiple spectra of the same peptide to construct a consensus spectrum. Each peptide annotation is compared with the consensus spectrum and assigned a score that reflects the probable accuracy of the annotation. Some PTM sites can be found in multiple, overlapping peptides, which provide independent confirmation of the PTM. PTMFinder filters out implausible PTMs and resolves most δ-correct annotations, greatly reducing the size and complexity of the unrestrictive database search results.
Another key feature of PTMFinder is that the program estimates a false-discovery rate, which is important for gauging the overall reliability of database search results. The false-discovery rate is computed by including a shuffled (decoy) protein database in the search. Any hits to the decoy database indicate spurious peptide identifications. PTMFinder estimates the false-discovery rate of each database search by computing the fraction of highest-scoring modified peptides that fall within the decoy database.
The researchers tested the PTMFinder tool on three data sets: 700,000 MS/MS spectra from human lens samples, 17 million spectra from a whole-cell extract of HEK293 human embryonic kidney cells, and 1.4 million spectra from the protist Dictyostelium discoideum. Tanner and co-workers used the unrestrictive MS-Alignment algorithm to search the spectra against databases. A single PTM with a maximum size of 250 Da per peptide was allowed. In a previous study, the researchers identified PTMs of lens proteins by manual validation. Here, PTMFinder automatically identified the same PTMs as well as PTMs that were uncharacterized previously.
PTM sites identified by PTMFinder from the HEK293 data set were compared with annotated sites in the Human Protein Reference Database (HPRD) and in the UniProt database. In addition to 933 PTM sites that were confirmed in one or both of the reference databases, PTMFinder identified hundreds of previously uncharacterized PTM sites from the HEK293 data set. Most of the PTMs were phosphorylations, acetylations, or methylations.
Intriguingly, the Dictyostelium database search revealed several PTM sites that were conserved between protein orthologs in the protist and humans. “It was very exciting to find common modifications in proteins from humans and Dictyostelium, two species that are separated by vast evolutionary distance,” says Tanner. “This result suggests that in addition to looking at phylogenies of protein sequences, you could look at phylogenies of posttranslational modifications.”
Tanner and co-workers’ findings indicate that our current knowledge of PTMs is a mere drop in the bucket compared with the deluge of modifications that can occur in cells. The function and evolutionary conservation of many of these PTMs remain mysterious. Furthermore, very little is known about how PTMs vary among different human tissues and disease states. By simplifying PTM discovery from large proteomics data sets, PTMFinder should prove a valuable tool in the quest to map the dynamic human proteome.
Advertisements



