Efficient Marginalization to Compute Protein Posterior Probabilities from Shotgun Mass Spectrometry Data

Department of Genome Sciences, University of Washington, Seattle, Washington
Department of Genome Sciences, University of Washington, Seattle, Washington
Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, Seattle, Washington
* To whom correspondence should be addressed. E-mail: [email protected]
Cite this: J. Proteome Res. 2010, 9, 10, 5346–5357
Publication Date (Web):August 16, 2010
    The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or estimated the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors.

