PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search
Standard proteomics workflows use tandem mass spectrometry followed by sequence database search to analyze complex biological samples. The identification of proteins carrying post-translational modifications, for example, phosphorylation, is typically addressed by allowing variable modifications in the searched sequences. Accounting for these variations exponentially increases the combinatorial space in the database, which leads to increased processing times and more false positive identifications. The here-presented tool PhoStar identifies spectra that originate from phosphorylated peptides before database search using a supervised machine learning approach. The model for the prediction of phosphorylation was trained and validated with an accuracy of 97.6% on a large set of high-confidence spectra collected from publicly available experimental data. Its power was further validated by predicting phosphorylation in the complete NIST human and mouse high collision-dissociation spectral libraries, achieving an accuracy of 98.2 and 97.9%, respectively. We demonstrate the application of PhoStar by using it for spectra filtering before database search. In database search of HeLa samples the peptide search space was reduced by 27–66% while finding at least 97% of total peptide identifications (at 1% FDR) compared with a standard workflow.
This article is cited by 7 publications.
- Jinjun Gao, Fan Yang, Jinteng Che, Yu Han, Yankun Wang, Nan Chen, Daniel W. Bak, Shuchang Lai, Xiao Xie, Eranthie Weerapana, Chu Wang. Selenium-Encoded Isotopic Signature Targeted Profiling. ACS Central Science 2018, 4
, 960-970. https://doi.org/10.1021/acscentsci.8b00112
- Daniel J. Geiszler, Daniel A. Polasky, Fengchao Yu, Alexey I. Nesvizhskii. Detecting diagnostic features in MS/MS spectra of post-translationally modified peptides. Nature Communications 2023, 14
- Heather Desaire, Eden P. Go, David Hua. Advances, obstacles, and opportunities for machine learning in proteomics. Cell Reports Physical Science 2022, 3
, 101069. https://doi.org/10.1016/j.xcrp.2022.101069
- Tom Altenburg, Sven H. Giese, Shengbo Wang, Thilo Muth, Bernhard Y. Renard. Ad hoc learning of peptide fragmentation from mass spectra enables an interpretable detection of phosphorylated and cross-linked peptides. Nature Machine Intelligence 2022, 4
, 378-388. https://doi.org/10.1038/s42256-022-00467-7
- Daniele Musiani, Enrico Massignani, Alessandro Cuomo, Avinash Yadav, Tiziana Bonaldi. Biochemical and Computational Approaches for the Large-Scale Analysis of Protein Arginine Methylation by Mass Spectrometry. Current Protein & Peptide Science 2020, 21
, 725-739. https://doi.org/10.2174/1389203721666200426232531
- Alla P. Toropova, Andrey A. Toropov. Application of the Monte Carlo Method for the Prediction of Behavior of Peptides. Current Protein & Peptide Science 2019, 20
, 1151-1157. https://doi.org/10.2174/1389203720666190123163907
- Maria Hernandez-Valladares, Rebecca Wangen, Frode S. Berven, Astrid Guldbrandsen. Protein Post-Translational Modification Crosstalk in Acute Myeloid Leukemia Calls for Action. Current Medicinal Chemistry 2019, 26
, 5317-5337. https://doi.org/10.2174/0929867326666190503164004