pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics
- Wen-Jing ZhouWen-Jing ZhouKey Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190University of Chinese Academy of Sciences, Beijing, China 100049More by Wen-Jing Zhou
- ,
- Hao YangHao YangKey Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190University of Chinese Academy of Sciences, Beijing, China 100049More by Hao Yang
- ,
- Wen-Feng ZengWen-Feng ZengKey Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190University of Chinese Academy of Sciences, Beijing, China 100049More by Wen-Feng Zeng
- ,
- Kun ZhangKun ZhangKey Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190University of Chinese Academy of Sciences, Beijing, China 100049More by Kun Zhang
- ,
- Hao Chi*Hao Chi*H.C. e-mail: [email protected]Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190University of Chinese Academy of Sciences, Beijing, China 100049More by Hao Chi
- , and
- Si-Min He*Si-Min He*S.-M.H. e-mail: [email protected]Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190University of Chinese Academy of Sciences, Beijing, China 100049More by Si-Min He
Abstract

As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.
Cited By
This article is cited by 11 publications.
- Zhen-Lin Chen, Peng-Zhi Mao, Wen-Feng Zeng, Hao Chi, Si-Min He. pDeepXL: MS/MS Spectrum Prediction for Cross-Linked Peptide Pairs by Deep Learning. Journal of Proteome Research 2021, 20
(5)
, 2570-2582. https://doi.org/10.1021/acs.jproteome.0c01004
- Jinghan Yang, Zhiqiang Gao, Xiuhan Ren, Jie Sheng, Ping Xu, Cheng Chang, Yan Fu. DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning. Analytical Chemistry 2021, 93
(15)
, 6094-6103. https://doi.org/10.1021/acs.analchem.0c04704
- Ching Tarn, Wen-Feng Zeng. pDeep3: Toward More Accurate Spectrum Prediction with Fast Few-Shot Learning. Analytical Chemistry 2021, 93
(14)
, 5815-5822. https://doi.org/10.1021/acs.analchem.0c05427
- Timothy Aaron Wiles, Laura M. Saba, Thomas Delong. Peptide−Spectrum Match Validation with Internal Standards (P−VIS): Internally-Controlled Validation of Mass Spectrometry-Based Peptide Identifications. Journal of Proteome Research 2021, 20
(1)
, 236-249. https://doi.org/10.1021/acs.jproteome.0c00355
- Mario Picciani, Wassim Gabriel, Victor‐George Giurcoiu, Omar Shouman, Firas Hamood, Ludwig Lautenbacher, Cecilia Bang Jensen, Julian Müller, Mostafa Kalhor, Armin Soleymaniniya, Bernhard Kuster, Matthew The, Mathias Wilhelm. Oktoberfest: Open‐source spectral library generation and rescoring pipeline based on Prosit. PROTEOMICS 2023, 12 https://doi.org/10.1002/pmic.202300112
- Joseph Dodd-o, Amanda M. Acevedo-Jake, Abdul-Rahman Azizogli, Vikram Khipple Mulligan, Vivek A. Kumar. How to Design Peptides. 2023, 187-216. https://doi.org/10.1007/978-1-0716-2835-5_15
- Huiming Zhu, Songhao Jiang, Wenjing Zhou, Hao Chi, Jinshuai Sun, Jiahui Shi, Zhenpeng Zhang, Lei Chang, Liyan Yu, Lixia Zhang, Zhitang Lyu, Ping Xu, Yao Zhang. Ac-LysargiNase efficiently helps genome reannotation of Mycolicibacterium smegmatis MC2 155. Journal of Proteomics 2022, 264 , 104622. https://doi.org/10.1016/j.jprot.2022.104622
- Wen-Jing Zhou, Zhuo-Hong Wei, Si-Min He, Hao Chi. pValid 2: A deep learning based validation method for peptide identification in shotgun proteomics with increased discriminating power. Journal of Proteomics 2022, 251 , 104414. https://doi.org/10.1016/j.jprot.2021.104414
- Muhammad Usman Tariq, Muhammad Haseeb, Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, Fahad Saeed. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE Access 2021, 9 , 5497-5516. https://doi.org/10.1109/ACCESS.2020.3047588
- Rui Xu, Jie Sheng, Mingze Bai, Kunxian Shu, Yunping Zhu, Cheng Chang. A Comprehensive Evaluation of MS/MS Spectrum Prediction Tools for Shotgun Proteomics. PROTEOMICS 2020, 20
(21-22)
https://doi.org/10.1002/pmic.201900345
- Bo Wen, Wen‐Feng Zeng, Yuxing Liao, Zhiao Shi, Sara R. Savage, Wen Jiang, Bing Zhang. Deep Learning in Proteomics. PROTEOMICS 2020, 20
(21-22)
https://doi.org/10.1002/pmic.201900335