ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img

pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics

  • Wen-Jing Zhou
    Wen-Jing Zhou
    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190
    University of Chinese Academy of Sciences, Beijing, China 100049
  • Hao Yang
    Hao Yang
    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190
    University of Chinese Academy of Sciences, Beijing, China 100049
    More by Hao Yang
  • Wen-Feng Zeng
    Wen-Feng Zeng
    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190
    University of Chinese Academy of Sciences, Beijing, China 100049
  • Kun Zhang
    Kun Zhang
    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190
    University of Chinese Academy of Sciences, Beijing, China 100049
    More by Kun Zhang
  • Hao Chi*
    Hao Chi
    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190
    University of Chinese Academy of Sciences, Beijing, China 100049
    *H.C. e-mail: [email protected]
    More by Hao Chi
  • , and 
  • Si-Min He*
    Si-Min He
    Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China 100190
    University of Chinese Academy of Sciences, Beijing, China 100049
    *S.-M.H. e-mail: [email protected]
    More by Si-Min He
Cite this: J. Proteome Res. 2019, 18, 7, 2747–2758
Publication Date (Web):June 24, 2019
https://doi.org/10.1021/acs.jproteome.8b00993
Copyright © 2019 American Chemical Society

    Article Views

    1038

    Altmetric

    -

    Citations

    LEARN ABOUT THESE METRICS
    Read OnlinePDF (3 MB)
    Supporting Info (2)»

    Abstract

    Abstract Image

    As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.

    Supporting Information

    ARTICLE SECTIONS
    Jump To

    The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.8b00993.

    • Tables summarizing the four data sets used, the search engines used, the database search parameters, the four databases used for Kuster_PT_Training, the training and test sets, FPRs and FNRs (%) of five validation methods at the peptide level, FPR and FNR of the Synthetic-peptide validation of Kuster_PT at the peptide level, comparisons of using the top-1 peptide or top-3 peptides in three validation methods, of different implementations of the Trap-database, and of different implementations of Open-search; figures showing an example identified from pFind which both Open-search and pDeep validations flag as suspicious identifications, ROC and PR curves of five validation methods on Olsen_Hela and on Mann_Hela, true-positive and true-negative PSMs and peptides comparison among five methods on Kuster_PT, on Olsen_Hela, and on Mann_Hela, influence of different features and thresholds on pValid; and notes detailing the relationship between metrics of validation (i.e., FPR and FNR) and metrics of identification (i.e., recall and error rate), the workflow of calculating FPR and FNR of Synthetic-peptide validation, the similarity of synthetic-peptide with the same or different collision energies, and the other mass deviations to construct trap spectra (Supplementary Tables 1–12, Figures 1–8, and Notes 1–4) (PDF)

    • Supplementary data listing validation results on PSMs related to olfactory proteins identified from the Pandey_Draft data set and database search parameters for the Pandey_Draft data set (XLSX)

    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

    Cited By

    This article is cited by 11 publications.

    1. Zhen-Lin Chen, Peng-Zhi Mao, Wen-Feng Zeng, Hao Chi, Si-Min He. pDeepXL: MS/MS Spectrum Prediction for Cross-Linked Peptide Pairs by Deep Learning. Journal of Proteome Research 2021, 20 (5) , 2570-2582. https://doi.org/10.1021/acs.jproteome.0c01004
    2. Jinghan Yang, Zhiqiang Gao, Xiuhan Ren, Jie Sheng, Ping Xu, Cheng Chang, Yan Fu. DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning. Analytical Chemistry 2021, 93 (15) , 6094-6103. https://doi.org/10.1021/acs.analchem.0c04704
    3. Ching Tarn, Wen-Feng Zeng. pDeep3: Toward More Accurate Spectrum Prediction with Fast Few-Shot Learning. Analytical Chemistry 2021, 93 (14) , 5815-5822. https://doi.org/10.1021/acs.analchem.0c05427
    4. Timothy Aaron Wiles, Laura M. Saba, Thomas Delong. Peptide−Spectrum Match Validation with Internal Standards (P−VIS): Internally-Controlled Validation of Mass Spectrometry-Based Peptide Identifications. Journal of Proteome Research 2021, 20 (1) , 236-249. https://doi.org/10.1021/acs.jproteome.0c00355
    5. Mario Picciani, Wassim Gabriel, Victor‐George Giurcoiu, Omar Shouman, Firas Hamood, Ludwig Lautenbacher, Cecilia Bang Jensen, Julian Müller, Mostafa Kalhor, Armin Soleymaniniya, Bernhard Kuster, Matthew The, Mathias Wilhelm. Oktoberfest: Open‐source spectral library generation and rescoring pipeline based on Prosit. PROTEOMICS 2023, 12 https://doi.org/10.1002/pmic.202300112
    6. Joseph Dodd-o, Amanda M. Acevedo-Jake, Abdul-Rahman Azizogli, Vikram Khipple Mulligan, Vivek A. Kumar. How to Design Peptides. 2023, 187-216. https://doi.org/10.1007/978-1-0716-2835-5_15
    7. Huiming Zhu, Songhao Jiang, Wenjing Zhou, Hao Chi, Jinshuai Sun, Jiahui Shi, Zhenpeng Zhang, Lei Chang, Liyan Yu, Lixia Zhang, Zhitang Lyu, Ping Xu, Yao Zhang. Ac-LysargiNase efficiently helps genome reannotation of Mycolicibacterium smegmatis MC2 155. Journal of Proteomics 2022, 264 , 104622. https://doi.org/10.1016/j.jprot.2022.104622
    8. Wen-Jing Zhou, Zhuo-Hong Wei, Si-Min He, Hao Chi. pValid 2: A deep learning based validation method for peptide identification in shotgun proteomics with increased discriminating power. Journal of Proteomics 2022, 251 , 104414. https://doi.org/10.1016/j.jprot.2021.104414
    9. Muhammad Usman Tariq, Muhammad Haseeb, Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, Fahad Saeed. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE Access 2021, 9 , 5497-5516. https://doi.org/10.1109/ACCESS.2020.3047588
    10. Rui Xu, Jie Sheng, Mingze Bai, Kunxian Shu, Yunping Zhu, Cheng Chang. A Comprehensive Evaluation of MS/MS Spectrum Prediction Tools for Shotgun Proteomics. PROTEOMICS 2020, 20 (21-22) https://doi.org/10.1002/pmic.201900345
    11. Bo Wen, Wen‐Feng Zeng, Yuxing Liao, Zhiao Shi, Sara R. Savage, Wen Jiang, Bing Zhang. Deep Learning in Proteomics. PROTEOMICS 2020, 20 (21-22) https://doi.org/10.1002/pmic.201900335

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    You’ve supercharged your research process with ACS and Mendeley!

    STEP 1:
    Click to create an ACS ID

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    MENDELEY PAIRING EXPIRED
    Your Mendeley pairing has expired. Please reconnect