ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img
RETURN TO ISSUEPREVResearch ArticleNEXT

Preprocessing Tandem Mass Spectra Using Genetic Programming for Peptide Identification

  • Samaneh Azari
    Samaneh Azari
    School of Engineering and Computer Science, Victoria University of Wellington, 6012, Wellington, Kelburn, New Zealand
    School of Engineering and Computer Science, Victoria University of Wellington, 6140, Wellington, New Zealand
  • Bing Xue
    Bing Xue
    School of Engineering and Computer Science, Victoria University of Wellington, 6012, Wellington, Kelburn, New Zealand
    More by Bing Xue
  • Mengjie Zhang
    Mengjie Zhang
    School of Engineering and Computer Science, Victoria University of Wellington, 6012, Wellington, Kelburn, New Zealand
  • , and 
  • Lifeng Peng
    Lifeng Peng
    Centre for Biodiscovery and School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
    More by Lifeng Peng
Cite this: J. Am. Soc. Mass Spectrom. 2019, 30, 7, 1294–1307
Publication Date (Web):April 25, 2019
https://doi.org/10.1007/s13361-019-02196-5
Copyright © 2019 © American Society for Mass Spectrometry 2019

    Article Views

    143

    Altmetric

    -

    Citations

    LEARN ABOUT THESE METRICS
    Other access options
    Supporting Info (11)»

    Abstract

    Abstract Image

    One of the major challenges in proteomics is peptide identification from mass spectra containing high noise ratio and small number of signal (b-/y-ions) peaks. However, the accuracy and reliability of peptide identification in such highly imbalanced MS/MS data can be improved by applying a preprocessing step prior to peptide identification aiming at discriminating b-/y-ions from noise peaks in the spectra. In this study, we report a genetic programming (GP)–based preprocessing method for de-noising highly imbalanced and noisy CID MS/MS spectra. GP now becomes a popular machine learning method via automatic programming. GP preprocesses the highly noisy MS/MS spectra by classifying peaks as noise peaks or signal peaks in a binary classification manner. Meanwhile, a set of spectral fragment features based on the MS/MS fragmentation rules is extracted from the dataset to investigate their discriminating abilities by GP. A MS/MS spectral dataset containing thousands of spectra are used to train the GP model. As the GP tree-based representation has the capability for implicit feature selection during the evolutionary process, the evolved GP model with the selected features is compared with the best threshold-based method. The results show that the GP method improved the reliability of peptide identification and increased the identification rate of a de novo sequencing tool, PEAKS, to 99.4% from 80.1% achieved by the best threshold-based method. Moreover, the result of peptide identification by a database search tool, SEQUEST, using the data preprocessed by the GP method was statistically significant compared to the other methods.

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.

    Recommended

    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. You can change your affiliated institution below.

    Supporting Information


    Terms & Conditions

    Electronic Supporting Information files are available without a subscription to ACS Web Editions. The American Chemical Society holds a copyright ownership interest in any copyrightable Supporting Information. Files available from the ACS website may be downloaded for personal use only. Users are not otherwise permitted to reproduce, republish, redistribute, or sell any Supporting Information from the ACS website, either in whole or in part, in either machine-readable form or any other form without permission from the American Chemical Society. For permission to reproduce, republish and redistribute this material, requesters must process their own requests via the RightsLink permission system. Information about how to use the RightsLink permission system can be found at http://pubs.acs.org/page/copyright/permissions.html.

    Cited By

    This article is cited by 1 publications.

    1. Samaneh Azari, Bing Xue, Mengjie Zhang, Lifeng Peng. A Decomposition Based Multi-objective Genetic Programming Algorithm for Classification of Highly Imbalanced Tandem Mass Spectrometry. 2020, 449-463. https://doi.org/10.1007/978-3-030-41299-9_35

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    You’ve supercharged your research process with ACS and Mendeley!

    STEP 1:
    Click to create an ACS ID

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    MENDELEY PAIRING EXPIRED
    Your Mendeley pairing has expired. Please reconnect