J. Proteome Res., 4 (4), 1353 -1360, 2005. 10.1021/pr0500509 S1535-3893(05)00050-3
Web Release Date: June 25, 2005

Copyright © 2005 American Chemical Society

Large Scale Analysis of MASCOT Results Using a Mass Accuracy-Based THreshold (MATH) Effectively Improves Data Interpretation

Paul A. Rudnick,* Yueju Wang, Erin Evans, Cheng S. Lee, and Brian M. Balgley

Calibrant Biosystems, 7507 Standish Pl., Rockville, Maryland 20855, and Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742

Received March 3, 2005

Abstract:

In this report, we take a heuristic approach to studying the effects of mass tolerance settings and database size on the sensitivity and specificity of MASCOT. We also examine the efficacy of the MASCOT Identity Threshold as a discriminator when applied to QqTOF data with an average mass accuracy of 10 ppm or better. As predicted, arbitrarily large mass tolerance settings negatively affect MASCOT's specificity, and to a lesser degree, sensitivity. Increased mass tolerances also render the generation of a significance threshold less effective. To study these effects, we used Bayes' Law to calculate MASCOT's predictive values. With a relatively small search database (Human IPI), MASCOT had a mean positive predictive value of 0.993 when combined with MASCOT's Identity Threshold. However, the corresponding average negative predictive value, or the probability that an ion was not present given no score or a score below threshold, was reduced as mass tolerances were tightened, and had an average value of 0.717. This value was improved upon by extrapolating an empirical threshold using a reversed database search and a new algorithm to rapidly identify false positive identifications. Using the empirical threshold reduced false negative identifications on the average 17% while limiting the false positive rate to below 5%; even larger reductions were obtained using mass tolerances approaching two times the actual error of the experimental data. A simple application of this strategy to the analysis of a microdissected glioblastoma multiforme sample analyzed by IEF/LC-MS/MS is reported, as is a description of the tools required to implement a large scale analysis using this alternative approach.

Keywords: bioinformatics MASCOT data analysis search algorithms statistics IEF/LC-MS/MS SEQUEST data standards biomarkers


Download the full text: PDF | HTML