Online News
MAYU estimates false discovery rates for protein IDs in large data sets
A new method determines the reliabilities of protein IDs.
Although numerous strategies exist for the estimation of the false discovery rates (FDRs) of peptide–spectrum matches (PSMs), fewer methods have been developed for the estimation of protein-ID FDRs. And none of these protein-ID FDR methods seem to work well for large data sets. Thus, Ruedi Aebersold and colleagues at the University of Zurich, the Swiss Federal Institute of Technology Zurich, and the Institute for Systems Biology have developed MAYU, a generic approach that estimates the reliabilities of protein IDs for data sets of all sizes, especially those composed of ≥100 LC/MS/MS runs.
In a typical shotgun proteomics experiment, database search engines match mass spectra with peptide sequences. Then, protein IDs are inferred by matching these peptide sequences to known proteins. Because proteins are identified on the basis of many PSMs, figuring out FDRs at the protein level is a much more complicated process than doing the same at the peptide level. Small errors at the PSM step can translate into large errors for protein IDs.
To appropriately determine the FDR for protein IDs, MAYU extends the target decoy approach that has been so famously used for PSM FDRs. The input for MAYU is a list of inferred protein IDs. The approach then takes into account the number of protein IDs in target and decoy databases and the sizes of the databases to provide FDRs for the protein IDs.
The researchers validated MAYU, then compared its performance with that of other protein-ID FDR strategies, such as ProteinProphet and the naïve target decoy approach. During the comparison, the scientists noticed that as larger data sets were analyzed, the discrepancies between MAYU and the other two approaches grew. To get a better handle on this size effect, Aebersold and colleagues applied MAYU to the protein IDs of 20 increasingly large subsets of the C. elegans proteome. They found that the sizes of the data sets and the PSM FDRs strongly affected the protein-ID FDR.
MAYU software is publicly available as an integrated part of the Trans-Proteomic Pipeline suite of tools and as a stand-alone package. (Mol. Cell. Proteomics 2009, DOI 10.1074/mcp.M900317-MCP200)
Cart
ACS
Network







