EDITORIAL
Developing and Disseminating Advances in Computational and Statistical Proteomics
Martin McIntosh
This publication is free to access through this site. Learn More
CURRENTS
Ovarian cancer proteins identified with microarrays | Potential biomarker for neural progenitor cells | Enhanced ion charge states for ETD | Calorimetry for proteomics | Quantitative analysis of protein complexes | Tea tasters replaced by NMR? | Percolator | The biochemical network database | Mass spectrum quality assessment | Getting the most out of MS2 and MS3 data
This publication is free to access through this site. Learn More
NEWS
Turning data graveyards into gold mines
Katie Cottingham
This publication is free to access through this site. Learn More
A powerful tool for PTM discovery
Laura Cassiday
This publication is free to access through this site. Learn More
Proteomics Projects: HAI brings antibody researchers together
Katie Cottingham
This publication is free to access through this site. Learn More
People: New Editorial Advisory Board members
This publication is free to access through this site. Learn More
CALENDAR
Meetings
This publication is free to access through this site. Learn More
PRODUCT NOTES
New Products
This publication is free to access through this site. Learn More
Perspectives
Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases
Lukas Käll - ,
John D. Storey - ,
Michael J. MacCoss - , and
William Stafford Noble *
Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide−spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.
Modes of Inference for Evaluating the Confidence of Peptide Identifications
Matt Fitzgibbon - ,
Qunhua Li - , and
Martin McIntosh *
Several modes of inference are currently used in practice to evaluate the confidence of putative peptide identifications resulting from database scoring algorithms such as Mascot, SEQUEST, or X!Tandem. The approaches include parametric methods, such as classic PeptideProphet, and distribution free methods, such as methods based on reverse or decoy databases. Because of its parametric nature, classic PeptideProphet, although highly robust, was not highly flexible and was difficult to apply to new search algorithms or classification scores. While commonly applied, the decoy approach has not yet been fully formalized and standardized. And, although they are distribution-free, they like other approaches are not free of assumptions. Recent manuscripts by Käll et al., Choi and Nesvizhskii, and Choi et al. help advance these methods, specifically by formalizing an alternative formulation of decoy databases approaches and extending the PeptideProphet methods to make explicit use of decoy databases, respectively. Taken together with standardized decoy database methods, and expectation scores computed by search engines like Tandem, there exist at least four different modes of inference used to assign confidence levels to individual peptides or groups of peptides. We overview and compare the assumptions of each of these approaches and summarize some interpretation issues. We also discuss some suggestions, which may make the use of decoy databases more computationally efficient in practice.
Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin
Lukas Käll - ,
John D. Storey - ,
Michael J. MacCoss - , and
William Stafford Noble *
A variety of methods have been described in the literature for assigning statistical significance to peptides identified via tandem mass spectrometry. Here, we explain how two types of scores, the q-value and the posterior error probability, are related and complementary to one another.
What’s Driving False Discovery Rates?
David L. Tabb †
The “Paris Guidelines” have begun the process of standardizing reporting for proteomics. New bioinformatics tools have improved the process for estimating error rates of peptide identifications. This perspective seeks to consider these advances in the context of proteomics’ short history. As increasing numbers of proteomics papers come from biologists rather than technologists, developing consensus standards for estimating error will be increasingly necessary. Standardizing this assessment should be welcomed as a reflection of the growing impact of proteomic technologies.
False Discovery Rates and Related Statistical Concepts in Mass Spectrometry-Based Proteomics
Hyungwon Choi - and
Alexey I. Nesvizhskii *
Development of statistical methods for assessing the significance of peptide assignments to tandem mass spectra obtained using database searching remains an important problem. In the past several years, several different approaches have emerged, including the concept of expectation values, target-decoy strategy, and the probability mixture modeling approach of PeptideProphet. In this work, we provide a background on statistical significance analysis in the field of mass spectrometry-based proteomics, and present our perspective on the current and future developments in this area.
Reviews
An Assessment of Software Solutions for the Analysis of Mass Spectrometry Based Quantitative Proteomics Data
Lukas N. Mueller *- ,
Mi-Youn Brusniak - ,
D. R. Mani - , and
Ruedi Aebersold
Over the past decade, a series of experimental strategies for mass spectrometry based quantitative proteomics and corresponding computational methodology for the processing of the resulting data have been generated. We provide here an overview of the main quantification principles and available software solutions for the analysis of data generated by liquid chromatography coupled to mass spectrometry (LC-MS). Three conceptually different methods to perform quantitative LC-MS experiments have been introduced. In the first, quantification is achieved by spectral counting, in the second via differential stable isotopic labeling, and in the third by using the ion current in label-free LC-MS measurements. We discuss here advantages and challenges of each quantification approach and assess available software solutions with respect to their instrument compatibility and processing functionality. This review therefore serves as a starting point for researchers to choose an appropriate software solution for quantitative proteomic experiments based on their experimental and analytical requirements.
Article
High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning
Dongmei Yang - ,
Kevin Ramkissoon - ,
Eric Hamlett - , and
Morgan C. Giddings *
This publication is Open Access under the license indicated. Learn More
For MALDI-TOF mass spectrometry, we show that the intensity of a peptide–ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model’s cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.
A Data-Mining Scheme for Identifying Peptide Structural Motifs Responsible for Different MS/MS Fragmentation Intensity Patterns
Yingying Huang - ,
George C. Tseng - ,
Shinsheng Yuan - ,
Ljiljana Pasa-Tolic - ,
Mary S. Lipton - ,
Richard D. Smith - , and
Vicki H. Wysocki *
Although tandem mass spectrometry (MS/MS) has become an integral part of proteomics, intensity patterns in MS/MS spectra are rarely weighted heavily in most widely used algorithms because they are not yet fully understood. Here a knowledge mining approach is demonstrated to discover fragmentation intensity patterns and elucidate the chemical factors behind such patterns. Fragmentation intensity information from 28 330 ion trap peptide MS/MS spectra of different charge states and sequences went through unsupervised clustering using a penalized K-means algorithm. Without any prior chemistry assumptions, four clusters with distinctive fragmentation patterns were obtained. A decision tree was generated to investigate peptide sequence motif and charge state status that caused these fragmentation patterns. This data-mining scheme is generally applicable for any large data sets. It bypasses the common prior knowledge constraints and reports on the overall peptide fragmentation behavior. It improves the understanding of gas-phase peptide dissociation and provides a foundation for new or improved protein identification algorithms.
Whole Genome Searching with Shotgun Proteomic Data: Applications for Genome Annotation
Joel R. Sevinsky - ,
Benjamin J. Cargile - ,
Maureen K. Bunger - ,
Fanyu Meng - ,
Nathan A. Yates - ,
Ronald C. Hendrickson - , and
James L. Stephenson, Jr. *
High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation.
Algorithm for Identification of Fusion Proteins via Mass Spectrometry
Julio Ng *- and
Pavel A. Pevzner
Identification of fusion proteins has contributed significantly to our understanding of cancer progression, yielding important predictive markers and therapeutic targets. While fusion proteins can be potentially identified by mass spectrometry, all previously found fusion proteins were identified using genomic (rather than mass spectrometry) technologies. This lack of MS/MS applications in studies of fusion proteins is caused by the lack of computational tools that are able to interpret mass spectra from peptides covering unknown fusion breakpoints (fusion peptides). Indeed, the number of potential fusion peptides is so large that the existing MS/MS database search tools become impractical even in the case of small genomes. We explore computational approaches to identifying fusion peptides, propose an algorithm for solving the fusion peptide identification problem, and analyze the performance of this algorithm on simulated data. We further illustrate how this approach can be modified for human exons prediction.
The Standard Protein Mix Database: A Diverse Data Set To Assist in the Production of Improved Peptide and Protein Identification Software Tools
John Klimek - ,
James S. Eddes - ,
Laura Hohmann - ,
Jennifer Jackson - ,
Amelia Peterson - ,
Simon Letarte - ,
Philip R. Gafken - ,
Jonathan E Katz - ,
Parag Mallick - ,
Hookeun Lee - ,
Alexander Schmidt - ,
Reto Ossola - ,
Jimmy K. Eng - ,
Ruedi Aebersold - , and
Daniel B Martin *
Tandem mass spectrometry (MS/MS) is frequently used in the identification of peptides and proteins. Typical proteomic experiments rely on algorithms such as SEQUEST and MASCOT to compare thousands of tandem mass spectra against the theoretical fragment ion spectra of peptides in a database. The probabilities that these spectrum-to-sequence assignments are correct can be determined by statistical software such as PeptideProphet or through estimations based on reverse or decoy databases. However, many of the software applications that assign probabilities for MS/MS spectra to sequence matches were developed using training data sets from 3D ion-trap mass spectrometers. Given the variety of types of mass spectrometers that have become commercially available over the last 5 years, we sought to generate a data set of reference data covering multiple instrumentation platforms to facilitate both the refinement of existing computational approaches and the development of novel software tools. We analyzed the proteolytic peptides in a mixture of tryptic digests of 18 proteins, named the “ISB standard protein mix”, using 8 different mass spectrometers. These include linear and 3D ion traps, two quadrupole time-of-flight platforms (qq-TOF), and two MALDI-TOF−TOF platforms. The resulting data set, which has been named the Standard Protein Mix Database, consists of over 1.1 million spectra in 150+ replicate runs on the mass spectrometers. The data were inspected for quality of separation and searched using SEQUEST. All data, including the native raw instrument and mzXML formats and the PeptideProphet validated peptide assignments, are available at http://regis-web.systemsbiology.net/PublicDatasets/.
Interactive Three-Dimensional Visualization and Contextual Analysis of Protein Interaction Networks
Edwin Ho - ,
Richard Webber - , and
Marc R. Wilkins *
To understand the biology of the interactome, the covisualization of protein interactions and other protein-related data is required. In this study, we have adapted a 3-D network visualization platform, GEOMI, to allow the coanalysis of protein–protein interaction networks with proteomic parameters such as protein localization, abundance, physicochemical parameters, post-translational modifications, and gene ontology classification. Working with Saccharomyces cerevisiae data, we show that rich and interactive visualizations, constructed from multidimensional orthogonal data, provide insights on the complexity of the interactome and its role in biological processes and the architecture of the cell. We present the first organelle-specific interaction networks, that provide subinteractomes of high biological interest. We further present some of the first views of the interactome built from a new combination of yeast two-hybrid data and stable protein complexes, which are likely to approximate the true workings of stable and transient aspects of the interactome. The GEOMI tool and all interactome data are freely available by contacting the authors.
Clustering Millions of Tandem Mass Spectra
Ari M. Frank *- ,
Nuno Bandeira - ,
Zhouxin Shen - ,
Stephen Tanner - ,
Steven P. Briggs - ,
Richard D. Smith - , and
Pavel A. Pevzner *
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.
Improving Mass Spectrometry Peak Detection Using Multiple Peak Alignment Results
Weichaun Yu *- ,
Zengyou He - ,
Junfeng Liu - , and
Hongyu Zhao
Mass spectrometry data are often corrupted by noise. It is very difficult to simultaneously detect low-abundance peaks and reduce false-positive peak detection caused by noise. In this paper, we propose to improve peak detection using an additional constraint: the consistent appearance of similar true peaks across multiple spectra. We observe that false -positive peaks in general do not repeat themselves well across multiple spectra. When we align all the identified peaks (including false-positive ones) from multiple spectra together, those false-positive peaks are not as consistent as true peaks. Thus, we propose to use information from other spectra in order to reduce false-positive peaks. The new method improves the detection of peaks over the traditional single spectrum based peak detection methods. Consequently, the discovery of cancer biomarkers also benefits from this improvement. Source code and additional data are available at: http://www.ece.ust.hk/∼eeyu/mspeak.htm.
On the Value of Knowing a z• Ion for What It Is
Jian Liu - ,
Xiaorong Liang - , and
Scott A. McLuckey *
Computer simulation of database searches of electron transfer dissociation (ETD) spectra using both “bottom up” and “top down” approaches was performed to evaluate the utility of knowing a priori which product ions contain the C-terminus (i.e., the z• ions). In this work, knowledge of the identities of the z• ions was used to exclude putative identifications that are based solely on the mass matching of undifferentiated product ions derived from an experiment with those derived from in silico fragmentation. The benefit from knowing which ions are z• ions was found to be heavily dependent on the quality of the ETD spectra, in terms of sequence coverage afforded by the product ions, the amount of noise in the spectra (i.e., extraneous peaks that do not directly reflect primary structure), and mass measurement accuracy. Under conditions in which the likelihood for misidentifications are high without a priori knowledge of ion types (e.g., b-, y-, c-, or z-ions), a knowledge of which product ions are z• ions allows discrimination against false-positive identifications. Relatively little benefit from knowing which ions are z• ions was noted when product spectra reflected relatively high sequence coverage and when a low fraction of the products ions were due to extraneous peaks (i.e., spectra with relatively little noise). In all cases, specificity is higher with higher mass measurement accuracy with the consequent reduction in benefit from knowledge of which ions are z• ions.
Identification and Characterization of Disulfide Bonds in Proteins and Peptides from Tandem MS Data by Use of the MassMatrix MS/MS Search Engine
Hua Xu - ,
Liwen Zhang - , and
Michael A. Freitas *
A new database search algorithm has been developed to identify disulfide-linked peptides in tandem MS data sets. The algorithm is included in the newly developed tandem MS database search program, MassMatrix. The algorithm exploits the probabilistic scoring model in MassMatrix to achieve identification of disulfide bonds in proteins and peptides. Proteins and peptides with disulfide bonds can be identified with high confidence without chemical reduction or other derivatization. The approach was tested on peptide and protein standards with known disulfide bonds. All disulfide bonds in the standard set were identified by MassMatrix. The algorithm was further tested on bovine pancreatic ribonuclease A (RNaseA). The 4 native disulfide bonds in RNaseA were detected by MassMatrix with multiple validated peptide matches for each disulfide bond with high statistical scores. Fifteen nonnative disulfide bonds were also observed in the protein digest under basic conditions (pH = 8.0) due to disulfide bond interchange. After minimizing the disulfide bond interchange (pH = 6.0) during digestion, only one nonnative disulfide bond was observed. The MassMatrix algorithm offers an additional approach for the discovery of disulfide bond from tandem mass spectrometry data.
Local and Nonlocal Environments around Cis Peptides
Brent Wathen - and
Zongchao Jia *
Although the vast majority of peptide bonds in folded proteins are found in the trans conformation, a small percentage are found in the less energetically favorable cis conformation. Though the mechanism of cis peptide bond formation remains unknown, the role of local aromatics has been emphasized in the literature. This paper presents results from a comprehensive statistical analysis of both the local and nonlocal (i.e., tertiary) environment around cis peptides. In addition to an increased frequency of aromatic residues in the local environment around cis peptides, a number of nonlocal differences in protein secondary and tertiary structure between cis and trans peptides are found: (i) coil regions containing cis peptides are almost twice as long as those without cis peptides and include more Tyr and Pro residues; (ii) cis peptides occur with high frequencies in coil regions near large β-structures; (iii) there is a nonlocal enrichment of Cys, His, Tyr, and Ser in the tertiary environment surrounding cis peptides when compared to trans peptides; and (iv) on average, cis peptides make fewer medium-range and more long-range contacts than trans peptides do. On the basis of these observations, it is concluded that nonlocal factors play a significant role in cis peptide formation, which has not been fully appreciated previously. An autocatalytic model for cis peptide formation is discussed as are consequences for protein folding.
In Silico Tools for Predicting Peptides Binding to HLA-Class II Molecules: More Confusion than Conclusion
Uthaman Gowthaman - and
Javed N. Agrewala *
Identification of promiscuous peptides, which bind to human leukocyte antigen, is indispensable for global vaccination. However, the development of such vaccines is impaired due to the exhaustive polymorphism in human leukocyte antigens. The use of in silico tools for mining such peptides circumvents the expensive and laborious experimental screening methods. Nevertheless, the intrepid use of such tools warrants a rational assessment with respect to experimental findings. Here, we have adopted a ‘bottom upʼ approach, where we have used experimental data to assess the reliability of existing in silico methods. We have used a data set of 179 peptides from diverse antigens and have validated six commonly used in silico methods; ProPred, MHC2PRED, RANKPEP, SVMHC, MHCPred, and MHC-BPS. We observe that the prediction efficiency of the programs is not balanced for all the HLA-DR alleles and there is extremely high level of discrepancy in the prediction efficiency apropos of the nature of the antigen. It has not escaped our notice that the in silico methods studied here are not very proficient in identifying promiscuous peptides. This puts a much constraint on the intrepid use of such programs for human leukocyte antigen class II binding peptides. We conclude from this study that the in silico methods cannot be wholly relied for selecting crucial peptides for development of vaccines.
The Effects of Shared Peptides on Protein Quantitation in Label-Free Proteomics by LC/MS/MS
Shuangshuang Jin - ,
Donald S. Daly - ,
David L. Springer - , and
John H. Miller *
Assessment of differential protein abundance from the observed properties of detected peptides is an essential part of protein profiling based on shotgun proteomics. However, the abundance observed for shared peptides may be due to contributions from multiple proteins that are affected differently by a given treatment. Excluding shared peptides eliminates this ambiguity but may significantly decrease the number of proteins for which abundance estimates can be obtained. Peptide sharing within a family of biologically related proteins does not cause ambiguity if family members have a common response to treatment. On the basis of this concept, we have developed an approach for including shared peptides in the analysis of differential protein abundance in protein profiling. Data from a recent proteomics study of lung tissue from mice exposed to lipopolysaccharide, cigarette smoke, and a combination of these agents are used to illustrate our method. Starting from data where about half of the implicated database protein involved shared peptides, 82% of the affected proteins were grouped into families, based on FASTA annotation, with closure on peptide sharing. In many cases, a common abundance relative to control was sufficient to explain ion-current peak areas for peptides, both unique and shared, that identified biologically related proteins in a peptide-sharing closure group. On the basis of these results, we propose that peptide-sharing closure groups provide a way to include abundance data for shared peptides in quantitative protein profiling by high-throughput mass spectrometry.
Accurate Annotation of Peptide Modifications through Unrestrictive Database Search
Stephen Tanner *- ,
Samuel H. Payne - ,
Surendra Dasari - ,
Zhouxin Shen - ,
Phillip A. Wilmarth - ,
Larry L. David - ,
William F. Loomis - ,
Steven P. Briggs - , and
Vineet Bafna
Proteins are extensively modified after translation due to cellular regulation, signal transduction, or chemical damage. Peptide tandem mass spectrometry can discover post-translational modifications, as well as sequence polymorphisms. Recent efforts have studied modifications at the proteomic scale. In this context, it becomes crucial to assess the accuracy of modification discovery. We discuss methods to quantify the false discovery rate from a search and demonstrate how several features can be used to distinguish valid modifications from search artifacts. We present a tool, PTMFinder, which implements these methods. We summarize the corpus of post-translational modifications identified on large data sets. Thousands of known and novel modification sites are identified, including site-specific modifications conserved over vast evolutionary distances.
Analyzing Large-Scale Proteomics Projects with Latent Semantic Indexing
Sebastian Klie *- ,
Lennart Martens *- ,
Juan Antonio Vizcaíno - ,
Richard Côté - ,
Phil Jones - ,
Rolf Apweiler - ,
Alexander Hinneburg - , and
Henning Hermjakob
Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput experiments have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amount of identifications available to the community. Despite the considerable body of information amassed, very few successful analyses have been performed and published on this data, leveling off the ultimate value of these projects far below their potential. A prominent reason published proteomics data is seldom reanalyzed lies in the heterogeneous nature of the original sample collection and the subsequent data recording and processing. To illustrate that at least part of this heterogeneity can be compensated for, we here apply a latent semantic analysis to the data contributed by the Human Proteome Organization’s Plasma Proteome Project (HUPO PPP). Interestingly, despite the broad spectrum of instruments and methodologies applied in the HUPO PPP, our analysis reveals several obvious patterns that can be used to formulate concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in future experiments. It is clear from these results that the analysis of large bodies of publicly available proteomics data by noise-tolerant algorithms such as the latent semantic analysis holds great promise and is currently underexploited.
Data Augmentation Algorithms for Detecting Conserved Domains in Protein Sequences: A Comparative Study
Chengpeng Bi
Protein conserved domains are distinct units of molecular structure, usually associated with particular aspects of molecular function such as catalysis or binding. These conserved subsequences are often unobserved and thus in need of detection. Motif discovery methods can be used to find these unobserved domains given a set of sequences. This paper presents the data augmentation (DA) framework that unifies a suite of motif-finding algorithms through maximizing the same likelihood function by imputing the unobserved data. The data augmentation refers to those methods that formulate iterative optimization by exploiting the unobserved data. Two categories of maximum likelihood based motif-finding algorithms are illustrated under the DA framework. The first is the deterministic algorithms that are to maximize the likelihood function by performing an iteratively optimal local search in the alignment space. The second is the stochastic algorithms that are to iteratively draw motif location samples via Monte Carlo simulation and simultaneously keep track of the superior solution with the best likelihood. As a result, four DA motif discovery algorithms are described, evaluated, and compared by aligning real and simulated protein sequences.
Deriving the Probabilities of Water Loss and Ammonia Loss for Amino Acids from Tandem Mass Spectra
Shiwei Sun - ,
Chungong Yu - ,
Yantao Qiao - ,
Yu Lin - ,
Gongjin Dong - ,
Changning Liu - ,
Jingfen Zhang - ,
Zhuo Zhang - ,
Jinjin Cai - ,
Hong Zhang - , and
Dongbo Bu *
In protein identification through tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. The widely used prediction models, such as SEQUEST and MASCOT, ignore the intensity of the ions with important neutral losses, including water loss and ammonia loss. However, ignoring these neutral losses results in a significant deviation between the predicted theoretical spectrum and its experimental counterpart. Here, based on the “one peak, multiple explanations” observation, we proposed an expectation−maximization (EM) method to automatically learn the probabilities of water loss and ammonia loss for each amino acid. Then we employed these probabilities to design an improved statistical model for theoretical spectrum prediction. We implemented these methods and tested them on practical data. On a training set containing 1803 spectra, the experimental results show a good agreement with some known knowledge about neutral losses, such as the tendency of water loss from Asp, Glu, Ser, and Thr. Furthermore, on a testing set containing 941 spectra, the improved similarity between the experimental and predicted spectra demonstrates that this method can generate more reasonable predictions relative to the model that ignores neutral losses. As an application of the derived probabilities, we implemented a database searching method adopting the improved theoretical spectrum model with neutral loss ions estimated. Experimental results on Keller’s data set demonstrate that this method can identify peptides more accurately than SEQUEST. In another application to validate SEQUEST’s results, the reported peptide−spectrum pairs are reranked with respect to the similarity between experimental and predicted spectra. Experimental results on both LTQ and QS̅TAR data sets suggest that this reranking strategy can effectively distinguish the false negative predictions reported by SEQUEST.
Occurrence of Copper Proteins through the Three Domains of Life: A Bioinformatic Approach
Claudia Andreini - ,
Lucia Banci - ,
Ivano Bertini *- , and
Antonio Rosato
In high-throughput genome-level protein investigation efforts, such as Structural Genomics, the systematic experimental characterization of metal-binding properties (i.e., the investigation of the metalloproteome) is not always pursued and remains far from trivial. In the present work, we have applied a bioinformatic approach to investigate the occurrence of (putative) copper-binding proteins in 57 different organisms spanning the entire tree of life. We found that the size of the copper proteome is generally less than 1% of the total proteome of an organism, in both eukaryotes and prokaryotes. The occurrence of copper-binding proteins is relatively scarce when compared to that of zinc-binding proteins and of non-heme iron proteins. This may be due to both poorer bioavailability (in particular with respect to iron in the ancient world) and the complexity of copper chemistry and the risks associated with it, which may have adversely affected natural selection of copper-binding proteins. The present analysis shows that there is a strong relationship between the metal coordination sphere and protein function. A network involving proteins having roles in both copper transport and respiration was identified, parts or all of which are detected in the majority of the organisms examined.
Biomarker Discovery for Arsenic Exposure Using Functional Data. Analysis and Feature Learning of Mass Spectrometry Proteomic Data
Jaroslaw Harezlak *- ,
Michael C. Wu *- ,
Mike Wang - ,
Armin Schwartzman - ,
David C. Christiani - , and
Xihong Lin
Plasma biomarkers of exposure to environmental contaminants play an important role in early detection of disease. The emerging field of proteomics presents an attractive opportunity for candidate biomarker discovery, as it simultaneously measures and analyzes a large number of proteins. This article presents a case study for measuring arsenic concentrations in a population residing in an As-endemic region of Bangladesh using plasma protein expressions measured by SELDI-TOF mass spectrometry. We analyze the data using a unified statistical method based on functional learning to preprocess mass spectra and extract mass spectrometry (MS) features and to associate the selected MS features with arsenic exposure measurements. The task is challenging due to several factors, the high dimensionality of mass spectrometry data, complicated error structures, and a multiple comparison problem. We use nonparametric functional regression techniques for MS modeling, peak detection based on the significant zero-downcrossing method, and peak alignment using a warping algorithm. Our results show significant associations of arsenic exposure to either under- or overexpressions of 20 proteins.
Statistical Analysis of Relative Labeled Mass Spectrometry Data from Complex Samples Using ANOVA
Ann L. Oberg *- ,
Douglas W. Mahoney - ,
Jeanette E. Eckel-Passow - ,
Christopher J. Malone - ,
Russell D. Wolfinger - ,
Elizabeth G. Hill - ,
Leslie T. Cooper - ,
Oyere K. Onuma - ,
Craig Spiro - ,
Terry M. Therneau - , and
H. Robert Bergen, III
Statistical tools enable unified analysis of data from multiple global proteomic experiments, producing unbiased estimates of normalization terms despite the missing data problem inherent in these studies. The modeling approach, implementation, and useful visualization tools are demonstrated via a case study of complex biological samples assessed using the iTRAQ relative labeling protocol.
Protein Identification and Peptide Expression Resolver: Harmonizing Protein Identification with Protein Expression Data
Paul Kearney - ,
Heather Butler - ,
Kevin Eng - , and
Patrice Hugo
Proteomic discovery platforms generate both peptide expression information and protein identification information. Peptide expression data are used to determine which peptides are differentially expressed between study cohorts, and then these peptides are targeted for protein identification. In this paper, we demonstrate that peptide expression information is also a powerful tool for enhancing confidence in protein identification results. Specifically, we evaluate the following hypothesis: tryptic peptides originating from the same protein have similar expression profiles across samples in the discovery study. Evidence supporting this hypothesis is provided. This hypothesis is integrated into a protein identification tool, PIPER (Protein Identification and Peptide Expression Resolver), that reduces erroneous protein identifications below 5%. PIPER’s utility is illustrated by application to a 72-sample biomarker discovery study where it is demonstrated that false positive protein identifications can be reduced below 5%. Consequently, it is recommended that PIPER methodology be incorporated into proteomic studies where both protein expression and identification data are collected.
Improving Sensitivity by Probabilistically Combining Results from Multiple MS/MS Search Methodologies
Brian C. Searle *- ,
Mark Turner - , and
Alexey I. Nesvizhskii
Database-searching programs generally identify only a fraction of the spectra acquired in a standard LC/MS/MS study of digested proteins. Subtle variations in database-searching algorithms for assigning peptides to MS/MS spectra have been known to provide different identification results. To leverage this variation, a probabilistic framework is developed for combining the results of multiple search engines. The scores for each search engine are first independently converted into peptide probabilities. These probabilities can then be readily combined across search engines using Bayesian rules and the expectation maximization learning algorithm. A significant gain in the number of peptides identified with high confidence with each additional search engine is demonstrated using several data sets of increasing complexity, from a control protein mixture to a human plasma sample, searched using SEQUEST, Mascot, and X! Tandem database-searching programs. The increased rate of peptide assignments also translates into a substantially larger number of protein identifications in LC/MS/MS studies compared to a typical analysis using a single database-search tool.
Semisupervised Model-Based Validation of Peptide Identifications in Mass Spectrometry-Based Proteomics
Hyungwon Choi - and
Alexey I. Nesvizhskii *
Development of robust statistical methods for validation of peptide assignments to tandem mass (MS/MS) spectra obtained using database searching remains an important problem. PeptideProphet is one of the commonly used computational tools available for that purpose. An alternative simple approach for validation of peptide assignments is based on addition of decoy (reversed, randomized, or shuffled) sequences to the searched protein sequence database. The probabilistic modeling approach of PeptideProphet and the decoy strategy can be combined within a single semisupervised framework, leading to improved robustness and higher accuracy of computed probabilities even in the case of most challenging data sets. We present a semisupervised expectation-maximization (EM) algorithm for constructing a Bayes classifier for peptide identification using the probability mixture model, extending PeptideProphet to incorporate decoy peptide matches. Using several data sets of varying complexity, from control protein mixtures to a human plasma sample, and using three commonly used database search programs, SEQUEST, MASCOT, and TANDEM/k-score, we illustrate that more accurate mixture estimation leads to an improved control of the false discovery rate in the classification of peptide assignments.
Characterization of Global Yeast Quantitative Proteome Data Generated from the Wild-Type and Glucose Repression Saccharomyces cerevisiae Strains: The Comparison of Two Quantitative Methods
Renata Usaite - ,
James Wohlschlegel - ,
John D. Venable - ,
Sung K. Park - ,
Jens Nielsen - ,
Lisbeth Olsson - , and
John R. Yates III *
The quantitative proteomic analysis of complex protein mixtures is emerging as a technically challenging but viable systems-level approach for studying cellular function. This study presents a large-scale comparative analysis of protein abundances from yeast protein lysates derived from both wild-type yeast and yeast strains lacking key components of the Snf1 kinase complex. Four different strains were grown under well-controlled chemostat conditions. Multidimensional protein identification technology followed by quantitation using either spectral counting or stable isotope labeling approaches was used to identify relative changes in the protein expression levels between the strains. A total of 2388 proteins were relatively quantified, and more than 350 proteins were found to have significantly different expression levels between the two strains of comparison when using the stable isotope labeling strategy. The stable isotope labeling based quantitative approach was found to be highly reproducible among biological replicates when complex protein mixtures containing small expression changes were analyzed. Where poor correlation between stable isotope labeling and spectral counting was found, the major reason behind the discrepancy was the lack of reproducible sampling for proteins with low spectral counts. The functional categorization of the relative protein expression differences that occur in Snf1-deficient strains uncovers a wide range of biological processes regulated by this important cellular kinase.
Signal Detection in High-Resolution Mass Spectrometry Data
Dale F. McLerran - ,
Ziding Feng - ,
O. John Semmes - ,
Lisa Cazares - , and
Timothy W. Randolph *
Mass spectrometry data from high-resolution time-of-flight instruments often contain a vast number of noninformative background-ion peaks whose signal is similar to that of peptide peaks. Consequently, seeking peptide signal in these spectra based on a signal-to-noise ratio will remove signal peaks as well as noise. This work characterizes the background as a precursor to seeking peptide-related features. Robust-regression methods are used to estimate distributions for null (background) peak intensities and locations. Defining signal peaks as outliers with respect to these distributions leads to more precision in detecting the isotopic envelope of peaks from low-abundance peptides in high-resolution spectra.
Statistical Validation of Peptide Identifications in Large-Scale Proteomics Using the Target-Decoy Database Search Strategy and Flexible Mixture Modeling
Hyungwon Choi - ,
Debashis Ghosh - , and
Alexey I. Nesvizhskii *
Reliable statistical validation of peptide and protein identifications is a top priority in large-scale mass spectrometry based proteomics. PeptideProphet is one of the computational tools commonly used for assessing the statistical confidence in peptide assignments to tandem mass spectra obtained using database search programs such as SEQUEST, MASCOT, or X! TANDEM. We present two flexible methods, the variable component mixture model and the semiparametric mixture model, that remove the restrictive parametric assumptions in the mixture modeling approach of PeptideProphet. Using a control protein mixture data set generated on an linear ion trap Fourier transform (LTQ-FT) mass spectrometer, we demonstrate that both methods improve parametric models in terms of the accuracy of probability estimates and the power to detect correct identifications controlling the false discovery rate to the same degree. The statistical approaches presented here require that the data set contain a sufficient number of decoy (known to be incorrect) peptide identifications, which can be obtained using the target-decoy database search strategy.
Technical Note
X!!Tandem, an Improved Method for Running X!Tandem in Parallel on Collections of Commodity Computers
Robert D. Bjornson *- ,
Nicholas J. Carriero - ,
Christopher Colangelo - ,
Mark Shifman - ,
Kei-Hoi Cheung - ,
Perry L. Miller - , and
Kenneth Williams
The widespread use of mass spectrometry for protein identification has created a demand for computationally efficient methods of matching mass spectrometry data to protein databases. A search using X!Tandem, a popular and representative program, can require hours or days to complete, particularly when missed cleavages and post-translational modifications are considered. Existing techniques for accelerating X!Tandem by employing parallelism are unsatisfactory for a variety of reasons. The paper describes a parallelization of X!Tandem, called X!!Tandem, that shows excellent speedups on commodity hardware and produces the same results as the original program. Furthermore, the parallelization technique used is unusual and potentially useful for parallelizing other complex programs.
Does Trypsin Cut Before Proline?
Jesse Rodriguez - ,
Nitin Gupta *- ,
Richard D. Smith - , and
Pavel A. Pevzner
Trypsin is the most commonly used enzyme in mass spectrometry for protein digestion with high substrate specificity. Many peptide identification algorithms incorporate these specificity rules as filtering criteria. A generally accepted “Keil rule” is that trypsin cleaves next to arginine or lysine, but not before proline. Since this rule was derived two decades ago based on a small number of experimentally confirmed cleavages, we decided to re-examine it using 14.5 million tandem spectra (2 orders of magnitude increase in the number of observed tryptic cleavages). Our analysis revealed a surprisingly large number of cleavages before proline. We examine several hypotheses to explain these cleavages and argue that trypsin specificity rules used in peptide identification algorithms should be modified to “legitimatize” cleavages before proline. Our approach can be applied to analyze any protease, and we further argue that specificity rules for other enzymes should also be re-evaluated based on statistical evidence derived from large MS/MS data sets.
Communication
Online Multidimensional Separation with Biphasic Monolithic Capillary Column for Shotgun Proteome Analysis
Fangjun Wang - ,
Jing Dong - ,
Mingliang Ye - ,
Xiaogang Jiang - ,
Ren’an Wu - , and
Hanfa Zou *
A biphasic monolithic capillary column with 10 cm segment of strong-cation exchange monolith and 65 cm segment of reversed-phase monolith was prepared within a single 100 µm i.d. capillary. Separation performance of this column was evaluated by a five-cycle online multidimensional separation of 10 µg tryptic digest of yeast proteins using nanoflow liquid chromatography coupled with tandem mass spectrometry, and it took 12 h for whole separation under the operating pressure only ∼900 psi. Totally, 780 distinct proteins were positively identified through assignment of 2953 unique peptides at false-positive rate less than 1%. The good separation performance of this biphasic column was largely attributed to the good orthogonality of the strong-cation exchange monolith and reversed-phase monolith for multidimensional separation.
Article
Large-Scale Identification and Evolution Indexing of Tyrosine Phosphorylation Sites from Murine Brain
Bryan A. Ballif *- ,
G. Richard Carey - ,
Shamil R. Sunyaev - , and
Steven P. Gygi
Metazoans employ reversible tyrosine phosphorylation to regulate innumerable biological processes. Thus, the large-scale identification of tyrosine phosphorylation sites from primary tissues is an essential step toward a molecular systems understanding of dynamic regulation in vivo. The relative paucity of phosphotyrosine has greatly limited its identification in large-scale phosphoproteomic experiments. However, using antiphosphotyrosine peptide immunoprecipitations, we report the largest study to date of tyrosine phosphorylation sites from primary tissue, identifying 414 unique tyrosine phosphorylation sites from murine brain. To measure the conservation of phosphorylated tyrosines and their surrounding residues, we constructed a computational pipeline and identified patterns of conservation within the signature of phosphotyrosine.
Design of Recombinant Antibody Microarrays for Cell Surface Membrane Proteomics
Linda Dexlin - ,
Johan Ingvarsson - ,
Björn Frendéus - ,
Carl A. K. Borrebaeck - , and
Christer Wingren *
Generating proteomic maps of membrane proteins, common targets for therapeutic interventions and disease diagnostics, has turned out to be a major challenge. Antibody-based microarrays are among the novel rapidly evolving proteomic technologies that may enable global proteome analysis to be performed. Here, we have designed the first generation of a scaleable human recombinant scFv antibody microarray technology platform for cell surface membrane proteomics as well as glycomics targeting intact cells. The results showed that rapid and multiplexed profiling of the cell surface proteome (and glycome) could be performed in a highly specific and sensitive manner and that differential expression patterns due to external stimuli could be monitored.
A Quantitative Proteomic Analysis of Mitochondrial Participation in P19 Cell Neuronal Differentiation
Jermel Watkins - ,
Siddhartha Basu - , and
Daniel F. Bogenhagen *
A quantitative proteomic analysis of changes in protein expression accompanying the differentiation of P19 mouse embryonal carcinoma cells into neuron-like cells using isobaric tag technology coupled with LC−MS/MS revealed protein changes reflecting withdrawal from the cell cycle accompanied by a dynamic reorganization of the cytoskeleton and an up-regulation of mitochondrial biogenesis. Further study of quantitative changes in abundance of individual proteins in a purified mitochondrial fraction showed that most mitochondrial proteins increased significantly in abundance. A set of chaperone proteins did not participate in this increase, suggesting that neuron-like cells are relatively deficient in mitochondrial chaperones. We developed a procedure to account for differences in recovery of mitochondrial proteins during purification of organelles from distinct cell or tissue sources. Proteomic data supported by RT-PCR analysis suggests that enhanced mitochondrial biogenesis during neuronal differentiation may reflect a large increase in expression of PGC-1α combined with down-regulation of its negative regulator, p160 Mybbp1a.
A Proteome Resource of Ovarian Cancer Ascites: Integrated Proteomic and Bioinformatic Analyses To Identify Putative Biomarkers
Limor Gortzak-Uzan - ,
Alex Ignatchenko - ,
Andreas I. Evangelou - ,
Mahima Agochiya - ,
Kevin A. Brown - ,
Peter St.Onge - ,
Inga Kireeva - ,
Gerold Schmitt-Ulms - ,
Theodore J. Brown - ,
Joan Murphy - ,
Barry Rosen - ,
Patricia Shaw - ,
Igor Jurisica *- , and
Thomas Kislinger *
Epithelial ovarian cancer is the most lethal gynecological malignancy, and disease-specific biomarkers are urgently needed to improve diagnosis, prognosis, and to predict and monitor treatment efficiency. We present an in-depth proteomic analysis of selected biochemical fractions of human ovarian cancer ascites, resulting in the stringent and confident identification of over 2500 proteins. Rigorous filter schemes were applied to objectively minimize the number of false-positive identifications, and we only report proteins with substantial peptide evidence. Integrated computational analysis of the ascites proteome combined with several recently published proteomic data sets of human plasma, urine, 59 ovarian cancer related microarray data sets, and protein–protein interactions from the Interologous Interaction Database I2D (http://ophid.utoronto.ca/i2d) resulted in a short-list of 80 putative biomarkers. The presented proteomics analysis provides a significant resource for ovarian cancer research, and a framework for biomarker discovery.
Species Variation in the Fecal Metabolome Gives Insight into Differential Gastrointestinal Function
Jasmina Saric - ,
Yulan Wang - ,
Jia Li - ,
Muireann Coen - ,
Jürg Utzinger - ,
Julian R. Marchesi - ,
Jennifer Keiser - ,
Kirill Veselkov - ,
John C. Lindon - ,
Jeremy K. Nicholson - , and
Elaine Holmes *
The metabolic composition of fecal extracts provides a window for elucidating the complex metabolic interplay between mammals and their intestinal ecosystems, and these metabolite profiles can yield information on a range of gut diseases. Here, the metabolites present in aqueous fecal extracts of humans, mice and rats were characterized using high-resolution 1H NMR spectroscopy coupled with multivariate pattern recognition techniques. Additionally, the effects of sample storage and preparation methods were evaluated in order to assess the stability of fecal metabolite profiles, and to optimize information recovery from fecal samples. Finally, variations in metabolite profiles were investigated in healthy mice as a function of time. Interspecies variation was found to be greater than the variation due to either time or sample preparation. Although many fecal metabolites were common to the three species, such as short chain fatty acids and branched chain amino acids, each species generated a unique profile. Relatively higher levels of uracil, hypoxanthine, phenylacetic acid, glucose, glycine, and tyrosine amino acids were present in the rat, with β-alanine being unique to the rat, and glycerol and malonate being unique to the human. Human fecal extracts showed a greater interindividual variation than the two rodent species, reflecting the natural genetic and environmental diversity in human populations. Fecal composition in healthy mice was found to change over time, which might be explained by altered gut microbial presence or activity. The systematic characterization of fecal composition across humans, mice, and rats, together with the evaluation of inherent variation, provides a benchmark for future studies seeking to determine fecal biomarkers of disease and/or response to dietary or therapeutic interventions.
Real-Time Fluorescence Monitoring of Tryptic Digestion in Proteomics
Peter Karuso *- ,
Angela S. Crawford - ,
Duncan A. Veal - ,
Graham B. I. Scott - , and
Hung-Yoon Choi
Ensuring that proteolytic digestions are complete before submitting samples for downstream proteomic analyses is important, as failure or partial digestion can waste valuable instrument time and make results difficult to interpret. Conversely, overdigestion can also be problematic, such as when removing affinity tags from recombinant proteins or using nonspecific proteases. The techniques of HPLC, circular dichroism, SDS-PAGE, and MS have each been used to assess protein digestion. These techniques are slow, may require expensive instrumentation, can be inaccurate, and/or are unsuitable for real-time monitoring. Epicocconone is a natural fluorophore that reacts reversibly with proteins to form a highly fluorescent adduct and has previously been used to quantify proteins in 1D and 2D gels and in solution. Here, we describe a new method for the real-time monitoring of protein digestion based on epicocconone. This unique in situ fluorescent assay can tracelessly follow proteolysis of samples, at low microgram levels, destined for proteomics analysis or purification.
Quantitation by Isobaric Labeling: Applications to Glycomics
James A. Atwood III,- ,
Lei Cheng - ,
Gerardo Alvarez-Manilla - ,
Nicole L. Warren - ,
William S. York - , and
Ron Orlando *
The study of glycosylation patterns (glycomics) in biological samples is an emerging field that can provide key insights into cell development and pathology. A current challenge in the field of glycomics is to determine how to quantify changes in glycan expression between different cells, tissues, or biological fluids. Here we describe a novel strategy, quantitation by isobaric labeling (QUIBL), to facilitate comparative glycomics. Permethylation of a glycan with 13CH3I or 12CH2DI generates a pair of isobaric derivatives, which have the same nominal mass. However, each methylation site introduces a mass difference of 0.002922 Da. As glycans have multiple methylation sites, the total mass difference for the isobaric pair allows separation and quantitation at a resolution of ∼30000 m/Δm. N-Linked oligosaccharides from a standard glycoprotein and human serum were used to demonstrate that QUIBL facilitates relative quantitation over a linear dynamic range of 2 orders of magnitude and permits the relative quantitation of isomeric glycans. We applied QUIBL to quantitate glycomic changes associated with the differentiation of murine embryonic stem cells to embryoid bodies.
Analysis of Host-Inducing Proteome Changes in Bifidobacterium longum NCC2705 Grown in Vivo
Jing Yuan - ,
Bin Wang - ,
Zhongke Sun - ,
Xin Bo - ,
Xitong Yuan - ,
Xiang He - ,
Hongqing Zhao - ,
Xinying Du - ,
Fang Wang - ,
Zheng Jiang - ,
Ling Zhang - ,
Leili Jia - ,
Yufei Wang - ,
KaiHua Wei - ,
Jie Wang - ,
Xuemin Zhang - ,
Yansong Sun - ,
Liuyu Huang *- , and
Ming Zeng *
To investigate the molecular mechanisms underlying the adaptation of Bifidobacterium longum to the intestinal tract, we utilized a new model for rabbit intestinal culture of B. longum and reported the changes in proteomic profiles after incubation in the in vivo environment. By 2D-PAGE coupled with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and/or electrospray ionization tandem mass spectrometry (ESI-MS/MS) analyses, proteomic profiles of B. longum strain NCC2705 grown in the in vivo and in vitro environments were compared. Confirmed by semiquantitative RT-PCR, which exhibited at least a 3-fold change or greater, 19 up-regulated proteins, 14 down-regulated proteins, and 4 proteins with mobility changes were identified during intestinal growth. These identified proteins include key stress proteins, metabolism-related proteins, and proteins related to translation. Our results indicate that some useful proteins are expressed at higher levels in cells during intestinal growth. These proteins reflected the adaptation of B. longum NCC2705 to the intestine, such as EF-Tu which contributes to the retention or attachment as a Bifidobacterium adhesin-like factor, bile salt hydrolase (BSH) which might play an important role in the molecular mechanisms for the initial interaction of probiotic with the intestinal environment, and stress proteins which defend B. longum against the action of bile salts and other harmful ingredients of the gastrointestinal tract (GIT). The most striking fact of our observation was that four proteins GlnA1, PurC, LuxS, and Pgk exhibit clear post-translational modification. Western blot (WB) analysis and Pro-Q Diamond staining revealed that substances of the GIT trigger Pgk and LuxS phosphorylation at Ser/Thr residues for bacteria grown in vivo. These proteins were identified for the first time as bifidobacterial phosphoproteins. Our data suggest that the phosphorylated autoinducer-2 production protein LuxS of B. longum NCC2705 (LuxS-P) is the active form of LuxS and that LuxS-P may play a key role in the regulation of quorum sensing.
Integrated Analysis of the Cerebrospinal Fluid Peptidome and Proteome
Alexandre Zougman - ,
Bartosz Pilch - ,
Alexandre Podtelejnikov - ,
Michael Kiehntopf - ,
Claudia Schnabel - ,
Chanchal Kumar - , and
Matthias Mann *
Cerebrospinal fluid (CSF) is the only body fluid in direct contact with the brain and thus is a potential source of biomarkers. Furthermore, CSF serves as a medium of endocrine signaling and contains a multitude of regulatory peptides. A combined study of the peptidome and proteome of CSF or any other body fluid has not been reported previously. We report confident identification in CSF of 563 peptide products derived from 91 precursor proteins as well as a high confidence CSF proteome of 798 proteins. For the CSF peptidome, we use high accuracy mass spectrometry (MS) for MS and MS/MS modes, allowing unambiguous identification of neuropeptides. Combination of the peptidome and proteome data suggests that enzymatic processing of membrane proteins causes release of their extracellular parts into CSF. The CSF proteome has only partial overlap with the plasma proteome, thus it is produced locally rather than deriving from plasma. Our work offers insights into CSF composition and origin.
Metabolomic and Proteomic Analysis of a Clonal Insulin-Producing β-Cell Line (INS-1 832/13)
Céline Fernandez - ,
Ulrika Fransson - ,
Elna Hallgard - ,
Peter Spégel - ,
Cecilia Holm - ,
Morten Krogh - ,
Kristofer Wårell - ,
Peter James - , and
Hindrik Mulder *
Metabolites generated from fuel metabolism in pancreatic β-cells control exocytosis of insulin, a process which fails in type 2 diabetes. To identify and quantify these metabolites, global and unbiased analysis of cellular metabolism is required. To this end, polar metabolites, extracted from the clonal 832/13 β-cell line cultured at 2.8 and 16.7 mM glucose for 48 h, were derivatized followed by identification and quantification, using gas chromatography (GC) and mass spectrometry (MS). After culture at 16.7 mM glucose for 48 h, 832/13 β-cells exhibited a phenotype reminiscent of glucotoxicity with decreased content and secretion of insulin. The metabolomic analysis revealed alterations in the levels of 7 metabolites derived from glycolysis, the TCA cycle and pentose phosphate shunt, and 4 amino acids. Principal component analysis of the metabolite data showed two clusters, corresponding to the cells cultured at 2.8 and 16.7 mM glucose, respectively. Concurrent changes in protein expression were analyzed by 2-D gel electrophoresis followed by LC−MS/MS. The identities of 86 spots corresponding to 75 unique proteins that were significantly different in 832/13 β-cells cultured at 16.7 mM glucose were established. Only 5 of these were found to be metabolic enzymes that could be involved in the metabolomic alterations observed. Anticipated changes in metabolite levels in cells exposed to increased glucose were observed, while changes in enzyme levels were much less profound. This suggests that substrate availability, allosteric regulation, and/or post-translational modifications are more important determinants of metabolite levels than enzyme expression at the protein level.
Identification of Mouse Embryonic Stem Cell-Associated Proteins
Hossein Baharvand *- ,
Ali Fathi - ,
Hamid Gourabi - ,
Sepideh Mollamohammadi - , and
Ghasem Hosseini Salekdeh *
Over the past few years, there has been a growing interest in discovering the molecular mechanisms controlling embryonic stem cellsʼ (ESCs) proliferation and differentiation. Proteome analysis has proven to be an effective approach to comprehensively unravel the regulatory network of differentiation. We applied a two-dimensional electrophoresis based proteomic approach followed by mass spectrometry to analyze the proteome of two mouse ESC lines, Royan B1 and D3, at 0, 6, and 16 days after differentiation initiation. Out of 97 ESC-associated proteins commonly expressed in two ESC lines, 72 proteins were identified using MALDI TOF-TOF mass spectrometry analysis. The expression pattern of four down-regulated proteins including Hspd1, Hspa8, β-Actin, and Tpt1 were further confirmed by Western blot and immunofluorescence analyses in Royan B1 and D3 as well as two other mouse ESC lines, Royan C1 and Royan C4. Differential mRNA expression analysis of 20 genes using quantitative real-time reverse transcription PCR revealed a low correlation between mRNA and protein levels during differentiation. We also observed that the mRNA level of Tpt1 increased significantly in differentiating cells, whereas its protein level decreased. Several novel ESC-associated proteins have been presented in this study which warrants further investigation with respect to the etiology of stemness.
Antiviral Phagocytosis Is Regulated by a Novel Rab-Dependent Complex in Shrimp Penaeus japonicus
Wenlin Wu - ,
Rongrong Zong - ,
Jianyang Xu - , and
Xiaobo Zhang *
Rab GTPases are involved in phagosome formation and maturation. However, the role of Rab GTPases in phagocytosis against virus infection remains unknown. In this study, it was found that a Rab gene (PjRab) from marine shrimp was upregulated in virus-resistant shrimp, suggesting that Rab GTPase was involved in the innate response to virus. The RNAi and mRNA assays revealed that the PjRab protein could regulate shrimp hemocytic phagocytosis through a protein complex consisting of the PjRab, β-actin, tropomyosin, and envelope protein VP466 of shrimp white spot syndrome virus (WSSV). It was further demonstrated that the PjRab gene silencing by RNAi caused the increase in the number of WSSV copies, indicating that the PjRab might be an intracellular virus recognition protein employed by a host to increase the phagocytic activity. Therefore, our study presents a novel Rab-dependent signaling complex, in which the Rab GTPase might detect virus infection as an intracellular virus recognition protein and trigger downstream phagocytic defense against virus in crustacean for the first time. This discovery would improve our understanding of the still poorly understood molecular events involved in innate immune response against virus infection of invertebrates.
Aqueous Polymer Two-Phase Systems for the Proteomic Analysis of Plasma Membranes from Minute Brain Samples
Jens Schindler *- ,
Urs Lewandrowski - ,
Albert Sickmann - , and
Eckhard Friauf
Comprehensive knowledge about the plasma membrane protein profile of a given brain region, at defined developmental stages, will greatly foster the understanding of brain function and dysfunction. Protocols are required which selectively enrich plasma membranes from small brain regions, thereby resulting in high yields. Here, we present a suitable protocol that is based on aqueous polymer two-phase systems. It is material saving, easy to perform, fast, and low-priced. Evidence for its effectiveness was obtained by marker enzyme assays, immunoblot analyses, and mass spectrometry. Plasma membranes from all parts of the cells (somata, dendrites, and axons) were enriched, whereas there was a reduction of mitochondria and endoplasmic reticulum. The total of 15.0% of the initial activity of the plasma membrane marker was recovered, while the activity of the mitochondrial marker and the marker for the endoplasmic reticulum was 0.2% of the initial activity. Mass spectrometric analyses of proteins purified from approximately one-fourth of rat cerebellum (i.e., 80 mg of tissue) resulted in the identification of 525 different proteins, with 27.3% (gene ontology) or 38.2% (gene cards) being allocated to the plasma membrane. When accepting 4.7% of the initial mitochondrial marker activity and 2.9% of the initial activity of the marker for the endoplasmic reticulum as contaminations, the yield of the plasma membrane marker increased to 28.8%. Under these conditions, 586 different proteins were identified by mass spectrometry, 26.1–36.5% of which were plasma membrane proteins. Taken together, our protocol represents a powerful tool for the analysis of the plasma membrane subproteome of distinct brain regions.
Technical Note
Proteomic Analysis of Human Follicular Fluid Using an Alternative Bottom-Up Approach
Jörg Hanrieder - ,
Adrien Nyakas - ,
Tord Naessén - , and
Jonas Bergquist *
Human follicular fluid (hFF) is the in vivo environment of oocytes during follicular maturation in the ovaries. It contains a huge variety of compounds such as, e.g., proteins that might play an important role in follicular development and oocyte growth. Previous proteomic studies on follicular fluid have isolated and already identified a certain number of proteins. Nevertheless, only a small part of proteins present in follicular fluid have been covered so far and a large number have still not been identified. Therefore, the need for new, more resolving, and sensitive approaches in proteome research is evident. We utilized a proteomic setup based on in solution isoelectric focusing (IEF) and reversed-phase nanoliquid chromatography coupled to matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometry (nano-LC MALDI TOF/TOF MS) for in depth protein analysis of human follicular fluid samples of patients undergoing controlled ovarian hyper stimulation (COH) for in vitro fertilization therapy (IVF). This approach led to the significant identification of 69 proteins, where 32 have not been reported before to be found in human follicular fluid with proteomic methods. Among these findings, at least two relevant compounds essentially involved in hormone secretion regulation during the folliculogenetic process were identified: sex hormone binding globulin (SHBG) and inhibin A (INHA). To confirm these results, both proteins were further validated by immunoassays.
Simultaneous Analysis of Circulating Human Cytokines Using a High-Sensitivity Cytokine Biochip Array
S. Pete FitzGerald - ,
R. Ivan McConnell *- , and
Allen Huxley
Biochip array technology allows the simultaneous measurement of multiple analytes per sample using a single analytical device. This study shows its applicability to the simultaneous measurement of 12 circulating human cytokines with high-sensitivity detection. This application increases their real-time detectability, maintaining a broad concentration range and without compromising the precision. This methodology represents a very applicable tool in cytokine research when simultaneous determination of minute concentrations can be of interest.
Additions & Corrections
Proteome of Murine Jejunal Brush Border Membrane Vesicles
Mark Donowitz - ,
Siddharth Singh - ,
Farah F. Salahuddin - ,
Boris M. Hogema - ,
Yueping Chen - ,
Marjan Gucek - ,
Robert N. Cole - ,
Amy Ham - ,
Nicholas C. Zachos - ,
Olga Kovbasnjuk - ,
Lynne A. Lapierre - ,
Nellie Broere - ,
James Goldenring - ,
Hugo deJonge - , and
Xuhang Li
This publication is free to access through this site. Learn More