ACS Publications. Most Trusted. Most Cited. Most Read
Noncoding after All: Biases in Proteomics Data Do Not Explain Observed Absence of lncRNA Translation Products
My Activity

Figure 1Loading Img
  • Open Access
  • Editors Choice
Article

Noncoding after All: Biases in Proteomics Data Do Not Explain Observed Absence of lncRNA Translation Products
Click to copy article linkArticle link copied!

View Author Information
VIB-UGent Center for Medical Biotechnology, Ghent 9000, Belgium
§ ⊥ # Department of Biochemistry; §Center for Medical Genetics; Department of Mathematical Modeling, Statistics and Bioinformatics; Cancer Research Institute Ghent (CRIG); and #Bioinformatics Institute Ghent (BIG N2N), Ghent University, Ghent 9000 Belgium
*Prof. Dr. Lennart Martens, A. Baertsoenkaai 3, B-9000 Gent, Belgium, [email protected], tel: +32 9 264 93 58, fax: +32 9 264 94 84.
Open PDFSupporting Information (1)

Journal of Proteome Research

Cite this: J. Proteome Res. 2017, 16, 7, 2508–2515
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jproteome.7b00085
Published May 23, 2017

Copyright © 2017 American Chemical Society. This publication is licensed under these Terms of Use.

Abstract

Click to copy section linkSection link copied!

Over the past decade, long noncoding RNAs (lncRNAs) have emerged as novel functional entities of the eukaryotic genome. However, the scientific community remains divided over the amount of true noncoding transcripts among the large number of unannotated transcripts identified by recent large scale and deep RNA-sequencing efforts. Here, we systematically exclude possible technical reasons underlying the absence of lncRNA-encoded proteins in mass spectrometry data sets, strongly suggesting that the large majority of lncRNAs is indeed not translated.

Copyright © 2017 American Chemical Society

Introduction

Click to copy section linkSection link copied!

Advances in sequencing technologies have uncovered pervasive transcription of the eukaryotic genome outside of annotated protein-coding loci. Most of these novel transcripts are long (>200 nucleotides) and lack large open reading frames (ORFs) and homology to annotated protein-coding genes. (1) Termed long noncoding RNAs (lncRNAs), these transcripts comprise a vast, diverse, and largely unexplored class of RNA, outnumbering any other class of genetic entities in the human genome. (2) Those that have been studied in detail play important roles in a wide range of cellular processes during normal development and in homeostasis and disease, including cancer. (3)
Similar to lncRNAs, short open reading frame (sORF)-encoded polypeptides (SEPs) or micropeptides have gained increased attention over the past few years. While classical bioactive peptides are enzymatically cleaved from longer protein precursors, micropeptides are small peptides (<100 amino acids) directly translated from single sORFs. So far, only a limited number of these micropeptides have been discovered and functionally characterized. (4)
The coding potential of newly discovered RNA transcripts is typically assessed by means of prediction algorithms. (5-7) While each algorithm has its own strengths and weaknesses, they are all biased to current annotations and may thus be unsuitable for the detection of small or nonconserved proteins, including micropeptides.
Although the advent of ribosome profiling (8) (sequencing of ribosome protected RNA fragments) promised to provide evidence for (the lack of) translation of expressed ORFs, much is still open to interpretation. Numerous studies report substantial ribosome occupancy of lncRNA transcripts. (9-12) The striking similarities in the pattern and size of ribosome protected fragments covering protein-coding transcripts and lncRNAs have led some researchers to conclude that up to 90% of the lncRNA transcriptome bears coding ORFs. (10) Other researchers report much more conservative numbers. (11-14) For instance, if the relative abundance of ribosomes before and after stop codons (termed ribosome release) is used to discriminate between protein-coding and noncoding transcripts, only a few novel coding ORFs are found. (11) When taking into account the phased movement of ribosomes across translated ORFs, only a small number of novel peptides arising from transcripts annotated as lncRNAs (13) are identified. Different research groups have thus developed different metrics and methodologies to detect coding ORFs in ribosome profiling data. Without a consensus, the true coding potential of lncRNA transcripts remains open to speculation.
Mass spectrometry is often considered as the gold standard in detection and characterization of proteins or peptides. So far, few studies have turned to mass spectrometry to study micropeptides and lncRNA-encoded proteins. In our previous work, (15) we have reprocessed large quantities of tandem mass spectrometry data obtained from the PRoteomics IDEntifications (PRIDE) database. In brief, we reanalyzed raw data from 2,493 PRIDE experiments, containing 39,463,035 fragmentation mass spectra covering 68 human tissues using a combinatorial database consisting of Uniprot protein sequences and six reading frame translated LNCipedia lncRNAs. In these searches, less than 1% of the lncRNA genes in LNCipedia were covered by at least two unique peptide to spectrum matches (PSMs), compared to approximately 87% of Uniprot proteins (Volders et al., 2015; Tables S-1 and S-2). The results of these searches are publicly available through the LNCipedia portal.
Other groups have reported similar numbers, ranging from less than 100 up to 1,600 putative lncRNA-encoded proteins in human. (16-18) Compared to the more than 60,000 reported lncRNA genes, (2, 19) these numbers are fairly low and definitely much lower than those reported by various ribosome profiling studies.
This discrepancy in the reported amounts of potentially coding lncRNAs is the source of spirited discussion in the field. Indeed, a resolution of this conflict has direct relevance for further investigations into the biological roles of lncRNAs.
The most direct observation of coding lncRNAs is the actual detection by mass spectrometry-based proteomics of the encoded proteins. As such, the absence of large amounts of detected lncRNA-derived proteins strongly hints at a limited coding potential for lncRNAs. The main criticism of this approach, however, is that mass spectrometry-based proteomics is somehow biased against the detection of lncRNA products.
Here, we therefore examine the possible biases of mass spectrometry to detect and characterize lncRNA-encoded proteins based on a detailed yet exhaustive reprocessing of very large amounts of public proteomics data. Our findings clearly show that there are no obvious technical reasons why mass spectrometry would have largely missed (micro)peptides originating from noncoding RNA transcripts, thus eliminating the possibility that mass spectrometry would be biased against the detection of putative lncRNA-encoded proteins.

Influence of Protein Composition on Detectability by Mass Spectrometry

Click to copy section linkSection link copied!

Mass spectrometry enables high-throughput protein identification in complex samples. However, there is some controversy regarding the limitations of this technique in terms of detectability of peptides and thus, by extension, proteins. Several potential causes have been proposed, including biases due to the size of the protein sequence, the amino acid composition, the abundance, and the half-life of proteins. (20-22) Here, we investigate these presumed issues and identify potential reasons as to why certain predicted ORF products evade detection. The applied strategy revolves around the reprocessing of publicly available data in PRIDE, (23) one of the world’s leading mass spectrometry repositories. (24) Sequence database searches were performed using an automated reprocessing pipeline, consisting of pride-asap (25) for the detection of data set specific parameters, SearchGUI (26) to match the fragmentation mass spectra against peptides derived from protein sequence databases, and PeptideShaker (27) to integrate the identifications and control these at a 1% false discovery rate at the peptide-to-spectrum match level.
Because the combination of known canonical human protein sequences and hypothetical lncRNA-derived sequences can hamper protein inference, the overlap between both data sets must be investigated. This was achieved by matching the full set of tryptic peptides originating from the six reading frame translated LNCipedia database (version 3.1) against the full set of human canonical proteins in UniProtKB/SwissProt. (28) Out of 8,645,916 hypothetical tryprtic peptide sequences, only 277,412 had one or more identical matches in the protein sequence data set. The overlap is thus minimal (approximately 3.21%), which is about equal to the between-protein tryptic peptide overlap for the human complement of UniProtKB/Swiss-Prot, which does not consider any splice isoforms. This indicates that uniquely identifying lncRNA polypeptides should be no more difficult than identifying unique human proteins.
A first potential factor that may contribute to a detection bias is the size of a protein. In order to analyze this, publicly available submissions of human projects to PRIDE were searched against the human complement of the UniProtKB/SwissProt (28) protein sequence database using our reprocessing pipeline. The resulting set of proteins was ranked according to sequence length. A simple spectral count over all PRIDE assays in which a protein was identified was used to indicate the number of times the protein was observed. Q8WZ42, the megadalton protein titin, represented by its canonical isoform of 34,350 residues, was identified 298 times in 183 assays. This indicates that large proteins are picked up despite their length, as is to be expected due to the relatively higher number of potential MS/MS-identifiable peptides following enzymatic cleavage of larger proteins. At the same time, short proteins are also frequently identified across a broad range of assays (Table 1). It is noteworthy that, out of 20,207 human entries in UniProtKB/SwissProt, only 36—(mainly) tissue or cell specific—proteins (0.18%) are smaller than the shortest reported protein sequences in Table 1. These numbers provide a strong indication that protein length is not likely a major determining factor in protein detectability by mass spectrometry using standard sampling protocols.
Table 1. Ten Shortest Human Proteins Identified by Reprocessing of the Reprocessed PRIDE Data
proteingene namelength (AA)average MW (Da)spectral countassay count
P62328TMSB4X444921.46787287
P63313TMSB10444894.48366229
Q8N4H5TOMM5516035.318870
P62891RPL39516275.4910952
Q59GN2RPL39P5516322.5910751
Q5VTU8ATP5EP2515806.875343
P56381ATP5E515648.575343
Q96IX5USMG5586326.3811286
P62861FAU596647.86248141
P13640MT1G626647.867147
A second feature that could impose a bias on protein detection using mass spectrometry is the amino acid sequence composition. The existence of such a potential bias was investigated by comparing the composition of peptides that have been identified at high confidence with the composition of in silico generated peptide sequences. A theoretical digest of the human UniProtKB/SwissProt database was therefore created using dbtoolkit (29) with tryptic cleavage rules, allowing for two missed cleavages. Both empirical peptides from the reprocessing of the human data in PRIDE and in silico obtaining peptide sequences from the in silico digest of UniProtKB/SwissProt were filtered to sizes between 5 and 30 amino acids, which is the common range of observed peptide lengths in practice. (30) The amino acid composition of both theoretical and observed peptides was then calculated by counting the occurrence rate of an amino acid per position in the sequence (Figure 1). There is a high positive correlation between both data sets (Spearman ρ = 0.952, p < 0.01), hinting that there is no reason to assume that the compositions of proteins identified by the reprocessing of PRIDE and those generated by in silico digestion are very different. The higher occurrence rates for R and K in the experimental data are most likely related to the fact that these are the residues that are targeted by the most common sample preparation procedure, which involves protein digestion by trypsin. This is indeed the confirmed case for the majority of PRIDE projects. In addition, these residues are strong bases and therefore strongly promote ionization. The explanation for the slightly lower occurrence rate of S in the experimental data can be related to the fact that S can be phosphorylated in vivo, and to the somewhat lower efficiency in the detection of phosphorylated residues.

Figure 1

Figure 1. Comparison between theoretical (UniProtKB/SwissProt) and observed (reprocessed PRIDE data) peptide sequence amino acid composition for human data from PRIDE and UniProtKB/SwissProt.

Another important property that can affect detection by mass spectrometry is protein (and thus peptide) abundance in the sample. Although there are examples of successful enrichment protocols, (31) the detection of products of rare translation events is not straightforward. In order to investigate the influence of the abundance on the detectability of proteins by mass spectrometry, we first make use of the study by Anderson and Hunter (32) that reports empirically obtained protein quantification values in human blood plasma. Reprocessing of the subset of PRIDE data sets derived from human blood was carried out, and their estimated abundances were mapped to the values reported by Anderson and Hunter (Figure 2). While it is clear that the lowest abundant proteins are not detected, the abundance range of human plasma is quite extreme at 11 orders of magnitude, of which at least eight are covered reliably in the PRIDE data. This analysis thus shows that mass spectrometry-based proteomics is only biased against the very least abundant proteins.

Figure 2

Figure 2. Reprocessing results for PRIDE data sets derived from human blood plasma mapped onto the abundance values reported by Anderson and Hunter. (32) The size of a bubble corresponds to the number of PRIDE assays in which that protein was identified.

Another possibility for detection bias is provided by the half-life of a protein, as rapidly degraded proteins may escape detection as well. In order to assess a possible bias based on protein half-life, we make use of the study by Schwanhäusser et al., where half-life values for murine proteins are reported. (33) Because PRIDE also contains murine data, extensive reprocessing of these murine data sets against the mouse complement of the UniProtKB/SwissProt database was performed and the reprocessed identifications were mapped to the originally reported half-life data (Figure 3). This analysis reveals that the PRIDE data cover the entire half-life range, indicating no influence of protein half-life values on detectability.

Figure 3

Figure 3. Reprocessing results for all PRIDE murine data mapped onto the half-life values reported by Schwanhäusser et al. (33) The size of the bubble corresponds to the number of PRIDE assays in which the protein was identified.

In addition, we calculated the N-terminal instability index of human proteins as described by Guruprasad et al. (34) This metric is based on the dipeptide composition of a protein and provides a crude estimation of protein half-life when large-scale experimental data are lacking, as is the case for human proteins. The underlying assumption is that a protein’s half-life correlates negatively to its relative instability. We therefore compared the calculated instability indices for all proteins in the human complement of UniProtKB/SwissProt with those calculated for the identified proteins from the human data sets in PRIDE. Only a minor deviation is revealed between the instability index distributions of observed and theoretical proteins (Figure 4), providing additional proof that the degradation rate of a protein has little, if any, influence on its detectability.

Figure 4

Figure 4. Instability index distributions of human UniProtKB/SwissProt proteins, and of identified proteins from reprocessed human data sets in PRIDE.

lncRNA Expression and Composition Show No Indication of Coding Potential

Click to copy section linkSection link copied!

The expression profiles of lncRNAs differ extensively from those of protein-coding mRNAs (Figure 5a). LncRNAs are generally expressed at a lower level and are more abundant in the nucleus. While mRNAs are transported to the cytoplasm for ribosomal translation, several lncRNAs have a documented function in the nucleus. (35) As such, the nuclear enrichment of lncRNAs suggests a noncoding role for the majority of the lncRNA transcripts.

Figure 5

Figure 5. LncRNA and mRNA expression profile and detectability. (a) Two-dimensional kernel density plot of lncRNA and mRNA expression levels and subcellular localization. The enrichment of nuclear over cytosolic expression versus the expression in the whole-cell extract is shown. Selected lncRNA and protein-coding genes are depicted. Especially low abundant lncRNAs show nuclear enrichment compared to mRNAs (adapted from Djebali et al. (1)). (b) Whole-cell expression distribution for lncRNAs and mRNAs. Although lncRNAs are generally expressed at lower levels, a substantial overlap is observed. (c) Normalized spectral abundance factor (NSAF) of the detected protein as a function of its RNA expression level. While mRNA expression and NSAF are moderately correlated, the entire range of expression is clearly covered and thus detectable with mass spectrometry.

We have observed that very low protein abundance can hamper the detection by mass spectrometry (Figure 2) and lncRNAs are expressed at lower levels compared to mRNAs. Because expression level is a good predictor for protein concentration, (36) one might speculate that lncRNAs give rise to proteins at concentrations below the mass spectrometry detection limit. To examine this issue, we first compared lncRNA and mRNA expression levels in the GENCODE v7 data set (1) (see Supporting Information for details). While the average expression level of lncRNAs is below that of protein-coding genes, the expression range is very similar (Figure 5b). In addition, a substantial number of lncRNAs are expressed at levels similar to typical mRNA transcripts. To evaluate the protein detectability as a function of its mRNA expression, we compared mRNA expression levels to the normalized spectral abundance factor (NSAF+) (37) of the corresponding protein. The expression level is defined as the maximally observed RPKM (reads per kilobase per million mapped reads) for a particular mRNA across 11 cell lines in the GENCODE data set. The maximally observed NSAF+ for each protein from the 4,413 assays in PRIDE that originate from these cell lines is reported. The NSAF+ and RPKM show a low but significant correlation (Spearman ρ = 0.32, p-value <0.01), which is particularly apparent in the higher expression ranges (Figure 5c). Importantly, even though low abundant proteins are more difficult to detect, detected proteins cover the entire expression range. Thus, should lncRNAs give rise to proteins, their concentrations should be detectable by mass spectrometry.
The fact remains that most (if not all) lncRNAs contain canonical ORFs. While predictions classify these as noncoding (hence the annotation as lncRNA), it is conceivable that these ORFs represent recent evolutionary adaptations and are thus difficult to detect by in silico analyses. To evaluate if lncRNA ORFs are evolutionary retained or products of random nucleotide progression, we examined the relative size of these ORFs. By using the reverse complement of the sequence as a control, it is obvious that mRNA ORFs are much larger than random ORFs in the reverse complement sequence (see Supporting Information for details). In contrast, lncRNA ORFs do not differ in size from randomly occurring ORFs (Figure 6), suggesting that they are indeed the product of random nucleotide progression. In addition, it was previously shown that lncRNA ORFs do not show the within-species substitution patterns expected of recently evolved proteins. (11)

Figure 6

Figure 6. Relative size of the largest canonical ORF in mRNA and lncRNA transcripts. Using the reverse complement sequence as a control, it is apparent that lncRNA (as opposed to mRNA) ORFs are not larger than what would be expected from random nucleotide progression.

Conclusions

Click to copy section linkSection link copied!

Investigations into the proportion of coding lncRNAs have resulted in very different estimates. RNA-based analyses, including ribosome profiling, have led to very high estimates, while the more direct measurement of lncRNA-derived proteins via mass spectrometry has turned up only a small percentage of putatively coding lncRNAs. In order to help resolve this discrepancy, we here performed a detailed yet thorough analysis across the very large amounts of publicly data available for the human and murine proteomes to eliminate possible biases of mass spectrometry-based proteomics in detecting lncRNA-derived proteins. Our analyses reveal that the detection of proteins by mass spectrometry displays only limited bias, relating to proteins with very low abundance and/or very short sequence lengths (shorter than 44 amino acids). Nevertheless, it should be noted that specialized methods can circumvent the observed protein detection biases. Targeted sampling of less studied tissues may still reveal the existence of lncRNA-encoded, tissue specific (1) translation products. Short translation products can be picked up using peptidomics approaches, (38) and enrichment protocols (31) can boost yet unseen (micro)peptides above the mass spectrometry detection threshold. Our analyses thus also delineate useful methods and protocols for comprehensive analysis strategies that are tailored toward finding yet unfound putative protein products from lncRNAs.
Even though mass spectrometry has its limitations in the detection of very low abundant or very small proteins, we firmly demonstrate here that these limitations alone cannot explain the discrepancy between the observed number of lncRNA-encoded proteins and the predicted number by various ribosome profiling studies. In addition, we show that the putative protein products of lncRNA ORFs do not differ in protein sequence length or composition from currently well-detectable proteins. It is thus unlikely that the majority of the current lncRNA annotation consists of miss-classified protein-coding genes. These findings confirm that ribosome association alone is insufficient to define novel coding ORFs, as was already suggested by some ribosome profiling studies.

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.7b00085.

  • Supplementary methods,; supplementary tables S-1 to S-4, showing overview of the obtained coverage, processed RNA-sequencing datasets, and overview of the PRIDE projects (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
  • Authors
    • Kenneth Verheggen - VIB-UGent Center for Medical Biotechnology, Ghent 9000, Belgium
    • Pieter-Jan Volders - VIB-UGent Center for Medical Biotechnology, Ghent 9000, BelgiumOrcidhttp://orcid.org/0000-0002-2685-2637
    • Pieter Mestdagh - VIB-UGent Center for Medical Biotechnology, Ghent 9000, Belgium‡Department of Biochemistry; §Center for Medical Genetics; ∥Department of Mathematical Modeling, Statistics and Bioinformatics; ⊥Cancer Research Institute Ghent (CRIG); and #Bioinformatics Institute Ghent (BIG N2N), Ghent University, Ghent 9000 Belgium
    • Gerben Menschaert - VIB-UGent Center for Medical Biotechnology, Ghent 9000, Belgium‡Department of Biochemistry; §Center for Medical Genetics; ∥Department of Mathematical Modeling, Statistics and Bioinformatics; ⊥Cancer Research Institute Ghent (CRIG); and #Bioinformatics Institute Ghent (BIG N2N), Ghent University, Ghent 9000 Belgium
    • Petra Van Damme - VIB-UGent Center for Medical Biotechnology, Ghent 9000, Belgium
    • Kris Gevaert - VIB-UGent Center for Medical Biotechnology, Ghent 9000, BelgiumOrcidhttp://orcid.org/0000-0002-4237-0283
    • Jo Vandesompele - VIB-UGent Center for Medical Biotechnology, Ghent 9000, Belgium‡Department of Biochemistry; §Center for Medical Genetics; ∥Department of Mathematical Modeling, Statistics and Bioinformatics; ⊥Cancer Research Institute Ghent (CRIG); and #Bioinformatics Institute Ghent (BIG N2N), Ghent University, Ghent 9000 Belgium
  • Author Contributions

    K.V. and P.-J.V. contributed equally.

  • Notes
    The authors declare no competing financial interest.

Acknowledgment

Click to copy section linkSection link copied!

This work was supported by the Multidisciplinary Research Partnership ‘Bioinformatics: From Nucleotides to Networks’ Project of Ghent University [01MR0310W to P.V.]; Fund for Scientific Research Flanders [FWO; to P.M. P.D., and G.M.]; SBO grant “InSPECtor” of Flanders Innovation & Entrepreneurship (VLAIO) [120025 to L.M.]; and Ghent University [to K.V. and J.V.].

References

Click to copy section linkSection link copied!

This article references 38 other publications.

  1. 1
    Djebali, S.; Davis, C. A.; Merkel, A.; Dobin, A.; Lassmann, T.; Mortazavi, A. M.; Tanzer, A.; Lagarde, J.; Lin, W.; Schlesinger, F. Landscape of transcription in human cells Nature 2012, 489 (7414) 101 108 DOI: 10.1038/nature11233
  2. 2
    Iyer, M. K.; Niknafs, Y. S.; Malik, R.; Singhal, U.; Sahu, A.; Hosono, Y.; Barrette, T. R.; Prensner, J. R.; Evans, J. R.; Zhao, S. The landscape of long noncoding RNAs in the human transcriptome Nat. Genet. 2015, 47 (3) 199 208 DOI: 10.1038/ng.3192
  3. 3
    Mercer, T. R.; Dinger, M. E.; Mattick, J. S. Long non-coding RNAs: insights into functions Nat. Rev. Genet. 2009, 10 (3) 155 159 DOI: 10.1038/nrg2521
  4. 4
    Crappé, J.; Van Criekinge, W.; Menschaert, G. Little things make big things happen: A summary of micropeptide encoding genes EuPa Open Proteomics 2014, 3, 128 137 DOI: 10.1016/j.euprot.2014.02.006
  5. 5
    Lin, M. F.; Jungreis, I.; Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions Bioinformatics 2011, 27 (13) I275 I282 DOI: 10.1093/bioinformatics/btr209
  6. 6
    Wang, L.; Park, H. J.; Dasari, S.; Wang, S.; Kocher, J.-P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model Nucleic Acids Res. 2013, 41 (6) e74 e74 DOI: 10.1093/nar/gkt006
  7. 7
    Kong, L.; Zhang, Y.; Ye, Z. Q.; Liu, X. Q.; Zhao, S. Q.; Wei, L.; Gao, G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine Nucleic Acids Res. 2007, 35, W345 W349 DOI: 10.1093/nar/gkm391
  8. 8
    Ingolia, N. T. Genome-Wide Translational Profiling by Ribosome Footprinting. In Guide to Yeast Genetics: Functional Genomics, Proteomics, and Other Systems Analysis; Methods in Enzymology; Elsevier, 2010; Vol. 470, pp 119 142.
  9. 9
    Ingolia, N. T.; Lareau, L. F.; Weissman, J. S. Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes Cell 2011, 147 (4) 789 802 DOI: 10.1016/j.cell.2011.10.002
  10. 10
    Ingolia, N. T.; Brar, G. A.; Stern-Ginossar, N.; Harris, M. S.; Talhouarne, G. J. S.; Jackson, S. E.; Wills, M. R.; Weissman, J. S. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes Cell Rep. 2014, 8 (5) 1365 1379 DOI: 10.1016/j.celrep.2014.07.045
  11. 11
    Guttman, M.; Russell, P.; Ingolia, N. T.; Weissman, J. S.; Lander, E. S. Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins Cell 2013, 154 (1) 240 251 DOI: 10.1016/j.cell.2013.06.009
  12. 12
    Chew, G.-L.; Pauli, A.; Rinn, J. L.; Regev, A.; Schier, A. F.; Valen, E. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs Development 2013, 140 (13) 2828 2834 DOI: 10.1242/dev.098343
  13. 13
    Bazzini, A. A.; Johnstone, T. G.; Christiano, R.; Mackowiak, S. D.; Obermayer, B.; Fleming, E. S.; Vejnar, C. E.; Lee, M. T.; Rajewsky, N.; Walther, T. C. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation EMBO J. 2014, 33 (9) 981 993 DOI: 10.1002/embj.201488411
  14. 14
    Lee, S.; Liu, B.; Lee, S.; Huang, S.-X.; Shen, B.; Qian, S.-B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (37) E2424 E2432 DOI: 10.1073/pnas.1207846109
  15. 15
    Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174 D180 DOI: 10.1093/nar/gku1060
  16. 16
    Menschaert, G.; Van Criekinge, W.; Notelaers, T.; Koch, A.; Crappé, J.; Gevaert, K.; Van Damme, P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events Mol. Cell. Proteomics 2013, 12 (7) 1780 1790 DOI: 10.1074/mcp.M113.027540
  17. 17
    Crappé, J.; Van Criekinge, W.; Trooskens, G.; Hayakawa, E.; Luyten, W.; Baggerman, G.; Menschaert, G. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs BMC Genomics 2013, 14 (1) 648 DOI: 10.1186/1471-2164-14-648
  18. 18
    Slavoff, S. A.; Mitchell, A. J.; Schwaid, A. G.; Cabili, M. N.; Ma, J.; Levin, J. Z.; Karger, A. D.; Budnik, B. A.; Rinn, J. L.; Saghatelian, A. Peptidomic discovery of short open reading frame-encoded peptides in human cells Nat. Chem. Biol. 2012, 9 (1) 59 64 DOI: 10.1038/nchembio.1120
  19. 19
    Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174 D180 DOI: 10.1093/nar/gku1060
  20. 20
    Brewis, I. A.; Brennan, P. Proteomics technologies for the global identification and quantification of proteins Adv. Protein Chem. Struct. Biol. 2010, 80, 1 44 DOI: 10.1016/B978-0-12-381264-3.00001-1
  21. 21
    Klie, S.; Martens, L.; Vizcaíno, J. A.; Côté, R.; Jones, P.; Apweiler, R.; Hinneburg, A.; Hermjakob, H. Analyzing large-scale proteomics projects with latent semantic indexing J. Proteome Res. 2008, 7 (1) 182 191 DOI: 10.1021/pr070461k
  22. 22
    Leary, D. H.; Hervey, W. J.; Deschamps, J. R.; Kusterbeck, A. W.; Vora, G. J. Which metaproteome? The impact of protein extraction bias on metaproteomic analyses Mol. Cell. Probes 2013, 27 (5–6) 193 199 DOI: 10.1016/j.mcp.2013.06.003
  23. 23
    Vizcaíno, J. A.; Côté, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013 Nucleic Acids Res. 2013, 41 (Database issue) D1063 D1069 DOI: 10.1093/nar/gks1262
  24. 24
    Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N. ProteomeXchange provides globally coordinated proteomics data submission and dissemination Nat. Biotechnol. 2014, 32 (3) 223 226 DOI: 10.1038/nbt.2839
  25. 25
    Hulstaert, N.; Reisinger, F.; Rameseder, J.; Barsnes, H.; Vizcaíno, J. A.; Martens, L. Pride-asap: automatic fragment ion annotation of identified PRIDE spectra J. Proteomics 2013, 95, 89 92 DOI: 10.1016/j.jprot.2013.04.011
  26. 26
    Vaudel, M.; Barsnes, H.; Berven, F. S.; Sickmann, A.; Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches Proteomics 2011, 11 (5) 996 999 DOI: 10.1002/pmic.201000595
  27. 27
    Vaudel, M.; Burkhart, J. M.; Zahedi, R. P.; Oveland, E.; Berven, F. S.; Sickmann, A.; Martens, L.; Barsnes, H. PeptideShaker enables reanalysis of MS-derived proteomics data sets Nat. Biotechnol. 2015, 33 (1) 22 24 DOI: 10.1038/nbt.3109
  28. 28
    UniProt Consortium Activities at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2014, 42 (Database issue) D191 D198 DOI: 10.1093/nar/gkt1140
  29. 29
    Martens, L.; Vandekerckhove, J.; Gevaert, K. DBToolkit: processing protein databases for peptide-centric proteomics Bioinformatics 2005, 21 (17) 3584 3585 DOI: 10.1093/bioinformatics/bti588
  30. 30
    Vandermarliere, E.; Mueller, M.; Martens, L. Getting intimate with trypsin, the leading protease in proteomics Mass Spectrom. Rev. 2013, 32 (6) 453 465 DOI: 10.1002/mas.21376
  31. 31
    Mustafa, G. M.; Larry, D.; Petersen, J. R.; Elferink, C. J. Targeted proteomics for biomarker discovery and validation of hepatocellular carcinoma in hepatitis C infected patients World J. Hepatol 2015, 7 (10) 1312 1324 DOI: 10.4254/wjh.v7.i10.1312
  32. 32
    Anderson, L.; Hunter, C. L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins Mol. Cell. Proteomics 2005, 5 (4) 573 588 DOI: 10.1074/mcp.M500331-MCP200
  33. 33
    Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Corrigendum: Global quantification of mammalian gene expression control Nature 2013, 495 (7439) 126 127 DOI: 10.1038/nature11848
  34. 34
    Guruprasad, K.; Reddy, B. V.; Pandit, M. W. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence Protein Eng., Des. Sel. 1990, 4 (2) 155 161 DOI: 10.1093/protein/4.2.155
  35. 35
    Rinn, J. L.; Chang, H. Y. Genome regulation by long noncoding RNAs Annu. Rev. Biochem. 2012, 81, 145 166 DOI: 10.1146/annurev-biochem-051410-092902
  36. 36
    Wilhelm, M.; Schlegl, J.; Hahne, H.; Moghaddas Gholami, A.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H. Mass-spectrometry-based draft of the human proteome Nature 2014, 509 (7502) 582 587 DOI: 10.1038/nature13319
  37. 37
    Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae J. Proteome Res. 2006, 5 (9) 2339 2347 DOI: 10.1021/pr060161n
  38. 38
    Schulz-Knappe, P.; Schrader, M.; Zucht, H.-D. The peptidomics concept Comb. Chem. High Throughput Screening 2005, 8 (8) 697 704 DOI: 10.2174/138620705774962418

Cited By

Click to copy section linkSection link copied!
Citation Statements
Explore this article's citation statements on scite.ai

This article is cited by 29 publications.

  1. Tine Claeys, Maxime Menu, Robbin Bouwmeester, Kris Gevaert, Lennart Martens. Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins. Journal of Proteome Research 2023, 22 (4) , 1181-1192. https://doi.org/10.1021/acs.jproteome.2c00644
  2. Pathmanaban Ramasamy, Demet Turan, Natalia Tichshenko, Niels Hulstaert, Elien Vandermarliere, Wim Vranken, Lennart Martens. Scop3P: A Comprehensive Resource of Human Phosphosites within Their Full Context. Journal of Proteome Research 2020, 19 (8) , 3478-3486. https://doi.org/10.1021/acs.jproteome.0c00306
  3. Young-Ki Paik, Lydie Lane, Takeshi Kawamura, Yu-Ju Chen, Je-Yoel Cho, Joshua LaBaer, Jong Shin Yoo, Gilberto Domont, Fernando Corrales, Gilbert S. Omenn, Alexander Archakov, Sergio Encarnación-Guevara, Siqi Lui, Ghasem Hosseini Salekdeh, Jin-Young Cho, Chae-Yeon Kim, Christopher M. Overall. Launching the C-HPP neXt-CP50 Pilot Project for Functional Characterization of Identified Proteins with No Known Function. Journal of Proteome Research 2018, 17 (12) , 4042-4050. https://doi.org/10.1021/acs.jproteome.8b00383
  4. Gilbert S. Omenn, Lydie Lane, Emma K. Lundberg, Christopher M. Overall, and Eric W. Deutsch . Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. Journal of Proteome Research 2017, 16 (12) , 4281-4287. https://doi.org/10.1021/acs.jproteome.7b00375
  5. Alexandre Luiz Korte de Azevedo, Talita Helen Bombardelli Gomig, Michel Batista, Jaqueline Carvalho de Oliveira, Iglenir João Cavalli, Daniela Fiori Gradia, Enilze Maria de Souza Fonseca Ribeiro. Peptidomics and Machine Learning–based Evaluation of Noncoding RNA–Derived Micropeptides in Breast Cancer: Expression Patterns and Functional/Therapeutic Insights. Laboratory Investigation 2024, 104 (12) , 102150. https://doi.org/10.1016/j.labinv.2024.102150
  6. Joseph D. Valencia, David A. Hendrix, . Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLOS Computational Biology 2023, 19 (10) , e1011526. https://doi.org/10.1371/journal.pcbi.1011526
  7. John R. Prensner, Jennifer G. Abelin, Leron W. Kok, Karl R. Clauser, Jonathan M. Mudge, Jorge Ruiz-Orera, Michal Bassani-Sternberg, Robert L. Moritz, Eric W. Deutsch, Sebastiaan van Heesch. What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome?. Molecular & Cellular Proteomics 2023, 22 (9) , 100631. https://doi.org/10.1016/j.mcpro.2023.100631
  8. Benjamin J.M. Tremblay, Julia I. Qüesta. Mechanisms of epigenetic regulation of transcription by lncRNAs in plants. IUBMB Life 2023, 75 (5) , 427-439. https://doi.org/10.1002/iub.2681
  9. Viola Melone, Annamaria Salvati, Noemi Brusco, Elena Alexandrova, Ylenia D’Agostino, Domenico Palumbo, Luigi Palo, Ilaria Terenzi, Giovanni Nassa, Francesca Rizzo, Giorgio Giurato, Alessandro Weisz, Roberta Tarallo. Functional Relationships between Long Non-Coding RNAs and Estrogen Receptor Alpha: A New Frontier in Hormone-Responsive Breast Cancer Management. International Journal of Molecular Sciences 2023, 24 (2) , 1145. https://doi.org/10.3390/ijms24021145
  10. Annelies Bogaert, Daria Fijalkowska, An Staes, Tessa Van de Steene, Hans Demol, Kris Gevaert. Limited Evidence for Protein Products of Noncoding Transcripts in the HEK293T Cellular Cytosol. Molecular & Cellular Proteomics 2022, 21 (8) , 100264. https://doi.org/10.1016/j.mcpro.2022.100264
  11. Xiaotong Luo, Yuantai Huang, Huiqin Li, Yihai Luo, Zhixiang Zuo, Jian Ren, Yubin Xie. SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients. Nucleic Acids Research 2022, 50 (D1) , D1373-D1381. https://doi.org/10.1093/nar/gkab822
  12. Bhavesh S. Parmar, Marlies K. R. Peeters, Kurt Boonen, Ellie C. Clark, Geert Baggerman, Gerben Menschaert, Liesbet Temmerman. Identification of Non-Canonical Translation Products in C. elegans Using Tandem Mass Spectrometry. Frontiers in Genetics 2021, 12 https://doi.org/10.3389/fgene.2021.728900
  13. Sara Andjus, Antonin Morillon, Maxime Wery. From Yeast to Mammals, the Nonsense-Mediated mRNA Decay as a Master Regulator of Long Non-Coding RNAs Functional Trajectory. Non-Coding RNA 2021, 7 (3) , 44. https://doi.org/10.3390/ncrna7030044
  14. Rui Vitorino, Sofia Guedes, Francisco Amado, Manuel Santos, Nobuyoshi Akimitsu. The role of micropeptides in biology. Cellular and Molecular Life Sciences 2021, 78 (7) , 3285-3298. https://doi.org/10.1007/s00018-020-03740-3
  15. Qing Zhang, Erzhong Wu, Yiheng Tang, Tanxi Cai, Lili Zhang, Jifeng Wang, Yajing Hao, Bao Zhang, Yue Zhou, Xiaojing Guo, Jianjun Luo, Runsheng Chen, Fuquan Yang. Deeply Mining a Universe of Peptides Encoded by Long Noncoding RNAs. Molecular & Cellular Proteomics 2021, 20 , 100109. https://doi.org/10.1016/j.mcpro.2021.100109
  16. Mor Varon, Tal Levy, Gal Mazor, Hila Ben David, Ran Marciano, Yakov Krelin, Manu Prasad, Moshe Elkabets, David Pauck, Ulvi Ahmadov, Daniel Picard, Nan Qin, Arndt Borkhardt, Guido Reifenberger, Gabriel Leprivier, Marc Remke, Barak Rotblat. The long noncoding RNA TP73‐AS1 promotes tumorigenicity of medulloblastoma cells. International Journal of Cancer 2019, 145 (12) , 3402-3413. https://doi.org/10.1002/ijc.32400
  17. Sajib Chakraborty, Geoffroy Andrieux, A. M. Mahmudul Hasan, Musaddeque Ahmed, Md. Ismail Hosen, Tania Rahman, M. Anwar Hossain, Melanie Boerries. Harnessing the tissue and plasma lncRNA-peptidome to discover peptide-based cancer biomarkers. Scientific Reports 2019, 9 (1) https://doi.org/10.1038/s41598-019-48774-1
  18. Lucas F. Maciel, David A. Morales-Vicente, Gilbert O. Silveira, Raphael O. Ribeiro, Giovanna G. O. Olberg, David S. Pires, Murilo S. Amaral, Sergio Verjovski-Almeida. Weighted Gene Co-Expression Analyses Point to Long Non-Coding RNA Hub Genes at Different Schistosoma mansoni Life-Cycle Stages. Frontiers in Genetics 2019, 10 https://doi.org/10.3389/fgene.2019.00823
  19. Igor Fesenko, Ilya Kirov, Andrey Kniazev, Regina Khazigaleeva, Vassili Lazarev, Daria Kharlampieva, Ekaterina Grafskaia, Viktor Zgoda, Ivan Butenko, Georgy Arapidi, Anna Mamaeva, Vadim Ivanov, Vadim Govorun. Distinct types of short open reading frames are translated in plant cells. Genome Research 2019, 29 (9) , 1464-1477. https://doi.org/10.1101/gr.253302.119
  20. Jing Li, Changning Liu. Coding or Noncoding, the Converging Concepts of RNAs. Frontiers in Genetics 2019, 10 https://doi.org/10.3389/fgene.2019.00496
  21. Lucía Lorenzi, Francisco Avila Cobos, Anneleen Decock, Celine Everaert, Hetty Helsmoortel, Steve Lefever, Karen Verboom, Pieter‐Jan Volders, Frank Speleman, Jo Vandesompele, Pieter Mestdagh. Long noncoding RNA expression profiling in cancer: Challenges and opportunities. Genes, Chromosomes and Cancer 2019, 58 (4) , 191-199. https://doi.org/10.1002/gcc.22709
  22. Jorge Ruiz-Orera, M Mar Albà. Conserved regions in long non-coding RNAs contain abundant translation and protein–RNA interaction signatures. NAR Genomics and Bioinformatics 2019, 1 (1) , e2-e2. https://doi.org/10.1093/nargab/lqz002
  23. Xinqiang Yin, Yuanyuan Jing, Hanmei Xu. Mining for missed sORF-encoded peptides. Expert Review of Proteomics 2019, 16 (3) , 257-266. https://doi.org/10.1080/14789450.2019.1571919
  24. Soumasree De, Liron Levin, Barak Rotblat. lncRNA in worms – Time to meet the neighbors. Current Opinion in Systems Biology 2019, 13 , 10-15. https://doi.org/10.1016/j.coisb.2018.08.007
  25. Roberto Giambruno, Marija Mihailovich, Tiziana Bonaldi. Mass Spectrometry-Based Proteomics to Unveil the Non-coding RNA World. Frontiers in Molecular Biosciences 2018, 5 https://doi.org/10.3389/fmolb.2018.00090
  26. Mingkun Yang, Xiaohuang Lin, Xin Liu, Jia Zhang, Feng Ge. Genome Annotation of a Model Diatom Phaeodactylum tricornutum Using an Integrated Proteogenomic Pipeline. Molecular Plant 2018, 11 (10) , 1292-1307. https://doi.org/10.1016/j.molp.2018.08.005
  27. Barbara Uszczynska-Ratajczak, Julien Lagarde, Adam Frankish, Roderic Guigó, Rory Johnson. Towards a complete map of the human long non-coding RNA transcriptome. Nature Reviews Genetics 2018, 19 (9) , 535-548. https://doi.org/10.1038/s41576-018-0017-y
  28. I. A. Fesenko, I. V. Kirov, A. A. Filippova. Impact of Noncoding Part of the Genome on the Proteome Plasticity of the Eukaryotic Cell. Russian Journal of Bioorganic Chemistry 2018, 44 (4) , 397-402. https://doi.org/10.1134/S1068162018040076
  29. Young-Ki Paik, Gilbert S. Omenn, William S. Hancock, Lydie Lane, Christopher M. Overall. Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Review of Proteomics 2017, 14 (12) , 1059-1071. https://doi.org/10.1080/14789450.2017.1394189

Journal of Proteome Research

Cite this: J. Proteome Res. 2017, 16, 7, 2508–2515
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jproteome.7b00085
Published May 23, 2017

Copyright © 2017 American Chemical Society. This publication is licensed under these Terms of Use.

Article Views

3720

Altmetric

-

Citations

Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. Comparison between theoretical (UniProtKB/SwissProt) and observed (reprocessed PRIDE data) peptide sequence amino acid composition for human data from PRIDE and UniProtKB/SwissProt.

    Figure 2

    Figure 2. Reprocessing results for PRIDE data sets derived from human blood plasma mapped onto the abundance values reported by Anderson and Hunter. (32) The size of a bubble corresponds to the number of PRIDE assays in which that protein was identified.

    Figure 3

    Figure 3. Reprocessing results for all PRIDE murine data mapped onto the half-life values reported by Schwanhäusser et al. (33) The size of the bubble corresponds to the number of PRIDE assays in which the protein was identified.

    Figure 4

    Figure 4. Instability index distributions of human UniProtKB/SwissProt proteins, and of identified proteins from reprocessed human data sets in PRIDE.

    Figure 5

    Figure 5. LncRNA and mRNA expression profile and detectability. (a) Two-dimensional kernel density plot of lncRNA and mRNA expression levels and subcellular localization. The enrichment of nuclear over cytosolic expression versus the expression in the whole-cell extract is shown. Selected lncRNA and protein-coding genes are depicted. Especially low abundant lncRNAs show nuclear enrichment compared to mRNAs (adapted from Djebali et al. (1)). (b) Whole-cell expression distribution for lncRNAs and mRNAs. Although lncRNAs are generally expressed at lower levels, a substantial overlap is observed. (c) Normalized spectral abundance factor (NSAF) of the detected protein as a function of its RNA expression level. While mRNA expression and NSAF are moderately correlated, the entire range of expression is clearly covered and thus detectable with mass spectrometry.

    Figure 6

    Figure 6. Relative size of the largest canonical ORF in mRNA and lncRNA transcripts. Using the reverse complement sequence as a control, it is apparent that lncRNA (as opposed to mRNA) ORFs are not larger than what would be expected from random nucleotide progression.

  • References


    This article references 38 other publications.

    1. 1
      Djebali, S.; Davis, C. A.; Merkel, A.; Dobin, A.; Lassmann, T.; Mortazavi, A. M.; Tanzer, A.; Lagarde, J.; Lin, W.; Schlesinger, F. Landscape of transcription in human cells Nature 2012, 489 (7414) 101 108 DOI: 10.1038/nature11233
    2. 2
      Iyer, M. K.; Niknafs, Y. S.; Malik, R.; Singhal, U.; Sahu, A.; Hosono, Y.; Barrette, T. R.; Prensner, J. R.; Evans, J. R.; Zhao, S. The landscape of long noncoding RNAs in the human transcriptome Nat. Genet. 2015, 47 (3) 199 208 DOI: 10.1038/ng.3192
    3. 3
      Mercer, T. R.; Dinger, M. E.; Mattick, J. S. Long non-coding RNAs: insights into functions Nat. Rev. Genet. 2009, 10 (3) 155 159 DOI: 10.1038/nrg2521
    4. 4
      Crappé, J.; Van Criekinge, W.; Menschaert, G. Little things make big things happen: A summary of micropeptide encoding genes EuPa Open Proteomics 2014, 3, 128 137 DOI: 10.1016/j.euprot.2014.02.006
    5. 5
      Lin, M. F.; Jungreis, I.; Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions Bioinformatics 2011, 27 (13) I275 I282 DOI: 10.1093/bioinformatics/btr209
    6. 6
      Wang, L.; Park, H. J.; Dasari, S.; Wang, S.; Kocher, J.-P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model Nucleic Acids Res. 2013, 41 (6) e74 e74 DOI: 10.1093/nar/gkt006
    7. 7
      Kong, L.; Zhang, Y.; Ye, Z. Q.; Liu, X. Q.; Zhao, S. Q.; Wei, L.; Gao, G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine Nucleic Acids Res. 2007, 35, W345 W349 DOI: 10.1093/nar/gkm391
    8. 8
      Ingolia, N. T. Genome-Wide Translational Profiling by Ribosome Footprinting. In Guide to Yeast Genetics: Functional Genomics, Proteomics, and Other Systems Analysis; Methods in Enzymology; Elsevier, 2010; Vol. 470, pp 119 142.
    9. 9
      Ingolia, N. T.; Lareau, L. F.; Weissman, J. S. Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes Cell 2011, 147 (4) 789 802 DOI: 10.1016/j.cell.2011.10.002
    10. 10
      Ingolia, N. T.; Brar, G. A.; Stern-Ginossar, N.; Harris, M. S.; Talhouarne, G. J. S.; Jackson, S. E.; Wills, M. R.; Weissman, J. S. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes Cell Rep. 2014, 8 (5) 1365 1379 DOI: 10.1016/j.celrep.2014.07.045
    11. 11
      Guttman, M.; Russell, P.; Ingolia, N. T.; Weissman, J. S.; Lander, E. S. Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins Cell 2013, 154 (1) 240 251 DOI: 10.1016/j.cell.2013.06.009
    12. 12
      Chew, G.-L.; Pauli, A.; Rinn, J. L.; Regev, A.; Schier, A. F.; Valen, E. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs Development 2013, 140 (13) 2828 2834 DOI: 10.1242/dev.098343
    13. 13
      Bazzini, A. A.; Johnstone, T. G.; Christiano, R.; Mackowiak, S. D.; Obermayer, B.; Fleming, E. S.; Vejnar, C. E.; Lee, M. T.; Rajewsky, N.; Walther, T. C. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation EMBO J. 2014, 33 (9) 981 993 DOI: 10.1002/embj.201488411
    14. 14
      Lee, S.; Liu, B.; Lee, S.; Huang, S.-X.; Shen, B.; Qian, S.-B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (37) E2424 E2432 DOI: 10.1073/pnas.1207846109
    15. 15
      Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174 D180 DOI: 10.1093/nar/gku1060
    16. 16
      Menschaert, G.; Van Criekinge, W.; Notelaers, T.; Koch, A.; Crappé, J.; Gevaert, K.; Van Damme, P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events Mol. Cell. Proteomics 2013, 12 (7) 1780 1790 DOI: 10.1074/mcp.M113.027540
    17. 17
      Crappé, J.; Van Criekinge, W.; Trooskens, G.; Hayakawa, E.; Luyten, W.; Baggerman, G.; Menschaert, G. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs BMC Genomics 2013, 14 (1) 648 DOI: 10.1186/1471-2164-14-648
    18. 18
      Slavoff, S. A.; Mitchell, A. J.; Schwaid, A. G.; Cabili, M. N.; Ma, J.; Levin, J. Z.; Karger, A. D.; Budnik, B. A.; Rinn, J. L.; Saghatelian, A. Peptidomic discovery of short open reading frame-encoded peptides in human cells Nat. Chem. Biol. 2012, 9 (1) 59 64 DOI: 10.1038/nchembio.1120
    19. 19
      Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174 D180 DOI: 10.1093/nar/gku1060
    20. 20
      Brewis, I. A.; Brennan, P. Proteomics technologies for the global identification and quantification of proteins Adv. Protein Chem. Struct. Biol. 2010, 80, 1 44 DOI: 10.1016/B978-0-12-381264-3.00001-1
    21. 21
      Klie, S.; Martens, L.; Vizcaíno, J. A.; Côté, R.; Jones, P.; Apweiler, R.; Hinneburg, A.; Hermjakob, H. Analyzing large-scale proteomics projects with latent semantic indexing J. Proteome Res. 2008, 7 (1) 182 191 DOI: 10.1021/pr070461k
    22. 22
      Leary, D. H.; Hervey, W. J.; Deschamps, J. R.; Kusterbeck, A. W.; Vora, G. J. Which metaproteome? The impact of protein extraction bias on metaproteomic analyses Mol. Cell. Probes 2013, 27 (5–6) 193 199 DOI: 10.1016/j.mcp.2013.06.003
    23. 23
      Vizcaíno, J. A.; Côté, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013 Nucleic Acids Res. 2013, 41 (Database issue) D1063 D1069 DOI: 10.1093/nar/gks1262
    24. 24
      Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N. ProteomeXchange provides globally coordinated proteomics data submission and dissemination Nat. Biotechnol. 2014, 32 (3) 223 226 DOI: 10.1038/nbt.2839
    25. 25
      Hulstaert, N.; Reisinger, F.; Rameseder, J.; Barsnes, H.; Vizcaíno, J. A.; Martens, L. Pride-asap: automatic fragment ion annotation of identified PRIDE spectra J. Proteomics 2013, 95, 89 92 DOI: 10.1016/j.jprot.2013.04.011
    26. 26
      Vaudel, M.; Barsnes, H.; Berven, F. S.; Sickmann, A.; Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches Proteomics 2011, 11 (5) 996 999 DOI: 10.1002/pmic.201000595
    27. 27
      Vaudel, M.; Burkhart, J. M.; Zahedi, R. P.; Oveland, E.; Berven, F. S.; Sickmann, A.; Martens, L.; Barsnes, H. PeptideShaker enables reanalysis of MS-derived proteomics data sets Nat. Biotechnol. 2015, 33 (1) 22 24 DOI: 10.1038/nbt.3109
    28. 28
      UniProt Consortium Activities at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2014, 42 (Database issue) D191 D198 DOI: 10.1093/nar/gkt1140
    29. 29
      Martens, L.; Vandekerckhove, J.; Gevaert, K. DBToolkit: processing protein databases for peptide-centric proteomics Bioinformatics 2005, 21 (17) 3584 3585 DOI: 10.1093/bioinformatics/bti588
    30. 30
      Vandermarliere, E.; Mueller, M.; Martens, L. Getting intimate with trypsin, the leading protease in proteomics Mass Spectrom. Rev. 2013, 32 (6) 453 465 DOI: 10.1002/mas.21376
    31. 31
      Mustafa, G. M.; Larry, D.; Petersen, J. R.; Elferink, C. J. Targeted proteomics for biomarker discovery and validation of hepatocellular carcinoma in hepatitis C infected patients World J. Hepatol 2015, 7 (10) 1312 1324 DOI: 10.4254/wjh.v7.i10.1312
    32. 32
      Anderson, L.; Hunter, C. L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins Mol. Cell. Proteomics 2005, 5 (4) 573 588 DOI: 10.1074/mcp.M500331-MCP200
    33. 33
      Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Corrigendum: Global quantification of mammalian gene expression control Nature 2013, 495 (7439) 126 127 DOI: 10.1038/nature11848
    34. 34
      Guruprasad, K.; Reddy, B. V.; Pandit, M. W. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence Protein Eng., Des. Sel. 1990, 4 (2) 155 161 DOI: 10.1093/protein/4.2.155
    35. 35
      Rinn, J. L.; Chang, H. Y. Genome regulation by long noncoding RNAs Annu. Rev. Biochem. 2012, 81, 145 166 DOI: 10.1146/annurev-biochem-051410-092902
    36. 36
      Wilhelm, M.; Schlegl, J.; Hahne, H.; Moghaddas Gholami, A.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H. Mass-spectrometry-based draft of the human proteome Nature 2014, 509 (7502) 582 587 DOI: 10.1038/nature13319
    37. 37
      Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae J. Proteome Res. 2006, 5 (9) 2339 2347 DOI: 10.1021/pr060161n
    38. 38
      Schulz-Knappe, P.; Schrader, M.; Zucht, H.-D. The peptidomics concept Comb. Chem. High Throughput Screening 2005, 8 (8) 697 704 DOI: 10.2174/138620705774962418
  • Supporting Information

    Supporting Information


    The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.7b00085.

    • Supplementary methods,; supplementary tables S-1 to S-4, showing overview of the obtained coverage, processed RNA-sequencing datasets, and overview of the PRIDE projects (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.