Noncoding after All: Biases in Proteomics Data Do Not Explain Observed Absence of lncRNA Translation ProductsClick to copy article linkArticle link copied!
- Kenneth Verheggen
- Pieter-Jan Volders
- Pieter Mestdagh
- Gerben Menschaert
- Petra Van Damme
- Kris Gevaert
- Lennart Martens
- Jo Vandesompele
Abstract
Over the past decade, long noncoding RNAs (lncRNAs) have emerged as novel functional entities of the eukaryotic genome. However, the scientific community remains divided over the amount of true noncoding transcripts among the large number of unannotated transcripts identified by recent large scale and deep RNA-sequencing efforts. Here, we systematically exclude possible technical reasons underlying the absence of lncRNA-encoded proteins in mass spectrometry data sets, strongly suggesting that the large majority of lncRNAs is indeed not translated.
Introduction
Influence of Protein Composition on Detectability by Mass Spectrometry
protein | gene name | length (AA) | average MW (Da) | spectral count | assay count |
---|---|---|---|---|---|
P62328 | TMSB4X | 44 | 4921.46 | 787 | 287 |
P63313 | TMSB10 | 44 | 4894.48 | 366 | 229 |
Q8N4H5 | TOMM5 | 51 | 6035.31 | 88 | 70 |
P62891 | RPL39 | 51 | 6275.49 | 109 | 52 |
Q59GN2 | RPL39P5 | 51 | 6322.59 | 107 | 51 |
Q5VTU8 | ATP5EP2 | 51 | 5806.87 | 53 | 43 |
P56381 | ATP5E | 51 | 5648.57 | 53 | 43 |
Q96IX5 | USMG5 | 58 | 6326.38 | 112 | 86 |
P62861 | FAU | 59 | 6647.86 | 248 | 141 |
P13640 | MT1G | 62 | 6647.86 | 71 | 47 |
Figure 1
Figure 1. Comparison between theoretical (UniProtKB/SwissProt) and observed (reprocessed PRIDE data) peptide sequence amino acid composition for human data from PRIDE and UniProtKB/SwissProt.
Figure 2
Figure 2. Reprocessing results for PRIDE data sets derived from human blood plasma mapped onto the abundance values reported by Anderson and Hunter. (32) The size of a bubble corresponds to the number of PRIDE assays in which that protein was identified.
Figure 3
Figure 3. Reprocessing results for all PRIDE murine data mapped onto the half-life values reported by Schwanhäusser et al. (33) The size of the bubble corresponds to the number of PRIDE assays in which the protein was identified.
Figure 4
Figure 4. Instability index distributions of human UniProtKB/SwissProt proteins, and of identified proteins from reprocessed human data sets in PRIDE.
lncRNA Expression and Composition Show No Indication of Coding Potential
Figure 5
Figure 5. LncRNA and mRNA expression profile and detectability. (a) Two-dimensional kernel density plot of lncRNA and mRNA expression levels and subcellular localization. The enrichment of nuclear over cytosolic expression versus the expression in the whole-cell extract is shown. Selected lncRNA and protein-coding genes are depicted. Especially low abundant lncRNAs show nuclear enrichment compared to mRNAs (adapted from Djebali et al. (1)). (b) Whole-cell expression distribution for lncRNAs and mRNAs. Although lncRNAs are generally expressed at lower levels, a substantial overlap is observed. (c) Normalized spectral abundance factor (NSAF) of the detected protein as a function of its RNA expression level. While mRNA expression and NSAF are moderately correlated, the entire range of expression is clearly covered and thus detectable with mass spectrometry.
Figure 6
Figure 6. Relative size of the largest canonical ORF in mRNA and lncRNA transcripts. Using the reverse complement sequence as a control, it is apparent that lncRNA (as opposed to mRNA) ORFs are not larger than what would be expected from random nucleotide progression.
Conclusions
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.7b00085.
Supplementary methods,; supplementary tables S-1 to S-4, showing overview of the obtained coverage, processed RNA-sequencing datasets, and overview of the PRIDE projects (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgment
This work was supported by the Multidisciplinary Research Partnership ‘Bioinformatics: From Nucleotides to Networks’ Project of Ghent University [01MR0310W to P.V.]; Fund for Scientific Research Flanders [FWO; to P.M. P.D., and G.M.]; SBO grant “InSPECtor” of Flanders Innovation & Entrepreneurship (VLAIO) [120025 to L.M.]; and Ghent University [to K.V. and J.V.].
References
This article references 38 other publications.
- 1Djebali, S.; Davis, C. A.; Merkel, A.; Dobin, A.; Lassmann, T.; Mortazavi, A. M.; Tanzer, A.; Lagarde, J.; Lin, W.; Schlesinger, F. Landscape of transcription in human cells Nature 2012, 489 (7414) 101– 108 DOI: 10.1038/nature11233Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtlGnt73M&md5=8f8ee2a1189ced1010dec5e45a305acfLandscape of transcription in human cellsDjebali, Sarah; Davis, Carrie A.; Merkel, Angelika; Dobin, Alex; Lassmann, Timo; Mortazavi, Ali; Tanzer, Andrea; Lagarde, Julien; Lin, Wei; Schlesinger, Felix; Xue, Chenghai; Marinov, Georgi K.; Khatun, Jainab; Williams, Brian A.; Zaleski, Chris; Rozowsky, Joel; Roeder, Maik; Kokocinski, Felix; Abdelhamid, Rehab F.; Alioto, Tyler; Antoshechkin, Igor; Baer, Michael T.; Bar, Nadav S.; Batut, Philippe; Bell, Kimberly; Bell, Ian; Chakrabortty, Sudipto; Chen, Xian; Chrast, Jacqueline; Curado, Joao; Derrien, Thomas; Drenkow, Jorg; Dumais, Erica; Dumais, Jacqueline; Duttagupta, Radha; Falconnet, Emilie; Fastuca, Meagan; Fejes-Toth, Kata; Ferreira, Pedro; Foissac, Sylvain; Fullwood, Melissa J.; Gao, Hui; Gonzalez, David; Gordon, Assaf; Gunawardena, Harsha; Howald, Cedric; Jha, Sonali; Johnson, Rory; Kapranov, Philipp; King, Brandon; Kingswood, Colin; Luo, Oscar J.; Park, Eddie; Persaud, Kimberly; Preall, Jonathan B.; Ribeca, Paolo; Risk, Brian; Robyr, Daniel; Sammeth, Michael; Schaffer, Lorian; See, Lei-Hoon; Shahab, Atif; Skancke, Jorgen; Suzuki, Ana Maria; Takahashi, Hazuki; Tilgner, Hagen; Trout, Diane; Walters, Nathalie; Wang, Huaien; Wrobel, John; Yu, Yanbao; Ruan, Xiaoan; Hayashizaki, Yoshihide; Harrow, Jennifer; Gerstein, Mark; Hubbard, Tim; Reymond, Alexandre; Antonarakis, Stylianos E.; Hannon, Gregory; Giddings, Morgan C.; Ruan, Yijun; Wold, Barbara; Carninci, Piero; Guigo, Roderic; Gingeras, Thomas R.Nature (London, United Kingdom) (2012), 489 (7414), 101-108CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A review. Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalog of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalog is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
- 2Iyer, M. K.; Niknafs, Y. S.; Malik, R.; Singhal, U.; Sahu, A.; Hosono, Y.; Barrette, T. R.; Prensner, J. R.; Evans, J. R.; Zhao, S. The landscape of long noncoding RNAs in the human transcriptome Nat. Genet. 2015, 47 (3) 199– 208 DOI: 10.1038/ng.3192Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtFKjs7g%253D&md5=028eb70e5365d4d33325968f5ea06d3cThe landscape of long noncoding RNAs in the human transcriptomeIyer, Matthew K.; Niknafs, Yashar S.; Malik, Rohit; Singhal, Udit; Sahu, Anirban; Hosono, Yasuyuki; Barrette, Terrence R.; Prensner, John R.; Evans, Joseph R.; Zhao, Shuang; Poliakov, Anton; Cao, Xuhong; Dhanasekaran, Saravana M.; Wu, Yi-Mi; Robinson, Dan R.; Beer, David G.; Feng, Felix Y.; Iyer, Hariharan K.; Chinnaiyan, Arul M.Nature Genetics (2015), 47 (3), 199-208CODEN: NGENEC; ISSN:1061-4036. (Nature Publishing Group)Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiol. and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodol. to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3900) overlapped disease-assocd. SNPs. To prioritize lineage-specific, disease-assocd. lncRNA expression, we employed non-parametric differential expression testing and nominated 7942 lineage- or cancer-assocd. lncRNA genes. The lncRNA landscape characterized here may shed light on normal biol. and cancer pathogenesis and may be valuable for future biomarker development.
- 3Mercer, T. R.; Dinger, M. E.; Mattick, J. S. Long non-coding RNAs: insights into functions Nat. Rev. Genet. 2009, 10 (3) 155– 159 DOI: 10.1038/nrg2521Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhvFGlu7k%253D&md5=508aacdddaa8986015a3f2ed55b34cf8Long non-coding RNAs: insights into functionsMercer, Tim R.; Dinger, Marcel E.; Mattick, John S.Nature Reviews Genetics (2009), 10 (3), 155-159CODEN: NRGAAM; ISSN:1471-0056. (Nature Publishing Group)A review. The recent discovery that most of the eukaryotic genome is transcribed has focused interest on the importance of non-coding transcripts. Long non-coding RNAs are emerging as a class with wide-ranging functions in gene regulation. In mammals and other eukaryotes most of the genome is transcribed in a developmentally regulated manner to produce large nos. of long non-coding RNAs (ncRNAs). Here we review the rapidly advancing field of long ncRNAs, describing their conservation, their organization in the genome and their roles in gene regulation. We also consider the medical implications, and the emerging recognition that any transcript, regardless of coding potential, can have an intrinsic function as an RNA.
- 4Crappé, J.; Van Criekinge, W.; Menschaert, G. Little things make big things happen: A summary of micropeptide encoding genes EuPa Open Proteomics 2014, 3, 128– 137 DOI: 10.1016/j.euprot.2014.02.006Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXpvFWjs7k%253D&md5=3dd8e4a1de3c2a1cf00e323cfb48bc44Little things make big things happen: A summary of micropeptide encoding genesCrappe, Jeroen; Van Criekinge, Wim; Menschaert, GerbenEuPa Open Proteomics (2014), 3 (), 128-137CODEN: EOPUA7; ISSN:2212-9685. (Elsevier B.V.)A review. Classical bioactive peptides are cleaved from larger precursor proteins and are targeted toward the secretory pathway by means of an N-terminal signaling sequence. In contrast, micropeptides encoded from small open reading frames, lack such signaling sequence and are immediately released in the cytoplasm after translation. Over the past few years many such non-canonical genes (including open reading frames, ORFs smaller than 100 AAs) have been discovered and functionally characterized in different eukaryotic organisms. Furthermore, in silico approaches enabled the prediction of the existence of many more putatively coding small ORFs in the genomes of Sacharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and Mus musculus. However, questions remain as to what the functional role of this new class of eukaryotic genes might be, and how widespread they are. In the future, approaches integrating in silico, conservation-based prediction and a combination of genomic, proteomic and functional validation methods will prove to be indispensable to answer these open questions.
- 5Lin, M. F.; Jungreis, I.; Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions Bioinformatics 2011, 27 (13) I275– I282 DOI: 10.1093/bioinformatics/btr209Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXnvVKgsbw%253D&md5=b3f72d5760649b6c236863c65cc24967PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regionsLin, Michael F.; Jungreis, Irwin; Kellis, ManolisBioinformatics (2011), 27 (13), i275-i282CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to det. whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many addnl. species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures.
- 6Wang, L.; Park, H. J.; Dasari, S.; Wang, S.; Kocher, J.-P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model Nucleic Acids Res. 2013, 41 (6) e74– e74 DOI: 10.1093/nar/gkt006Google ScholarThere is no corresponding record for this reference.
- 7Kong, L.; Zhang, Y.; Ye, Z. Q.; Liu, X. Q.; Zhao, S. Q.; Wei, L.; Gao, G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine Nucleic Acids Res. 2007, 35, W345– W349 DOI: 10.1093/nar/gkm391Google ScholarThere is no corresponding record for this reference.
- 8Ingolia, N. T. Genome-Wide Translational Profiling by Ribosome Footprinting. In Guide to Yeast Genetics: Functional Genomics, Proteomics, and Other Systems Analysis; Methods in Enzymology; Elsevier, 2010; Vol. 470, pp 119– 142.Google ScholarThere is no corresponding record for this reference.
- 9Ingolia, N. T.; Lareau, L. F.; Weissman, J. S. Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes Cell 2011, 147 (4) 789– 802 DOI: 10.1016/j.cell.2011.10.002Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsVKgsLnJ&md5=07e899f5084ff6912ea3415c53dbe550Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian ProteomesIngolia, Nicholas T.; Lareau, Liana F.; Weissman, Jonathan S.Cell (Cambridge, MA, United States) (2011), 147 (4), 789-802CODEN: CELLB5; ISSN:0092-8674. (Cell Press)Summary: The ability to sequence genomes has far outstripped approaches for deciphering the information they encode. Here we present a suite of techniques, based on ribosome profiling (the deep sequencing of ribosome-protected mRNA fragments), to provide genome-wide maps of protein synthesis as well as a pulse-chase strategy for detg. rates of translation elongation. We exploit the propensity of harringtonine to cause ribosomes to accumulate at sites of translation initiation together with a machine learning algorithm to define protein products systematically. Anal. of translation in mouse embryonic stem cells reveals thousands of strong pause sites and unannotated translation products. These include amino-terminal extensions and truncations and upstream open reading frames with regulatory potential, initiated at both AUG and non-AUG codons, whose translation changes after differentiation. We also define a class of short, polycistronic ribosome-assocd. coding RNAs (sprcRNAs) that encode small proteins. Our studies reveal an unanticipated complexity to mammalian proteomes. A high-resoln. look at mammalian translation reveals unanticipated diversity in the resulting proteome, including peptide products from putative noncoding RNAs.
- 10Ingolia, N. T.; Brar, G. A.; Stern-Ginossar, N.; Harris, M. S.; Talhouarne, G. J. S.; Jackson, S. E.; Wills, M. R.; Weissman, J. S. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes Cell Rep. 2014, 8 (5) 1365– 1379 DOI: 10.1016/j.celrep.2014.07.045Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhsVarsb%252FM&md5=39970bfdab3095ca5b6dbf7eedf66578Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding GenesIngolia, Nicholas T.; Brar, Gloria A.; Stern-Ginossar, Noam; Harris, Michael S.; Talhouarne, Gaelle J. S.; Jackson, Sarah E.; Wills, Mark R.; Weissman, Jonathan S.Cell Reports (2014), 8 (5), 1365-1379CODEN: CREED8; ISSN:2211-1247. (Cell Press)Ribosome profiling suggests that ribosomes occupy many regions of the transcriptome thought to be noncoding, including 5' UTRs and long noncoding RNAs (lncRNAs). Apparent ribosome footprints outside of protein-coding regions raise the possibility of artifacts unrelated to translation, particularly when they occupy multiple, overlapping open reading frames (ORFs). Here, we show hallmarks of translation in these footprints: copurifn. with the large ribosomal subunit, response to drugs targeting elongation, trinucleotide periodicity, and initiation at early AUGs. We develop a metric for distinguishing between 80S footprints and nonribosomal sources using footprint size distributions, which validates the vast majority of footprints outside of coding regions. We present evidence for polypeptide prodn. beyond annotated genes, including the induction of immune responses following human cytomegalovirus (HCMV) infection. Translation is pervasive on cytosolic transcripts outside of conserved reading frames, and direct detection of this expanded universe of translated products enables efforts at understanding how cells manage and exploit its consequences.
- 11Guttman, M.; Russell, P.; Ingolia, N. T.; Weissman, J. S.; Lander, E. S. Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins Cell 2013, 154 (1) 240– 251 DOI: 10.1016/j.cell.2013.06.009Google ScholarThere is no corresponding record for this reference.
- 12Chew, G.-L.; Pauli, A.; Rinn, J. L.; Regev, A.; Schier, A. F.; Valen, E. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs Development 2013, 140 (13) 2828– 2834 DOI: 10.1242/dev.098343Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXht1OmsbfI&md5=dece282502cb10ac98b7b2d80352208fRibosome profiling reveals resemblance between long non-coding RNAs and 5' leaders of coding RNAsChew, Guo-Liang; Pauli, Andrea; Rinn, John L.; Regev, Aviv; Schier, Alexander F.; Valen, EivindDevelopment (Cambridge, United Kingdom) (2013), 140 (13), 2828-2834CODEN: DEVPED; ISSN:0950-1991. (Company of Biologists Ltd.)Large-scale genomics and computational approaches have identified thousands of putative long non-coding RNAs (lncRNAs). It has been controversial, however, as to what fraction of these RNAs is truly non-coding. Here, we combine ribosome profiling with a machine-learning approach to validate lncRNAs during zebrafish development in a high throughput manner. We find that dozens of proposed lncRNAs are protein-coding contaminants and that many lncRNAs have ribosome profiles that resemble the 5' leaders of coding RNAs. Anal. of ribosome profiling data from embryonic stem cells reveals similar properties for mammalian lncRNAs. These results clarify the annotation of developmental lncRNAs and suggest a potential role for translation in lncRNA regulation. In addn., our computational pipeline and ribosome profiling data provide a powerful resource for the identification of translated open reading frames during zebrafish development.
- 13Bazzini, A. A.; Johnstone, T. G.; Christiano, R.; Mackowiak, S. D.; Obermayer, B.; Fleming, E. S.; Vejnar, C. E.; Lee, M. T.; Rajewsky, N.; Walther, T. C. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation EMBO J. 2014, 33 (9) 981– 993 DOI: 10.1002/embj.201488411Google ScholarThere is no corresponding record for this reference.
- 14Lee, S.; Liu, B.; Lee, S.; Huang, S.-X.; Shen, B.; Qian, S.-B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (37) E2424– E2432 DOI: 10.1073/pnas.1207846109Google ScholarThere is no corresponding record for this reference.
- 15Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174– D180 DOI: 10.1093/nar/gku1060Google ScholarThere is no corresponding record for this reference.
- 16Menschaert, G.; Van Criekinge, W.; Notelaers, T.; Koch, A.; Crappé, J.; Gevaert, K.; Van Damme, P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events Mol. Cell. Proteomics 2013, 12 (7) 1780– 1790 DOI: 10.1074/mcp.M113.027540Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVyisL7P&md5=718d0a686ac9be925b6e3b782f9ee532Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation eventsMenschaert, Gerben; Van Criekinge, Wim; Notelaers, Tineke; Koch, Alexander; Crappe, Jeroen; Gevaert, Kris; Van Damme, PetraMolecular & Cellular Proteomics (2013), 12 (7), 1780-1790CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)An increasing no. of studies involve integrative anal. of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing and highly sensitive mass spectrometry (MS) instrumentation. Recently, a strategy, termed ribosome profiling (or RIBO-seq), based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both Swiss-Prot- and RIBO-seq-derived translation products, applicable for MS/MS spectrum identification. To record the impact of using the constructed deep proteome database, we performed two alternative MS-based proteomic strategies as follows: (i) a regular shotgun proteomic and (ii) an N-terminal combined fractional diagonal chromatog. (COFRADIC) approach. Although the former technique gives an overall assessment on the protein and peptide level, the latter technique, specifically enabling the isolation of N-terminal peptides, is very appropriate in validating the RIBO-seq-derived (alternative) translation initiation site profile. We demonstrate that this proteogenomic approach increases the overall protein identification rate 2.5% (e.g. new protein products, new protein splice variants, single nucleotide polymorphism variant proteins, and N-terminally extended forms of known proteins) as compared with only searching UniProtKB-SwissProt. Furthermore, using this custom database, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated upstream ORFs. Notably, the characterization of these new translation products revealed the use of multiple near-cognate (non-AUG) start codons. As deep sequencing techniques are becoming more std., less expensive, and widespread, we anticipate that mRNA sequencing and esp. custom-tailored RIBO-seq will become indispensable in the MS-based protein or peptide identification process. The underlying mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD000124.
- 17Crappé, J.; Van Criekinge, W.; Trooskens, G.; Hayakawa, E.; Luyten, W.; Baggerman, G.; Menschaert, G. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs BMC Genomics 2013, 14 (1) 648 DOI: 10.1186/1471-2164-14-648Google ScholarThere is no corresponding record for this reference.
- 18Slavoff, S. A.; Mitchell, A. J.; Schwaid, A. G.; Cabili, M. N.; Ma, J.; Levin, J. Z.; Karger, A. D.; Budnik, B. A.; Rinn, J. L.; Saghatelian, A. Peptidomic discovery of short open reading frame-encoded peptides in human cells Nat. Chem. Biol. 2012, 9 (1) 59– 64 DOI: 10.1038/nchembio.1120Google ScholarThere is no corresponding record for this reference.
- 19Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174– D180 DOI: 10.1093/nar/gku1060Google ScholarThere is no corresponding record for this reference.
- 20Brewis, I. A.; Brennan, P. Proteomics technologies for the global identification and quantification of proteins Adv. Protein Chem. Struct. Biol. 2010, 80, 1– 44 DOI: 10.1016/B978-0-12-381264-3.00001-1Google Scholar20https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXotlyhtw%253D%253D&md5=a2b7a4d42a14605f8d8abf7bb6249632Proteomics technologies for the global identification and quantification of proteinsBrewis, Ian A.; Brennan, P.Advances in Protein Chemistry and Structural Biology (2010), 80 (), 1-44CODEN: APCSG7; ISSN:1876-1623. (Elsevier Ltd.)This review provides an introduction for the nonspecialist to proteomics and in particular the major approaches available for global protein identification and quantification. Proteomics technologies offer considerable opportunities for improved biol. understanding and biomarker discovery. The central platform for proteomics is tandem mass spectrometry (MS) but a no. of other technologies, resources, and expertise are absolutely required to perform meaningful expts. These include protein sepn. science (and protein biochem. in general), genomics, and bioinformatics. There are a range of workflows available for protein (or peptide) sepn. prior to tandem MS and subsequent bioinformatics anal. to achieve protein identifications. The predominant approaches are 2D electrophoresis (2DE) and subsequent MS, liq. chromatog.-MS (LC-MS), and GeLC-MS. Beyond protein identification, there are a no. of well-established options available for protein quantification. Difference gel electrophoresis (DIGE) following 2DE is one option but MS-based methods (most commonly iTRAQ-Isobaric Tags for Relative and Abs. Quantification or SILAC-Stable Isotope Labeling by Amino Acids) are now the preferred options. Sample prepn. is crit. to performing good expts. and subcellular fractionation can addnl. provide protein localization information compared with whole cell lysates. Differential detergent solubilization is another valid option. With biol. fluids, it is possible to remove the most abundant proteins by immunodepletion. Sample enrichment is also used extensively in certain analyses and most commonly in phosphoproteomics with the initial purifn. of phosphopeptides. Proteomics produces considerable datasets and resources to facilitate the necessary extended anal. of this data are improving all the time. Beyond the opportunities afforded by proteomics there are definite challenges to achieving full proteomic coverage. Proteomes are highly complex and identifying and quantifying low abundance proteins is a significant issue. Addnl., the anal. of poorly sol. proteins, such as membrane proteins and multiprotein complexes, is difficult. However, it is without doubt that proteomics has already provided significant insights into biol. function and this will continue as the technol. continues to improve. We also anticipate that the promise of proteomics in terms of biomarker discovery will increasingly be realized.
- 21Klie, S.; Martens, L.; Vizcaíno, J. A.; Côté, R.; Jones, P.; Apweiler, R.; Hinneburg, A.; Hermjakob, H. Analyzing large-scale proteomics projects with latent semantic indexing J. Proteome Res. 2008, 7 (1) 182– 191 DOI: 10.1021/pr070461kGoogle Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtlOgsrbP&md5=1a4d89b17bc7ecc2e4bb577c2f3750a8Analyzing Large-Scale Proteomics Projects with Latent Semantic IndexingKlie, Sebastian; Martens, Lennart; Vizcaino, Juan Antonio; Cote, Richard; Jones, Phil; Apweiler, Rolf; Hinneburg, Alexander; Hermjakob, HenningJournal of Proteome Research (2008), 7 (1), 182-191CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput expts. have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amt. of identifications available to the community. Despite the considerable body of information amassed, very few successful analyses have been performed and published on this data, leveling off the ultimate value of these projects far below their potential. A prominent reason published proteomics data is seldom reanalyzed lies in the heterogeneous nature of the original sample collection and the subsequent data recording and processing. To illustrate that at least part of this heterogeneity can be compensated for, we here apply a latent semantic anal. to the data contributed by the Human Proteome Organization's Plasma Proteome Project (hupo ppp). Interestingly, despite the broad spectrum of instruments and methodologies applied in the hupo ppp, our anal. reveals several obvious patterns that can be used to formulate concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in future expts. It is clear from these results that the anal. of large bodies of publicly available proteomics data by noise-tolerant algorithms such as the latent semantic anal. holds great promise and is currently underexploited.
- 22Leary, D. H.; Hervey, W. J.; Deschamps, J. R.; Kusterbeck, A. W.; Vora, G. J. Which metaproteome? The impact of protein extraction bias on metaproteomic analyses Mol. Cell. Probes 2013, 27 (5–6) 193– 199 DOI: 10.1016/j.mcp.2013.06.003Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVOrtrjO&md5=e4abde58896e50dfd05c97813e031f05Which metaproteome? The impact of protein extraction bias on metaproteomic analysesLeary, Dagmar Hajkova; Hervey, W. Judson, IV; Deschamps, Jeffrey R.; Kusterbeck, Anne W.; Vora, Gary J.Molecular and Cellular Probes (2013), 27 (5-6), 193-199CODEN: MCPRE6; ISSN:0890-8508. (Elsevier Ltd.)Culture-independent techniques such as LC-MS/MS-based metaproteomic analyses are being increasingly utilized for the study of microbial compn. and function in complex environmental samples. Although several studies have documented the many challenges and sources of bias that must be considered in these types of analyses, none have systematically characterized the effect of protein extn. bias on the biol. interpretation of true environmental biofilm metaproteomes. In this study, we compared three protein extn. methods commonly used in the analyses of environmental samples [guanidine hydrochloride (GuHCl), B-PER, sequential citrate-phenol (SCP)] using nano-LC-MS/MS and an environmental marine biofilm to det. the unique biases introduced by each method and their effect on the interpretation of the derived metaproteomes. While the protein extn. efficiencies of the three methods ranged from 2.0 to 4.3%, there was little overlap in the sequence (1.9%), function (8.3% of total assigned protein families) and origin of the identified proteins from each ext. Each extn. method enriched for different protein families (GuHCl - photosynthesis, carbohydrate metab.; B-PER - membrane transport, oxidative stress; SCP - calcium binding, structural) while 23.7-45.4% of the identified proteins lacked SwissProt annotations. Taken together, the results demonstrated that even the most basic interpretations of this complex microbial assemblage (species compn., ratio of prokaryotic to eukaryotic proteins, predominant functions) varied with little overlap based on the protein extn. method employed. These findings demonstrate the heavy influence of protein extn. on biofilm metaproteomics and provide caveats for the interpretation of such data sets when utilizing single protein extn. methods for the description of complex microbial assemblages.
- 23Vizcaíno, J. A.; Côté, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013 Nucleic Acids Res. 2013, 41 (Database issue) D1063– D1069 DOI: 10.1093/nar/gks1262Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvV2ksb3P&md5=e49aba656ba6d88418202d9e54f67db0The Proteomics Identifications (PRIDE) database and associated tools: status in 2013Vizcaino, Juan Antonio; Cote, Richard G.; Csordas, Attila; Dianes, Jose A.; Fabregat, Antonio; Foster, Joseph M.; Griss, Johannes; Alpi, Emanuele; Birim, Melih; Contell, Javier; O'Kelly, Gavin; Schoenegger, Andreas; Ovelleiro, David; Perez-Riverol, Yasset; Reisinger, Florian; Rios, Daniel; Wang, Rui; Hermjakob, HenningNucleic Acids Research (2013), 41 (D1), D1063-D1069CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, esp. the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and anal. tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
- 24Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N. ProteomeXchange provides globally coordinated proteomics data submission and dissemination Nat. Biotechnol. 2014, 32 (3) 223– 226 DOI: 10.1038/nbt.2839Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXjvFyntrc%253D&md5=f173db74e09f40f829268af9dcc2c8a4ProteomeXchange provides globally coordinated proteomics data submission and disseminationVizcaino, Juan A.; Deutsch, Eric W.; Wang, Rui; Csordas, Attila; Reisinger, Florian; Rios, Daniel; Dianes, Jose A.; Sun, Zhi; Farrah, Terry; Bandeira, Nuno; Binz, Pierre-Alain; Xenarios, Ioannis; Eisenacher, Martin; Mayer, Gerhard; Gatto, Laurent; Campos, Alex; Chalkley, Robert J.; Kraus, Hans-Joachim; Albar, Juan Pablo; Martinez-Bartolome, Salvador; Apweiler, Rolf; Omenn, Gilbert S.; Martens, Lennart; Jones, Andrew R.; Hermjakob, HenningNature Biotechnology (2014), 32 (3), 223-226CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)ProteomeXchange provides an infrastructure for efficient and reliable public dissemination of proteomics data, supporting crucial validation, anal. and re-use.
- 25Hulstaert, N.; Reisinger, F.; Rameseder, J.; Barsnes, H.; Vizcaíno, J. A.; Martens, L. Pride-asap: automatic fragment ion annotation of identified PRIDE spectra J. Proteomics 2013, 95, 89– 92 DOI: 10.1016/j.jprot.2013.04.011Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmsleqsr0%253D&md5=11233fe2e13d0ddc2ee23b782d96a173Pride-asap: Automatic fragment ion annotation of identified PRIDE spectraHulstaert, Niels; Reisinger, Florian; Rameseder, Jonathan; Barsnes, Harald; Vizcaino, Juan Antonio; Martens, LennartJournal of Proteomics (2013), 95 (), 89-92CODEN: JPORFQ; ISSN:1874-3919. (Elsevier B.V.)We present an open source software application and library written in Java that provides a uniform annotation of identified spectra stored in the PRIDE database. Pride-asap can be ran in a command line mode for automated processing of multiple PRIDE expts., but also has a graphical user interface that allows end users to annotate the spectra in PRIDE expts. and to inspect the results in detail.
- 26Vaudel, M.; Barsnes, H.; Berven, F. S.; Sickmann, A.; Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches Proteomics 2011, 11 (5) 996– 999 DOI: 10.1002/pmic.201000595Google Scholar26https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXitFGku74%253D&md5=89a2dbf4b774df7d893bf3df8cebe9a7SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searchesVaudel, Marc; Barsnes, Harald; Berven, Frode S.; Sickmann, Albert; Martens, LennartProteomics (2011), 11 (5), 996-999CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The identification of proteins by mass spectrometry is a std. technique in the field of proteomics, relying on search engines to perform the identifications of the acquired spectra. Here, we present a user-friendly, lightwt. and open-source graphical user interface called SearchGUI, for configuring and running the freely available OMSSA (open mass spectrometry search algorithm) and X!Tandem search engines simultaneously.
- 27Vaudel, M.; Burkhart, J. M.; Zahedi, R. P.; Oveland, E.; Berven, F. S.; Sickmann, A.; Martens, L.; Barsnes, H. PeptideShaker enables reanalysis of MS-derived proteomics data sets Nat. Biotechnol. 2015, 33 (1) 22– 24 DOI: 10.1038/nbt.3109Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXmtVyhsg%253D%253D&md5=526bffefe976c8cbd2e896fb5ba1c4a9PeptideShaker enables reanalysis of MS-derived proteomics data setsVaudel, Marc; Burkhart, Julia M.; Zahedi, Rene P.; Oveland, Eystein; Berven, Frode S.; Sickmann, Albert; Martens, Lennart; Barsnes, HaraldNature Biotechnology (2015), 33 (1), 22-24CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)To maximize the value of public proteomics data, reuse and repurposing must become straightforward, allowing the completion of the proteomics data cycle. Here we describe PeptideShaker, a proteomics informatics software that can be used at any stage in the proteomics data cycle for the anal. and interpretation of primary data, enabling data sharing and dissemination and re-anal. of publicly available proteomics data.
- 28UniProt Consortium Activities at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2014, 42 (Database issue) D191– D198 DOI: 10.1093/nar/gkt1140Google ScholarThere is no corresponding record for this reference.
- 29Martens, L.; Vandekerckhove, J.; Gevaert, K. DBToolkit: processing protein databases for peptide-centric proteomics Bioinformatics 2005, 21 (17) 3584– 3585 DOI: 10.1093/bioinformatics/bti588Google Scholar29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXpsValsb8%253D&md5=142ac55c8a3ba473e9a9117dc7965dd2DBToolkit: processing protein databases for peptide-centric proteomicsMartens, Lennart; Vandekerckhove, Joel; Gevaert, KrisBioinformatics (2005), 21 (17), 3584-3585CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: DBToolkit is a user-friendly, easily extensible tool that allows the processing of protein sequence databases to peptide-centric sequence databases. This processing is primarily aimed at enhancing the useful information content of these databases for use as optimized search spaces for efficient identification of peptide fragmentation spectra obtained by mass spectrometry. In addn., DBToolkit can be used to reliably solve a range of other typical tasks in processing sequence databases.
- 30Vandermarliere, E.; Mueller, M.; Martens, L. Getting intimate with trypsin, the leading protease in proteomics Mass Spectrom. Rev. 2013, 32 (6) 453– 465 DOI: 10.1002/mas.21376Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhs1Cit7%252FF&md5=2dfa62ad939751761c3f4b1ce62b47bbGetting intimate with trypsin, the leading protease in proteomicsVandermarliere, Elien; Mueller, Michael; Martens, LennartMass Spectrometry Reviews (2013), 32 (6), 453-465CODEN: MSRVD3; ISSN:0277-7037. (John Wiley & Sons, Inc.)A review. Nowadays, mass spectrometry-based proteomics is carried out primarily in a bottom-up fashion, with peptides obtained after proteolytic digest of a whole proteome lysate as the primary analytes instead of the proteins themselves. This exptl. setup crucially relies on a protease to digest an abundant and complex protein mixt. into a far more complex peptide mixt. Full knowledge of the working mechanism and specificity of the used proteases is therefore crucial, both for the digestion step itself as well as for the downstream identification and quantification of the (fragmentation) mass spectra acquired for the peptides in the mixt. Targeted protein anal. through selected reaction monitoring, a relative newcomer in the specific field of mass spectrometry-based proteomics, even requires a priori understanding of protease behavior for the proteins of interest. Because of the rapidly increasing popularity of proteomics as an anal. tool in the life sciences, there is now a renewed demand for detailed knowledge on trypsin, the workhorse protease in proteomics. This review addresses this need and provides an overview on the structure and working mechanism of trypsin, followed by a crit. anal. of its cleavage behavior, typically simply accepted to occur exclusively yet consistently after Arg and Lys, unless they are followed by a Pro. In this context, shortcomings in our ability to understand and predict the behavior of trypsin will be highlighted, along with the downstream implications. Furthermore, an anal. is carried out on the inherent shortcomings of trypsin with regard to whole proteome anal., and alternative approaches will be presented that can alleviate these issues. Finally, some reflections on the future of trypsin as the workhorse protease in mass spectrometry-based proteomics will be provided.
- 31Mustafa, G. M.; Larry, D.; Petersen, J. R.; Elferink, C. J. Targeted proteomics for biomarker discovery and validation of hepatocellular carcinoma in hepatitis C infected patients World J. Hepatol 2015, 7 (10) 1312– 1324 DOI: 10.4254/wjh.v7.i10.1312Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MbhsV2gsA%253D%253D&md5=34072825b4c7e015a203c14902dc03efTargeted proteomics for biomarker discovery and validation of hepatocellular carcinoma in hepatitis C infected patientsMustafa Gul M; Larry Denner; Petersen John R; Elferink Cornelis JWorld journal of hepatology (2015), 7 (10), 1312-24 ISSN:1948-5182.Hepatocellular carcinoma (HCC)-related mortality is high because early detection modalities are hampered by inaccuracy, expense and inherent procedural risks. Thus there is an urgent need for minimally invasive, highly specific and sensitive biomarkers that enable early disease detection when therapeutic intervention remains practical. Successful therapeutic intervention is predicated on the ability to detect the cancer early. Similar unmet medical needs abound in most fields of medicine and require novel methodological approaches. Proteomic profiling of body fluids presents a sensitive diagnostic tool for early cancer detection. Here we describe such a strategy of comparative proteomics to identify potential serum-based biomarkers to distinguish high-risk chronic hepatitis C virus infected patients from HCC patients. In order to compensate for the extraordinary dynamic range in serum proteins, enrichment methods that compress the dynamic range without surrendering proteome complexity can help minimize the problems associated with many depletion methods. The enriched serum can be resolved using 2D-difference in-gel electrophoresis and the spots showing statistically significant changes selected for identification by liquid chromatography-tandem mass spectrometry. Subsequent quantitative verification and validation of these candidate biomarkers represent an obligatory and rate-limiting process that is greatly enabled by selected reaction monitoring (SRM). SRM is a tandem mass spectrometry method suitable for identification and quantitation of target peptides within complex mixtures independent on peptide-specific antibodies. Ultimately, multiplexed SRM and dynamic multiple reaction monitoring can be utilized for the simultaneous analysis of a biomarker panel derived from support vector machine learning approaches, which allows monitoring a specific disease state such as early HCC. Overall, this approach yields high probability biomarkers for clinical validation in large patient cohorts and represents a strategy extensible to many diseases.
- 32Anderson, L.; Hunter, C. L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins Mol. Cell. Proteomics 2005, 5 (4) 573– 588 DOI: 10.1074/mcp.M500331-MCP200Google ScholarThere is no corresponding record for this reference.
- 33Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Corrigendum: Global quantification of mammalian gene expression control Nature 2013, 495 (7439) 126– 127 DOI: 10.1038/nature11848Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3svktlWktw%253D%253D&md5=a28543f3fd45a3be24143effab8ab7c6Corrigendum: Global quantification of mammalian gene expression controlSchwanhausser Bjorn; Busse Dorothea; Li Na; Dittmar Gunnar; Schuchhardt Johannes; Wolf Jana; Chen Wei; Selbach MatthiasNature (2013), 495 (7439), 126-7 ISSN:.There is no expanded citation for this reference.
- 34Guruprasad, K.; Reddy, B. V.; Pandit, M. W. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence Protein Eng., Des. Sel. 1990, 4 (2) 155– 161 DOI: 10.1093/protein/4.2.155Google ScholarThere is no corresponding record for this reference.
- 35Rinn, J. L.; Chang, H. Y. Genome regulation by long noncoding RNAs Annu. Rev. Biochem. 2012, 81, 145– 166 DOI: 10.1146/annurev-biochem-051410-092902Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtVGls7bO&md5=ab67189225d0b0bd97832ee21499e992Genome regulation by long noncoding RNAsRinn, John L.; Chang, Howard Y.Annual Review of Biochemistry (2012), 81 (), 145-166CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews Inc.)A review. The central dogma of gene expression is that DNA is transcribed into mRNAs, which in turn serve as the template for protein synthesis. The discovery of extensive transcription of large RNA transcripts that do not code for proteins, termed long noncoding RNAs (lncRNAs), provides an important new perspective on the centrality of RNA in gene regulation. Here, we discuss genome-scale strategies to discover and characterize lncRNAs. An emerging theme from multiple model systems is that lncRNAs form extensive networks of ribonucleoprotein (RNP) complexes with numerous chromatin regulators and then target these enzymic activities to appropriate locations in the genome. Consistent with this notion, lncRNAs can function as modular scaffolds to specify higher-order organization in RNP complexes and in chromatin states. The importance of these modes of regulation is underscored by the newly recognized roles of long RNAs for proper gene control across all kingdoms of life.
- 36Wilhelm, M.; Schlegl, J.; Hahne, H.; Moghaddas Gholami, A.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H. Mass-spectrometry-based draft of the human proteome Nature 2014, 509 (7502) 582– 587 DOI: 10.1038/nature13319Google Scholar36https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXoslCrt7k%253D&md5=b9e4b32b349b92160996c860de293191Mass-spectrometry-based draft of the human proteomeWilhelm, Mathias; Schlegl, Judith; Hahne, Hannes; Gholami, Amin Moghaddas; Lieberenz, Marcus; Savitski, Mikhail M.; Ziegler, Emanuel; Butzmann, Lars; Gessulat, Siegfried; Marx, Harald; Mathieson, Toby; Lemeer, Simone; Schnatbaum, Karsten; Reimer, Ulf; Wenschuh, Holger; Mollenhauer, Martin; Slotta-Huspenina, Julia; Boese, Joos-Hendrik; Bantscheff, Marcus; Gerstmair, Anja; Faerber, Franz; Kuster, BernhardNature (London, United Kingdom) (2014), 509 (7502), 582-587CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biol. information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time anal. of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estn. of the size of the protein-coding genome, and identified organ-specific proteins and a large no. of translated lincRNAs (long intergenic non-coding RNAs). Anal. of mRNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analyzing the compn. and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biol. insight and fosters the development of proteomic technol.
- 37Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae J. Proteome Res. 2006, 5 (9) 2339– 2347 DOI: 10.1021/pr060161nGoogle Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XnsV2gs7g%253D&md5=e46260674dba9947867173c251b5edffStatistical Analysis of Membrane Proteome Expression Changes in Saccharomyces cerevisiaeZybailov, Boris; Mosley, Amber L.; Sardiu, Mihaela E.; Coleman, Michael K.; Florens, Laurence; Washburn, Michael P.Journal of Proteome Research (2006), 5 (9), 2339-2347CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)The authors have devised an approach for analyzing shotgun proteomics datasets based on the normalized spectral abundance factor (NSAF) that can be used for quant. proteomics anal. Three biol. replicates of samples enriched for plasma membranes were isolated from S. cerevisiae grown in 14N-rich media and 15N-minimal media and analyzed via quant. multidimensional protein identification technol. (MudPIT). The natural log transformation of NSAF values from S. cerevisiae cells grown in 14N YPD media and 15N-minimal media had a normal distribution. The t-test anal. demonstrated 221 of 1316 proteins were significantly overexpressed in one or the other growth conditions with a p value <0.05. Notably, amino acid transporters were among the 14 membrane proteins that were significantly upregulated in cells grown in minimal media, and the authors functionally validated these increases in protein expression with radioisotope uptake assays for selected proteins.
- 38Schulz-Knappe, P.; Schrader, M.; Zucht, H.-D. The peptidomics concept Comb. Chem. High Throughput Screening 2005, 8 (8) 697– 704 DOI: 10.2174/138620705774962418Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28Xlt1Sntg%253D%253D&md5=f0fec3d795b8785f80c41bf88c7e471dThe peptidomics conceptSchulz-Knappe, Peter; Schrader, Michael; Zucht, Hans-DieterCombinatorial Chemistry and High Throughput Screening (2005), 8 (8), 697-704CODEN: CCHSFU; ISSN:1386-2073. (Bentham Science Publishers Ltd.)A review. Peptides are a paramount example of how nature diversifies from one single gene to release multiple, regulated functionalities at the desired sites and time. To achieve this, peptides are sequentially generated by a complex network of more than 500 proteases, acting at intracellular sites, upon secretion, in extracellular environments, and, finally, serving (regulated) degrdn. This cycle of maturation, activation, and degrdn. points out that the peptidome is mechanistically linked to the proteome: the distribution between both is regulated by proteases and counter-regulated by protease inhibitors. Given the high diversity of peptides in living systems and their involvement in key regulatory processes, a need for improved peptide discovery, ideally combining peptide sequence identification with peptide profiling, has emerged. Std. proteomic approaches are not suitable for a systematic peptide anal., since they do not cover the low mol. mass window. The new direction in proteomic research to analyze this "terra incognita" is peptidomics. This novel concept aims at the comprehensive visualization and anal. of small polypeptides, thus covering the mass range between proteomics and metabonomics. The pacemakers for the development of peptidomics technologies are modern mass spectrometry and bioinformatics. They are ideally suited for sensitive and comprehensive peptide anal., esp. in combination with the massive information content of todays genomic and transcriptomic databases. Given the high diversity of native peptides in living systems, clin. chem. and modern medicine are the prime application areas. The discovery of relevant peptide biomarkers and drug targets will strongly benefit from peptidomics.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 29 publications.
- Tine Claeys, Maxime Menu, Robbin Bouwmeester, Kris Gevaert, Lennart Martens. Machine Learning on Large-Scale Proteomics Data Identifies Tissue and Cell-Type Specific Proteins. Journal of Proteome Research 2023, 22
(4)
, 1181-1192. https://doi.org/10.1021/acs.jproteome.2c00644
- Pathmanaban Ramasamy, Demet Turan, Natalia Tichshenko, Niels Hulstaert, Elien Vandermarliere, Wim Vranken, Lennart Martens. Scop3P: A Comprehensive Resource of Human Phosphosites within Their Full Context. Journal of Proteome Research 2020, 19
(8)
, 3478-3486. https://doi.org/10.1021/acs.jproteome.0c00306
- Young-Ki Paik, Lydie Lane, Takeshi Kawamura, Yu-Ju Chen, Je-Yoel Cho, Joshua LaBaer, Jong Shin Yoo, Gilberto Domont, Fernando Corrales, Gilbert S. Omenn, Alexander Archakov, Sergio Encarnación-Guevara, Siqi Lui, Ghasem Hosseini Salekdeh, Jin-Young Cho, Chae-Yeon Kim, Christopher M. Overall. Launching the C-HPP neXt-CP50 Pilot Project for Functional Characterization of Identified Proteins with No Known Function. Journal of Proteome Research 2018, 17
(12)
, 4042-4050. https://doi.org/10.1021/acs.jproteome.8b00383
- Gilbert S. Omenn, Lydie Lane, Emma K. Lundberg, Christopher M. Overall, and Eric W. Deutsch . Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. Journal of Proteome Research 2017, 16
(12)
, 4281-4287. https://doi.org/10.1021/acs.jproteome.7b00375
- Alexandre Luiz Korte de Azevedo, Talita Helen Bombardelli Gomig, Michel Batista, Jaqueline Carvalho de Oliveira, Iglenir João Cavalli, Daniela Fiori Gradia, Enilze Maria de Souza Fonseca Ribeiro. Peptidomics and Machine Learning–based Evaluation of Noncoding RNA–Derived Micropeptides in Breast Cancer: Expression Patterns and Functional/Therapeutic Insights. Laboratory Investigation 2024, 104
(12)
, 102150. https://doi.org/10.1016/j.labinv.2024.102150
- Joseph D. Valencia, David A. Hendrix, . Improving deep models of protein-coding potential with a Fourier-transform architecture and machine translation task. PLOS Computational Biology 2023, 19
(10)
, e1011526. https://doi.org/10.1371/journal.pcbi.1011526
- John R. Prensner, Jennifer G. Abelin, Leron W. Kok, Karl R. Clauser, Jonathan M. Mudge, Jorge Ruiz-Orera, Michal Bassani-Sternberg, Robert L. Moritz, Eric W. Deutsch, Sebastiaan van Heesch. What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome?. Molecular & Cellular Proteomics 2023, 22
(9)
, 100631. https://doi.org/10.1016/j.mcpro.2023.100631
- Benjamin J.M. Tremblay, Julia I. Qüesta. Mechanisms of epigenetic regulation of transcription by
lncRNAs
in plants. IUBMB Life 2023, 75
(5)
, 427-439. https://doi.org/10.1002/iub.2681
- Viola Melone, Annamaria Salvati, Noemi Brusco, Elena Alexandrova, Ylenia D’Agostino, Domenico Palumbo, Luigi Palo, Ilaria Terenzi, Giovanni Nassa, Francesca Rizzo, Giorgio Giurato, Alessandro Weisz, Roberta Tarallo. Functional Relationships between Long Non-Coding RNAs and Estrogen Receptor Alpha: A New Frontier in Hormone-Responsive Breast Cancer Management. International Journal of Molecular Sciences 2023, 24
(2)
, 1145. https://doi.org/10.3390/ijms24021145
- Annelies Bogaert, Daria Fijalkowska, An Staes, Tessa Van de Steene, Hans Demol, Kris Gevaert. Limited Evidence for Protein Products of Noncoding Transcripts in the HEK293T Cellular Cytosol. Molecular & Cellular Proteomics 2022, 21
(8)
, 100264. https://doi.org/10.1016/j.mcpro.2022.100264
- Xiaotong Luo, Yuantai Huang, Huiqin Li, Yihai Luo, Zhixiang Zuo, Jian Ren, Yubin Xie. SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients. Nucleic Acids Research 2022, 50
(D1)
, D1373-D1381. https://doi.org/10.1093/nar/gkab822
- Bhavesh S. Parmar, Marlies K. R. Peeters, Kurt Boonen, Ellie C. Clark, Geert Baggerman, Gerben Menschaert, Liesbet Temmerman. Identification of Non-Canonical Translation Products in C. elegans Using Tandem Mass Spectrometry. Frontiers in Genetics 2021, 12 https://doi.org/10.3389/fgene.2021.728900
- Sara Andjus, Antonin Morillon, Maxime Wery. From Yeast to Mammals, the Nonsense-Mediated mRNA Decay as a Master Regulator of Long Non-Coding RNAs Functional Trajectory. Non-Coding RNA 2021, 7
(3)
, 44. https://doi.org/10.3390/ncrna7030044
- Rui Vitorino, Sofia Guedes, Francisco Amado, Manuel Santos, Nobuyoshi Akimitsu. The role of micropeptides in biology. Cellular and Molecular Life Sciences 2021, 78
(7)
, 3285-3298. https://doi.org/10.1007/s00018-020-03740-3
- Qing Zhang, Erzhong Wu, Yiheng Tang, Tanxi Cai, Lili Zhang, Jifeng Wang, Yajing Hao, Bao Zhang, Yue Zhou, Xiaojing Guo, Jianjun Luo, Runsheng Chen, Fuquan Yang. Deeply Mining a Universe of Peptides Encoded by Long Noncoding RNAs. Molecular & Cellular Proteomics 2021, 20 , 100109. https://doi.org/10.1016/j.mcpro.2021.100109
- Mor Varon, Tal Levy, Gal Mazor, Hila Ben David, Ran Marciano, Yakov Krelin, Manu Prasad, Moshe Elkabets, David Pauck, Ulvi Ahmadov, Daniel Picard, Nan Qin, Arndt Borkhardt, Guido Reifenberger, Gabriel Leprivier, Marc Remke, Barak Rotblat. The long noncoding RNA
TP73‐AS1
promotes tumorigenicity of medulloblastoma cells. International Journal of Cancer 2019, 145
(12)
, 3402-3413. https://doi.org/10.1002/ijc.32400
- Sajib Chakraborty, Geoffroy Andrieux, A. M. Mahmudul Hasan, Musaddeque Ahmed, Md. Ismail Hosen, Tania Rahman, M. Anwar Hossain, Melanie Boerries. Harnessing the tissue and plasma lncRNA-peptidome to discover peptide-based cancer biomarkers. Scientific Reports 2019, 9
(1)
https://doi.org/10.1038/s41598-019-48774-1
- Lucas F. Maciel, David A. Morales-Vicente, Gilbert O. Silveira, Raphael O. Ribeiro, Giovanna G. O. Olberg, David S. Pires, Murilo S. Amaral, Sergio Verjovski-Almeida. Weighted Gene Co-Expression Analyses Point to Long Non-Coding RNA Hub Genes at Different Schistosoma mansoni Life-Cycle Stages. Frontiers in Genetics 2019, 10 https://doi.org/10.3389/fgene.2019.00823
- Igor Fesenko, Ilya Kirov, Andrey Kniazev, Regina Khazigaleeva, Vassili Lazarev, Daria Kharlampieva, Ekaterina Grafskaia, Viktor Zgoda, Ivan Butenko, Georgy Arapidi, Anna Mamaeva, Vadim Ivanov, Vadim Govorun. Distinct types of short open reading frames are translated in plant cells. Genome Research 2019, 29
(9)
, 1464-1477. https://doi.org/10.1101/gr.253302.119
- Jing Li, Changning Liu. Coding or Noncoding, the Converging Concepts of RNAs. Frontiers in Genetics 2019, 10 https://doi.org/10.3389/fgene.2019.00496
- Lucía Lorenzi, Francisco Avila Cobos, Anneleen Decock, Celine Everaert, Hetty Helsmoortel, Steve Lefever, Karen Verboom, Pieter‐Jan Volders, Frank Speleman, Jo Vandesompele, Pieter Mestdagh. Long noncoding RNA expression profiling in cancer: Challenges and opportunities. Genes, Chromosomes and Cancer 2019, 58
(4)
, 191-199. https://doi.org/10.1002/gcc.22709
- Jorge Ruiz-Orera, M Mar Albà. Conserved regions in long non-coding RNAs contain abundant translation and protein–RNA interaction signatures. NAR Genomics and Bioinformatics 2019, 1
(1)
, e2-e2. https://doi.org/10.1093/nargab/lqz002
- Xinqiang Yin, Yuanyuan Jing, Hanmei Xu. Mining for missed sORF-encoded peptides. Expert Review of Proteomics 2019, 16
(3)
, 257-266. https://doi.org/10.1080/14789450.2019.1571919
- Soumasree De, Liron Levin, Barak Rotblat. lncRNA in worms – Time to meet the neighbors. Current Opinion in Systems Biology 2019, 13 , 10-15. https://doi.org/10.1016/j.coisb.2018.08.007
- Roberto Giambruno, Marija Mihailovich, Tiziana Bonaldi. Mass Spectrometry-Based Proteomics to Unveil the Non-coding RNA World. Frontiers in Molecular Biosciences 2018, 5 https://doi.org/10.3389/fmolb.2018.00090
- Mingkun Yang, Xiaohuang Lin, Xin Liu, Jia Zhang, Feng Ge. Genome Annotation of a Model Diatom Phaeodactylum tricornutum Using an Integrated Proteogenomic Pipeline. Molecular Plant 2018, 11
(10)
, 1292-1307. https://doi.org/10.1016/j.molp.2018.08.005
- Barbara Uszczynska-Ratajczak, Julien Lagarde, Adam Frankish, Roderic Guigó, Rory Johnson. Towards a complete map of the human long non-coding RNA transcriptome. Nature Reviews Genetics 2018, 19
(9)
, 535-548. https://doi.org/10.1038/s41576-018-0017-y
- I. A. Fesenko, I. V. Kirov, A. A. Filippova. Impact of Noncoding Part of the Genome on the Proteome Plasticity of the Eukaryotic Cell. Russian Journal of Bioorganic Chemistry 2018, 44
(4)
, 397-402. https://doi.org/10.1134/S1068162018040076
- Young-Ki Paik, Gilbert S. Omenn, William S. Hancock, Lydie Lane, Christopher M. Overall. Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Review of Proteomics 2017, 14
(12)
, 1059-1071. https://doi.org/10.1080/14789450.2017.1394189
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Abstract
Figure 1
Figure 1. Comparison between theoretical (UniProtKB/SwissProt) and observed (reprocessed PRIDE data) peptide sequence amino acid composition for human data from PRIDE and UniProtKB/SwissProt.
Figure 2
Figure 2. Reprocessing results for PRIDE data sets derived from human blood plasma mapped onto the abundance values reported by Anderson and Hunter. (32) The size of a bubble corresponds to the number of PRIDE assays in which that protein was identified.
Figure 3
Figure 3. Reprocessing results for all PRIDE murine data mapped onto the half-life values reported by Schwanhäusser et al. (33) The size of the bubble corresponds to the number of PRIDE assays in which the protein was identified.
Figure 4
Figure 4. Instability index distributions of human UniProtKB/SwissProt proteins, and of identified proteins from reprocessed human data sets in PRIDE.
Figure 5
Figure 5. LncRNA and mRNA expression profile and detectability. (a) Two-dimensional kernel density plot of lncRNA and mRNA expression levels and subcellular localization. The enrichment of nuclear over cytosolic expression versus the expression in the whole-cell extract is shown. Selected lncRNA and protein-coding genes are depicted. Especially low abundant lncRNAs show nuclear enrichment compared to mRNAs (adapted from Djebali et al. (1)). (b) Whole-cell expression distribution for lncRNAs and mRNAs. Although lncRNAs are generally expressed at lower levels, a substantial overlap is observed. (c) Normalized spectral abundance factor (NSAF) of the detected protein as a function of its RNA expression level. While mRNA expression and NSAF are moderately correlated, the entire range of expression is clearly covered and thus detectable with mass spectrometry.
Figure 6
Figure 6. Relative size of the largest canonical ORF in mRNA and lncRNA transcripts. Using the reverse complement sequence as a control, it is apparent that lncRNA (as opposed to mRNA) ORFs are not larger than what would be expected from random nucleotide progression.
References
This article references 38 other publications.
- 1Djebali, S.; Davis, C. A.; Merkel, A.; Dobin, A.; Lassmann, T.; Mortazavi, A. M.; Tanzer, A.; Lagarde, J.; Lin, W.; Schlesinger, F. Landscape of transcription in human cells Nature 2012, 489 (7414) 101– 108 DOI: 10.1038/nature112331https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtlGnt73M&md5=8f8ee2a1189ced1010dec5e45a305acfLandscape of transcription in human cellsDjebali, Sarah; Davis, Carrie A.; Merkel, Angelika; Dobin, Alex; Lassmann, Timo; Mortazavi, Ali; Tanzer, Andrea; Lagarde, Julien; Lin, Wei; Schlesinger, Felix; Xue, Chenghai; Marinov, Georgi K.; Khatun, Jainab; Williams, Brian A.; Zaleski, Chris; Rozowsky, Joel; Roeder, Maik; Kokocinski, Felix; Abdelhamid, Rehab F.; Alioto, Tyler; Antoshechkin, Igor; Baer, Michael T.; Bar, Nadav S.; Batut, Philippe; Bell, Kimberly; Bell, Ian; Chakrabortty, Sudipto; Chen, Xian; Chrast, Jacqueline; Curado, Joao; Derrien, Thomas; Drenkow, Jorg; Dumais, Erica; Dumais, Jacqueline; Duttagupta, Radha; Falconnet, Emilie; Fastuca, Meagan; Fejes-Toth, Kata; Ferreira, Pedro; Foissac, Sylvain; Fullwood, Melissa J.; Gao, Hui; Gonzalez, David; Gordon, Assaf; Gunawardena, Harsha; Howald, Cedric; Jha, Sonali; Johnson, Rory; Kapranov, Philipp; King, Brandon; Kingswood, Colin; Luo, Oscar J.; Park, Eddie; Persaud, Kimberly; Preall, Jonathan B.; Ribeca, Paolo; Risk, Brian; Robyr, Daniel; Sammeth, Michael; Schaffer, Lorian; See, Lei-Hoon; Shahab, Atif; Skancke, Jorgen; Suzuki, Ana Maria; Takahashi, Hazuki; Tilgner, Hagen; Trout, Diane; Walters, Nathalie; Wang, Huaien; Wrobel, John; Yu, Yanbao; Ruan, Xiaoan; Hayashizaki, Yoshihide; Harrow, Jennifer; Gerstein, Mark; Hubbard, Tim; Reymond, Alexandre; Antonarakis, Stylianos E.; Hannon, Gregory; Giddings, Morgan C.; Ruan, Yijun; Wold, Barbara; Carninci, Piero; Guigo, Roderic; Gingeras, Thomas R.Nature (London, United Kingdom) (2012), 489 (7414), 101-108CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A review. Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalog of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalog is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
- 2Iyer, M. K.; Niknafs, Y. S.; Malik, R.; Singhal, U.; Sahu, A.; Hosono, Y.; Barrette, T. R.; Prensner, J. R.; Evans, J. R.; Zhao, S. The landscape of long noncoding RNAs in the human transcriptome Nat. Genet. 2015, 47 (3) 199– 208 DOI: 10.1038/ng.31922https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtFKjs7g%253D&md5=028eb70e5365d4d33325968f5ea06d3cThe landscape of long noncoding RNAs in the human transcriptomeIyer, Matthew K.; Niknafs, Yashar S.; Malik, Rohit; Singhal, Udit; Sahu, Anirban; Hosono, Yasuyuki; Barrette, Terrence R.; Prensner, John R.; Evans, Joseph R.; Zhao, Shuang; Poliakov, Anton; Cao, Xuhong; Dhanasekaran, Saravana M.; Wu, Yi-Mi; Robinson, Dan R.; Beer, David G.; Feng, Felix Y.; Iyer, Hariharan K.; Chinnaiyan, Arul M.Nature Genetics (2015), 47 (3), 199-208CODEN: NGENEC; ISSN:1061-4036. (Nature Publishing Group)Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiol. and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodol. to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3900) overlapped disease-assocd. SNPs. To prioritize lineage-specific, disease-assocd. lncRNA expression, we employed non-parametric differential expression testing and nominated 7942 lineage- or cancer-assocd. lncRNA genes. The lncRNA landscape characterized here may shed light on normal biol. and cancer pathogenesis and may be valuable for future biomarker development.
- 3Mercer, T. R.; Dinger, M. E.; Mattick, J. S. Long non-coding RNAs: insights into functions Nat. Rev. Genet. 2009, 10 (3) 155– 159 DOI: 10.1038/nrg25213https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhvFGlu7k%253D&md5=508aacdddaa8986015a3f2ed55b34cf8Long non-coding RNAs: insights into functionsMercer, Tim R.; Dinger, Marcel E.; Mattick, John S.Nature Reviews Genetics (2009), 10 (3), 155-159CODEN: NRGAAM; ISSN:1471-0056. (Nature Publishing Group)A review. The recent discovery that most of the eukaryotic genome is transcribed has focused interest on the importance of non-coding transcripts. Long non-coding RNAs are emerging as a class with wide-ranging functions in gene regulation. In mammals and other eukaryotes most of the genome is transcribed in a developmentally regulated manner to produce large nos. of long non-coding RNAs (ncRNAs). Here we review the rapidly advancing field of long ncRNAs, describing their conservation, their organization in the genome and their roles in gene regulation. We also consider the medical implications, and the emerging recognition that any transcript, regardless of coding potential, can have an intrinsic function as an RNA.
- 4Crappé, J.; Van Criekinge, W.; Menschaert, G. Little things make big things happen: A summary of micropeptide encoding genes EuPa Open Proteomics 2014, 3, 128– 137 DOI: 10.1016/j.euprot.2014.02.0064https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXpvFWjs7k%253D&md5=3dd8e4a1de3c2a1cf00e323cfb48bc44Little things make big things happen: A summary of micropeptide encoding genesCrappe, Jeroen; Van Criekinge, Wim; Menschaert, GerbenEuPa Open Proteomics (2014), 3 (), 128-137CODEN: EOPUA7; ISSN:2212-9685. (Elsevier B.V.)A review. Classical bioactive peptides are cleaved from larger precursor proteins and are targeted toward the secretory pathway by means of an N-terminal signaling sequence. In contrast, micropeptides encoded from small open reading frames, lack such signaling sequence and are immediately released in the cytoplasm after translation. Over the past few years many such non-canonical genes (including open reading frames, ORFs smaller than 100 AAs) have been discovered and functionally characterized in different eukaryotic organisms. Furthermore, in silico approaches enabled the prediction of the existence of many more putatively coding small ORFs in the genomes of Sacharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and Mus musculus. However, questions remain as to what the functional role of this new class of eukaryotic genes might be, and how widespread they are. In the future, approaches integrating in silico, conservation-based prediction and a combination of genomic, proteomic and functional validation methods will prove to be indispensable to answer these open questions.
- 5Lin, M. F.; Jungreis, I.; Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions Bioinformatics 2011, 27 (13) I275– I282 DOI: 10.1093/bioinformatics/btr2095https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXnvVKgsbw%253D&md5=b3f72d5760649b6c236863c65cc24967PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regionsLin, Michael F.; Jungreis, Irwin; Kellis, ManolisBioinformatics (2011), 27 (13), i275-i282CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to det. whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many addnl. species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures.
- 6Wang, L.; Park, H. J.; Dasari, S.; Wang, S.; Kocher, J.-P.; Li, W. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model Nucleic Acids Res. 2013, 41 (6) e74– e74 DOI: 10.1093/nar/gkt006There is no corresponding record for this reference.
- 7Kong, L.; Zhang, Y.; Ye, Z. Q.; Liu, X. Q.; Zhao, S. Q.; Wei, L.; Gao, G. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine Nucleic Acids Res. 2007, 35, W345– W349 DOI: 10.1093/nar/gkm391There is no corresponding record for this reference.
- 8Ingolia, N. T. Genome-Wide Translational Profiling by Ribosome Footprinting. In Guide to Yeast Genetics: Functional Genomics, Proteomics, and Other Systems Analysis; Methods in Enzymology; Elsevier, 2010; Vol. 470, pp 119– 142.There is no corresponding record for this reference.
- 9Ingolia, N. T.; Lareau, L. F.; Weissman, J. S. Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian Proteomes Cell 2011, 147 (4) 789– 802 DOI: 10.1016/j.cell.2011.10.0029https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsVKgsLnJ&md5=07e899f5084ff6912ea3415c53dbe550Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity and Dynamics of Mammalian ProteomesIngolia, Nicholas T.; Lareau, Liana F.; Weissman, Jonathan S.Cell (Cambridge, MA, United States) (2011), 147 (4), 789-802CODEN: CELLB5; ISSN:0092-8674. (Cell Press)Summary: The ability to sequence genomes has far outstripped approaches for deciphering the information they encode. Here we present a suite of techniques, based on ribosome profiling (the deep sequencing of ribosome-protected mRNA fragments), to provide genome-wide maps of protein synthesis as well as a pulse-chase strategy for detg. rates of translation elongation. We exploit the propensity of harringtonine to cause ribosomes to accumulate at sites of translation initiation together with a machine learning algorithm to define protein products systematically. Anal. of translation in mouse embryonic stem cells reveals thousands of strong pause sites and unannotated translation products. These include amino-terminal extensions and truncations and upstream open reading frames with regulatory potential, initiated at both AUG and non-AUG codons, whose translation changes after differentiation. We also define a class of short, polycistronic ribosome-assocd. coding RNAs (sprcRNAs) that encode small proteins. Our studies reveal an unanticipated complexity to mammalian proteomes. A high-resoln. look at mammalian translation reveals unanticipated diversity in the resulting proteome, including peptide products from putative noncoding RNAs.
- 10Ingolia, N. T.; Brar, G. A.; Stern-Ginossar, N.; Harris, M. S.; Talhouarne, G. J. S.; Jackson, S. E.; Wills, M. R.; Weissman, J. S. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes Cell Rep. 2014, 8 (5) 1365– 1379 DOI: 10.1016/j.celrep.2014.07.04510https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhsVarsb%252FM&md5=39970bfdab3095ca5b6dbf7eedf66578Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding GenesIngolia, Nicholas T.; Brar, Gloria A.; Stern-Ginossar, Noam; Harris, Michael S.; Talhouarne, Gaelle J. S.; Jackson, Sarah E.; Wills, Mark R.; Weissman, Jonathan S.Cell Reports (2014), 8 (5), 1365-1379CODEN: CREED8; ISSN:2211-1247. (Cell Press)Ribosome profiling suggests that ribosomes occupy many regions of the transcriptome thought to be noncoding, including 5' UTRs and long noncoding RNAs (lncRNAs). Apparent ribosome footprints outside of protein-coding regions raise the possibility of artifacts unrelated to translation, particularly when they occupy multiple, overlapping open reading frames (ORFs). Here, we show hallmarks of translation in these footprints: copurifn. with the large ribosomal subunit, response to drugs targeting elongation, trinucleotide periodicity, and initiation at early AUGs. We develop a metric for distinguishing between 80S footprints and nonribosomal sources using footprint size distributions, which validates the vast majority of footprints outside of coding regions. We present evidence for polypeptide prodn. beyond annotated genes, including the induction of immune responses following human cytomegalovirus (HCMV) infection. Translation is pervasive on cytosolic transcripts outside of conserved reading frames, and direct detection of this expanded universe of translated products enables efforts at understanding how cells manage and exploit its consequences.
- 11Guttman, M.; Russell, P.; Ingolia, N. T.; Weissman, J. S.; Lander, E. S. Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins Cell 2013, 154 (1) 240– 251 DOI: 10.1016/j.cell.2013.06.009There is no corresponding record for this reference.
- 12Chew, G.-L.; Pauli, A.; Rinn, J. L.; Regev, A.; Schier, A. F.; Valen, E. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs Development 2013, 140 (13) 2828– 2834 DOI: 10.1242/dev.09834312https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXht1OmsbfI&md5=dece282502cb10ac98b7b2d80352208fRibosome profiling reveals resemblance between long non-coding RNAs and 5' leaders of coding RNAsChew, Guo-Liang; Pauli, Andrea; Rinn, John L.; Regev, Aviv; Schier, Alexander F.; Valen, EivindDevelopment (Cambridge, United Kingdom) (2013), 140 (13), 2828-2834CODEN: DEVPED; ISSN:0950-1991. (Company of Biologists Ltd.)Large-scale genomics and computational approaches have identified thousands of putative long non-coding RNAs (lncRNAs). It has been controversial, however, as to what fraction of these RNAs is truly non-coding. Here, we combine ribosome profiling with a machine-learning approach to validate lncRNAs during zebrafish development in a high throughput manner. We find that dozens of proposed lncRNAs are protein-coding contaminants and that many lncRNAs have ribosome profiles that resemble the 5' leaders of coding RNAs. Anal. of ribosome profiling data from embryonic stem cells reveals similar properties for mammalian lncRNAs. These results clarify the annotation of developmental lncRNAs and suggest a potential role for translation in lncRNA regulation. In addn., our computational pipeline and ribosome profiling data provide a powerful resource for the identification of translated open reading frames during zebrafish development.
- 13Bazzini, A. A.; Johnstone, T. G.; Christiano, R.; Mackowiak, S. D.; Obermayer, B.; Fleming, E. S.; Vejnar, C. E.; Lee, M. T.; Rajewsky, N.; Walther, T. C. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation EMBO J. 2014, 33 (9) 981– 993 DOI: 10.1002/embj.201488411There is no corresponding record for this reference.
- 14Lee, S.; Liu, B.; Lee, S.; Huang, S.-X.; Shen, B.; Qian, S.-B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (37) E2424– E2432 DOI: 10.1073/pnas.1207846109There is no corresponding record for this reference.
- 15Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174– D180 DOI: 10.1093/nar/gku1060There is no corresponding record for this reference.
- 16Menschaert, G.; Van Criekinge, W.; Notelaers, T.; Koch, A.; Crappé, J.; Gevaert, K.; Van Damme, P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events Mol. Cell. Proteomics 2013, 12 (7) 1780– 1790 DOI: 10.1074/mcp.M113.02754016https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVyisL7P&md5=718d0a686ac9be925b6e3b782f9ee532Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation eventsMenschaert, Gerben; Van Criekinge, Wim; Notelaers, Tineke; Koch, Alexander; Crappe, Jeroen; Gevaert, Kris; Van Damme, PetraMolecular & Cellular Proteomics (2013), 12 (7), 1780-1790CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)An increasing no. of studies involve integrative anal. of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing and highly sensitive mass spectrometry (MS) instrumentation. Recently, a strategy, termed ribosome profiling (or RIBO-seq), based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both Swiss-Prot- and RIBO-seq-derived translation products, applicable for MS/MS spectrum identification. To record the impact of using the constructed deep proteome database, we performed two alternative MS-based proteomic strategies as follows: (i) a regular shotgun proteomic and (ii) an N-terminal combined fractional diagonal chromatog. (COFRADIC) approach. Although the former technique gives an overall assessment on the protein and peptide level, the latter technique, specifically enabling the isolation of N-terminal peptides, is very appropriate in validating the RIBO-seq-derived (alternative) translation initiation site profile. We demonstrate that this proteogenomic approach increases the overall protein identification rate 2.5% (e.g. new protein products, new protein splice variants, single nucleotide polymorphism variant proteins, and N-terminally extended forms of known proteins) as compared with only searching UniProtKB-SwissProt. Furthermore, using this custom database, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated upstream ORFs. Notably, the characterization of these new translation products revealed the use of multiple near-cognate (non-AUG) start codons. As deep sequencing techniques are becoming more std., less expensive, and widespread, we anticipate that mRNA sequencing and esp. custom-tailored RIBO-seq will become indispensable in the MS-based protein or peptide identification process. The underlying mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD000124.
- 17Crappé, J.; Van Criekinge, W.; Trooskens, G.; Hayakawa, E.; Luyten, W.; Baggerman, G.; Menschaert, G. Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs BMC Genomics 2013, 14 (1) 648 DOI: 10.1186/1471-2164-14-648There is no corresponding record for this reference.
- 18Slavoff, S. A.; Mitchell, A. J.; Schwaid, A. G.; Cabili, M. N.; Ma, J.; Levin, J. Z.; Karger, A. D.; Budnik, B. A.; Rinn, J. L.; Saghatelian, A. Peptidomic discovery of short open reading frame-encoded peptides in human cells Nat. Chem. Biol. 2012, 9 (1) 59– 64 DOI: 10.1038/nchembio.1120There is no corresponding record for this reference.
- 19Volders, P.-J.; Verheggen, K.; Menschaert, G.; Vandepoele, K.; Martens, L.; Vandesompele, J.; Mestdagh, P. An update on LNCipedia: a database for annotated human lncRNA sequences Nucleic Acids Res. 2015, 43 (Database issue) D174– D180 DOI: 10.1093/nar/gku1060There is no corresponding record for this reference.
- 20Brewis, I. A.; Brennan, P. Proteomics technologies for the global identification and quantification of proteins Adv. Protein Chem. Struct. Biol. 2010, 80, 1– 44 DOI: 10.1016/B978-0-12-381264-3.00001-120https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXotlyhtw%253D%253D&md5=a2b7a4d42a14605f8d8abf7bb6249632Proteomics technologies for the global identification and quantification of proteinsBrewis, Ian A.; Brennan, P.Advances in Protein Chemistry and Structural Biology (2010), 80 (), 1-44CODEN: APCSG7; ISSN:1876-1623. (Elsevier Ltd.)This review provides an introduction for the nonspecialist to proteomics and in particular the major approaches available for global protein identification and quantification. Proteomics technologies offer considerable opportunities for improved biol. understanding and biomarker discovery. The central platform for proteomics is tandem mass spectrometry (MS) but a no. of other technologies, resources, and expertise are absolutely required to perform meaningful expts. These include protein sepn. science (and protein biochem. in general), genomics, and bioinformatics. There are a range of workflows available for protein (or peptide) sepn. prior to tandem MS and subsequent bioinformatics anal. to achieve protein identifications. The predominant approaches are 2D electrophoresis (2DE) and subsequent MS, liq. chromatog.-MS (LC-MS), and GeLC-MS. Beyond protein identification, there are a no. of well-established options available for protein quantification. Difference gel electrophoresis (DIGE) following 2DE is one option but MS-based methods (most commonly iTRAQ-Isobaric Tags for Relative and Abs. Quantification or SILAC-Stable Isotope Labeling by Amino Acids) are now the preferred options. Sample prepn. is crit. to performing good expts. and subcellular fractionation can addnl. provide protein localization information compared with whole cell lysates. Differential detergent solubilization is another valid option. With biol. fluids, it is possible to remove the most abundant proteins by immunodepletion. Sample enrichment is also used extensively in certain analyses and most commonly in phosphoproteomics with the initial purifn. of phosphopeptides. Proteomics produces considerable datasets and resources to facilitate the necessary extended anal. of this data are improving all the time. Beyond the opportunities afforded by proteomics there are definite challenges to achieving full proteomic coverage. Proteomes are highly complex and identifying and quantifying low abundance proteins is a significant issue. Addnl., the anal. of poorly sol. proteins, such as membrane proteins and multiprotein complexes, is difficult. However, it is without doubt that proteomics has already provided significant insights into biol. function and this will continue as the technol. continues to improve. We also anticipate that the promise of proteomics in terms of biomarker discovery will increasingly be realized.
- 21Klie, S.; Martens, L.; Vizcaíno, J. A.; Côté, R.; Jones, P.; Apweiler, R.; Hinneburg, A.; Hermjakob, H. Analyzing large-scale proteomics projects with latent semantic indexing J. Proteome Res. 2008, 7 (1) 182– 191 DOI: 10.1021/pr070461k21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtlOgsrbP&md5=1a4d89b17bc7ecc2e4bb577c2f3750a8Analyzing Large-Scale Proteomics Projects with Latent Semantic IndexingKlie, Sebastian; Martens, Lennart; Vizcaino, Juan Antonio; Cote, Richard; Jones, Phil; Apweiler, Rolf; Hinneburg, Alexander; Hermjakob, HenningJournal of Proteome Research (2008), 7 (1), 182-191CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)Since the advent of public data repositories for proteomics data, readily accessible results from high-throughput expts. have been accumulating steadily. Several large-scale projects in particular have contributed substantially to the amt. of identifications available to the community. Despite the considerable body of information amassed, very few successful analyses have been performed and published on this data, leveling off the ultimate value of these projects far below their potential. A prominent reason published proteomics data is seldom reanalyzed lies in the heterogeneous nature of the original sample collection and the subsequent data recording and processing. To illustrate that at least part of this heterogeneity can be compensated for, we here apply a latent semantic anal. to the data contributed by the Human Proteome Organization's Plasma Proteome Project (hupo ppp). Interestingly, despite the broad spectrum of instruments and methodologies applied in the hupo ppp, our anal. reveals several obvious patterns that can be used to formulate concrete recommendations for optimizing proteomics project planning as well as the choice of technologies used in future expts. It is clear from these results that the anal. of large bodies of publicly available proteomics data by noise-tolerant algorithms such as the latent semantic anal. holds great promise and is currently underexploited.
- 22Leary, D. H.; Hervey, W. J.; Deschamps, J. R.; Kusterbeck, A. W.; Vora, G. J. Which metaproteome? The impact of protein extraction bias on metaproteomic analyses Mol. Cell. Probes 2013, 27 (5–6) 193– 199 DOI: 10.1016/j.mcp.2013.06.00322https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVOrtrjO&md5=e4abde58896e50dfd05c97813e031f05Which metaproteome? The impact of protein extraction bias on metaproteomic analysesLeary, Dagmar Hajkova; Hervey, W. Judson, IV; Deschamps, Jeffrey R.; Kusterbeck, Anne W.; Vora, Gary J.Molecular and Cellular Probes (2013), 27 (5-6), 193-199CODEN: MCPRE6; ISSN:0890-8508. (Elsevier Ltd.)Culture-independent techniques such as LC-MS/MS-based metaproteomic analyses are being increasingly utilized for the study of microbial compn. and function in complex environmental samples. Although several studies have documented the many challenges and sources of bias that must be considered in these types of analyses, none have systematically characterized the effect of protein extn. bias on the biol. interpretation of true environmental biofilm metaproteomes. In this study, we compared three protein extn. methods commonly used in the analyses of environmental samples [guanidine hydrochloride (GuHCl), B-PER, sequential citrate-phenol (SCP)] using nano-LC-MS/MS and an environmental marine biofilm to det. the unique biases introduced by each method and their effect on the interpretation of the derived metaproteomes. While the protein extn. efficiencies of the three methods ranged from 2.0 to 4.3%, there was little overlap in the sequence (1.9%), function (8.3% of total assigned protein families) and origin of the identified proteins from each ext. Each extn. method enriched for different protein families (GuHCl - photosynthesis, carbohydrate metab.; B-PER - membrane transport, oxidative stress; SCP - calcium binding, structural) while 23.7-45.4% of the identified proteins lacked SwissProt annotations. Taken together, the results demonstrated that even the most basic interpretations of this complex microbial assemblage (species compn., ratio of prokaryotic to eukaryotic proteins, predominant functions) varied with little overlap based on the protein extn. method employed. These findings demonstrate the heavy influence of protein extn. on biofilm metaproteomics and provide caveats for the interpretation of such data sets when utilizing single protein extn. methods for the description of complex microbial assemblages.
- 23Vizcaíno, J. A.; Côté, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013 Nucleic Acids Res. 2013, 41 (Database issue) D1063– D1069 DOI: 10.1093/nar/gks126223https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvV2ksb3P&md5=e49aba656ba6d88418202d9e54f67db0The Proteomics Identifications (PRIDE) database and associated tools: status in 2013Vizcaino, Juan Antonio; Cote, Richard G.; Csordas, Attila; Dianes, Jose A.; Fabregat, Antonio; Foster, Joseph M.; Griss, Johannes; Alpi, Emanuele; Birim, Melih; Contell, Javier; O'Kelly, Gavin; Schoenegger, Andreas; Ovelleiro, David; Perez-Riverol, Yasset; Reisinger, Florian; Rios, Daniel; Wang, Rui; Hermjakob, HenningNucleic Acids Research (2013), 41 (D1), D1063-D1069CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, esp. the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and anal. tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
- 24Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N. ProteomeXchange provides globally coordinated proteomics data submission and dissemination Nat. Biotechnol. 2014, 32 (3) 223– 226 DOI: 10.1038/nbt.283924https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXjvFyntrc%253D&md5=f173db74e09f40f829268af9dcc2c8a4ProteomeXchange provides globally coordinated proteomics data submission and disseminationVizcaino, Juan A.; Deutsch, Eric W.; Wang, Rui; Csordas, Attila; Reisinger, Florian; Rios, Daniel; Dianes, Jose A.; Sun, Zhi; Farrah, Terry; Bandeira, Nuno; Binz, Pierre-Alain; Xenarios, Ioannis; Eisenacher, Martin; Mayer, Gerhard; Gatto, Laurent; Campos, Alex; Chalkley, Robert J.; Kraus, Hans-Joachim; Albar, Juan Pablo; Martinez-Bartolome, Salvador; Apweiler, Rolf; Omenn, Gilbert S.; Martens, Lennart; Jones, Andrew R.; Hermjakob, HenningNature Biotechnology (2014), 32 (3), 223-226CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)ProteomeXchange provides an infrastructure for efficient and reliable public dissemination of proteomics data, supporting crucial validation, anal. and re-use.
- 25Hulstaert, N.; Reisinger, F.; Rameseder, J.; Barsnes, H.; Vizcaíno, J. A.; Martens, L. Pride-asap: automatic fragment ion annotation of identified PRIDE spectra J. Proteomics 2013, 95, 89– 92 DOI: 10.1016/j.jprot.2013.04.01125https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmsleqsr0%253D&md5=11233fe2e13d0ddc2ee23b782d96a173Pride-asap: Automatic fragment ion annotation of identified PRIDE spectraHulstaert, Niels; Reisinger, Florian; Rameseder, Jonathan; Barsnes, Harald; Vizcaino, Juan Antonio; Martens, LennartJournal of Proteomics (2013), 95 (), 89-92CODEN: JPORFQ; ISSN:1874-3919. (Elsevier B.V.)We present an open source software application and library written in Java that provides a uniform annotation of identified spectra stored in the PRIDE database. Pride-asap can be ran in a command line mode for automated processing of multiple PRIDE expts., but also has a graphical user interface that allows end users to annotate the spectra in PRIDE expts. and to inspect the results in detail.
- 26Vaudel, M.; Barsnes, H.; Berven, F. S.; Sickmann, A.; Martens, L. SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches Proteomics 2011, 11 (5) 996– 999 DOI: 10.1002/pmic.20100059526https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXitFGku74%253D&md5=89a2dbf4b774df7d893bf3df8cebe9a7SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searchesVaudel, Marc; Barsnes, Harald; Berven, Frode S.; Sickmann, Albert; Martens, LennartProteomics (2011), 11 (5), 996-999CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The identification of proteins by mass spectrometry is a std. technique in the field of proteomics, relying on search engines to perform the identifications of the acquired spectra. Here, we present a user-friendly, lightwt. and open-source graphical user interface called SearchGUI, for configuring and running the freely available OMSSA (open mass spectrometry search algorithm) and X!Tandem search engines simultaneously.
- 27Vaudel, M.; Burkhart, J. M.; Zahedi, R. P.; Oveland, E.; Berven, F. S.; Sickmann, A.; Martens, L.; Barsnes, H. PeptideShaker enables reanalysis of MS-derived proteomics data sets Nat. Biotechnol. 2015, 33 (1) 22– 24 DOI: 10.1038/nbt.310927https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXmtVyhsg%253D%253D&md5=526bffefe976c8cbd2e896fb5ba1c4a9PeptideShaker enables reanalysis of MS-derived proteomics data setsVaudel, Marc; Burkhart, Julia M.; Zahedi, Rene P.; Oveland, Eystein; Berven, Frode S.; Sickmann, Albert; Martens, Lennart; Barsnes, HaraldNature Biotechnology (2015), 33 (1), 22-24CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)To maximize the value of public proteomics data, reuse and repurposing must become straightforward, allowing the completion of the proteomics data cycle. Here we describe PeptideShaker, a proteomics informatics software that can be used at any stage in the proteomics data cycle for the anal. and interpretation of primary data, enabling data sharing and dissemination and re-anal. of publicly available proteomics data.
- 28UniProt Consortium Activities at the Universal Protein Resource (UniProt) Nucleic Acids Res. 2014, 42 (Database issue) D191– D198 DOI: 10.1093/nar/gkt1140There is no corresponding record for this reference.
- 29Martens, L.; Vandekerckhove, J.; Gevaert, K. DBToolkit: processing protein databases for peptide-centric proteomics Bioinformatics 2005, 21 (17) 3584– 3585 DOI: 10.1093/bioinformatics/bti58829https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXpsValsb8%253D&md5=142ac55c8a3ba473e9a9117dc7965dd2DBToolkit: processing protein databases for peptide-centric proteomicsMartens, Lennart; Vandekerckhove, Joel; Gevaert, KrisBioinformatics (2005), 21 (17), 3584-3585CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: DBToolkit is a user-friendly, easily extensible tool that allows the processing of protein sequence databases to peptide-centric sequence databases. This processing is primarily aimed at enhancing the useful information content of these databases for use as optimized search spaces for efficient identification of peptide fragmentation spectra obtained by mass spectrometry. In addn., DBToolkit can be used to reliably solve a range of other typical tasks in processing sequence databases.
- 30Vandermarliere, E.; Mueller, M.; Martens, L. Getting intimate with trypsin, the leading protease in proteomics Mass Spectrom. Rev. 2013, 32 (6) 453– 465 DOI: 10.1002/mas.2137630https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhs1Cit7%252FF&md5=2dfa62ad939751761c3f4b1ce62b47bbGetting intimate with trypsin, the leading protease in proteomicsVandermarliere, Elien; Mueller, Michael; Martens, LennartMass Spectrometry Reviews (2013), 32 (6), 453-465CODEN: MSRVD3; ISSN:0277-7037. (John Wiley & Sons, Inc.)A review. Nowadays, mass spectrometry-based proteomics is carried out primarily in a bottom-up fashion, with peptides obtained after proteolytic digest of a whole proteome lysate as the primary analytes instead of the proteins themselves. This exptl. setup crucially relies on a protease to digest an abundant and complex protein mixt. into a far more complex peptide mixt. Full knowledge of the working mechanism and specificity of the used proteases is therefore crucial, both for the digestion step itself as well as for the downstream identification and quantification of the (fragmentation) mass spectra acquired for the peptides in the mixt. Targeted protein anal. through selected reaction monitoring, a relative newcomer in the specific field of mass spectrometry-based proteomics, even requires a priori understanding of protease behavior for the proteins of interest. Because of the rapidly increasing popularity of proteomics as an anal. tool in the life sciences, there is now a renewed demand for detailed knowledge on trypsin, the workhorse protease in proteomics. This review addresses this need and provides an overview on the structure and working mechanism of trypsin, followed by a crit. anal. of its cleavage behavior, typically simply accepted to occur exclusively yet consistently after Arg and Lys, unless they are followed by a Pro. In this context, shortcomings in our ability to understand and predict the behavior of trypsin will be highlighted, along with the downstream implications. Furthermore, an anal. is carried out on the inherent shortcomings of trypsin with regard to whole proteome anal., and alternative approaches will be presented that can alleviate these issues. Finally, some reflections on the future of trypsin as the workhorse protease in mass spectrometry-based proteomics will be provided.
- 31Mustafa, G. M.; Larry, D.; Petersen, J. R.; Elferink, C. J. Targeted proteomics for biomarker discovery and validation of hepatocellular carcinoma in hepatitis C infected patients World J. Hepatol 2015, 7 (10) 1312– 1324 DOI: 10.4254/wjh.v7.i10.131231https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MbhsV2gsA%253D%253D&md5=34072825b4c7e015a203c14902dc03efTargeted proteomics for biomarker discovery and validation of hepatocellular carcinoma in hepatitis C infected patientsMustafa Gul M; Larry Denner; Petersen John R; Elferink Cornelis JWorld journal of hepatology (2015), 7 (10), 1312-24 ISSN:1948-5182.Hepatocellular carcinoma (HCC)-related mortality is high because early detection modalities are hampered by inaccuracy, expense and inherent procedural risks. Thus there is an urgent need for minimally invasive, highly specific and sensitive biomarkers that enable early disease detection when therapeutic intervention remains practical. Successful therapeutic intervention is predicated on the ability to detect the cancer early. Similar unmet medical needs abound in most fields of medicine and require novel methodological approaches. Proteomic profiling of body fluids presents a sensitive diagnostic tool for early cancer detection. Here we describe such a strategy of comparative proteomics to identify potential serum-based biomarkers to distinguish high-risk chronic hepatitis C virus infected patients from HCC patients. In order to compensate for the extraordinary dynamic range in serum proteins, enrichment methods that compress the dynamic range without surrendering proteome complexity can help minimize the problems associated with many depletion methods. The enriched serum can be resolved using 2D-difference in-gel electrophoresis and the spots showing statistically significant changes selected for identification by liquid chromatography-tandem mass spectrometry. Subsequent quantitative verification and validation of these candidate biomarkers represent an obligatory and rate-limiting process that is greatly enabled by selected reaction monitoring (SRM). SRM is a tandem mass spectrometry method suitable for identification and quantitation of target peptides within complex mixtures independent on peptide-specific antibodies. Ultimately, multiplexed SRM and dynamic multiple reaction monitoring can be utilized for the simultaneous analysis of a biomarker panel derived from support vector machine learning approaches, which allows monitoring a specific disease state such as early HCC. Overall, this approach yields high probability biomarkers for clinical validation in large patient cohorts and represents a strategy extensible to many diseases.
- 32Anderson, L.; Hunter, C. L. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins Mol. Cell. Proteomics 2005, 5 (4) 573– 588 DOI: 10.1074/mcp.M500331-MCP200There is no corresponding record for this reference.
- 33Schwanhäusser, B.; Busse, D.; Li, N.; Dittmar, G.; Schuchhardt, J.; Wolf, J.; Chen, W.; Selbach, M. Corrigendum: Global quantification of mammalian gene expression control Nature 2013, 495 (7439) 126– 127 DOI: 10.1038/nature1184833https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3svktlWktw%253D%253D&md5=a28543f3fd45a3be24143effab8ab7c6Corrigendum: Global quantification of mammalian gene expression controlSchwanhausser Bjorn; Busse Dorothea; Li Na; Dittmar Gunnar; Schuchhardt Johannes; Wolf Jana; Chen Wei; Selbach MatthiasNature (2013), 495 (7439), 126-7 ISSN:.There is no expanded citation for this reference.
- 34Guruprasad, K.; Reddy, B. V.; Pandit, M. W. Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence Protein Eng., Des. Sel. 1990, 4 (2) 155– 161 DOI: 10.1093/protein/4.2.155There is no corresponding record for this reference.
- 35Rinn, J. L.; Chang, H. Y. Genome regulation by long noncoding RNAs Annu. Rev. Biochem. 2012, 81, 145– 166 DOI: 10.1146/annurev-biochem-051410-09290235https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtVGls7bO&md5=ab67189225d0b0bd97832ee21499e992Genome regulation by long noncoding RNAsRinn, John L.; Chang, Howard Y.Annual Review of Biochemistry (2012), 81 (), 145-166CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews Inc.)A review. The central dogma of gene expression is that DNA is transcribed into mRNAs, which in turn serve as the template for protein synthesis. The discovery of extensive transcription of large RNA transcripts that do not code for proteins, termed long noncoding RNAs (lncRNAs), provides an important new perspective on the centrality of RNA in gene regulation. Here, we discuss genome-scale strategies to discover and characterize lncRNAs. An emerging theme from multiple model systems is that lncRNAs form extensive networks of ribonucleoprotein (RNP) complexes with numerous chromatin regulators and then target these enzymic activities to appropriate locations in the genome. Consistent with this notion, lncRNAs can function as modular scaffolds to specify higher-order organization in RNP complexes and in chromatin states. The importance of these modes of regulation is underscored by the newly recognized roles of long RNAs for proper gene control across all kingdoms of life.
- 36Wilhelm, M.; Schlegl, J.; Hahne, H.; Moghaddas Gholami, A.; Lieberenz, M.; Savitski, M. M.; Ziegler, E.; Butzmann, L.; Gessulat, S.; Marx, H. Mass-spectrometry-based draft of the human proteome Nature 2014, 509 (7502) 582– 587 DOI: 10.1038/nature1331936https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXoslCrt7k%253D&md5=b9e4b32b349b92160996c860de293191Mass-spectrometry-based draft of the human proteomeWilhelm, Mathias; Schlegl, Judith; Hahne, Hannes; Gholami, Amin Moghaddas; Lieberenz, Marcus; Savitski, Mikhail M.; Ziegler, Emanuel; Butzmann, Lars; Gessulat, Siegfried; Marx, Harald; Mathieson, Toby; Lemeer, Simone; Schnatbaum, Karsten; Reimer, Ulf; Wenschuh, Holger; Mollenhauer, Martin; Slotta-Huspenina, Julia; Boese, Joos-Hendrik; Bantscheff, Marcus; Gerstmair, Anja; Faerber, Franz; Kuster, BernhardNature (London, United Kingdom) (2014), 509 (7502), 582-587CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biol. information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time anal. of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estn. of the size of the protein-coding genome, and identified organ-specific proteins and a large no. of translated lincRNAs (long intergenic non-coding RNAs). Anal. of mRNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analyzing the compn. and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biol. insight and fosters the development of proteomic technol.
- 37Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. Statistical analysis of membrane proteome expression changes in Saccharomyces cerevisiae J. Proteome Res. 2006, 5 (9) 2339– 2347 DOI: 10.1021/pr060161n37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XnsV2gs7g%253D&md5=e46260674dba9947867173c251b5edffStatistical Analysis of Membrane Proteome Expression Changes in Saccharomyces cerevisiaeZybailov, Boris; Mosley, Amber L.; Sardiu, Mihaela E.; Coleman, Michael K.; Florens, Laurence; Washburn, Michael P.Journal of Proteome Research (2006), 5 (9), 2339-2347CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)The authors have devised an approach for analyzing shotgun proteomics datasets based on the normalized spectral abundance factor (NSAF) that can be used for quant. proteomics anal. Three biol. replicates of samples enriched for plasma membranes were isolated from S. cerevisiae grown in 14N-rich media and 15N-minimal media and analyzed via quant. multidimensional protein identification technol. (MudPIT). The natural log transformation of NSAF values from S. cerevisiae cells grown in 14N YPD media and 15N-minimal media had a normal distribution. The t-test anal. demonstrated 221 of 1316 proteins were significantly overexpressed in one or the other growth conditions with a p value <0.05. Notably, amino acid transporters were among the 14 membrane proteins that were significantly upregulated in cells grown in minimal media, and the authors functionally validated these increases in protein expression with radioisotope uptake assays for selected proteins.
- 38Schulz-Knappe, P.; Schrader, M.; Zucht, H.-D. The peptidomics concept Comb. Chem. High Throughput Screening 2005, 8 (8) 697– 704 DOI: 10.2174/13862070577496241838https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28Xlt1Sntg%253D%253D&md5=f0fec3d795b8785f80c41bf88c7e471dThe peptidomics conceptSchulz-Knappe, Peter; Schrader, Michael; Zucht, Hans-DieterCombinatorial Chemistry and High Throughput Screening (2005), 8 (8), 697-704CODEN: CCHSFU; ISSN:1386-2073. (Bentham Science Publishers Ltd.)A review. Peptides are a paramount example of how nature diversifies from one single gene to release multiple, regulated functionalities at the desired sites and time. To achieve this, peptides are sequentially generated by a complex network of more than 500 proteases, acting at intracellular sites, upon secretion, in extracellular environments, and, finally, serving (regulated) degrdn. This cycle of maturation, activation, and degrdn. points out that the peptidome is mechanistically linked to the proteome: the distribution between both is regulated by proteases and counter-regulated by protease inhibitors. Given the high diversity of peptides in living systems and their involvement in key regulatory processes, a need for improved peptide discovery, ideally combining peptide sequence identification with peptide profiling, has emerged. Std. proteomic approaches are not suitable for a systematic peptide anal., since they do not cover the low mol. mass window. The new direction in proteomic research to analyze this "terra incognita" is peptidomics. This novel concept aims at the comprehensive visualization and anal. of small polypeptides, thus covering the mass range between proteomics and metabonomics. The pacemakers for the development of peptidomics technologies are modern mass spectrometry and bioinformatics. They are ideally suited for sensitive and comprehensive peptide anal., esp. in combination with the massive information content of todays genomic and transcriptomic databases. Given the high diversity of native peptides in living systems, clin. chem. and modern medicine are the prime application areas. The discovery of relevant peptide biomarkers and drug targets will strongly benefit from peptidomics.
Supporting Information
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.7b00085.
Supplementary methods,; supplementary tables S-1 to S-4, showing overview of the obtained coverage, processed RNA-sequencing datasets, and overview of the PRIDE projects (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.