Proteomics Standards Initiative at Twenty Years: Current Activities and Future WorkClick to copy article linkArticle link copied!
- Eric W. Deutsch*Eric W. Deutsch*Eric W. Deutsch: Email: [email protected], Phone: 206-732-1200, Fax: 206-732-1299.Institute for Systems Biology, Seattle, Washington 98109, United StatesMore by Eric W. Deutsch
- Juan Antonio VizcaínoJuan Antonio VizcaínoEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United KingdomMore by Juan Antonio Vizcaíno
- Andrew R. Jones*Andrew R. Jones*Andrew R. Jones: Email: [email protected]Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United KingdomMore by Andrew R. Jones
- Pierre-Alain BinzPierre-Alain BinzClinical Chemistry Service, Lausanne University Hospital, 1011 976 Lausanne, SwitzerlandMore by Pierre-Alain Binz
- Henry LamHenry LamDepartment of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, P. R. China.More by Henry Lam
- Joshua KleinJoshua KleinProgram for Bioinformatics, Boston University, Boston, Massachusetts 02215, United StatesMore by Joshua Klein
- Wout BittremieuxWout BittremieuxSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United StatesDepartment of Computer Science, University of Antwerp, 2020 Antwerpen, BelgiumMore by Wout Bittremieux
- Yasset Perez-RiverolYasset Perez-RiverolEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United KingdomMore by Yasset Perez-Riverol
- David L. TabbDavid L. TabbSA MRC Centre for TB Research, DST/NRF Centre of Excellence for Biomedical TB Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town 7602, South AfricaMore by David L. Tabb
- Mathias WalzerMathias WalzerEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United KingdomMore by Mathias Walzer
- Sylvie Ricard-BlumSylvie Ricard-BlumUniv. Lyon, Université Lyon 1, ICBMS, UMR 5246, 69622 Villeurbanne, FranceMore by Sylvie Ricard-Blum
- Henning HermjakobHenning HermjakobEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United KingdomMore by Henning Hermjakob
- Steffen NeumannSteffen NeumannBioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry, 06120 Halle, GermanyGerman Centre for Integrative Biodiversity Research (iDiv), 04103 Halle-Jena-Leipzig, GermanyMore by Steffen Neumann
- Tytus D. MakTytus D. MakMass Spectrometry Data Center, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United StatesMore by Tytus D. Mak
- Shin KawanoShin KawanoDatabase Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, JapanFaculty of Contemporary Society, Toyama University of International Studies, Toyama 930-1292, JapanSchool of Frontier Engineering, Kitasato University, Sagamihara 252-0373, JapanMore by Shin Kawano
- Luis MendozaLuis MendozaInstitute for Systems Biology, Seattle, Washington 98109, United StatesMore by Luis Mendoza
- Tim Van Den BosscheTim Van Den BosscheVIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, BelgiumDepartment of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, BelgiumMore by Tim Van Den Bossche
- Ralf GabrielsRalf GabrielsVIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, BelgiumDepartment of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, BelgiumMore by Ralf Gabriels
- Nuno BandeiraNuno BandeiraSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United StatesCenter for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United StatesMore by Nuno Bandeira
- Jeremy CarverJeremy CarverCenter for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United StatesMore by Jeremy Carver
- Benjamin PullmanBenjamin PullmanCenter for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United StatesMore by Benjamin Pullman
- Zhi Sun
- Nils HoffmannNils HoffmannInstitute for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich GmbH, 52428 Jülich, GermanyMore by Nils Hoffmann
- Jim ShofstahlJim ShofstahlThermo Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United StatesMore by Jim Shofstahl
- Yunping ZhuYunping ZhuNational Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, ChinaMore by Yunping Zhu
- Luana LicataLuana LicataFondazione Human Technopole, 20157 Milan, ItalyDepartment of Biology, University of Rome Tor Vergata, 00133 Rome, ItalyMore by Luana Licata
- Federica QuagliaFederica QuagliaInstitute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), 70126 Bari, ItalyDepartment of Biomedical Sciences, University of Padova, 35131 Padova, ItalyMore by Federica Quaglia
- Silvio C. E. TosattoSilvio C. E. TosattoDepartment of Biomedical Sciences, University of Padova, 35131 Padova, ItalyMore by Silvio C. E. Tosatto
- Sandra E. OrchardSandra E. OrchardEuropean Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United KingdomMore by Sandra E. Orchard
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
This publication is licensed under
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
License Summary*
You are free to share(copy and redistribute) this article in any medium or format and to adapt(remix, transform, and build upon) the material for any purpose, even commercially within the parameters below:
Creative Commons (CC): This is a Creative Commons license.
Attribution (BY): Credit must be given to the creator.
*Disclaimer
This summary highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. Carefully review the actual license before using these materials.
SPECIAL ISSUE
This article is part of the
Introduction
Working Groups | Guidelines | v. | Formats | v. | Controlled Vocabularies | v. |
---|---|---|---|---|---|---|
Molecular Interactions | MIMIx | 1.1.2 | PSI-MI XML | 2.5.4 | PSI-MI CV | 2.5.0 |
MIABE | 1.0.0 | PSI-MI XML | 3.0.0 | |||
MIAPAR | 1.0.0 | MITAB | 2.7, 2.8 | |||
Mass Spectrometry | Mass spectrometry (MIAPE-MS) | 2.98 | mzML | 1.1.0 | PSI-MS | 4.0.15 |
TraML | 1.0.0 | XLMOD | 1.1.0 | |||
Proteomics Informatics | Identification (MIAPE-MSI) | 1.1 | mzIdentML | 1.2.0 | ||
Mass spectrometry Quantification (MIAPE-Quant) | 1.0 | mzQuantML | 1.0.1 | |||
mzTab | 1.0.0 | |||||
mzTab-M | 2.0.0 | |||||
proBed | 1.0.0 | |||||
proBAM | 1.0.0 | |||||
PEFF | 1.0.0 | |||||
USI | 1.0.0 | |||||
ProXI (under development) | ||||||
ProForma | 2.0 | |||||
mzSpecLib (under development) | ||||||
Quality Control | mzQC (PSI spec. under development) | |||||
Protein Modifications | PSI-MOD | 1.031.6 | ||||
Intrinsically Disordered Proteins | MIADE (under development) |
Links to documentation about each standard can be obtained from https://www.psidev.info/.
Operation of the HUPO-PSI
2022 PSI Spring Workshop
Controlled Vocabularies and Ontologies
Guidelines
Existing Standard Data Formats
Figure 1
Figure 1. Overview of the formats developed by the Molecular Interactions Working Group and their relationships to other components in the community. Logo courtesy of IMEx.
Figure 2
Figure 2. Overview of the formats of the Mass Spectrometry Working Group and the Proteome Informatics Working Group and their relationships to other components in the community. Logo courtesy of the Proteomics Standards Initiative and ProteomeXchange.
Molecular Interactions Working Group
PSI-MI XML
MITAB
MI-JSON
Java Software Library JAMI
Mass Spectrometry and Proteomics Informatics Working Groups
mzML
Ongoing Work: mzML Extension for DIA and IMS Data
mzIdentML
Ongoing Work: mzIdentML Extension for Glycopeptide and Cross-Linked Peptide Data
mzTab
proBAM and proBed
PEFF
ProForma 2.0
Universal Spectrum Identifier (USI)
Related Formats
ProteomeXchange XML (PX XML)
MAGE-TAB for Proteomics (SDRF-Proteomics and IDF)
Disused Formats
New Standards in Development
mzQC
mzSpecLib
mzPAF
ProXI
PTM Site Formats
MIADE
Future Work and Synergies with Other Organizations
Conclusion
Acknowledgments
E.W.D. acknowledges funding from the National Institutes of Health (NIH) grants R01 GM087221, R24 GM127667, U19 AG023122, and from the National Science Foundation grants DBI-1933311, and IOS-1922871. J.A.V. wants to acknowledge the funding received from BBSRC [BB/S01781X/1, BB/T019670/1,BB/N022440/1, BB/K01997X/1, BB/L024225/1, BB/V018779/1], Wellcome [208391/Z/17/Z, 223745/Z/21/Z], NIH [R24 GM127667-01], ELIXIR implementation studies and EMBL core funding. A.R.J. acknowledges funding from BBSRC [BB/T019557/1, BB/S01781X/1, BB/R02216X/1, BB/L024128/1, BB/K01997X/1]. S.K. acknowledges funding from the JST NBDC grant [18063028] and JSPS KAKENHI [20H03245]. R.G. received funding from the Research Foundation Flanders (FWO) [1S50918N]. S.E.O. was supported by was supported by the National Human Genome Research Institute (NHGRI), Office of Director (OD/DPCPSI/ODSS), National Institute of Allergy and Infectious Diseases (NIAID), National Institute on Aging (NIA), National Institute of General Medical Sciences (NIGMS), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Eye Institute (NEI), National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health under Award Number [U24HG007822] (the content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health) and EMBL core funding. N.H. and S.N. acknowledge funding by the Bundesministerium für Bildung und Forschung (de.NBI/BMBF 031L0108A and de.NBI/BMBF 031L0107, respectively). S.C.E.T. received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 778247 as well as ELIXIR implementation studies. N.B. acknowledges funding from the National Institutes of Health (1R01LM013115) and National Science Foundation (ABI 1759980). Y.Z. acknowledges funding from the Chinese National Infrastructure for Protein Science (Beijing), and National Key Research and Development Program (2021YFA1301603).
References
This article references 78 other publications.
- 1Hebert, A. S.; Richards, A. L.; Bailey, D. J.; Ulbrich, A.; Coughlin, E. E.; Westphall, M. S.; Coon, J. J. The One Hour Yeast Proteome. Mol. Cell Proteomics 2014, 13 (1), 339– 347, DOI: 10.1074/mcp.M113.034769Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitlSksg%253D%253D&md5=8821587fa3433e1979060578c4555eddThe One Hour Yeast ProteomeHebert, Alexander S.; Richards, Alicia L.; Bailey, Derek J.; Ulbrich, Arne; Coughlin, Emma E.; Westphall, Michael S.; Coon, Joshua J.Molecular & Cellular Proteomics (2014), 13 (1), 339-347CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)We describe the comprehensive anal. of the yeast proteome in just over one hour of optimized anal. We achieve this expedited proteome characterization with improved sample prepn., chromatog. sepns., and by using a new Orbitrap hybrid mass spectrometer equipped with a mass filter, a collision cell, a high-field Orbitrap analyzer, and, finally, a dual cell linear ion trap analyzer (Q-OT-qIT, Orbitrap Fusion). This system offers high MS2 acquisition speed of 20 Hz and detects up to 19 peptide sequences within a single second of operation. Over a 1.3 h chromatog. method, the Q-OT-qIT hybrid collected an av. of 13,447 MS1 and 80,460 MS2 scans (per run) to produce 43,400 (‾x) peptide spectral matches and 34,255 (‾x) peptides with unique amino acid sequences (1% false discovery rate (FDR)). On av., each one hour anal. achieved detection of 3,977 proteins (1% FDR). We conclude that further improvements in mass spectrometer scan rate could render comprehensive anal. of the human proteome within a few hours.
- 2Huttlin, E. L.; Ting, L.; Bruckner, R. J.; Gebreab, F.; Gygi, M. P.; Szpyt, J.; Tam, S.; Zarraga, G.; Colby, G.; Baltier, K.; Dong, R.; Guarani, V.; Vaites, L. P.; Ordureau, A.; Rad, R.; Erickson, B. K.; Wühr, M.; Chick, J.; Zhai, B.; Kolippakkam, D.; Mintseris, J.; Obar, R. A.; Harris, T.; Artavanis-Tsakonas, S.; Sowa, M. E.; De Camilli, P.; Paulo, J. A.; Harper, J. W.; Gygi, S. P. The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell 2015, 162 (2), 425– 440, DOI: 10.1016/j.cell.2015.06.043Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXht1KgtL3I&md5=79b8d96037646f6679baab3b966b9d47The BioPlex Network: A Systematic Exploration of the Human InteractomeHuttlin, Edward L.; Ting, Lily; Bruckner, Raphael J.; Gebreab, Fana; Gygi, Melanie P.; Szpyt, John; Tam, Stanley; Zarraga, Gabriela; Colby, Greg; Baltier, Kurt; Dong, Rui; Guarani, Virginia; Vaites, Laura Pontano; Ordureau, Alban; Rad, Ramin; Erickson, Brian K.; Wuhr, Martin; Chick, Joel; Zhai, Bo; Kolippakkam, Deepak; Mintseris, Julian; Obar, Robert A.; Harris, Tim; Artavanis-Tsakonas, Spyros; Sowa, Mathew E.; De Camilli, Pietro; Paulo, Joao A.; Harper, J. Wade; Gygi, Steven P.Cell (Cambridge, MA, United States) (2015), 162 (2), 425-440CODEN: CELLB5; ISSN:0092-8674. (Cell Press)Protein interactions form a network whose structure drives cellular function and whose organization informs biol. inquiry. Using high-throughput affinity-purifn. mass spectrometry, the authors identify interacting partners for 2594 human proteins in HEK293T cells. The resulting network (BioPlex) contains 23,744 interactions among 7668 proteins with 86% previously undocumented. BioPlex accurately depicts known complexes, attaining 80%-100% coverage for most CORUM complexes. The network readily subdivides into communities that correspond to complexes or clusters of functionally related proteins. More generally, network architecture reflects cellular localization, biol. process, and mol. function, enabling functional characterization of thousands of proteins. Network structure also reveals assocns. among thousands of protein domains, suggesting a basis for examg. structurally related proteins. Finally, BioPlex, in combination with other approaches, can be used to reveal interactions of biol. or clin. significance. For example, mutations in the membrane protein VAPB implicated in familial amyotrophic lateral sclerosis perturb a defined community of interactors.
- 3Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N.; Binz, P.-A.; Xenarios, I.; Eisenacher, M.; Mayer, G.; Gatto, L.; Campos, A.; Chalkley, R. J.; Kraus, H.-J.; Albar, J. P.; Martinez-Bartolomé, S.; Apweiler, R.; Omenn, G. S.; Martens, L.; Jones, A. R.; Hermjakob, H. ProteomeXchange Provides Globally Coordinated Proteomics Data Submission and Dissemination. Nat. Biotechnol. 2014, 32 (3), 223– 226, DOI: 10.1038/nbt.2839Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXjvFyntrc%253D&md5=f173db74e09f40f829268af9dcc2c8a4ProteomeXchange provides globally coordinated proteomics data submission and disseminationVizcaino, Juan A.; Deutsch, Eric W.; Wang, Rui; Csordas, Attila; Reisinger, Florian; Rios, Daniel; Dianes, Jose A.; Sun, Zhi; Farrah, Terry; Bandeira, Nuno; Binz, Pierre-Alain; Xenarios, Ioannis; Eisenacher, Martin; Mayer, Gerhard; Gatto, Laurent; Campos, Alex; Chalkley, Robert J.; Kraus, Hans-Joachim; Albar, Juan Pablo; Martinez-Bartolome, Salvador; Apweiler, Rolf; Omenn, Gilbert S.; Martens, Lennart; Jones, Andrew R.; Hermjakob, HenningNature Biotechnology (2014), 32 (3), 223-226CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)ProteomeXchange provides an infrastructure for efficient and reliable public dissemination of proteomics data, supporting crucial validation, anal. and re-use.
- 4Deutsch, E. W.; Csordas, A.; Sun, Z.; Jarnuczak, A.; Perez-Riverol, Y.; Ternent, T.; Campbell, D. S.; Bernal-Llinares, M.; Okuda, S.; Kawano, S.; Moritz, R. L.; Carver, J. J.; Wang, M.; Ishihama, Y.; Bandeira, N.; Hermjakob, H.; Vizcaíno, J. A. The ProteomeXchange Consortium in 2017: Supporting the Cultural Change in Proteomics Public Data Deposition. Nucleic Acids Res. 2017, 45 (D1), D1100– D1106, DOI: 10.1093/nar/gkw936Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhslWhs7o%253D&md5=bc5fa349c3685fccc4626dcb11d86986The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data depositionDeutsch, Eric W.; Csordas, Attila; Sun, Zhi; Jarnuczak, Andrew; Perez-Riverol, Yasset; Ternent, Tobias; Campbell, David S.; Bernal-Llinares, Manuel; Okuda, Shujiro; Kawano, Shin; Moritz, Robert L.; Carver, Jeremy J.; Wang, Mingxun; Ishihama, Yasushi; Bandeira, Nuno; Hermjakob, Henning; Vizcaino, Juan AntonioNucleic Acids Research (2017), 45 (D1), D1100-D1106CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components. We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted std., supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approx. half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.
- 5Deutsch, E. W.; Bandeira, N.; Sharma, V.; Perez-Riverol, Y.; Carver, J. J.; Kundu, D. J.; García-Seisdedos, D.; Jarnuczak, A. F.; Hewapathirana, S.; Pullman, B. S.; Wertz, J.; Sun, Z.; Kawano, S.; Okuda, S.; Watanabe, Y.; Hermjakob, H.; MacLean, B.; MacCoss, M. J.; Zhu, Y.; Ishihama, Y.; Vizcaíno, J. A. The ProteomeXchange Consortium in 2020: Enabling “big Data” Approaches in Proteomics. Nucleic Acids Res. 2019, 48 (D1), D1145– D1152, DOI: 10.1093/nar/gkz984Google ScholarThere is no corresponding record for this reference.
- 6Porras, P.; Barrera, E.; Bridge, A.; Del-Toro, N.; Cesareni, G.; Duesbury, M.; Hermjakob, H.; Iannuccelli, M.; Jurisica, I.; Kotlyar, M.; Licata, L.; Lovering, R. C.; Lynn, D. J.; Meldal, B.; Nanduri, B.; Paneerselvam, K.; Panni, S.; Pastrello, C.; Pellegrini, M.; Perfetto, L.; Rahimzadeh, N.; Ratan, P.; Ricard-Blum, S.; Salwinski, L.; Shirodkar, G.; Shrivastava, A.; Orchard, S. Towards a Unified Open Access Dataset of Molecular Interactions. Nat. Commun. 2020, 11 (1), 6144, DOI: 10.1038/s41467-020-19942-zGoogle Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXisFWks7zM&md5=f3db054ef193afaa904cbe539a692d93Towards a unified open access dataset of molecular interactionsPorras, Pablo; Barrera, Elisabet; Bridge, Alan; del-Toro, Noemi; Cesareni, Gianni; Duesbury, Margaret; Hermjakob, Henning; Iannuccelli, Marta; Jurisica, Igor; Kotlyar, Max; Licata, Luana; Lovering, Ruth C.; Lynn, David J.; Meldal, Birgit; Nanduri, Bindu; Paneerselvam, Kalpana; Panni, Simona; Pastrello, Chiara; Pellegrini, Matteo; Perfetto, Livia; Rahimzadeh, Negin; Ratan, Prashansa; Ricard-Blum, Sylvie; Salwinski, Lukasz; Shirodkar, Gautam; Shrivastava, Anjalia; Orchard, SandraNature Communications (2020), 11 (1), 6144CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)The International Mol. Exchange (IMEx) Consortium provides scientists with a single body of exptl. verified protein interactions curated in rich contextual detail to an internationally agreed std. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Addnl., we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.
- 7Wilkinson, M. D.; Dumontier, M.; Aalbersberg, I. J. J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da Silva Santos, L. B.; Bourne, P. E.; Bouwman, J.; Brookes, A. J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C. T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A. J. G.; Groth, P.; Goble, C.; Grethe, J. S.; Heringa, J.; ’t Hoen, P. A. C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S. J.; Martone, M. E.; Mons, A.; Packer, A. L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S.-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M. A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 160018, DOI: 10.1038/sdata.2016.18Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC28bjslyrtQ%253D%253D&md5=e4ce8cf366db2280e54eb0168940720bThe FAIR Guiding Principles for scientific data management and stewardshipWilkinson Mark D; Dumontier Michel; Aalbersberg I Jsbrand Jan; Appleton Gabrielle; Dumon Olivier; Groth Paul; Strawn George; Axton Myles; Baak Arie; Blomberg Niklas; Boiten Jan-Willem; da Silva Santos Luiz Bonino; Bourne Philip E; Bouwman Jildau; Brookes Anthony J; Clark Tim; Crosas Merce; Dillo Ingrid; Edmunds Scott; Evelo Chris T; Finkers Richard; Gonzalez-Beltran Alejandra; Rocca-Serra Philippe; Sansone Susanna-Assunta; Gray Alasdair J G; Goble Carole; Grethe Jeffrey S; Heringa Jaap; Kok Ruben; 't Hoen Peter A C; Hooft Rob; Kuhn Tobias; Kok Joost; Lusher Scott J; Mons Barend; Martone Maryann E; Mons Albert; Packer Abel L; Persson Bengt; Roos Marco; Thompson Mark; van Schaik Rene; Schultes Erik; Sengstag Thierry; Slater Ted; Swertz Morris A; van der Lei Johan; van Mulligen Erik; Mons Barend; Velterop Jan; Waagmeester Andra; Wittenburg Peter; Wolstencroft Katherine; Zhao Jun; Mons BarendScientific data (2016), 3 (), 160018 ISSN:.There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
- 8Wood-Charlson, E. M.; Crockett, Z.; Erdmann, C.; Arkin, A. P.; Robinson, C. B. Ten Simple Rules for Getting and Giving Credit for Data. PLoS Comput. Biol. 2022, 18 (9), e1010476 DOI: 10.1371/journal.pcbi.1010476Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xis1SktLzJ&md5=2564f0e0ffb3245a83e99e07b18d3186Ten simple rules for getting and giving credit for dataWood-Charlson, Elisha M.; Crockett, Zachary; Erdmann, Chris; Arkin, Adam P.; Robinson, Carly B.PLoS Computational Biology (2022), 18 (9), e1010476CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)There is no expanded citation for this reference.
- 9Hanash, S.; Celis, J. E. The Human Proteome Organization: A Mission to Advance Proteome Knowledge. Mol. Cell Proteomics 2002, 1 (6), 413– 414, DOI: 10.1074/mcp.R200002-MCP200Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XmsV2jt7s%253D&md5=1b5c07982bee9074e38412d0f85a22a9The Human Proteome Organization. A mission to advance proteome knowledgeHanash, Sam; Celis, Julio E.Molecular and Cellular Proteomics (2002), 1 (6), 413-414CODEN: MCPOBS; ISSN:1535-9476. (American Society for Biochemistry and Molecular Biology, Inc.)There is no expanded citation for this reference.
- 10Orchard, S.; Hermjakob, H.; Apweiler, R. The Proteomics Standards Initiative. Proteomics 2003, 3 (7), 1374– 1376, DOI: 10.1002/pmic.200300496Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXmtF2rt70%253D&md5=e7a6e612d813e747dc8720d7093400b5The proteomics standards initiativeOrchard, Sandra; Hermjakob, Henning; Apweiler, RolfProteomics (2003), 3 (7), 1374-1376CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)A review. The Proteomics Stds. Initiative (PSI) aims to define community stds. for data representation in proteomics and to facilitate data comparison, exchange and verification. Progress has been made in the development of common stds. for data exchange in the fields of both mass spectrometry and protein-protein interaction. A proteomics-specific extension is being created for the emerging American Society for Tests and Measurements mass spectrometry std., which will be supported by manufacturers of both hardware and software. A data model for proteomics experimentation is under development and discussions on a public repository for published proteomics data are underway. The Protein-Protein Interactions group expects to publish the Level 1 PSI data exchange format for protein-protein interactions soon and discussions as to the content of Level 2 have been initiated.
- 11Deutsch, E. W.; Orchard, S.; Binz, P.-A.; Bittremieux, W.; Eisenacher, M.; Hermjakob, H.; Kawano, S.; Lam, H.; Mayer, G.; Menschaert, G.; Perez-Riverol, Y.; Salek, R. M.; Tabb, D. L.; Tenzer, S.; Vizcaíno, J. A.; Walzer, M.; Jones, A. R. Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J. Proteome Res. 2017, 16 (12), 4288– 4298, DOI: 10.1021/acs.jproteome.7b00370Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtl2gsrnM&md5=b1fb5a59898dfe02fa872db2d66238f5Proteomics Standards Initiative: Fifteen Years of Progress and Future WorkDeutsch, Eric W.; Orchard, Sandra; Binz, Pierre-Alain; Bittremieux, Wout; Eisenacher, Martin; Hermjakob, Henning; Kawano, Shin; Lam, Henry; Mayer, Gerhard; Menschaert, Gerben; Perez-Riverol, Yasset; Salek, Reza M.; Tabb, David L.; Tenzer, Stefan; Vizcaino, Juan Antonio; Walzer, Mathias; Jones, Andrew R.Journal of Proteome Research (2017), 16 (12), 4288-4298CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)A review. The Proteomics Stds. Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community stds. and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community stds. via special workshops and ongoing work. Among the existing, ratified stds., the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Min. Information About a Proteomics Expt.) guidelines with the advance of new technologies and techniques. Further, new stds. are currently either in the final stages of completion (proBed and proBAM for proteogenomics results, as well as PEFF) or in early stages of design (a spectral library std. format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PROXI) web services Application Programming Interface). The authors review the current status of all these aspects of the PSI, describe synergies with other efforts such as the ProteomeXchange Consortium, the Human Proteome Project, and the metabolomics community, and provide a look at future directions of the PSI.
- 12del-Toro, N.; Dumousseau, M.; Orchard, S.; Jimenez, R. C.; Galeota, E.; Launay, G.; Goll, J.; Breuer, K.; Ono, K.; Salwinski, L.; Hermjakob, H. A New Reference Implementation of the PSICQUIC Web Service. Nucleic Acids Res. 2013, 41 (Web Server issue), W601– W606, DOI: 10.1093/nar/gkt392Google ScholarThere is no corresponding record for this reference.
- 13Vizcaíno, J. A.; Martens, L.; Hermjakob, H.; Julian, R. K.; Paton, N. W. The PSI Formal Document Process and Its Implementation on the PSI Website. Proteomics 2007, 7 (14), 2355– 2357, DOI: 10.1002/pmic.200700064Google Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXos1GhtLs%253D&md5=ae51e625281a0f4ff396728d26f44433The PSI formal document process and its implementation on the PSI websiteVizcaino, Juan Antonio; Martens, Lennart; Hermjakob, Henning; Julian, Randall K.; Paton, Norman W.Proteomics (2007), 7 (14), 2355-2357CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The Human Proteome Organization's Proteomics Stds. Initiative (HUPO-PSI) has recently developed formal document processes for reviewing MIAPE documents, specifications, community practice and informational documents. These document work flows rely on community participation as well as more traditional expert review. We here present the web interface used to support these document processes, and explain briefly how interested parties can participate in the review process.
- 14Mayer, G.; Jones, A. R.; Binz, P.-A.; Deutsch, E. W.; Orchard, S.; Montecchi-Palazzi, L.; Vizcaíno, J. A.; Hermjakob, H.; Oveillero, D.; Julian, R.; Stephan, C.; Meyer, H. E.; Eisenacher, M. Controlled Vocabularies and Ontologies in Proteomics: Overview, Principles and Practice. Biochim. Biophys. Acta 2014, 1844 (1 Pt A), 98– 107, DOI: 10.1016/j.bbapap.2013.02.017Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmslensL8%253D&md5=188884b8b78cda6ceeca90915bc08065Controlled vocabularies and ontologies in proteomics: Overview, principles and practiceMayer, Gerhard; Jones, Andrew R.; Binz, Pierre-Alain; Deutsch, Eric W.; Orchard, Sandra; Montecchi-Palazzi, Luisa; Vizcaino, Juan Antonio; Hermjakob, Henning; Oveillero, David; Julian, Randall; Stephan, Christian; Meyer, Helmut E.; Eisenacher, MartinBiochimica et Biophysica Acta, Proteins and Proteomics (2014), 1844 (1PA), 98-107CODEN: BBAPBW; ISSN:1570-9639. (Elsevier B. V.)A review. This paper focuses on the use of controlled vocabularies (CVs) and ontologies esp. in the area of proteomics, primarily related to the work of the Proteomics Stds. Initiative (PSI). It describes the relevant proteomics std. formats and the ontologies used within them. Software and tools for working with these ontol. files are also discussed. The article also examines the "mapping files" used to ensure correct controlled vocabulary terms that are placed within PSI stds. and the fulfillment of the MIAPE (Min. Information about a Proteomics Expt.) requirements.
- 15Hermjakob, H.; Montecchi-Palazzi, L.; Bader, G.; Wojcik, J.; Salwinski, L.; Ceol, A.; Moore, S.; Orchard, S.; Sarkans, U.; von Mering, C.; Roechert, B.; Poux, S.; Jung, E.; Mersch, H.; Kersey, P.; Lappe, M.; Li, Y.; Zeng, R.; Rana, D.; Nikolski, M.; Husi, H.; Brun, C.; Shanker, K.; Grant, S. G. N.; Sander, C.; Bork, P.; Zhu, W.; Pandey, A.; Brazma, A.; Jacq, B.; Vidal, M.; Sherman, D.; Legrain, P.; Cesareni, G.; Xenarios, I.; Eisenberg, D.; Steipe, B.; Hogue, C.; Apweiler, R. The HUPO PSI’s Molecular Interaction Format-a Community Standard for the Representation of Protein Interaction Data. Nat. Biotechnol. 2004, 22 (2), 177– 183, DOI: 10.1038/nbt926Google Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXnvFeisA%253D%253D&md5=9957d59263f92eb212d93ba90bb7827bThe HUPO PSI's Molecular Interaction format-a community standard for the representation of protein interaction dataHermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jerome; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K.; Grant, Seth G. N.; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, RolfNature Biotechnology (2004), 22 (2), 177-183CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)A major goal of proteomics is the complete description of the protein interaction network underlying cell physiol. A large no. of small scale and, more recently, large-scale expts. have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across expts. is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community std. data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Stds. Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomol. Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Ref. Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Mol. Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
- 16Mayer, G.; Montecchi-Palazzi, L.; Ovelleiro, D.; Jones, A. R.; Binz, P.-A.; Deutsch, E. W.; Chambers, M.; Kallhardt, M.; Levander, F.; Shofstahl, J.; Orchard, S.; Vizcaíno, J. A.; Hermjakob, H.; Stephan, C.; Meyer, H. E.; Eisenacher, M. HUPO-PSI Group. The HUPO Proteomics Standards Initiative- Mass Spectrometry Controlled Vocabulary. Database (Oxford) 2013, 2013, bat009, DOI: 10.1093/database/bat009Google ScholarThere is no corresponding record for this reference.
- 17Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W. H.; Römpp, A.; Neumann, S.; Pizarro, A. D.; Montecchi-Palazzi, L.; Tasman, N.; Coleman, M.; Reisinger, F.; Souda, P.; Hermjakob, H.; Binz, P.-A.; Deutsch, E. W. MzML-a Community Standard for Mass Spectrometry Data. Mol. Cell Proteomics 2011, 10 (1), R110.000133, DOI: 10.1074/mcp.R110.000133Google ScholarThere is no corresponding record for this reference.
- 18Côté, R. G.; Jones, P.; Apweiler, R.; Hermjakob, H. The Ontology Lookup Service, a Lightweight Cross-Platform Tool for Controlled Vocabulary Queries. BMC Bioinformatics 2006, 7, 97, DOI: 10.1186/1471-2105-7-97Google Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD287nslGqtw%253D%253D&md5=1cbf39736a2e527498098988be7ddb17The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queriesCote Richard G; Jones Philip; Apweiler Rolf; Hermjakob HenningBMC bioinformatics (2006), 7 (), 97 ISSN:.BACKGROUND: With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries. RESULTS: The Ontology Lookup Service (OLS) was created to integrate publicly available biomedical ontologies into a single database. All modified ontologies are updated daily. A list of currently loaded ontologies is available online. The database can be queried to obtain information on a single term or to browse a complete ontology using AJAX. Auto-completion provides a user-friendly search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology or subsets of it. A programmatic interface is available to query the webservice using SOAP. The service is described by a WSDL descriptor file available online. A sample Java client to connect to the webservice using SOAP is available for download from SourceForge. All OLS source code is publicly available under the open source Apache Licence. CONCLUSION: The OLS provides a user-friendly single entry point for publicly available ontologies in the Open Biomedical Ontology (OBO) format. It can be accessed interactively or programmatically at http://www.ebi.ac.uk/ontology-lookup/.
- 19Perez-Riverol, Y.; Ternent, T.; Koch, M.; Barsnes, H.; Vrousgou, O.; Jupp, S.; Vizcaíno, J. A. OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets. Proteomics 2017, 17 (19), 1700244, DOI: 10.1002/pmic.201700244Google ScholarThere is no corresponding record for this reference.
- 20Whetzel, P. L.; Noy, N. F.; Shah, N. H.; Alexander, P. R.; Nyulas, C.; Tudorache, T.; Musen, M. A. BioPortal: Enhanced Functionality via New Web Services from the National Center for Biomedical Ontology to Access and Use Ontologies in Software Applications. Nucleic Acids Res. 2011, 39 (Web Server issue), W541– 545, DOI: 10.1093/nar/gkr469Google Scholar20https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOmsLw%253D&md5=a2f753b47f8210b77bc121583f223f62BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applicationsWhetzel, Patricia L.; Noy, Natalya F.; Shah, Nigam H.; Alexander, Paul R.; Nyulas, Csongor; Tudorache, Tania; Musen, Mark A.Nucleic Acids Research (2011), 39 (Web Server), W541-W545CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The National Center for Biomedical Ontol. (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontol.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontol. content by providing features to add mappings between terms, to add comments linked to specific ontol. terms and to provide ontol. reviews. The NCBO Web services (http://www.bioontol.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontol. Language (OWL) and Open Biol. and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontol. content, from getting all terms in an ontol. to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.
- 21Montecchi-Palazzi, L.; Beavis, R.; Binz, P.-A.; Chalkley, R. J.; Cottrell, J.; Creasy, D.; Shofstahl, J.; Seymour, S. L.; Garavelli, J. S. The PSI-MOD Community Standard for Representation of Protein Modification Data. Nat. Biotechnol. 2008, 26 (8), 864– 866, DOI: 10.1038/nbt0808-864Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXps1Wmtr8%253D&md5=7e24cce0f34148ae3ad6a8f424fd2af9The PSI-MOD community standard for representation of protein modification dataMontecchi-Palazzi, Luisa; Beavis, Ron; Binz, Pierre-Alain; Chalkley, Robert J.; Cottrell, John; Creasy, David; Shofstahl, Jim; Seymour, Sean L.; Garavelli, John S.Nature Biotechnology (2008), 26 (8), 864-866CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 22Garavelli, J. S. The RESID Database of Protein Modifications as a Resource and Annotation Tool. Proteomics 2004, 4 (6), 1527– 1533, DOI: 10.1002/pmic.200300777Google Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXkvFGit7s%253D&md5=37574d54b9c617573637ce1b9fb20f53The RESID database of protein modifications as a resource and annotation toolGaravelli, John S.Proteomics (2004), 4 (6), 1527-1533CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The RESID Database of Protein Modifications is a comprehensive collection of annotations and structures for protein modifications and cross-links including pre-, co-, and post-translational modifications. The database provides: systematic and alternate names, at. formulas and masses, enzymic activities that generate the modifications, keywords, literature citations, Gene Ontol. (GO) cross-refs., protein sequence database feature table annotations, structure diagrams, and mol. models. This database is freely accessible on the Internet through resources provided by the European Bioinformatics Institute (http://www.ebi.ac.uk/RESID), and by the National Cancer Institute - Frederick Advanced Biomedical Computing Center (http://www.ncifcrf.gov/RESID). Each RESID Database entry presents a chem. unique modification and shows how that modification is currently annotated in the protein sequence databases, Swiss-Prot and the Protein Information Resource (PIR). The RESID Database provides a table of corresponding equiv. feature annotations that is used in the UniProt project, an international effort to combine the resources of the Swiss-Prot, TrEMBL and PIR. As an annotation tool, the RESID Database is used in standardizing and enhancing modification descriptions in the feature tables of Swiss-Prot entries. As an Internet resource, the RESID Database assists researchers in high-throughput proteomics to search monoisotopic masses and mass differences and identify known and predicted protein modifications.
- 23Creasy, D. M.; Cottrell, J. S. Unimod: Protein Modifications for Mass Spectrometry. PROTEOMICS 2004, 4 (6), 1534– 1536, DOI: 10.1002/pmic.200300744Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXkvFGit7g%253D&md5=d6644c36aed9728f520d688c41e7786eUnimod: Protein modifications for mass spectrometryCreasy, David M.; Cottrell, John S.Proteomics (2004), 4 (6), 1534-1536CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Unimod is a database of protein modifications for use in mass spectrometry applications, esp. protein identification and de novo sequencing. It contains accurate and verifiable values, derived from elemental compns., for the mass differences introduced by both natural and artificial modifications.
- 24Mayer, G. XLMOD: Cross-Linking and Chromatography Derivatization Reagents Ontology. arXiv 2020. DOI: 10.48550/ARXIV.2003.00329 .Google ScholarThere is no corresponding record for this reference.
- 25Orchard, S.; Salwinski, L.; Kerrien, S.; Montecchi-Palazzi, L.; Oesterheld, M.; Stümpflen, V.; Ceol, A.; Chatr-aryamontri, A.; Armstrong, J.; Woollard, P.; Salama, J. J.; Moore, S.; Wojcik, J.; Bader, G. D.; Vidal, M.; Cusick, M. E.; Gerstein, M.; Gavin, A.-C.; Superti-Furga, G.; Greenblatt, J.; Bader, J.; Uetz, P.; Tyers, M.; Legrain, P.; Fields, S.; Mulder, N.; Gilson, M.; Niepmann, M.; Burgoon, L.; De Las Rivas, J.; Prieto, C.; Perreau, V. M.; Hogue, C.; Mewes, H.-W.; Apweiler, R.; Xenarios, I.; Eisenberg, D.; Cesareni, G.; Hermjakob, H. The Minimum Information Required for Reporting a Molecular Interaction Experiment (MIMIx). Nat. Biotechnol. 2007, 25 (8), 894– 898, DOI: 10.1038/nbt1324Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXos12ntLY%253D&md5=2e976e0d5b01e8c57d9217b001c6f5a8The minimum information required for reporting a molecular interaction experiment (MIMIx)Orchard, Sandra; Salwinski, Lukasz; Kerrien, Samuel; Montecchi-Palazzi, Luisa; Oesterheld, Matthias; Stuempflen, Volker; Ceol, Arnaud; Chatr-aryamontri, Andrew; Armstrong, John; Woollard, Peter; Salama, John J.; Moore, Susan; Wojcik, Jerome; Bader, Gary D.; Vidal, Marc; Cusick, Michael E.; Gerstein, Mark; Gavin, Anne-Claude; Superti-Furga, Giulio; Greenblatt, Jack; Bader, Joel; Uetz, Peter; Tyers, Mike; Legrain, Pierre; Fields, Stan; Mulder, Nicola; Gilson, Michael; Niepmann, Michael; Burgoon, Lyle; De Las Rivas, Javier; Prieto, Carlos; Perreau, Victoria M.; Hogue, Chris; Mewes, Hans-Werner; Apweiler, Rolf; Xenarios, Ioannis; Eisenberg, David; Cesareni, Gianni; Hermjakob, HenningNature Biotechnology (2007), 25 (8), 894-898CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)A wealth of mol. interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the expt. Here we propose MIMIx, the min. information required for reporting a mol. interaction expt. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of mol. interaction data in public databases, thereby improving access to valuable interaction data.
- 26Taylor, C. F.; Paton, N. W.; Lilley, K. S.; Binz, P.-A.; Julian, R. K.; Jones, A. R.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E. W.; Dunn, M. J.; Heck, A. J. R.; Leitner, A.; Macht, M.; Mann, M.; Martens, L.; Neubert, T. A.; Patterson, S. D.; Ping, P.; Seymour, S. L.; Souda, P.; Tsugita, A.; Vandekerckhove, J.; Vondriska, T. M.; Whitelegge, J. P.; Wilkins, M. R.; Xenarios, I.; Yates, J. R.; Hermjakob, H. The Minimum Information about a Proteomics Experiment (MIAPE). Nat. Biotechnol. 2007, 25 (8), 887– 893, DOI: 10.1038/nbt1329Google Scholar26https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXos12ntb8%253D&md5=faede75793f0c7cb95452d64490d3efcThe minimum information about a proteomics experiment (MIAPE)Taylor, Chris F.; Paton, Norman W.; Lilley, Kathryn S.; Binz, Pierre-Alain; Julian, Randall K., Jr.; Jones, Andrew R.; Zhu, Weimin; Apweiler, Rolf; Aebersold, Ruedi; Deutsch, Eric W.; Dunn, Michael J.; Heck, Albert J. R.; Leitner, Alexander; Macht, Marcus; Mann, Matthias; Martens, Lennart; Neubert, Thomas A.; Patterson, Scott D.; Ping, Peipei; Seymour, Sean L.; Souda, Puneet; Tsugita, Akira; Vandekerckhove, Joel; Vondriska, Thomas M.; Whitelegge, Julian P.; Wilkins, Marc R.; Xenarios, Ioannnis; Yates, John R., III; Hermjakob, HenningNature Biotechnology (2007), 25 (8), 887-893CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)Both the generation and the anal. of proteomics data are now widespread, and high-throughput approaches are commonplace. Protocols continue to increase in complexity as methods and technologies evolve and diversify. To encourage the standardized collection, integration, storage and dissemination of proteomics data, the Human Proteome Organization's Proteomics Stds. Initiative develops guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry. This paper describes the processes and principles underpinning the development of these modules; discusses the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector; addresses the issue of overlap with other reporting guidelines; and highlights the criticality of appropriate tools and resources in enabling 'MIAPE-compliant' reporting.
- 27Brazma, A.; Hingamp, P.; Quackenbush, J.; Sherlock, G.; Spellman, P.; Stoeckert, C.; Aach, J.; Ansorge, W.; Ball, C. A.; Causton, H. C.; Gaasterland, T.; Glenisson, P.; Holstege, F. C.; Kim, I. F.; Markowitz, V.; Matese, J. C.; Parkinson, H.; Robinson, A.; Sarkans, U.; Schulze-Kremer, S.; Stewart, J.; Taylor, R.; Vilo, J.; Vingron, M. Minimum Information about a Microarray Experiment (MIAME)-toward Standards for Microarray Data. Nat. Genet. 2001, 29 (4), 365– 371, DOI: 10.1038/ng1201-365Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXovFamurw%253D&md5=0543844eb34561de34c8b03d271f0998Minimum information about a microarray experiment (MIAME)-toward standards for microarray dataBrazma, Alvis; Hingamp, Pascal; Quackenbush, John; Sherlock, Gavin; Spellman, Paul; Stoeckert, Chris; Aach, John; Ansorge, Wilhelm; Ball, Catherine A.; Causton, Helen C.; Gaasterland, Terry; Glenisson, Patrick; Holstege, Frank C. P.; Kim, Irene F.; Markowitz, Victor; Matese, John C.; Parkinson, Helen; Robinson, Alan; Sarkans, Ugis; Schulze-Kremer, Steffen; Stewart, Jason; Taylor, Ronald; Vilo, Jaak; Vingron, MartinNature Genetics (2001), 29 (4), 365-371CODEN: NGENEC; ISSN:1061-4036. (Nature America Inc.)Microarray anal. has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of stds. for presenting and exchanging such data. Here we present a proposal, the Min. Information About a Microarray Expt. (MIAME), that describes the min. information required to ensure that microarray data can be easily interpreted and that results derived from its anal. can be independently verified. The ultimate goal of this work is to establish a std. for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data anal. tools. With respect to MIAME, we conc. on defining the content and structure of the necessary information rather than the tech. format for capturing it.
- 28Jones, A. R.; Carroll, K.; Knight, D.; Maclellan, K.; Domann, P. J.; Legido-Quigley, C.; Huang, L.; Smallshaw, L.; Mirzaei, H.; Shofstahl, J.; Paton, N. W. Minimum Information About a Proteomics Experiment (MIAPE). Guidelines for Reporting the Use of Column Chromatography in Proteomics. Nat. Biotechnol. 2010, 28 (7), 654, DOI: 10.1038/nbt0710-654aGoogle Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXos1ahsLY%253D&md5=3c6adb12ba976a6e4608b7d948cedc69Guidelines for reporting the use of column chromatography in proteomicsJones, Andrew R.; Carroll, Kathleen; Knight, David; MacLellan, Kirsty; Domann, Paula J.; Legido-Quigley, Cristina; Huang, Lihua; Smallshaw, Lance; Mirzaei, Hamid; Shofstahl, James; Paton, Norman W.Nature Biotechnology (2010), 28 (7), 654CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 29Taylor, C. F.; Binz, P.-A.; Aebersold, R.; Affolter, M.; Barkovich, R.; Deutsch, E. W.; Horn, D. M.; Hühmer, A.; Kussmann, M.; Lilley, K.; Macht, M.; Mann, M.; Müller, D.; Neubert, T. A.; Nickson, J.; Patterson, S. D.; Raso, R.; Resing, K.; Seymour, S. L.; Tsugita, A.; Xenarios, I.; Zeng, R.; Julian, R. K. Guidelines for Reporting the Use of Mass Spectrometry in Proteomics. Nat. Biotechnol. 2008, 26 (8), 860– 861, DOI: 10.1038/nbt0808-860Google Scholar29https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXps1Wls70%253D&md5=d1724d10357d9d50c6bc02dcce40e797Guidelines for reporting the use of mass spectrometry in proteomicsTaylor, Chris F.; Binz, Pierre-Alain; Aebersold, Ruedi; Affolter, Michel; Barkovich, Robert; Deutsch, Eric W.; Horn, David M.; Huehmer, Andreas; Kussmann, Martin; Lilley, Kathryn; Macht, Marcus; Mann, Matthias; Mueller, Dieter; Neubert, Thomas A.; Nickson, Janice; Patterson, Scott D.; Raso, Roberto; Resing, Kathryn; Seymour, Sean L.; Tsugita, Akira; Xenarios, Ioannis; Zeng, Rong; Julian, Randall K., Jr.Nature Biotechnology (2008), 26 (8), 860-861CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 30Binz, P.-A.; Barkovich, R.; Beavis, R. C.; Creasy, D.; Horn, D. M.; Julian, R. K.; Seymour, S. L.; Taylor, C. F.; Vandenbrouck, Y. Guidelines for Reporting the Use of Mass Spectrometry Informatics in Proteomics. Nat. Biotechnol. 2008, 26 (8), 862, DOI: 10.1038/nbt0808-862Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXps1Wmtbw%253D&md5=1f766e4e26eb42fdb2c931acdb517c3aGuidelines for reporting the use of mass spectrometry informatics in proteomicsBinz, Pierre-Alain; Barkovich, Robert; Beavis, Ronald C.; Creasy, David; Horn, David M.; Julian, Randall K., Jr.; Seymour, Sean L.; Taylor, Chris F.; Vandenbrouck, YvesNature Biotechnology (2008), 26 (8), 862CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 31Martínez-Bartolomé, S.; Deutsch, E. W.; Binz, P.-A.; Jones, A. R.; Eisenacher, M.; Mayer, G.; Campos, A.; Canals, F.; Bech-Serra, J.-J.; Carrascal, M.; Gay, M.; Paradela, A.; Navajas, R.; Marcilla, M.; Hernáez, M. L.; Gutiérrez-Blázquez, M. D.; Velarde, L. F. C.; Aloria, K.; Beaskoetxea, J.; Medina-Aunon, J. A.; Albar, J. P. Guidelines for Reporting Quantitative Mass Spectrometry Based Experiments in Proteomics. J. Proteomics 2013, 95, 84– 88, DOI: 10.1016/j.jprot.2013.02.026Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXlvVCqtLc%253D&md5=c0ac95f3ea5d455114ab3f74abb4f449Guidelines for reporting quantitative mass spectrometry based experiments in proteomicsMartinez-Bartolome, Salvador; Deutsch, Eric W.; Binz, Pierre-Alain; Jones, Andrew R.; Eisenacher, Martin; Mayer, Gerhard; Campos, Alex; Canals, Francesc; Bech-Serra, Joan-Josep; Carrascal, Montserrat; Gay, Marina; Paradela, Alberto; Navajas, Rosana; Marcilla, Miguel; Hernaez, Maria Luisa; Gutierrez-Blazquez, Maria Dolores; Velarde, Luis Felipe Clemente; Aloria, Kerman; Beaskoetxea, Jabier; Medina-Aunon, J. Alberto; Albar, Juan P.Journal of Proteomics (2013), 95 (), 84-88CODEN: JPORFQ; ISSN:1874-3919. (Elsevier B.V.)A review. Mass spectrometry is already a well-established protein identification tool and recent methodol. and technol. developments have also made possible the extn. of quant. data of protein abundance in large-scale studies. Several strategies for abs. and relative quant. proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data anal. workflows. The guidelines for Mass Spectrometry Quantification allow the description of a wide range of quant. approaches, including labeled and label-free techniques and also targeted approaches such as Selected Reaction Monitoring (SRM). The HUPO Proteomics Stds. Initiative (HUPO-PSI) has invested considerable efforts to improve the standardization of proteomics data handling, representation and sharing through the development of data stds., reporting guidelines, controlled vocabularies and tooling. In this manuscript, we describe a key output from the HUPO-PSI-namely the MIAPE Quant guidelines, which have developed in parallel with the corresponding data exchange format mzQuantML [1]. The MIAPE Quant guidelines describe the HUPO-PSI proposal concerning the min. information to be reported when a quant. data set, derived from mass spectrometry (MS), is submitted to a database or as supplementary information to a journal. The guidelines have been developed with input from a broad spectrum of stakeholders in the proteomics field to represent a true consensus view of the most important data types and metadata, required for a quant. expt. to be analyzed critically or a data anal. pipeline to be reproduced. It is anticipated that they will influence or be directly adopted as part of journal guidelines for publication and by public proteomics databases and thus may have an impact on proteomics labs. across the world. This article is part of a Special Issue entitled: Standardization and Quality Control.
- 32Medina-Aunon, J. A.; Martínez-Bartolomé, S.; López-García, M. A.; Salazar, E.; Navajas, R.; Jones, A. R.; Paradela, A.; Albar, J. P. The ProteoRed MIAPE Web Toolkit: A User-Friendly Framework to Connect and Share Proteomics Standards. Mol. Cell Proteomics 2011, 10 (10), M111.008334, DOI: 10.1074/mcp.M111.008334Google ScholarThere is no corresponding record for this reference.
- 33Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y.-K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J. Proteome Res. 2016, 15 (11), 3961– 3970, DOI: 10.1021/acs.jproteome.6b00392Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1ynt73P&md5=fbe3fa339c4866915db29018f407ced1Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1Deutsch, Eric W.; Overall, Christopher M.; Van Eyk, Jennifer E.; Baker, Mark S.; Paik, Young-Ki; Weintraub, Susan T.; Lane, Lydie; Martens, Lennart; Vandenbrouck, Yves; Kusebauch, Ulrike; Hancock, William S.; Hermjakob, Henning; Aebersold, Ruedi; Moritz, Robert L.; Omenn, Gilbert S.Journal of Proteome Research (2016), 15 (11), 3961-3970CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)Every data-rich community research effort requires a clear plan for ensuring the quality of the data interpretation and comparability of analyses. To address this need within the Human Proteome Project (HPP) of the Human Proteome Organization (HUPO), we have developed through broad consultation a set of mass spectrometry data interpretation guidelines that should be applied to all HPP data contributions. For submission of manuscripts reporting HPP protein identification results, the guidelines are presented as a one-page checklist contg. fifteen essential points followed by two pages of expanded description of each. Here, we present an overview of the guidelines and provide an in-depth description of each of the fifteen elements to facilitate understanding of the intentions and rationale behind the guidelines, both for authors and for reviewers. Broadly, these guidelines provide specific directions regarding how HPP data are to be submitted to mass spectrometry data repositories, how error anal. should be presented, and how detection of novel proteins should be supported with addnl. confirmatory evidence. These guidelines, developed by the HPP community, are presented to the broader scientific community for further discussion.
- 34Deutsch, E. W.; Lane, L.; Overall, C. M.; Bandeira, N.; Baker, M. S.; Pineau, C.; Moritz, R. L.; Corrales, F.; Orchard, S.; Van Eyk, J. E.; Paik, Y.-K.; Weintraub, S. T.; Vandenbrouck, Y.; Omenn, G. S. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0. J. Proteome Res. 2019, 18 (12), 4108– 4116, DOI: 10.1021/acs.jproteome.9b00542Google Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3MnmsFyntQ%253D%253D&md5=fd594de6ff3b72e43086a537c0d65088Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0Deutsch Eric W; Moritz Robert L; Omenn Gilbert S; Lane Lydie; Overall Christopher M; Bandeira Nuno; Baker Mark S; Pineau Charles; Corrales Fernando; Orchard Sandra; Van Eyk Jennifer E; Paik Young-Ki; Weintraub Susan T; Vandenbrouck Yves; Omenn Gilbert SJournal of proteome research (2019), 18 (12), 4108-4116 ISSN:.The Human Proteome Organization's (HUPO) Human Proteome Project (HPP) developed Mass Spectrometry (MS) Data Interpretation Guidelines that have been applied since 2016. These guidelines have helped ensure that the emerging draft of the complete human proteome is highly accurate and with low numbers of false-positive protein identifications. Here, we describe an update to these guidelines based on consensus-reaching discussions with the wider HPP community over the past year. The revised 3.0 guidelines address several major and minor identified gaps. We have added guidelines for emerging data independent acquisition (DIA) MS workflows and for use of the new Universal Spectrum Identifier (USI) system being developed by the HUPO Proteomics Standards Initiative (PSI). In addition, we discuss updates to the standard HPP pipeline for collecting MS evidence for all proteins in the HPP, including refinements to minimum evidence. We present a new plan for incorporating MassIVE-KB into the HPP pipeline for the next (HPP 2020) cycle in order to obtain more comprehensive coverage of public MS data sets. The main checklist has been reorganized under headings and subitems, and related guidelines have been grouped. In sum, Version 2.1 of the HPP MS Data Interpretation Guidelines has served well, and this timely update to version 3.0 will aid the HPP as it approaches its goal of collecting and curating MS evidence of translation and expression for all predicted ∼20 000 human proteins encoded by the human genome.
- 35Kerrien, S.; Orchard, S.; Montecchi-Palazzi, L.; Aranda, B.; Quinn, A. F.; Vinod, N.; Bader, G. D.; Xenarios, I.; Wojcik, J.; Sherman, D.; Tyers, M.; Salama, J. J.; Moore, S.; Ceol, A.; Chatr-Aryamontri, A.; Oesterheld, M.; Stümpflen, V.; Salwinski, L.; Nerothin, J.; Cerami, E.; Cusick, M. E.; Vidal, M.; Gilson, M.; Armstrong, J.; Woollard, P.; Hogue, C.; Eisenberg, D.; Cesareni, G.; Apweiler, R.; Hermjakob, H. Broadening the Horizon-Level 2.5 of the HUPO-PSI Format for Molecular Interactions. BMC Biol. 2007, 5, 44, DOI: 10.1186/1741-7007-5-44Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1c%252Fht1Wmsg%253D%253D&md5=b9e21e38831627d03c9f0a7404229da6Broadening the horizon--level 2.5 of the HUPO-PSI format for molecular interactionsKerrien Samuel; Orchard Sandra; Montecchi-Palazzi Luisa; Aranda Bruno; Quinn Antony F; Vinod Nisha; Bader Gary D; Xenarios Ioannis; Wojcik Jerome; Sherman David; Tyers Mike; Salama John J; Moore Susan; Ceol Arnaud; Chatr-Aryamontri Andrew; Oesterheld Matthias; Stumpflen Volker; Salwinski Lukasz; Nerothin Jason; Cerami Ethan; Cusick Michael E; Vidal Marc; Gilson Michael; Armstrong John; Woollard Peter; Hogue Christopher; Eisenberg David; Cesareni Gianni; Apweiler Rolf; Hermjakob HenningBMC biology (2007), 5 (), 44 ISSN:.BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
- 36Sivade Dumousseau, M.; Alonso-López, D.; Ammari, M.; Bradley, G.; Campbell, N. H.; Ceol, A.; Cesareni, G.; Combe, C.; De Las Rivas, J.; Del-Toro, N.; Heimbach, J.; Hermjakob, H.; Jurisica, I.; Koch, M.; Licata, L.; Lovering, R. C.; Lynn, D. J.; Meldal, B. H. M.; Micklem, G.; Panni, S.; Porras, P.; Ricard-Blum, S.; Roechert, B.; Salwinski, L.; Shrivastava, A.; Sullivan, J.; Thierry-Mieg, N.; Yehudi, Y.; Van Roey, K.; Orchard, S. Encompassing New Use Cases - Level 3.0 of the HUPO-PSI Format for Molecular Interactions. BMC Bioinformatics 2018, 19 (1), 134, DOI: 10.1186/s12859-018-2118-1Google Scholar36https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MjhsVGisQ%253D%253D&md5=74a8c33671263fdb89ef14d631d64e0bEncompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactionsSivade Dumousseau M; Del-Toro N; Hermjakob H; Koch M; Meldal B H M; Porras P; Shrivastava A; Orchard S; Alonso-Lopez D; De Las Rivas J; Ammari M; Bradley G; Campbell N H; Lovering R C; Ceol A; Cesareni G; Licata L; Combe C; Heimbach J; Micklem G; Sullivan J; Yehudi Y; Heimbach J; Micklem G; Sullivan J; Yehudi Y; Hermjakob H; Jurisica I; Jurisica I; Lynn D J; Lynn D J; Panni S; Ricard-Blum S; Roechert B; Salwinski L; Thierry-Mieg N; Van Roey KBMC bioinformatics (2018), 19 (1), 134 ISSN:.BACKGROUND: Systems biologists study interaction data to understand the behaviour of whole cell systems, and their environment, at a molecular level. In order to effectively achieve this goal, it is critical that researchers have high quality interaction datasets available to them, in a standard data format, and also a suite of tools with which to analyse such data and form experimentally testable hypotheses from them. The PSI-MI XML standard interchange format was initially published in 2004, and expanded in 2007 to enable the download and interchange of molecular interaction data. PSI-XML2.5 was designed to describe experimental data and to date has fulfilled this basic requirement. However, new use cases have arisen that the format cannot properly accommodate. These include data abstracted from more than one publication such as allosteric/cooperative interactions and protein complexes, dynamic interactions and the need to link kinetic and affinity data to specific mutational changes. RESULTS: The Molecular Interaction workgroup of the HUPO-PSI has extended the existing, well-used XML interchange format for molecular interaction data to meet new use cases and enable the capture of new data types, following extensive community consultation. PSI-MI XML3.0 expands the capabilities of the format beyond simple experimental data, with a concomitant update of the tool suite which serves this format. The format has been implemented by key data producers such as the International Molecular Exchange (IMEx) Consortium of protein interaction databases and the Complex Portal. CONCLUSIONS: PSI-MI XML3.0 has been developed by the data producers, data users, tool developers and database providers who constitute the PSI-MI workgroup. This group now actively supports PSI-MI XML2.5 as the main interchange format for experimental data, PSI-MI XML3.0 which additionally handles more complex data types, and the simpler, tab-delimited MITAB2.5, 2.6 and 2.7 for rapid parsing and download.
- 37Sivade Dumousseau, M.; Koch, M.; Shrivastava, A.; Alonso-López, D.; De Las Rivas, J.; Del-Toro, N.; Combe, C. W.; Meldal, B. H. M.; Heimbach, J.; Rappsilber, J.; Sullivan, J.; Yehudi, Y.; Orchard, S. JAMI: A Java Library for Molecular Interactions and Data Interoperability. BMC Bioinformatics 2018, 19 (1), 133, DOI: 10.1186/s12859-018-2119-0Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MjhsVGiug%253D%253D&md5=c5b9d0ca769653a65671dfce1219eccfJAMI: a Java library for molecular interactions and data interoperabilitySivade Dumousseau M; Koch M; Shrivastava A; Del-Toro N; Meldal B H M; Orchard S; Alonso-Lopez D; De Las Rivas J; Combe C W; Rappsilber J; Heimbach J; Sullivan J; Yehudi Y; Heimbach J; Sullivan J; Yehudi Y; Rappsilber JBMC bioinformatics (2018), 19 (1), 133 ISSN:.BACKGROUND: A number of different molecular interactions data download formats now exist, designed to allow access to these valuable data by diverse user groups. These formats include the PSI-XML and MITAB standard interchange formats developed by Molecular Interaction workgroup of the HUPO-PSI in addition to other, use-specific downloads produced by other resources. The onus is currently on the user to ensure that a piece of software is capable of read/writing all necessary versions of each format. This problem may increase, as data providers strive to meet ever more sophisticated user demands and data types. RESULTS: A collaboration between EMBL-EBI and the University of Cambridge has produced JAMI, a single library to unify standard molecular interaction data formats such as PSI-MI XML and PSI-MITAB. The JAMI free, open-source library enables the development of molecular interaction computational tools and pipelines without the need to produce different versions of software to read different versions of the data formats. CONCLUSION: Software and tools developed on top of the JAMI framework are able to integrate and support both PSI-MI XML and PSI-MITAB. The use of JAMI avoids the requirement to chain conversions between formats in order to reach a desired output format and prevents code and unit test duplication as the code becomes more modular. JAMI's model interfaces are abstracted from the underlying format, hiding the complexity and requirements of each data format from developers using JAMI as a library.
- 38Shah, A. R.; Davidson, J.; Monroe, M. E.; Mayampurath, A. M.; Danielson, W. F.; Shi, Y.; Robinson, A. C.; Clowers, B. H.; Belov, M. E.; Anderson, G. A.; Smith, R. D. An Efficient Data Format for Mass Spectrometry-Based Proteomics. J. Am. Soc. Mass Spectrom. 2010, 21 (10), 1784– 1788, DOI: 10.1016/j.jasms.2010.06.014Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1Wmt7fO&md5=28a7abcdf4e375b6f65f0b4672fe4ea3An Efficient Data Format for Mass Spectrometry-Based ProteomicsShah, Anuj R.; Davidson, Jennifer; Monroe, Matthew E.; Mayampurath, Anoop M.; Danielson, William F.; Shi, Yan; Robinson, Aaron C.; Clowers, Brian H.; Belov, Mikhail E.; Anderson, Gordon A.; Smith, Richard D.Journal of the American Society for Mass Spectrometry (2010), 21 (10), 1784-1788CODEN: JAMSEF; ISSN:1044-0305. (Elsevier B.V.)The diverse range of mass spectrometry (MS) instrumentation along with corresponding proprietary and nonproprietary data formats has generated a proteomics community driven call for a standardized format to facilitate management, processing, storing, visualization, and exchange of both exptl. and processed data. To date, significant efforts have been extended towards standardizing XML-based formats for mass spectrometry data representation, despite the recognized inefficiencies assocd. with storing large numeric datasets in XML. The proteomics community has periodically entertained alternate strategies for data exchange, e.g., using a common application programming interface or a database-derived format. However, these efforts have yet to gain significant attention, mostly because they have not demonstrated significant performance benefits over existing stds., but also due to issues such as extensibility to multidimensional sepn. systems, robustness of operation, and incomplete or mismatched vocabulary. Here, the authors describe a format based on std. database principles that offers multiple benefits over existing formats in terms of storage size, ease of processing, data retrieval times, and extensibility to accommodate multidimensional sepn. systems.
- 39Wilhelm, M.; Kirchner, M.; Steen, J. A. J.; Steen, H. Mz5: Space- and Time-Efficient Storage of Mass Spectrometry Data Sets. Mol. Cell Proteomics 2012, 11 (1), O111.011379, DOI: 10.1074/mcp.O111.011379Google ScholarThere is no corresponding record for this reference.
- 40Bouyssié, D.; Dubois, M.; Nasso, S.; Gonzalez de Peredo, A.; Burlet-Schiltz, O.; Aebersold, R.; Monsarrat, B. MzDB: A File Format Using Multiple Indexing Strategies for the Efficient Analysis of Large LC-MS/MS and SWATH-MS Data Sets. Mol. Cell Proteomics 2015, 14 (3), 771– 781, DOI: 10.1074/mcp.O114.039115Google Scholar40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXjslCmt7w%253D&md5=314045afe89ba78463cec6923792fb51mzDB: A File Format Using Multiple Indexing Strategies for the Efficient Analysis of Large LC-MS/MS and SWATH-MS Data SetsBouyssie, David; Dubois, Marc; Nasso, Sara; Gonzalez de Peredo, Anne; Burlet-Schiltz, Odile; Aebersold, Ruedi; Monsarrat, BernardMolecular & Cellular Proteomics (2015), 14 (3), 771-781CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)In comparison with XML formats, mzDB saves ∼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data anal. pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.
- 41Wang, J.; Lu, M.; Wang, R.; An, S.; Xie, C.; Yu, C. StackZDPD: A Novel Encoding Scheme for Mass Spectrometry Data Optimized for Speed and Compression Ratio. Sci. Rep 2022, 12 (1), 5384, DOI: 10.1038/s41598-022-09432-1Google Scholar41https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xoslyqu7k%253D&md5=2e89701d84f19a852e29c3675e0e81cfStackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratioWang, Jinyin; Lu, Miaoshan; Wang, Ruiming; An, Shaowei; Xie, Cong; Yu, ChangbinScientific Reports (2022), 12 (1), 5384CODEN: SRCEC3; ISSN:2045-2322. (Nature Portfolio)Abstr.: As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various anal. platforms due to its sheer vol. of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed soln. guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient. Under the premise of ensuring the efficiency of decompression, we propose an encoding scheme "Stack-ZDPD" that is optimized for storage of raw MS data, designed for the format "Aird", a computation-oriented format with fast accessing and decoding time, where the core compression algorithm is "ZDPD". Stack-ZDPD reduces the vol. of data stored in mzML format by around 80% or more, depending on the data acquisition pattern, and the compression ratio is approx. 30% compared to ZDPD for data generated using Time of Flight technol. Our approach is available on AirdPro, for file conversion and the Java-API Aird-SDK, for data parsing.
- 42Schramm, T.; Hester, Z.; Klinkert, I.; Both, J.-P.; Heeren, R. M. A.; Brunelle, A.; Laprévote, O.; Desbenoit, N.; Robbe, M.-F.; Stoeckli, M.; Spengler, B.; Römpp, A. ImzML-a Common Data Format for the Flexible Exchange and Processing of Mass Spectrometry Imaging Data. J. Proteomics 2012, 75 (16), 5106– 5110, DOI: 10.1016/j.jprot.2012.07.026Google Scholar42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtFGrs7bL&md5=51588487986e21e83d21ce7be53916abimzML - A common data format for the flexible exchange and processing of mass spectrometry imaging dataSchramm, Thorsten; Hester, Alfons; Klinkert, Ivo; Both, Jean-Pierre; Heeren, Ron M. A.; Brunelle, Alain; Laprevote, Olivier; Desbenoit, Nicolas; Robbe, Marie-France; Stoeckli, Markus; Spengler, Bernhard; Roempp, AndreasJournal of Proteomics (2012), 75 (16), 5106-5110CODEN: JPORFQ; ISSN:1874-3919. (Elsevier B.V.)The application of mass spectrometry imaging (MS imaging) is rapidly growing with a constantly increasing no. of different instrumental systems and software tools. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data anal. software. imzML data is divided in two files which are linked by a universally unique identifier (UUID). Exptl. details are stored in an XML file which is based on the HUPO-PSI format mzML. Information is provided in the form of a 'controlled vocabulary' (CV) in order to unequivocally describe the parameters and to avoid redundancy in nomenclature. Mass spectral data are stored in a binary file in order to allow efficient storage. imzML is supported by a growing no. of software tools. Users will be no longer limited to proprietary software, but are able to use the processing software best suited for a specific question or application. MS imaging data from different instruments can be converted to imzML and displayed with identical parameters in one software package for easier comparison. All tech. details necessary to implement imzML and addnl. background information is available at www.imzml.org.This article is part of a Special Issue entitled: Imaging Mass Spectrometry: A User's Guide to a New Technique for Biol. and Biomedical Research.
- 43Bhamber, R. S.; Jankevics, A.; Deutsch, E. W.; Jones, A. R.; Dowsey, A. W. MzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant MzML and Optimized for Speed and Storage Requirements. J. Proteome Res. 2021, 20 (1), 172– 183, DOI: 10.1021/acs.jproteome.0c00192Google Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhs12rsbzN&md5=ee85faa3c192cb6d1a15c618132b579cmzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage RequirementsBhamber, Ranjeet S.; Jankevics, Andris; Deutsch, Eric W.; Jones, Andrew R.; Dowsey, Andrew W.Journal of Proteome Research (2021), 20 (1), 172-183CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)With ever-increasing amts. of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer vol. of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data anal. pipelines. The Proteomics Stds. Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format 'mzMLb' that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb's design adheres to both HDF5 and NetCDF4 std. implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A ref. implementation within the established ProteoWizard toolkit is provided.
- 44Jones, A. R.; Eisenacher, M.; Mayer, G.; Kohlbacher, O.; Siepen, J.; Hubbard, S. J.; Selley, J. N.; Searle, B. C.; Shofstahl, J.; Seymour, S. L.; Julian, R.; Binz, P.-A.; Deutsch, E. W.; Hermjakob, H.; Reisinger, F.; Griss, J.; Vizcaíno, J. A.; Chambers, M.; Pizarro, A.; Creasy, D. The MzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results. Mol. Cell Proteomics 2012, 11 (7), M111.014381, DOI: 10.1074/mcp.M111.014381Google ScholarThere is no corresponding record for this reference.
- 45Vizcaíno, J. A.; Mayer, G.; Perkins, S.; Barsnes, H.; Vaudel, M.; Perez-Riverol, Y.; Ternent, T.; Uszkoreit, J.; Eisenacher, M.; Fischer, L.; Rappsilber, J.; Netz, E.; Walzer, M.; Kohlbacher, O.; Leitner, A.; Chalkley, R. J.; Ghali, F.; Martínez-Bartolomé, S.; Deutsch, E. W.; Jones, A. R. The MzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics. Mol. Cell Proteomics 2017, 16 (7), 1275– 1285, DOI: 10.1074/mcp.M117.068429Google Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFSisr%252FP&md5=5479a48ddd9534a2dfca98dd3cadd06bThe mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome InformaticsVizcaino, Juan Antonio; Mayer, Gerhard; Perkins, Simon; Barsnes, Harald; Vaudel, Marc; Perez-Riverol, Yasset; Ternent, Tobias; Uszkoreit, Julian; Eisenacher, Martin; Fischer, Lutz; Rappsilber, Juri; Netz, Eugen; Walzer, Mathias; Kohlbacher, Oliver; Leitner, Alexander; Chalkley, Robert J.; Ghali, Fawaz; Martinez-Bartolome, Salvador; Deutsch, Eric W.; Jones, Andrew R.Molecular & Cellular Proteomics (2017), 16 (7), 1275-1285CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)The first stable version of the Proteomics Stds. Initiative mzIdentML open data std. (version 1.1) was published in 2012-capturing the outputs of peptide and protein identification software. In the intervening years, the std. has become well-supported in both com. and open software, as well as a submission and download format for public repositories. Here we report a new release of mzIdentML (version 1.2) that is required to keep pace with emerging practice in proteome informatics. New features have been added to support: (1) scores assocd. with localization of modifications on peptides; (2) statistics performed at the level of peptides; (3) identification of crosslinked peptides; and (4) support for proteogenomics approaches. In addn., there is now improved support for the encoding of de novo sequencing of peptides, spectral library searches, and protein inference. As a key point, the underlying XML schema has only undergone very minor modifications to simplify as much as possible the transition from version 1.1 to version 1.2 for implementers, but there have been several notable updates to the format specification, implementation guidelines, controlled vocabularies and validation software. MzIdentML 1.2 can be described as backwards compatible, in that reading software designed for mzIdentML 1.1 should function in most cases without adaptation. We anticipate that these developments will provide a continued stable base for software teams working to implement the std.
- 46Griss, J.; Jones, A. R.; Sachsenberg, T.; Walzer, M.; Gatto, L.; Hartler, J.; Thallinger, G. G.; Salek, R. M.; Steinbeck, C.; Neuhauser, N.; Cox, J.; Neumann, S.; Fan, J.; Reisinger, F.; Xu, Q.-W.; Del Toro, N.; Pérez-Riverol, Y.; Ghali, F.; Bandeira, N.; Xenarios, I.; Kohlbacher, O.; Vizcaíno, J. A.; Hermjakob, H. The MzTab Data Exchange Format: Communicating Mass-Spectrometry-Based Proteomics and Metabolomics Experimental Results to a Wider Audience. Mol. Cell Proteomics 2014, 13 (10), 2765– 2775, DOI: 10.1074/mcp.O113.036681Google Scholar46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhs1Knsb3L&md5=6011bcd94723d4a1507360d6459a4ff0The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider AudienceGriss, Johannes; Jones, Andrew R.; Sachsenberg, Timo; Walzer, Mathias; Gatto, Laurent; Hartler, Jurgen; Thallinger, Gerhard G.; Salek, Reza M.; Steinbeck, Christoph; Neuhauser, Nadin; Cox, Jurgen; Neumann, Steffen; Fan, Jun; Reisinger, Florian; Xu, Qing-Wei; del Toro, Noemi; Perez-Riverol, Yasset; Ghali, Fawaz; Bandeira, Nuno; Xenarios, Ioannis; Kohlbacher, Oliver; Vizcaino, Juan Antonio; Hermjakob, HenningMolecular & Cellular Proteomics (2014), 13 (10), 2765-2775CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. MzTab is intended as a lightwt. supplement to the existing std. XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. MzTab files can contain protein, peptide, and small mol. identifications together with exptl. metadata and basic quant. information. The format is not intended to store the complete exptl. evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the exptl. design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biol. community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive addnl. documentation can be found online.
- 47Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis 1999, 20 (18), 3551– 3567, DOI: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2Google Scholar47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXhtF2ntw%253D%253D&md5=ce7124df36d12a7fe26f05f0f264d0efProbability-based protein identification by searching sequence databases using mass spectrometry dataPerkins, David N.; Pappin, Darryl J. C.; Creasy, David M.; Cottrell, John S.Electrophoresis (1999), 20 (18), 3551-3567CODEN: ELCTDN; ISSN:0173-0835. (Wiley-VCH Verlag GmbH)Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the exptl. data are peptide mol. wts. from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a no. of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homol. (iii) Search parameters can be readily optimized by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.
- 48Röst, H. L.; Sachsenberg, T.; Aiche, S.; Bielow, C.; Weisser, H.; Aicheler, F.; Andreotti, S.; Ehrlich, H.-C.; Gutenbrunner, P.; Kenar, E.; Liang, X.; Nahnsen, S.; Nilse, L.; Pfeuffer, J.; Rosenberger, G.; Rurik, M.; Schmitt, U.; Veit, J.; Walzer, M.; Wojnar, D.; Wolski, W. E.; Schilling, O.; Choudhary, J. S.; Malmström, L.; Aebersold, R.; Reinert, K.; Kohlbacher, O. OpenMS: A Flexible Open-Source Software Platform for Mass Spectrometry Data Analysis. Nat. Methods 2016, 13 (9), 741– 748, DOI: 10.1038/nmeth.3959Google Scholar48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1ejtLrF&md5=6185e304e7a051764414f932c0c266aaOpenMS: a flexible open-source software platform for mass spectrometry data analysisRost, Hannes L.; Sachsenberg, Timo; Aiche, Stephan; Bielow, Chris; Weisser, Hendrik; Aicheler, Fabian; Andreotti, Sandro; Ehrlich, Hans-Christian; Gutenbrunner, Petra; Kenar, Erhan; Liang, Xiao; Nahnsen, Sven; Nilse, Lars; Pfeuffer, Julianus; Rosenberger, George; Rurik, Marc; Schmitt, Uwe; Veit, Johannes; Walzer, Mathias; Wojnar, David; Wolski, Witold E.; Schilling, Oliver; Choudhary, Jyoti S.; Malmstrom, Lars; Aebersold, Ruedi; Reinert, Knut; Kohlbacher, OliverNature Methods (2016), 13 (9), 741-748CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)High-resoln. mass spectrometry (MS) has become an important tool in the life sciences, contributing to the diagnosis and understanding of human diseases, elucidating biomol. structural information and characterizing cellular signaling networks. However, the rapid growth in the vol. and complexity of MS data makes transparent, accurate and reproducible anal. difficult. We present OpenMS 2.0 (http://www.openms.de), a robust, open-source, cross-platform software specifically designed for the flexible and reproducible anal. of high-throughput MS data. The extensible OpenMS software implements common mass spectrometric data processing tasks through a well-defined application programming interface in C++ and Python and through standardized open data formats. OpenMS addnl. provides a set of 185 tools and ready-made workflows for common mass spectrometric data processing tasks, which enable users to perform complex quant. mass spectrometric analyses with ease.
- 49Tyanova, S.; Temu, T.; Cox, J. The MaxQuant Computational Platform for Mass Spectrometry-Based Shotgun Proteomics. Nat. Protoc 2016, 11 (12), 2301– 2319, DOI: 10.1038/nprot.2016.136Google Scholar49https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhslynsL7O&md5=31539b285b373b7fcb4e6a497857d228The MaxQuant computational platform for mass spectrometry-based shotgun proteomicsTyanova, Stefka; Temu, Tikira; Cox, JuergenNature Protocols (2016), 11 (12), 2301-2319CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)MaxQuant is one of the most frequently used platforms for mass-spectrometry (MS)-based proteomics data anal. Since its first release in 2008, it has grown substantially in functionality and can be used in conjunction with more MS platforms. Here we present an updated protocol covering the most important basic computational workflows, including those designed for quant. label-free proteomics, MS1-level labeling and isobaric labeling techniques. This protocol presents a complete description of the parameters used in MaxQuant, as well as of the configuration options of its integrated search engine, Andromeda. This protocol update describes an adaptation of an existing protocol that substantially modifies the technique. Important concepts of shotgun proteomics and their implementation in MaxQuant are briefly reviewed, including different quantification strategies and the control of false-discovery rates (FDRs), as well as the anal. of post-translational modifications (PTMs). The MaxQuant output tables, which contain information about quantification of proteins and PTMs, are explained in detail. Furthermore, we provide a short version of the workflow that is applicable to data sets with simple and std. exptl. designs. The MaxQuant algorithms are efficiently parallelized on multiple processors and scale well from desktop computers to servers with many cores. The software is written in C# and is freely available at http://www.maxquant.org.
- 50Hoffmann, N.; Rein, J.; Sachsenberg, T.; Hartler, J.; Haug, K.; Mayer, G.; Alka, O.; Dayalan, S.; Pearce, J. T. M.; Rocca-Serra, P.; Qi, D.; Eisenacher, M.; Perez-Riverol, Y.; Vizcaíno, J. A.; Salek, R. M.; Neumann, S.; Jones, A. R. MzTab-M: A Data Standard for Sharing Quantitative Results in Mass Spectrometry Metabolomics. Anal. Chem. 2019, 91 (5), 3302– 3310, DOI: 10.1021/acs.analchem.8b04310Google Scholar50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitVCnsLY%253D&md5=dc86ef1e1edda270da21080ae4c30f5bmzTab-M: A Data Standard for Sharing Quantitative Results in Mass Spectrometry MetabolomicsHoffmann, Nils; Rein, Joel; Sachsenberg, Timo; Hartler, Juergen; Haug, Kenneth; Mayer, Gerhard; Alka, Oliver; Dayalan, Saravanan; Pearce, Jake T. M.; Rocca-Serra, Philippe; Qi, Da; Eisenacher, Martin; Perez-Riverol, Yasset; Vizcaino, Juan Antonio; Salek, Reza M.; Neumann, Steffen; Jones, Andrew R.Analytical Chemistry (Washington, DC, United States) (2019), 91 (5), 3302-3310CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Mass spectrometry (MS) is one of the primary techniques used for large-scale anal. of small mols. in metabolomics studies. To date, there has been little data format standardization in this field, as different software packages export results in different formats represented in XML or plain text, making data sharing, database deposition, and reanal. highly challenging. Working within the consortia of the Metabolomics Stds. Initiative, Proteomics Stds. Initiative, and the Metabolomics Society, the authors have created mzTab-M to act as a common output format from anal. approaches using MS on small mols. The format has been developed over several years, with input from a wide range of stakeholders. mzTab-M is a simple tab-sepd. text format, but importantly, the structure is highly standardized through the design of a detailed specification document, tightly coupled to validation software, and a mandatory controlled vocabulary of terms to populate it. The format is able to represent final quantification values from analyses, as well as the evidence trail in terms of features measured directly from MS (e.g., LC-MS, GC-MS, DIMS, etc.) and different types of approaches used to identify mols. mzTab-M allows for ambiguity in the identification of mols. to be communicated clearly to readers of the files (both people and software). There are several implementations of the format available, and the authors anticipate widespread adoption in the field.
- 51Menschaert, G.; Wang, X.; Jones, A. R.; Ghali, F.; Fenyö, D.; Olexiouk, V.; Zhang, B.; Deutsch, E. W.; Ternent, T.; Vizcaíno, J. A. The ProBAM and ProBed Standard Formats: Enabling a Seamless Integration of Genomics and Proteomics Data. Genome Biol. 2018, 19 (1), 12, DOI: 10.1186/s13059-017-1377-xGoogle Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitFGrsbrE&md5=f4c157c3e0da5198c4f7e53dc429ef95The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics dataMenschaert, Gerben; Wang, Xiaojing; Jones, Andrew R.; Ghali, Fawaz; Fenyo, David; Olexiouk, Volodimir; Zhang, Bing; Deutsch, Eric W.; Ternent, Tobias; Vizcaino, Juan AntonioGenome Biology (2018), 19 (), 12/1-12/8CODEN: GNBLFW; ISSN:1474-760X. (BioMed Central Ltd.)On behalf of The Human Proteome Organization (HUPO) Proteomics Stds. Initiative, we introduce here two novel std. data formats, proBAM and proBed, that have been developed to address the current challenges of integrating mass spectrometry-based proteomics data with genomics and transcriptomics information in proteogenomics studies. proBAM and proBed are adaptations of the well-defined, widely used file formats SAM/BAM and BED, resp., and both have been extended to meet the specific requirements entailed by proteomics data. Therefore, existing popular genomics tools such as SAMtools and Bedtools, and several widely used genome browsers, can already be used to manipulate and visualize these formats "out-of-the-box." We also highlight that a no. of specific addnl. software tools, properly supporting the proteomics information available in these formats, are now available providing functionalities such as file generation, file conversion, and data anal.
- 52Binz, P.-A.; Shofstahl, J.; Vizcaíno, J. A.; Barsnes, H.; Chalkley, R. J.; Menschaert, G.; Alpi, E.; Clauser, K.; Eng, J. K.; Lane, L.; Seymour, S. L.; Sánchez, L. F. H.; Mayer, G.; Eisenacher, M.; Perez-Riverol, Y.; Kapp, E. A.; Mendoza, L.; Baker, P. R.; Collins, A.; Van Den Bossche, T.; Deutsch, E. W. Proteomics Standards Initiative Extended FASTA Format. J. Proteome Res. 2019, 18 (6), 2686– 2692, DOI: 10.1021/acs.jproteome.9b00064Google Scholar52https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXptlemt7o%253D&md5=40142e226fc572eecfd51f2a34c4d6dfProteomics Standards Initiative Extended FASTA FormatBinz, Pierre-Alain; Shofstahl, Jim; Vizcaino, Juan Antonio; Barsnes, Harald; Chalkley, Robert J.; Menschaert, Gerben; Alpi, Emanuele; Clauser, Karl; Eng, Jimmy K.; Lane, Lydie; Seymour, Sean L.; Sanchez, Luis Francisco Hernandez; Mayer, Gerhard; Eisenacher, Martin; Perez-Riverol, Yasset; Kapp, Eugene A.; Mendoza, Luis; Baker, Peter R.; Collins, Andrew; Van Den Bossche, Tim; Deutsch, Eric W.Journal of Proteome Research (2019), 18 (6), 2686-2692CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biol. samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most anal. tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Stds. Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff.
- 53Eng, J. K.; Jahan, T. A.; Hoopmann, M. R. Comet: An Open-Source MS/MS Sequence Database Search Tool. Proteomics 2013, 13 (1), 22– 24, DOI: 10.1002/pmic.201200439Google Scholar53https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhslKqtLbI&md5=f876fbf5006fdf16335114a68c457140Comet: An open-source MS/MS sequence database search toolEng, Jimmy K.; Jahan, Tahmina A.; Hoopmann, Michael R.Proteomics (2013), 13 (1), 22-24CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of com. and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years.
- 54Eng, J. K.; Deutsch, E. W. Extending Comet for Global Amino Acid Variant and Post-Translational Modification Analysis Using the PSI Extended FASTA Format. Proteomics 2020, 20 (21–22), e1900362 DOI: 10.1002/pmic.201900362Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXmtFWjsLk%253D&md5=89f018e0ef4f9e87f7b74c1ecff9e6e0Extending Comet for Global Amino Acid Variant and Post-Translational Modification Analysis Using the PSI Extended FASTA FormatEng, Jimmy K.; Deutsch, Eric W.Proteomics (2020), 20 (21-22), 1900362CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Protein identification by tandem mass spectrometry sequence database searching is a std. practice in many proteomics labs. The de facto std. for the representation of sequence databases used as input to sequence database search tools is the FASTA format. The Human Proteome Organization's Proteomics Stds. Initiative has developed an extension to the FASTA format termed the proteomics stds. initiative extended FASTA format or PSI extended FASTA format (PEFF) where addnl. information such as structural annotations are encoded in the protein description lines. Comet has been extended to automatically analyze the post translational modifications and amino acid substitutions encoded in PEFF databases. Comet's PEFF implementation and example anal. results searching a HEK293 dataset against the neXtProt PEFF database are presented.
- 55LeDuc, R. D.; Schwämmle, V.; Shortreed, M. R.; Cesnik, A. J.; Solntsev, S. K.; Shaw, J. B.; Martin, M. J.; Vizcaino, J. A.; Alpi, E.; Danis, P.; Kelleher, N. L.; Smith, L. M.; Ge, Y.; Agar, J. N.; Chamot-Rooke, J.; Loo, J. A.; Pasa-Tolic, L.; Tsybin, Y. O. ProForma: A Standard Proteoform Notation. J. Proteome Res. 2018, 17 (3), 1321– 1325, DOI: 10.1021/acs.jproteome.7b00851Google Scholar55https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MvntVOisA%253D%253D&md5=929360b60d06e66c191fe841c3d307feProForma: A Standard Proteoform NotationLeDuc Richard D; Kelleher Neil L; Schwammle Veit; Shortreed Michael R; Cesnik Anthony J; Solntsev Stefan K; Smith Lloyd M; Ge Ying; Shaw Jared B; Pasa-Tolic Ljiljana; Martin Maria J; Vizcaino Juan A; Alpi Emanuele; Danis Paul; Smith Lloyd M; Agar Jeffrey N; Chamot-Rooke Julia; Loo Joseph A; Tsybin Yury OJournal of proteome research (2018), 17 (3), 1321-1325 ISSN:.The Consortium for Top-Down Proteomics (CTDP) proposes a standardized notation, ProForma, for writing the sequence of fully characterized proteoforms. ProForma provides a means to communicate any proteoform by writing the amino acid sequence using standard one-letter notation and specifying modifications or unidentified mass shifts within brackets following certain amino acids. The notation is unambiguous, human-readable, and can easily be parsed and written by bioinformatic tools. This system uses seven rules and supports a wide range of possible use cases, ensuring compatibility and reproducibility of proteoform annotations. Standardizing proteoform sequences will simplify storage, comparison, and reanalysis of proteomic studies, and the Consortium welcomes input and contributions from the research community on the continued design and maintenance of this standard.
- 56LeDuc, R. D.; Deutsch, E. W.; Binz, P.-A.; Fellers, R. T.; Cesnik, A. J.; Klein, J. A.; Van Den Bossche, T.; Gabriels, R.; Yalavarthi, A.; Perez-Riverol, Y.; Carver, J.; Bittremieux, W.; Kawano, S.; Pullman, B.; Bandeira, N.; Kelleher, N. L.; Thomas, P. M.; Vizcaíno, J. A. Proteomics Standards Initiative’s ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms. J. Proteome Res. 2022, 21 (4), 1189– 1195, DOI: 10.1021/acs.jproteome.1c00771Google Scholar56https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XntVejtb0%253D&md5=e1cc8b4127d6517f82e088ddb388f8d1Proteomics Standards Initiative's ProForma 2.0: Unifying the Encoding of Proteoforms and PeptidoformsLeDuc, Richard D.; Deutsch, Eric W.; Binz, Pierre-Alain; Fellers, Ryan T.; Cesnik, Anthony J.; Klein, Joshua A.; Van Den Bossche, Tim; Gabriels, Ralf; Yalavarthi, Arshika; Perez-Riverol, Yasset; Carver, Jeremy; Bittremieux, Wout; Kawano, Shin; Pullman, Benjamin; Bandeira, Nuno; Kelleher, Neil L.; Thomas, Paul M.; Vizcaino, Juan AntonioJournal of Proteome Research (2022), 21 (4), 1189-1195CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chem. induced, and artifactual modifications. The Human Proteome Organization Proteomics Stds. Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a std. notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chem. formulas, or controlled vocabulary terms, including cross-links (natural and chem.) and at. isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.
- 57Deutsch, E. W.; Perez-Riverol, Y.; Carver, J.; Kawano, S.; Mendoza, L.; Van Den Bossche, T.; Gabriels, R.; Binz, P.-A.; Pullman, B.; Sun, Z.; Shofstahl, J.; Bittremieux, W.; Mak, T. D.; Klein, J.; Zhu, Y.; Lam, H.; Vizcaíno, J. A.; Bandeira, N. Universal Spectrum Identifier for Mass Spectra. Nat. Methods 2021, 18 (7), 768– 770, DOI: 10.1038/s41592-021-01184-6Google Scholar57https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsVenurjM&md5=1f1df6a61c31ba9ff5aad99999cf6574Universal Spectrum Identifier for mass spectraDeutsch, Eric W.; Perez-Riverol, Yasset; Carver, Jeremy; Kawano, Shin; Mendoza, Luis; Van Den Bossche, Tim; Gabriels, Ralf; Binz, Pierre-Alain; Pullman, Benjamin; Sun, Zhi; Shofstahl, Jim; Bittremieux, Wout; Mak, Tytus D.; Klein, Joshua; Zhu, Yunping; Lam, Henry; Vizcaino, Juan Antonio; Bandeira, NunoNature Methods (2021), 18 (7), 768-770CODEN: NMAEA3; ISSN:1548-7091. (Nature Portfolio)Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
- 58Bittremieux, W.; Chen, C.; Dorrestein, P. C.; Schymanski, E. L.; Schulze, T.; Neumann, S.; Meier, R.; Rogers, S.; Wang, M. Universal MS/MS Visualization and Retrieval with the Metabolomics Spectrum Resolver Web Service bioRxiv ; preprint; Bioinformatics, 2020. DOI: 10.1101/2020.05.09.086066 .Google ScholarThere is no corresponding record for this reference.
- 59Dai, C.; Füllgrabe, A.; Pfeuffer, J.; Solovyeva, E. M.; Deng, J.; Moreno, P.; Kamatchinathan, S.; Kundu, D. J.; George, N.; Fexova, S.; Grüning, B.; Föll, M. C.; Griss, J.; Vaudel, M.; Audain, E.; Locard-Paulet, M.; Turewicz, M.; Eisenacher, M.; Uszkoreit, J.; Van Den Bossche, T.; Schwämmle, V.; Webel, H.; Schulze, S.; Bouyssié, D.; Jayaram, S.; Duggineni, V. K.; Samaras, P.; Wilhelm, M.; Choi, M.; Wang, M.; Kohlbacher, O.; Brazma, A.; Papatheodorou, I.; Bandeira, N.; Deutsch, E. W.; Vizcaíno, J. A.; Bai, M.; Sachsenberg, T.; Levitsky, L. I.; Perez-Riverol, Y. A Proteomics Sample Metadata Representation for Multiomics Integration and Big Data Analysis. Nat. Commun. 2021, 12 (1), 5854, DOI: 10.1038/s41467-021-26111-3Google Scholar59https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXitF2nsLnL&md5=e9b01a5ec10d750eb50119535bac998bA proteomics sample metadata representation for multiomics integration and big data analysisDai, Chengxin; Fullgrabe, Anja; Pfeuffer, Julianus; Solovyeva, Elizaveta M.; Deng, Jingwen; Moreno, Pablo; Kamatchinathan, Selvakumar; Kundu, Deepti Jaiswal; George, Nancy; Fexova, Silvie; Gruning, Bjorn; Foll, Melanie Christine; Griss, Johannes; Vaudel, Marc; Audain, Enrique; Locard-Paulet, Marie; Turewicz, Michael; Eisenacher, Martin; Uszkoreit, Julian; Van Den Bossche, Tim; Schwammle, Veit; Webel, Henry; Schulze, Stefan; Bouyssie, David; Jayaram, Savita; Duggineni, Vinay Kumar; Samaras, Patroklos; Wilhelm, Mathias; Choi, Meena; Wang, Mingxun; Kohlbacher, Oliver; Brazma, Alvis; Papatheodorou, Irene; Bandeira, Nuno; Deutsch, Eric W.; Vizcaino, Juan Antonio; Bai, Mingze; Sachsenberg, Timo; Levitsky, Lev I.; Perez-Riverol, YassetNature Communications (2021), 12 (1), 5854CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)The amt. of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanal. Here we propose to develop the transcriptomics data format MAGE-TAB into a std. representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanal. and integration of public proteomics datasets.
- 60Rayner, T. F.; Rocca-Serra, P.; Spellman, P. T.; Causton, H. C.; Farne, A.; Holloway, E.; Irizarry, R. A.; Liu, J.; Maier, D. S.; Miller, M.; Petersen, K.; Quackenbush, J.; Sherlock, G.; Stoeckert, C. J.; White, J.; Whetzel, P. L.; Wymore, F.; Parkinson, H.; Sarkans, U.; Ball, C. A.; Brazma, A. A Simple Spreadsheet-Based, MIAME-Supportive Format for Microarray Data: MAGE-TAB. BMC Bioinformatics 2006, 7, 489, DOI: 10.1186/1471-2105-7-489Google Scholar60https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD28jhtlWrtw%253D%253D&md5=e81731e3b3d2a7422525aa52bff5d11fA simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TABRayner Tim F; Rocca-Serra Philippe; Spellman Paul T; Causton Helen C; Farne Anna; Holloway Ele; Irizarry Rafael A; Liu Junmin; Maier Donald S; Miller Michael; Petersen Kjell; Quackenbush John; Sherlock Gavin; Stoeckert Christian J Jr; White Joseph; Whetzel Patricia L; Wymore Farrell; Parkinson Helen; Sarkans Ugis; Ball Catherine A; Brazma AlvisBMC bioinformatics (2006), 7 (), 489 ISSN:.BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. RESULTS: We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. CONCLUSION: MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
- 61Gibson, F.; Hoogland, C.; Martinez-Bartolomé, S.; Medina-Aunon, J. A.; Albar, J. P.; Babnigg, G.; Wipat, A.; Hermjakob, H.; Almeida, J. S.; Stanislaus, R.; Paton, N. W.; Jones, A. R. The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative. Proteomics 2010, 10 (17), 3073– 3081, DOI: 10.1002/pmic.201000120Google Scholar61https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtV2gsbzK&md5=fe7f791ab89cdb4c61a5e57baf2e4ddbThe Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards InitiativeGibson, Frank; Hoogland, Christine; Martinez-Bartolome, Salvador; Medina-Aunon, J. Alberto; Albar, Juan Pablo; Babnigg, Gyorgy; Wipat, Anil; Hermjakob, Henning; Almeida, Jonas S.; Stanislaus, Romesh; Paton, Norman W.; Jones, Andrew R.Proteomics (2010), 10 (17), 3073-3081CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The Human Proteome Organization's Proteomics Stds. Initiative has developed the GelML (gel electrophoresis markup language) data exchange format for representing gel electrophoresis expts. performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Min. Information About a Proteomics Expt. (MIAPE) set of modules. GelML supports the capture of metadata (such as exptl. protocols) and data (such as gel images) resulting from gel electrophoresis so that labs. can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of exptl. processes, and complements other PSI formats for MS data and the results of protein and peptide identifications to capture entire gel-based proteome workflows. GelML has resulted from the open standardization process of PSI consisting of both public consultation and anonymous review of the specifications.
- 62Deutsch, E. W.; Chambers, M.; Neumann, S.; Levander, F.; Binz, P.-A.; Shofstahl, J.; Campbell, D. S.; Mendoza, L.; Ovelleiro, D.; Helsens, K.; Martens, L.; Aebersold, R.; Moritz, R. L.; Brusniak, M.-Y. TraML-a Standard Format for Exchange of Selected Reaction Monitoring Transition Lists. Mol. Cell Proteomics 2012, 11 (4), R111.015040, DOI: 10.1074/mcp.R111.015040Google ScholarThere is no corresponding record for this reference.
- 63Helsens, K.; Brusniak, M.-Y.; Deutsch, E.; Moritz, R. L.; Martens, L. JTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM Transitions. J. Proteome Res. 2011, 10 (11), 5260– 5263, DOI: 10.1021/pr200664hGoogle Scholar63https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtlSqsbzE&md5=4a6ccfb6a4bb4dbe5aef6e3bbb5f707fjTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM TransitionsHelsens, Kenny; Brusniak, Mi-Youn; Deutsch, Eric; Moritz, Robert L.; Martens, LennartJournal of Proteome Research (2011), 10 (11), 5260-5263CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)We here present jTraML, a Java API for the Proteomics Stds. Initiative TraML data std. The library provides fully functional classes for all elements specified in the TraML XSD document, as well as convenient methods to construct controlled vocabulary-based instances required to define SRM transitions. The use of jTraML is demonstrated via a two-way conversion tool between TraML documents and vendor specific files, facilitating the adoption process of this new community std. The library is released as open source under the permissive Apache2 license and can be downloaded from http://jtraml.googlecode.com. TraML files can also be converted online at http://iomics.ugent.be/jtraml.
- 64MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: An Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments. Bioinformatics 2010, 26 (7), 966– 968, DOI: 10.1093/bioinformatics/btq054Google Scholar64https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXjvFykurk%253D&md5=fa018db7ec038d0f6e3a04dce1c76c39Skyline: an open source document editor for creating and analyzing targeted proteomics experimentsMacLean, Brendan; Tomazela, Daniela M.; Shulman, Nicholas; Chambers, Matthew; Finney, Gregory L.; Frewen, Barbara; Kern, Randall; Tabb, David L.; Liebler, Daniel C.; MacCoss, Michael J.Bioinformatics (2010), 26 (7), 966-968CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: Skyline is a Windows client application for targeted proteomics method creation and quant. data anal. It is open source and freely available for academic and com. use. The Skyline user interface simplifies the development of mass spectrometer methods and the anal. of data from targeted proteomics expts. performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously obsd. ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the exptl. design document. The fast and compact Skyline file format is easily shared, even for expts. requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth anal. with common statistical tools. Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.
- 65Walzer, M.; Qi, D.; Mayer, G.; Uszkoreit, J.; Eisenacher, M.; Sachsenberg, T.; Gonzalez-Galarza, F. F.; Fan, J.; Bessant, C.; Deutsch, E. W.; Reisinger, F.; Vizcaíno, J. A.; Medina-Aunon, J. A.; Albar, J. P.; Kohlbacher, O.; Jones, A. R. The MzQuantML Data Standard for Mass Spectrometry-Based Quantitative Studies in Proteomics. Mol. Cell Proteomics 2013, 12 (8), 2332– 2340, DOI: 10.1074/mcp.O113.028506Google Scholar65https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXht1Sht7rL&md5=1e64630faa4232ccb91c190728753382The mzQuantML Data Standard for Mass Spectrometry-based Quantitative Studies in ProteomicsWalzer, Mathias; Qi, Da; Mayer, Gerhard; Uszkoreit, Julian; Eisenacher, Martin; Sachsenberg, Timo; Gonzalez-Galarza, Faviel F.; Fan, Jun; Bessant, Conrad; Deutsch, Eric W.; Reisinger, Florian; Vizcaino, Juan Antonio; Medina-Aunon, J. Alberto; Albar, Juan Pablo; Kohlbacher, Oliver; Jones, Andrew R.Molecular & Cellular Proteomics (2013), 12 (8), 2332-2340CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)The range of heterogeneous approaches available for quantifying protein abundance via mass spectrometry (MS)1 leads to considerable challenges in modeling, archiving, exchanging, or submitting exptl. data sets as supplemental material to journals. To date, there has been no widely accepted format for capturing the evidence trail of how quant. anal. has been performed by software, for transferring data between software packages, or for submitting to public databases. In the context of the Proteomics Stds. Initiative, we have developed the mzQuantML data std. The std. can represent quant. data about regions in two-dimensional retention time vs. mass/charge space (called features), peptides, and proteins and protein groups (where there is ambiguity regarding peptide-to-protein inference), and it offers limited support for small mol. (metabolomic) data. The format has structures for representing replicate MS runs, grouping of replicates (for example, as study variables), and capturing the parameters used by software packages to arrive at these values. The format has the capability to ref. other stds. such as mzML and mzIdentML, and thus the evidence trail for the MS workflow as a whole can now be described. Several software implementations are available, and we encourage other bioinformatics groups to use mzQuantML as an input, internal, or output format for quant. software and for structuring local repositories. All project resources are available in the public domain from the HUPO Proteomics Stds. Initiative http://www.psidev.info/mzquantml.
- 66Walzer, M.; Pernas, L. E.; Nasso, S.; Bittremieux, W.; Nahnsen, S.; Kelchtermans, P.; Pichler, P.; van den Toorn, H. W. P.; Staes, A.; Vandenbussche, J.; Mazanek, M.; Taus, T.; Scheltema, R. A.; Kelstrup, C. D.; Gatto, L.; van Breukelen, B.; Aiche, S.; Valkenborg, D.; Laukens, K.; Lilley, K. S.; Olsen, J. V.; Heck, A. J. R.; Mechtler, K.; Aebersold, R.; Gevaert, K.; Vizcaíno, J. A.; Hermjakob, H.; Kohlbacher, O.; Martens, L. QcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments. Mol. Cell Proteomics 2014, 13 (8), 1905– 1913, DOI: 10.1074/mcp.M113.035907Google Scholar66https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXht1CksL%252FF&md5=b87d256f986786be3f1b6c343410ddfaqcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry ExperimentsWalzer, Mathias; Pernas, Lucia Espona; Nasso, Sara; Bittremieux, Wout; Nahnsen, Sven; Kelchtermans, Pieter; Pichler, Peter; van den Toorn, Henk W. P.; Staes, An; Vandenbussche, Jonathan; Mazanek, Michael; Taus, Thomas; Scheltema, Richard A.; Kelstrup, Christian D.; Gatto, Laurent; van Breukelen, Bas; Aiche, Stephan; Valkenborg, Dirk; Laukens, Kris; Lilley, Kathryn S.; Olsen, Jesper V.; Heck, Albert J. R.; Mechtler, Karl; Aebersold, Ruedi; Gevaert, Kris; Vizcaino, Juan Antonio; Hermjakob, Henning; Kohlbacher, Oliver; Martens, LennartMolecular & Cellular Proteomics (2014), 13 (8), 1905-1913CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to ext. these from the instrumental raw data. What has been missing, however, is a std. data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based std. that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML stds. from the HUPO-PSI (Proteomics Stds. Initiative). In addn. to the XML format, we also provide tools for the calcn. of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent anal. possibilities. All information about qcML is available at http://code.google.com/p/qcml.
- 67Bittremieux, W.; Walzer, M.; Tenzer, S.; Zhu, W.; Salek, R. M.; Eisenacher, M.; Tabb, D. L. The Human Proteome Organization-Proteomics Standards Initiative Quality Control Working Group: Making Quality Control More Accessible for Biological Mass Spectrometry. Anal. Chem. 2017, 89 (8), 4474– 4479, DOI: 10.1021/acs.analchem.6b04310Google Scholar67https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXksV2jtr0%253D&md5=9a454db151033de91da1ed901b11aea5The Human Proteome Organization-Proteomics Standards Initiative Quality Control Working Group: Making Quality Control More Accessible for Biological Mass SpectrometryBittremieux, Wout; Walzer, Mathias; Tenzer, Stefan; Zhu, Weimin; Salek, Reza M.; Eisenacher, Martin; Tabb, David L.Analytical Chemistry (Washington, DC, United States) (2017), 89 (8), 4474-4479CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)To have confidence in results acquired during biol. mass spectrometry expts., a systematic approach to quality control is of vital importance. Nonetheless, until now, only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteome Organization-Proteomics Stds. Initiative established a new working group on quality control at its meeting in the spring of 2016. The goal of this working group is to provide a unifying framework for quality control data. The initial focus will be on providing a community-driven standardized file format for quality control. For this purpose, the previously proposed qcML format will be adapted to support a variety of use cases for both proteomics and metabolomics applications, and it will be established as an official PSI format. An important consideration is to avoid enforcing restrictive requirements on quality control but instead provide the basic tech. necessities required to support extensive quality control for any type of mass spectrometry-based workflow. The authors want to emphasize that this is an open community effort, and the authors seek participation from all scientists with an interest in this field.
- 68Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Development and Validation of a Spectral Library Searching Method for Peptide Identification from MS/MS. Proteomics 2007, 7 (5), 655– 667, DOI: 10.1002/pmic.200600625Google Scholar68https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXjs1Kls70%253D&md5=f34b1ce3ee3a941044c9971d04d2dc50Development and validation of a spectral library searching method for peptide identification from MS/MSLam, Henry; Deutsch, Eric W.; Eddes, James S.; Eng, Jimmy K.; King, Nichole; Stein, Stephen E.; Aebersold, RuediProteomics (2007), 7 (5), 655-667CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)A notable inefficiency of shotgun proteomics expts. is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-phone. A more precise and efficient method, in which previously obsd. and identified peptide MS/MS spectra are cataloged and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is esp. suited for targeted proteomics applications, offering superior performance to traditional sequence searching.
- 69Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; Stein, S. E.; Aebersold, R. Building Consensus Spectral Libraries for Peptide Identification in Proteomics. Nat. Methods 2008, 5 (10), 873– 875, DOI: 10.1038/nmeth.1254Google Scholar69https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhtFKktLfM&md5=3cf83e26679ad2ce2d62dbfa87429a3fBuilding consensus spectral libraries for peptide identification in proteomicsLam, Henry; Deutsch, Eric W.; Eddes, James S.; Eng, Jimmy K.; Stein, Stephen E.; Aebersold, RuediNature Methods (2008), 5 (10), 873-875CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. The authors developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-anal. pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about obsd. proteomes into a concise and retrievable format for future data analyses.
- 70Frewen, B.; MacCoss, M. J. Using BiblioSpec for Creating and Searching Tandem MS Peptide Libraries. Curr. Protoc Bioinformatics 2007; Chapter 13, Unit 13.7. DOI: 10.1002/0471250953.bi1307s20 .Google ScholarThere is no corresponding record for this reference.
- 71Deutsch, E. W.; Perez-Riverol, Y.; Chalkley, R. J.; Wilhelm, M.; Tate, S.; Sachsenberg, T.; Walzer, M.; Käll, L.; Delanghe, B.; Böcker, S.; Schymanski, E. L.; Wilmes, P.; Dorfer, V.; Kuster, B.; Volders, P.-J.; Jehmlich, N.; Vissers, J. P. C.; Wolan, D. W.; Wang, A. Y.; Mendoza, L.; Shofstahl, J.; Dowsey, A. W.; Griss, J.; Salek, R. M.; Neumann, S.; Binz, P.-A.; Lam, H.; Vizcaíno, J. A.; Bandeira, N.; Röst, H. Expanding the Use of Spectral Libraries in Proteomics. J. Proteome Res. 2018, 17, 4051, DOI: 10.1021/acs.jproteome.8b00485Google Scholar71https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhvVWiur%252FN&md5=94614ca5a18653b5654af44c2aae34feExpanding the Use of Spectral Libraries in ProteomicsDeutsch, Eric W.; Perez-Riverol, Yasset; Chalkley, Robert J.; Wilhelm, Mathias; Tate, Stephen; Sachsenberg, Timo; Walzer, Mathias; Kall, Lukas; Delanghe, Bernard; Bocker, Sebastian; Schymanski, Emma L.; Wilmes, Paul; Dorfer, Viktoria; Kuster, Bernhard; Volders, Pieter-Jan; Jehmlich, Nico; Vissers, Johannes P. C.; Wolan, Dennis W.; Wang, Ana Y.; Mendoza, Luis; Shofstahl, Jim; Dowsey, Andrew W.; Griss, Johannes; Salek, Reza M.; Neumann, Steffen; Binz, Pierre-Alain; Lam, Henry; Vizcaino, Juan Antonio; Bandeira, Nuno; Rost, HannesJournal of Proteome Research (2018), 17 (12), 4051-4060CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)A review. The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Stds. Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. The authors present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine anal. of proteomics datasets.
- 72Mészáros, B.; Hatos, A.; Palopoli, N.; Quaglia, F.; Salladini, E.; Van Roey, K.; Arthanari, H.; Dosztányi, Z.; Felli, I. C.; Fischer, P. D.; Hoch, J. C.; Jeffries, C. M.; Longhi, S.; Maiani, E.; Orchard, S.; Pancsa, R.; Papaleo, E.; Pierattelli, R.; Piovesan, D.; Pritisanac, I.; Viennet, T.; Tompa, P.; Vranken, W.; Tosatto, S. C.; Davey, N. E. MIADE Metadata Guidelines: Minimum Information About a Disorder Experiment; Scientific Communication and Education, 2022. DOI: 10.1101/2022.07.12.495092 .Google ScholarThere is no corresponding record for this reference.
- 73Quaglia, F.; Mészáros, B.; Salladini, E.; Hatos, A.; Pancsa, R.; Chemes, L. B.; Pajkos, M.; Lazar, T.; Peña-Díaz, S.; Santos, J.; Ács, V.; Farahi, N.; Fichó, E.; Aspromonte, M. C.; Bassot, C.; Chasapi, A.; Davey, N. E.; Davidović, R.; Dobson, L.; Elofsson, A.; Erdos, G.; Gaudet, P.; Giglio, M.; Glavina, J.; Iserte, J.; Iglesias, V.; Kálmán, Z.; Lambrughi, M.; Leonardi, E.; Longhi, S.; Macedo-Ribeiro, S.; Maiani, E.; Marchetti, J.; Marino-Buslje, C.; Mészáros, A.; Monzon, A. M.; Minervini, G.; Nadendla, S.; Nilsson, J. F.; Novotný, M.; Ouzounis, C. A.; Palopoli, N.; Papaleo, E.; Pereira, P. J. B.; Pozzati, G.; Promponas, V. J.; Pujols, J.; Rocha, A. C. S.; Salas, M.; Sawicki, L. R.; Schad, E.; Shenoy, A.; Szaniszló, T.; Tsirigos, K. D.; Veljkovic, N.; Parisi, G.; Ventura, S.; Dosztányi, Z.; Tompa, P.; Tosatto, S. C. E.; Piovesan, D. DisProt in 2022: Improved Quality and Accessibility of Protein Intrinsic Disorder Annotation. Nucleic Acids Res. 2022, 50 (D1), D480– D487, DOI: 10.1093/nar/gkab1082Google Scholar73https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xis1Churk%253D&md5=62265cf41f620f29f0f3d673578ca5bcDisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotationQuaglia, Federica; Meszaros, Balint; Salladini, Edoardo; Hatos, Andras; Pancsa, Rita; Chemes, Lucia B.; Pajkos, Matyas; Lazar, Tamas; Pena-Diaz, Samuel; Santos, Jaime; Acs, Veronika; Farahi, Nazanin; Ficho, Erzsebet; Aspromonte, Maria Cristina; Bassot, Claudio; Chasapi, Anastasia; Davey, Norman E.; Davidovic, Radoslav; Dobson, Laszlo; Elofsson, Arne; Erdos, Gabor; Gaudet, Pascale; Giglio, Michelle; Glavina, Juliana; Iserte, Javier; Iglesias, Valentin; Kalman, Zsofia; Lambrughi, Matteo; Leonardi, Emanuela; Longhi, Sonia; Macedo-Ribeiro, Sandra; Maiani, Emiliano; Marchetti, Julia; Marino-Buslje, Cristina; Meszaros, Attila; Monzon, Alexander Miguel; Minervini, Giovanni; Nadendla, Suvarna; Nilsson, Juliet F.; Novotny, Marian; Ouzounis, Christos A.; Palopoli, Nicolas; Papaleo, Elena; Pereira, Pedro Jose Barbosa; Pozzati, Gabriele; Promponas, Vasilis J.; Pujols, Jordi; Rocha, Alma Carolina Sanchez; Salas, Martin; Sawicki, Luciana Rodriguez; Schad, Eva; Shenoy, Aditi; Szaniszlo, Tamas; Tsirigos, Konstantinos D.; Veljkovic, Nevena; Parisi, Gustavo; Ventura, Salvador; Dosztanyi, Zsuzsanna; Tompa, Peter; Tosatto, Silvio C. E.; Piovesan, DamianoNucleic Acids Research (2022), 50 (D1), D480-D487CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)The Database of Intrinsically Disordered Proteins is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontol. (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Min. Information About Disorder (MIADE) std., an active collaboration with the Gene Ontol. (GO) and Evidence and Conclusion Ontol. (ECO) consortia and the support of the ELIXIR infrastructure.
- 74Bittremieux, W.; Bouyssié, D.; Dorfer, V.; Locard-Paulet, M.; Perez-Riverol, Y.; Schwämmle, V.; Uszkoreit, J.; Van Den Bossche, T. The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS): An Open Community for Bioinformatics Training and Research. Rapid Commun. Mass Spectrom. 2021, e9087 DOI: 10.1002/rcm.9087Google ScholarThere is no corresponding record for this reference.
- 75Rehm, H. L.; Page, A. J. H.; Smith, L.; Adams, J. B.; Alterovitz, G.; Babb, L. J.; Barkley, M. P.; Baudis, M.; Beauvais, M. J. S.; Beck, T.; Beckmann, J. S.; Beltran, S.; Bernick, D.; Bernier, A.; Bonfield, J. K.; Boughtwood, T. F.; Bourque, G.; Bowers, S. R.; Brookes, A. J.; Brudno, M.; Brush, M. H.; Bujold, D.; Burdett, T.; Buske, O. J.; Cabili, M. N.; Cameron, D. L.; Carroll, R. J.; Casas-Silva, E.; Chakravarty, D.; Chaudhari, B. P.; Chen, S. H.; Cherry, J. M.; Chung, J.; Cline, M.; Clissold, H. L.; Cook-Deegan, R. M.; Courtot, M.; Cunningham, F.; Cupak, M.; Davies, R. M.; Denisko, D.; Doerr, M. J.; Dolman, L. I.; Dove, E. S.; Dursi, L. J.; Dyke, S. O. M.; Eddy, J. A.; Eilbeck, K.; Ellrott, K. P.; Fairley, S.; Fakhro, K. A.; Firth, H. V.; Fitzsimons, M. S.; Fiume, M.; Flicek, P.; Fore, I. M.; Freeberg, M. A.; Freimuth, R. R.; Fromont, L. A.; Fuerth, J.; Gaff, C. L.; Gan, W.; Ghanaim, E. M.; Glazer, D.; Green, R. C.; Griffith, M.; Griffith, O. L.; Grossman, R. L.; Groza, T.; Auvil, J. M. G.; Guigó, R.; Gupta, D.; Haendel, M. A.; Hamosh, A.; Hansen, D. P.; Hart, R. K.; Hartley, D. M.; Haussler, D.; Hendricks-Sturrup, R. M.; Ho, C. W. L.; Hobb, A. E.; Hoffman, M. M.; Hofmann, O. M.; Holub, P.; Hsu, J. S.; Hubaux, J.-P.; Hunt, S. E.; Husami, A.; Jacobsen, J. O.; Jamuar, S. S.; Janes, E. L.; Jeanson, F.; Jené, A.; Johns, A. L.; Joly, Y.; Jones, S. J. M.; Kanitz, A.; Kato, K.; Keane, T. M.; Kekesi-Lafrance, K.; Kelleher, J.; Kerry, G.; Khor, S.-S.; Knoppers, B. M.; Konopko, M. A.; Kosaki, K.; Kuba, M.; Lawson, J.; Leinonen, R.; Li, S.; Lin, M. F.; Linden, M.; Liu, X.; Udara Liyanage, I.; Lopez, J.; Lucassen, A. M.; Lukowski, M.; Mann, A. L.; Marshall, J.; Mattioni, M.; Metke-Jimenez, A.; Middleton, A.; Milne, R. J.; Molnár-Gábor, F.; Mulder, N.; Munoz-Torres, M. C.; Nag, R.; Nakagawa, H.; Nasir, J.; Navarro, A.; Nelson, T. H.; Niewielska, A.; Nisselle, A.; Niu, J.; Nyrönen, T. H.; O’Connor, B. D.; Oesterle, S.; Ogishima, S.; Wang, V. O.; Paglione, L. A. D.; Palumbo, E.; Parkinson, H. E.; Philippakis, A. A.; Pizarro, A. D.; Prlic, A.; Rambla, J.; Rendon, A.; Rider, R. A.; Robinson, P. N.; Rodarmer, K. W.; Rodriguez, L. L.; Rubin, A. F.; Rueda, M.; Rushton, G. A.; Ryan, R. S.; Saunders, G. I.; Schuilenburg, H.; Schwede, T.; Scollen, S.; Senf, A.; Sheffield, N. C.; Skantharajah, N.; Smith, A. V.; Sofia, H. J.; Spalding, D.; Spurdle, A. B.; Stark, Z.; Stein, L. D.; Suematsu, M.; Tan, P.; Tedds, J. A.; Thomson, A. A.; Thorogood, A.; Tickle, T. L.; Tokunaga, K.; Törnroos, J.; Torrents, D.; Upchurch, S.; Valencia, A.; Guimera, R. V.; Vamathevan, J.; Varma, S.; Vears, D. F.; Viner, C.; Voisin, C.; Wagner, A. H.; Wallace, S. E.; Walsh, B. P.; Williams, M. S.; Winkler, E. C.; Wold, B. J.; Wood, G. M.; Woolley, J. P.; Yamasaki, C.; Yates, A. D.; Yung, C. K.; Zass, L. J.; Zaytseva, K.; Zhang, J.; Goodhand, P.; North, K.; Birney, E. GA4GH: International Policies and Standards for Data Sharing across Genomic Research and Healthcare. Cell Genom 2021, 1 (2), 100029, DOI: 10.1016/j.xgen.2021.100029Google ScholarThere is no corresponding record for this reference.
- 76Keane, T. M.; O’Donovan, C.; Vizcaíno, J. A. The Growing Need for Controlled Data Access Models in Clinical Proteomics and Metabolomics. Nat. Commun. 2021, 12 (1), 5787, DOI: 10.1038/s41467-021-26110-4Google Scholar76https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXitFGqtL%252FJ&md5=6266fd2178b5ca2fe12b67b15cd83554The growing need for controlled data access models in clinical proteomics and metabolomicsKeane, Thomas M.; O'Donovan, Claire; Vizcaino, Juan AntonioNature Communications (2021), 12 (1), 5787CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)More and more clin. studies include potentially sensitive human proteomics or metabolomics datasets, but bioinformatics resources for managing the access to these data are not yet available. This commentary discusses current best practices and future perspectives for the responsible handling of clin. proteomics and metabolomics data.
- 77Bandeira, N.; Deutsch, E. W.; Kohlbacher, O.; Martens, L.; Vizcaíno, J. A. Data Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the Future. Mol. Cell Proteomics 2021, 20, 100071, DOI: 10.1016/j.mcpro.2021.100071Google Scholar77https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXotFaqsLY%253D&md5=1514664f25e39151d40af5212d0082bcData Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the FutureBandeira, Nuno; Deutsch, Eric W.; Kohlbacher, Oliver; Martens, Lennart; Vizcaino, Juan AntonioMolecular & Cellular Proteomics (2021), 20 (), 100071CODEN: MCPOBS; ISSN:1535-9484. (Elsevier Inc.)A review. Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the no. of clin. proteomics studies, an important emerging topic is the management and dissemination of clin., and thus potentially sensitive, human proteomics data. Both in the United States and in the European Union, there are legal frameworks protecting the privacy of individuals. Implementing privacy stds. for publicly released research data in genomics and transcriptomics has led to processes to control who may access the data, so-called "controlled access" data. In parallel with the technol. developments in the field, it is clear that the privacy risks of sharing proteomics data need to be properly assessed and managed. In our view, the proteomics community must be proactive in addressing these issues. Yet a careful balance must be kept. On the one hand, neglecting to address the potential of identifiability in human proteomics data could lead to reputational damage of the field, while on the other hand, erecting barriers to open access to clin. proteomics data will inevitably reduce reuse of proteomics data and could substantially delay crit. discoveries in biomedical research. In order to balance these apparently conflicting requirements for data privacy and efficient use and reuse of research efforts through the sharing of clin. proteomics data, development efforts will be needed at different levels including bioinformatics infrastructure, policymaking, and mechanisms of oversight.
- 78Jones, A. R.; Deutsch, E. W.; Vizcaíno, J. A. Is DIA Proteomics Data FAIR? Current Data Sharing Practices, Available Bioinformatics Infrastructure and Recommendations for the Future. Proteomics 2022, e2200014 DOI: 10.1002/pmic.202200014Google ScholarThere is no corresponding record for this reference.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 29 publications.
- Joshua Klein, Henry Lam, Tytus D. Mak, Wout Bittremieux, Yasset Perez-Riverol, Ralf Gabriels, Jim Shofstahl, Helge Hecht, Pierre-Alain Binz, Shin Kawano, Tim Van Den Bossche, Jeremy Carver, Benjamin A. Neely, Luis Mendoza, Tomi Suomi, Tine Claeys, Thomas Payne, Douwe Schulte, Zhi Sun, Nils Hoffmann, Yunping Zhu, Steffen Neumann, Andrew R. Jones, Nuno Bandeira, Juan Antonio Vizcaíno, Eric W. Deutsch. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Analytical Chemistry 2024, 96
(46)
, 18491-18501. https://doi.org/10.1021/acs.analchem.4c04091
- Boryana Petrova, Arzu Tugce Guler. Recent Developments in Single-Cell Metabolomics by Mass Spectrometry─A Perspective. Journal of Proteome Research 2024, Article ASAP.
- Ceder Dens, Charlotte Adams, Kris Laukens, Wout Bittremieux. Machine Learning Strategies to Tackle Data Challenges in Mass Spectrometry-Based Proteomics. Journal of the American Society for Mass Spectrometry 2024, 35
(9)
, 2143-2155. https://doi.org/10.1021/jasms.4c00180
- Chris Bielow, Nils Hoffmann, David Jimenez-Morales, Tim Van Den Bossche, Juan Antonio Vizcaíno, David L. Tabb, Wout Bittremieux, Mathias Walzer. Communicating Mass Spectrometry Quality Information in mzQC with Python, R, and Java. Journal of the American Society for Mass Spectrometry 2024, 35
(8)
, 1875-1882. https://doi.org/10.1021/jasms.4c00174
- Gilbert S. Omenn, Lydie Lane, Christopher M. Overall, Cecilia Lindskog, Charles Pineau, Nicolle H. Packer, Ileana M. Cristea, Susan T. Weintraub, Sandra Orchard, Michael H. A. Roehrl, Edouard Nice, Tiannan Guo, Jennifer E. Van Eyk, Siqi Liu, Nuno Bandeira, Ruedi Aebersold, Robert L. Moritz, Eric W. Deutsch. The 2023 Report on the Proteome from the HUPO Human Proteome Project. Journal of Proteome Research 2024, 23
(2)
, 532-549. https://doi.org/10.1021/acs.jproteome.3c00591
- Eric W. Deutsch, Luis Mendoza, David D. Shteynberg, Michael R. Hoopmann, Zhi Sun, Jimmy K. Eng, Robert L. Moritz. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. Journal of Proteome Research 2023, 22
(2)
, 615-624. https://doi.org/10.1021/acs.jproteome.2c00624
- Wenjin Zhang, Michelle Vesser, Nathan Edwards. GNOme, an ontology for glycan naming and subsumption. Analytical and Bioanalytical Chemistry 2025, 343 https://doi.org/10.1007/s00216-025-05757-8
- Yasset Perez-Riverol, Chakradhar Bandla, Deepti J Kundu, Selvakumar Kamatchinathan, Jingwen Bai, Suresh Hewapathirana, Nithu Sara John, Ananth Prakash, Mathias Walzer, Shengbo Wang, Juan Antonio Vizcaíno. The PRIDE database at 20 years: 2025 update. Nucleic Acids Research 2025, 53
(D1)
, D543-D553. https://doi.org/10.1093/nar/gkae1011
- Valeria Marzano, Stefano Levi Mortera, Lorenza Putignani. Insights on Wet and Dry Workflows for Human Gut Metaproteomics. PROTEOMICS 2024, 6 https://doi.org/10.1002/pmic.202400242
- Fuchu He, Ruedi Aebersold, Mark S. Baker, Xiuwu Bian, Xiaochen Bo, Daniel W. Chan, Cheng Chang, Luonan Chen, Xiangmei Chen, Yu-Ju Chen, Heping Cheng, Ben C. Collins, Fernando Corrales, Jürgen Cox, Weinan E, Jennifer E. Van Eyk, Jia Fan, Pouya Faridi, Daniel Figeys, George Fu Gao, Wen Gao, Zu-Hua Gao, Keisuke Goda, Wilson Wen Bin Goh, Dongfeng Gu, Changjiang Guo, Tiannan Guo, Yuezhong He, Albert J. R. Heck, Henning Hermjakob, Tony Hunter, Narayanan Gopalakrishna Iyer, Ying Jiang, Connie R. Jimenez, Lokesh Joshi, Neil L. Kelleher, Ming Li, Yang Li, Qingsong Lin, Cui Hua Liu, Fan Liu, Guang-Hui Liu, Yansheng Liu, Zhihua Liu, Teck Yew Low, Ben Lu, Matthias Mann, Anming Meng, Robert L. Moritz, Edouard Nice, Guang Ning, Gilbert S. Omenn, Christopher M. Overall, Giuseppe Palmisano, Yaojin Peng, Charles Pineau, Terence Chuen Wai Poon, Anthony W. Purcell, Jie Qiao, Roger R. Reddel, Phillip J. Robinson, Paola Roncada, Chris Sander, Jiahao Sha, Erwei Song, Sanjeeva Srivastava, Aihua Sun, Siu Kwan Sze, Chao Tang, Liujun Tang, Ruijun Tian, Juan Antonio Vizcaíno, Chanjuan Wang, Chen Wang, Xiaowen Wang, Xinxing Wang, Yan Wang, Tobias Weiss, Mathias Wilhelm, Robert Winkler, Bernd Wollscheid, Limsoon Wong, Linhai Xie, Wei Xie, Tao Xu, Tianhao Xu, Liying Yan, Jing Yang, Xiao Yang, John Yates, Tao Yun, Qiwei Zhai, Bing Zhang, Hui Zhang, Lihua Zhang, Lingqiang Zhang, Pingwen Zhang, Yukui Zhang, Yu Zi Zheng, Qing Zhong, Yunping Zhu, , Daniel W. Chan, Chris Soon Heng Tan, Weinan E, Jennifer E. Van Eyk, Mingxia Gao, Qiang Gao, Yushun Gao, Xuejiang Guo, Jie He, Jun He, Qing-Yu He, Jinlin Hou, Canhua Huang, Chenxi Jia, Bernhard Kuster, Chaoying Li, Dong Li, Yan Li, Yanchang Li, Siqi Liu, Xiaonan Liu, Ya Liu, Zhongyang Liu, Haojie Lu, Yongzhan Nie, Mariko Okada, Guojun Qian, Hongqiang Qin, Yu Rao, Zihe Rao, Xianwen Ren, Yan Ren, Feng Shen, Lin Shen, Shicheng Su, Minjia Tan, Ben Zhong Tang, Sheng-Ce Tao, Jian Wang, Tong Wang, Liming Wei, Catherine C. L. Wong, Xiaoliang Sunney Xie, Li Xu, Ping Xu, Tao Xu, Huanming Yang, Jianjun Yang, Mingliang Ye, Wantao Ying, Xiaobo Yu, Yaxiang Yuan, Qingcun Zeng, Qimin Zhan, Xiaofei Zhang, Xu Zhang, Ying Zhang, Nan-Shan Zhong, Feng Zhou, Yi Zhu, Zemin Zhang. π-HuB: the proteomic navigator of the human body. Nature 2024, 636
(8042)
, 322-331. https://doi.org/10.1038/s41586-024-08280-5
- Eva Price, Felix Feyertag, Thomas Evans, James Miskin, Kyriacos Mitrophanous, Duygu Dikicioglu. What is the
real
value of omics data? Enhancing research outcomes and securing long-term data excellence. Nucleic Acids Research 2024, 52
(20)
, 12130-12140. https://doi.org/10.1093/nar/gkae901
- Jingwen Bai, Selvakumar Kamatchinathan, Deepti J. Kundu, Chakradhar Bandla, Juan Antonio Vizcaíno, Yasset Perez‐Riverol. Open‐source large language models in action: A bioinformatics chatbot for PRIDE database. PROTEOMICS 2024, 24
(21-22)
https://doi.org/10.1002/pmic.202400005
- Joseph V Moxon, Cornea Pretorius, Alexandra F Trollope, Parul Mittal, Manuela Klingler-Hoffmann, Peter Hoffmann, Jonathan Golledge. A systematic review and in silico analysis of studies investigating the ischemic penumbra proteome in animal models of experimental stroke. Journal of Cerebral Blood Flow & Metabolism 2024, 44
(10)
, 1709-1722. https://doi.org/10.1177/0271678X241248502
- Colin W. Combe, Lars Kolbowski, Lutz Fischer, Ville Koskinen, Joshua Klein, Alexander Leitner, Andrew R. Jones, Juan Antonio Vizcaíno, Juri Rappsilber. mzIdentML 1.3.0 – Essential progress on the support of crosslinking and other identifications based on multiple spectra. PROTEOMICS 2024, 24
(17)
https://doi.org/10.1002/pmic.202300385
- Dominik Kopczynski, Christer S. Ejsing, Jeffrey G. McDonald, Takeshi Bamba, Erin S. Baker, Justine Bertrand-Michel, Britta Brügger, Cristina Coman, Shane R. Ellis, Timothy J. Garrett, William J. Griffiths, Xue Li Guan, Xianlin Han, Marcus Höring, Michal Holčapek, Nils Hoffmann, Kevin Huynh, Rainer Lehmann, Jace W. Jones, Rima Kaddurah-Daouk, Harald C. Köfeler, Peter J. Meikle, Thomas O. Metz, Valerie B. O’Donnell, Daisuke Saigusa, Dominik Schwudke, Andrej Shevchenko, Federico Torta, Juan Antonio Vizcaíno, Ruth Welti, Markus R. Wenk, Denise Wolrab, Yu Xia, Kim Ekroos, Robert Ahrends, Gerhard Liebisch. The lipidomics reporting checklist a framework for transparency of lipidomic experiments and repurposing resource data. Journal of Lipid Research 2024, 65
(9)
, 100621. https://doi.org/10.1016/j.jlr.2024.100621
- Colin W. Combe, Martin Graham, Lars Kolbowski, Lutz Fischer, Juri Rappsilber. xiVIEW: Visualisation of Crosslinking Mass Spectrometry Data. Journal of Molecular Biology 2024, 436
(17)
, 168656. https://doi.org/10.1016/j.jmb.2024.168656
- Chris Bielow, Nils Hoffmann, David Jimenez-Morales, Tim Van Den Bossche, Juan Antonio Vizcaíno, David L. Tabb, Wout Bittremieux, Mathias Walzer. Communicating mass spectrometry quality information in mzQC with Python, R, and Java. 2024https://doi.org/10.1101/2024.05.06.592686
- Ceder Dens, Charlotte Adams, Kris Laukens, Wout Bittremieux. Machine learning strategies to tackle data challenges in mass spectrometry-based proteomics. 2024https://doi.org/10.1101/2024.05.02.592141
- Goncalo Jorge Gouveia, Thomas Head, Leo L. Cheng, Chaevien S. Clendinen, John R. Cort, Xiuxia Du, Arthur S. Edison, Candace C. Fleischer, Jeffrey Hoch, Nathaniel Mercaldo, Wimal Pathmasiri, Daniel Raftery, Tracey B. Schock, Lloyd W. Sumner, Panteleimon G. Takis, Valérie Copié, Hamid R. Eghbalnia, Robert Powers. Perspective: use and reuse of NMR-based metabolomics data: what works and what remains challenging. Metabolomics 2024, 20
(2)
https://doi.org/10.1007/s11306-024-02090-6
- James T. Yurkovich, Simon J. Evans, Noa Rappaport, Jeffrey L. Boore, Jennifer C. Lovejoy, Nathan D. Price, Leroy E. Hood. The transition from genomics to phenomics in personalized population health. Nature Reviews Genetics 2024, 25
(4)
, 286-302. https://doi.org/10.1038/s41576-023-00674-x
- Julian Uszkoreit, Magnus Palmblad, Veit Schwämmle. Tackling reproducibility: lessons for the proteomics community. Expert Review of Proteomics 2024, 21
(1-3)
, 9-11. https://doi.org/10.1080/14789450.2024.2320166
- Mahasish Shome, Tim M.G. MacKenzie, Smitha R. Subbareddy, Michael P. Snyder. The Importance, Challenges, and Possible Solutions for Sharing Proteomics Data While Safeguarding Individuals’ Privacy. Molecular & Cellular Proteomics 2024, 23
(3)
, 100731. https://doi.org/10.1016/j.mcpro.2024.100731
- Maria Iacobescu, Cristina Pop, Alina Uifălean, Cristina Mogoşan, Diana Cenariu, Mihnea Zdrenghea, Alina Tănase, Jon Thor Bergthorsson, Victor Greiff, Mihai Cenariu, Cristina Adela Iuga, Ciprian Tomuleasa, Dan Tătaru. Unlocking protein-based biomarker potential for graft-versus-host disease following allogenic hematopoietic stem cell transplants. Frontiers in Immunology 2024, 15 https://doi.org/10.3389/fimmu.2024.1327035
- Ronghui Lou, Wenqing Shui. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Molecular & Cellular Proteomics 2024, 23
(2)
, 100712. https://doi.org/10.1016/j.mcpro.2024.100712
- Abigail N. Henke, Srikhar Chilukuri, Laura M. Langan, Bryan W. Brooks. Reporting and reproducibility: Proteomics of fish models in environmental toxicology and ecotoxicology. Science of The Total Environment 2024, 912 , 168455. https://doi.org/10.1016/j.scitotenv.2023.168455
- Martin Golebiewski, Gerhard Mayer. Data Formats for Systems Biology, Systems Medicine and Computational Modeling. 2024https://doi.org/10.1016/B978-0-323-95502-7.00164-0
- Sajad Majeed Zargar, Asmat Farooq, Parvaze Ahmad Sofi, Jebi Sudan, Uneeb Urwat, Khursheed Hussain. Advances in proteomics techniques. 2024, 49-69. https://doi.org/10.1016/B978-0-443-21923-8.00003-0
- Silke Oeljeklaus, Lakshita Sharma, Julian Bender, Bettina Warscheid. Mass spectrometry-based proteomics to study mutants and interactomes of mitochondrial translocation proteins. 2024, 101-152. https://doi.org/10.1016/bs.mie.2024.07.059
- Ruqayya Afridi, Won-Ha Lee, Jong-Heon Kim, Kyoungho Suk. Utilizing databases for astrocyte secretome research. Expert Review of Proteomics 2023, 20
(12)
, 371-379. https://doi.org/10.1080/14789450.2023.2285311
- Wang Cheng, Zhang Yiwen, Li Liang, Yan Yihua, Wang Guanhua, Qiu Xin, Zeng Yangqinxue. Structural equation model of the spatial distribution of water engineering facilities along the Beijing-Hangzhou grand canal and its relationship with natural factors. Heritage Science 2023, 11
(1)
https://doi.org/10.1186/s40494-023-01088-y
- Rolf Teschke, Gaby Danan. Idiosyncratic DILI and RUCAM under One Hat: The Global View. Livers 2023, 3
(3)
, 397-433. https://doi.org/10.3390/livers3030030
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.
Recommended Articles
Abstract
Figure 1
Figure 1. Overview of the formats developed by the Molecular Interactions Working Group and their relationships to other components in the community. Logo courtesy of IMEx.
Figure 2
Figure 2. Overview of the formats of the Mass Spectrometry Working Group and the Proteome Informatics Working Group and their relationships to other components in the community. Logo courtesy of the Proteomics Standards Initiative and ProteomeXchange.
References
This article references 78 other publications.
- 1Hebert, A. S.; Richards, A. L.; Bailey, D. J.; Ulbrich, A.; Coughlin, E. E.; Westphall, M. S.; Coon, J. J. The One Hour Yeast Proteome. Mol. Cell Proteomics 2014, 13 (1), 339– 347, DOI: 10.1074/mcp.M113.0347691https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitlSksg%253D%253D&md5=8821587fa3433e1979060578c4555eddThe One Hour Yeast ProteomeHebert, Alexander S.; Richards, Alicia L.; Bailey, Derek J.; Ulbrich, Arne; Coughlin, Emma E.; Westphall, Michael S.; Coon, Joshua J.Molecular & Cellular Proteomics (2014), 13 (1), 339-347CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)We describe the comprehensive anal. of the yeast proteome in just over one hour of optimized anal. We achieve this expedited proteome characterization with improved sample prepn., chromatog. sepns., and by using a new Orbitrap hybrid mass spectrometer equipped with a mass filter, a collision cell, a high-field Orbitrap analyzer, and, finally, a dual cell linear ion trap analyzer (Q-OT-qIT, Orbitrap Fusion). This system offers high MS2 acquisition speed of 20 Hz and detects up to 19 peptide sequences within a single second of operation. Over a 1.3 h chromatog. method, the Q-OT-qIT hybrid collected an av. of 13,447 MS1 and 80,460 MS2 scans (per run) to produce 43,400 (‾x) peptide spectral matches and 34,255 (‾x) peptides with unique amino acid sequences (1% false discovery rate (FDR)). On av., each one hour anal. achieved detection of 3,977 proteins (1% FDR). We conclude that further improvements in mass spectrometer scan rate could render comprehensive anal. of the human proteome within a few hours.
- 2Huttlin, E. L.; Ting, L.; Bruckner, R. J.; Gebreab, F.; Gygi, M. P.; Szpyt, J.; Tam, S.; Zarraga, G.; Colby, G.; Baltier, K.; Dong, R.; Guarani, V.; Vaites, L. P.; Ordureau, A.; Rad, R.; Erickson, B. K.; Wühr, M.; Chick, J.; Zhai, B.; Kolippakkam, D.; Mintseris, J.; Obar, R. A.; Harris, T.; Artavanis-Tsakonas, S.; Sowa, M. E.; De Camilli, P.; Paulo, J. A.; Harper, J. W.; Gygi, S. P. The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell 2015, 162 (2), 425– 440, DOI: 10.1016/j.cell.2015.06.0432https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXht1KgtL3I&md5=79b8d96037646f6679baab3b966b9d47The BioPlex Network: A Systematic Exploration of the Human InteractomeHuttlin, Edward L.; Ting, Lily; Bruckner, Raphael J.; Gebreab, Fana; Gygi, Melanie P.; Szpyt, John; Tam, Stanley; Zarraga, Gabriela; Colby, Greg; Baltier, Kurt; Dong, Rui; Guarani, Virginia; Vaites, Laura Pontano; Ordureau, Alban; Rad, Ramin; Erickson, Brian K.; Wuhr, Martin; Chick, Joel; Zhai, Bo; Kolippakkam, Deepak; Mintseris, Julian; Obar, Robert A.; Harris, Tim; Artavanis-Tsakonas, Spyros; Sowa, Mathew E.; De Camilli, Pietro; Paulo, Joao A.; Harper, J. Wade; Gygi, Steven P.Cell (Cambridge, MA, United States) (2015), 162 (2), 425-440CODEN: CELLB5; ISSN:0092-8674. (Cell Press)Protein interactions form a network whose structure drives cellular function and whose organization informs biol. inquiry. Using high-throughput affinity-purifn. mass spectrometry, the authors identify interacting partners for 2594 human proteins in HEK293T cells. The resulting network (BioPlex) contains 23,744 interactions among 7668 proteins with 86% previously undocumented. BioPlex accurately depicts known complexes, attaining 80%-100% coverage for most CORUM complexes. The network readily subdivides into communities that correspond to complexes or clusters of functionally related proteins. More generally, network architecture reflects cellular localization, biol. process, and mol. function, enabling functional characterization of thousands of proteins. Network structure also reveals assocns. among thousands of protein domains, suggesting a basis for examg. structurally related proteins. Finally, BioPlex, in combination with other approaches, can be used to reveal interactions of biol. or clin. significance. For example, mutations in the membrane protein VAPB implicated in familial amyotrophic lateral sclerosis perturb a defined community of interactors.
- 3Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N.; Binz, P.-A.; Xenarios, I.; Eisenacher, M.; Mayer, G.; Gatto, L.; Campos, A.; Chalkley, R. J.; Kraus, H.-J.; Albar, J. P.; Martinez-Bartolomé, S.; Apweiler, R.; Omenn, G. S.; Martens, L.; Jones, A. R.; Hermjakob, H. ProteomeXchange Provides Globally Coordinated Proteomics Data Submission and Dissemination. Nat. Biotechnol. 2014, 32 (3), 223– 226, DOI: 10.1038/nbt.28393https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXjvFyntrc%253D&md5=f173db74e09f40f829268af9dcc2c8a4ProteomeXchange provides globally coordinated proteomics data submission and disseminationVizcaino, Juan A.; Deutsch, Eric W.; Wang, Rui; Csordas, Attila; Reisinger, Florian; Rios, Daniel; Dianes, Jose A.; Sun, Zhi; Farrah, Terry; Bandeira, Nuno; Binz, Pierre-Alain; Xenarios, Ioannis; Eisenacher, Martin; Mayer, Gerhard; Gatto, Laurent; Campos, Alex; Chalkley, Robert J.; Kraus, Hans-Joachim; Albar, Juan Pablo; Martinez-Bartolome, Salvador; Apweiler, Rolf; Omenn, Gilbert S.; Martens, Lennart; Jones, Andrew R.; Hermjakob, HenningNature Biotechnology (2014), 32 (3), 223-226CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)ProteomeXchange provides an infrastructure for efficient and reliable public dissemination of proteomics data, supporting crucial validation, anal. and re-use.
- 4Deutsch, E. W.; Csordas, A.; Sun, Z.; Jarnuczak, A.; Perez-Riverol, Y.; Ternent, T.; Campbell, D. S.; Bernal-Llinares, M.; Okuda, S.; Kawano, S.; Moritz, R. L.; Carver, J. J.; Wang, M.; Ishihama, Y.; Bandeira, N.; Hermjakob, H.; Vizcaíno, J. A. The ProteomeXchange Consortium in 2017: Supporting the Cultural Change in Proteomics Public Data Deposition. Nucleic Acids Res. 2017, 45 (D1), D1100– D1106, DOI: 10.1093/nar/gkw9364https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhslWhs7o%253D&md5=bc5fa349c3685fccc4626dcb11d86986The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data depositionDeutsch, Eric W.; Csordas, Attila; Sun, Zhi; Jarnuczak, Andrew; Perez-Riverol, Yasset; Ternent, Tobias; Campbell, David S.; Bernal-Llinares, Manuel; Okuda, Shujiro; Kawano, Shin; Moritz, Robert L.; Carver, Jeremy J.; Wang, Mingxun; Ishihama, Yasushi; Bandeira, Nuno; Hermjakob, Henning; Vizcaino, Juan AntonioNucleic Acids Research (2017), 45 (D1), D1100-D1106CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components. We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted std., supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approx. half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.
- 5Deutsch, E. W.; Bandeira, N.; Sharma, V.; Perez-Riverol, Y.; Carver, J. J.; Kundu, D. J.; García-Seisdedos, D.; Jarnuczak, A. F.; Hewapathirana, S.; Pullman, B. S.; Wertz, J.; Sun, Z.; Kawano, S.; Okuda, S.; Watanabe, Y.; Hermjakob, H.; MacLean, B.; MacCoss, M. J.; Zhu, Y.; Ishihama, Y.; Vizcaíno, J. A. The ProteomeXchange Consortium in 2020: Enabling “big Data” Approaches in Proteomics. Nucleic Acids Res. 2019, 48 (D1), D1145– D1152, DOI: 10.1093/nar/gkz984There is no corresponding record for this reference.
- 6Porras, P.; Barrera, E.; Bridge, A.; Del-Toro, N.; Cesareni, G.; Duesbury, M.; Hermjakob, H.; Iannuccelli, M.; Jurisica, I.; Kotlyar, M.; Licata, L.; Lovering, R. C.; Lynn, D. J.; Meldal, B.; Nanduri, B.; Paneerselvam, K.; Panni, S.; Pastrello, C.; Pellegrini, M.; Perfetto, L.; Rahimzadeh, N.; Ratan, P.; Ricard-Blum, S.; Salwinski, L.; Shirodkar, G.; Shrivastava, A.; Orchard, S. Towards a Unified Open Access Dataset of Molecular Interactions. Nat. Commun. 2020, 11 (1), 6144, DOI: 10.1038/s41467-020-19942-z6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXisFWks7zM&md5=f3db054ef193afaa904cbe539a692d93Towards a unified open access dataset of molecular interactionsPorras, Pablo; Barrera, Elisabet; Bridge, Alan; del-Toro, Noemi; Cesareni, Gianni; Duesbury, Margaret; Hermjakob, Henning; Iannuccelli, Marta; Jurisica, Igor; Kotlyar, Max; Licata, Luana; Lovering, Ruth C.; Lynn, David J.; Meldal, Birgit; Nanduri, Bindu; Paneerselvam, Kalpana; Panni, Simona; Pastrello, Chiara; Pellegrini, Matteo; Perfetto, Livia; Rahimzadeh, Negin; Ratan, Prashansa; Ricard-Blum, Sylvie; Salwinski, Lukasz; Shirodkar, Gautam; Shrivastava, Anjalia; Orchard, SandraNature Communications (2020), 11 (1), 6144CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)The International Mol. Exchange (IMEx) Consortium provides scientists with a single body of exptl. verified protein interactions curated in rich contextual detail to an internationally agreed std. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Addnl., we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.
- 7Wilkinson, M. D.; Dumontier, M.; Aalbersberg, I. J. J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da Silva Santos, L. B.; Bourne, P. E.; Bouwman, J.; Brookes, A. J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C. T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A. J. G.; Groth, P.; Goble, C.; Grethe, J. S.; Heringa, J.; ’t Hoen, P. A. C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S. J.; Martone, M. E.; Mons, A.; Packer, A. L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S.-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M. A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 160018, DOI: 10.1038/sdata.2016.187https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC28bjslyrtQ%253D%253D&md5=e4ce8cf366db2280e54eb0168940720bThe FAIR Guiding Principles for scientific data management and stewardshipWilkinson Mark D; Dumontier Michel; Aalbersberg I Jsbrand Jan; Appleton Gabrielle; Dumon Olivier; Groth Paul; Strawn George; Axton Myles; Baak Arie; Blomberg Niklas; Boiten Jan-Willem; da Silva Santos Luiz Bonino; Bourne Philip E; Bouwman Jildau; Brookes Anthony J; Clark Tim; Crosas Merce; Dillo Ingrid; Edmunds Scott; Evelo Chris T; Finkers Richard; Gonzalez-Beltran Alejandra; Rocca-Serra Philippe; Sansone Susanna-Assunta; Gray Alasdair J G; Goble Carole; Grethe Jeffrey S; Heringa Jaap; Kok Ruben; 't Hoen Peter A C; Hooft Rob; Kuhn Tobias; Kok Joost; Lusher Scott J; Mons Barend; Martone Maryann E; Mons Albert; Packer Abel L; Persson Bengt; Roos Marco; Thompson Mark; van Schaik Rene; Schultes Erik; Sengstag Thierry; Slater Ted; Swertz Morris A; van der Lei Johan; van Mulligen Erik; Mons Barend; Velterop Jan; Waagmeester Andra; Wittenburg Peter; Wolstencroft Katherine; Zhao Jun; Mons BarendScientific data (2016), 3 (), 160018 ISSN:.There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
- 8Wood-Charlson, E. M.; Crockett, Z.; Erdmann, C.; Arkin, A. P.; Robinson, C. B. Ten Simple Rules for Getting and Giving Credit for Data. PLoS Comput. Biol. 2022, 18 (9), e1010476 DOI: 10.1371/journal.pcbi.10104768https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xis1SktLzJ&md5=2564f0e0ffb3245a83e99e07b18d3186Ten simple rules for getting and giving credit for dataWood-Charlson, Elisha M.; Crockett, Zachary; Erdmann, Chris; Arkin, Adam P.; Robinson, Carly B.PLoS Computational Biology (2022), 18 (9), e1010476CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)There is no expanded citation for this reference.
- 9Hanash, S.; Celis, J. E. The Human Proteome Organization: A Mission to Advance Proteome Knowledge. Mol. Cell Proteomics 2002, 1 (6), 413– 414, DOI: 10.1074/mcp.R200002-MCP2009https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XmsV2jt7s%253D&md5=1b5c07982bee9074e38412d0f85a22a9The Human Proteome Organization. A mission to advance proteome knowledgeHanash, Sam; Celis, Julio E.Molecular and Cellular Proteomics (2002), 1 (6), 413-414CODEN: MCPOBS; ISSN:1535-9476. (American Society for Biochemistry and Molecular Biology, Inc.)There is no expanded citation for this reference.
- 10Orchard, S.; Hermjakob, H.; Apweiler, R. The Proteomics Standards Initiative. Proteomics 2003, 3 (7), 1374– 1376, DOI: 10.1002/pmic.20030049610https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXmtF2rt70%253D&md5=e7a6e612d813e747dc8720d7093400b5The proteomics standards initiativeOrchard, Sandra; Hermjakob, Henning; Apweiler, RolfProteomics (2003), 3 (7), 1374-1376CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)A review. The Proteomics Stds. Initiative (PSI) aims to define community stds. for data representation in proteomics and to facilitate data comparison, exchange and verification. Progress has been made in the development of common stds. for data exchange in the fields of both mass spectrometry and protein-protein interaction. A proteomics-specific extension is being created for the emerging American Society for Tests and Measurements mass spectrometry std., which will be supported by manufacturers of both hardware and software. A data model for proteomics experimentation is under development and discussions on a public repository for published proteomics data are underway. The Protein-Protein Interactions group expects to publish the Level 1 PSI data exchange format for protein-protein interactions soon and discussions as to the content of Level 2 have been initiated.
- 11Deutsch, E. W.; Orchard, S.; Binz, P.-A.; Bittremieux, W.; Eisenacher, M.; Hermjakob, H.; Kawano, S.; Lam, H.; Mayer, G.; Menschaert, G.; Perez-Riverol, Y.; Salek, R. M.; Tabb, D. L.; Tenzer, S.; Vizcaíno, J. A.; Walzer, M.; Jones, A. R. Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J. Proteome Res. 2017, 16 (12), 4288– 4298, DOI: 10.1021/acs.jproteome.7b0037011https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtl2gsrnM&md5=b1fb5a59898dfe02fa872db2d66238f5Proteomics Standards Initiative: Fifteen Years of Progress and Future WorkDeutsch, Eric W.; Orchard, Sandra; Binz, Pierre-Alain; Bittremieux, Wout; Eisenacher, Martin; Hermjakob, Henning; Kawano, Shin; Lam, Henry; Mayer, Gerhard; Menschaert, Gerben; Perez-Riverol, Yasset; Salek, Reza M.; Tabb, David L.; Tenzer, Stefan; Vizcaino, Juan Antonio; Walzer, Mathias; Jones, Andrew R.Journal of Proteome Research (2017), 16 (12), 4288-4298CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)A review. The Proteomics Stds. Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community stds. and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community stds. via special workshops and ongoing work. Among the existing, ratified stds., the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Min. Information About a Proteomics Expt.) guidelines with the advance of new technologies and techniques. Further, new stds. are currently either in the final stages of completion (proBed and proBAM for proteogenomics results, as well as PEFF) or in early stages of design (a spectral library std. format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PROXI) web services Application Programming Interface). The authors review the current status of all these aspects of the PSI, describe synergies with other efforts such as the ProteomeXchange Consortium, the Human Proteome Project, and the metabolomics community, and provide a look at future directions of the PSI.
- 12del-Toro, N.; Dumousseau, M.; Orchard, S.; Jimenez, R. C.; Galeota, E.; Launay, G.; Goll, J.; Breuer, K.; Ono, K.; Salwinski, L.; Hermjakob, H. A New Reference Implementation of the PSICQUIC Web Service. Nucleic Acids Res. 2013, 41 (Web Server issue), W601– W606, DOI: 10.1093/nar/gkt392There is no corresponding record for this reference.
- 13Vizcaíno, J. A.; Martens, L.; Hermjakob, H.; Julian, R. K.; Paton, N. W. The PSI Formal Document Process and Its Implementation on the PSI Website. Proteomics 2007, 7 (14), 2355– 2357, DOI: 10.1002/pmic.20070006413https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXos1GhtLs%253D&md5=ae51e625281a0f4ff396728d26f44433The PSI formal document process and its implementation on the PSI websiteVizcaino, Juan Antonio; Martens, Lennart; Hermjakob, Henning; Julian, Randall K.; Paton, Norman W.Proteomics (2007), 7 (14), 2355-2357CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The Human Proteome Organization's Proteomics Stds. Initiative (HUPO-PSI) has recently developed formal document processes for reviewing MIAPE documents, specifications, community practice and informational documents. These document work flows rely on community participation as well as more traditional expert review. We here present the web interface used to support these document processes, and explain briefly how interested parties can participate in the review process.
- 14Mayer, G.; Jones, A. R.; Binz, P.-A.; Deutsch, E. W.; Orchard, S.; Montecchi-Palazzi, L.; Vizcaíno, J. A.; Hermjakob, H.; Oveillero, D.; Julian, R.; Stephan, C.; Meyer, H. E.; Eisenacher, M. Controlled Vocabularies and Ontologies in Proteomics: Overview, Principles and Practice. Biochim. Biophys. Acta 2014, 1844 (1 Pt A), 98– 107, DOI: 10.1016/j.bbapap.2013.02.01714https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmslensL8%253D&md5=188884b8b78cda6ceeca90915bc08065Controlled vocabularies and ontologies in proteomics: Overview, principles and practiceMayer, Gerhard; Jones, Andrew R.; Binz, Pierre-Alain; Deutsch, Eric W.; Orchard, Sandra; Montecchi-Palazzi, Luisa; Vizcaino, Juan Antonio; Hermjakob, Henning; Oveillero, David; Julian, Randall; Stephan, Christian; Meyer, Helmut E.; Eisenacher, MartinBiochimica et Biophysica Acta, Proteins and Proteomics (2014), 1844 (1PA), 98-107CODEN: BBAPBW; ISSN:1570-9639. (Elsevier B. V.)A review. This paper focuses on the use of controlled vocabularies (CVs) and ontologies esp. in the area of proteomics, primarily related to the work of the Proteomics Stds. Initiative (PSI). It describes the relevant proteomics std. formats and the ontologies used within them. Software and tools for working with these ontol. files are also discussed. The article also examines the "mapping files" used to ensure correct controlled vocabulary terms that are placed within PSI stds. and the fulfillment of the MIAPE (Min. Information about a Proteomics Expt.) requirements.
- 15Hermjakob, H.; Montecchi-Palazzi, L.; Bader, G.; Wojcik, J.; Salwinski, L.; Ceol, A.; Moore, S.; Orchard, S.; Sarkans, U.; von Mering, C.; Roechert, B.; Poux, S.; Jung, E.; Mersch, H.; Kersey, P.; Lappe, M.; Li, Y.; Zeng, R.; Rana, D.; Nikolski, M.; Husi, H.; Brun, C.; Shanker, K.; Grant, S. G. N.; Sander, C.; Bork, P.; Zhu, W.; Pandey, A.; Brazma, A.; Jacq, B.; Vidal, M.; Sherman, D.; Legrain, P.; Cesareni, G.; Xenarios, I.; Eisenberg, D.; Steipe, B.; Hogue, C.; Apweiler, R. The HUPO PSI’s Molecular Interaction Format-a Community Standard for the Representation of Protein Interaction Data. Nat. Biotechnol. 2004, 22 (2), 177– 183, DOI: 10.1038/nbt92615https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXnvFeisA%253D%253D&md5=9957d59263f92eb212d93ba90bb7827bThe HUPO PSI's Molecular Interaction format-a community standard for the representation of protein interaction dataHermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jerome; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K.; Grant, Seth G. N.; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, RolfNature Biotechnology (2004), 22 (2), 177-183CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)A major goal of proteomics is the complete description of the protein interaction network underlying cell physiol. A large no. of small scale and, more recently, large-scale expts. have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across expts. is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community std. data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Stds. Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomol. Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Ref. Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Mol. Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
- 16Mayer, G.; Montecchi-Palazzi, L.; Ovelleiro, D.; Jones, A. R.; Binz, P.-A.; Deutsch, E. W.; Chambers, M.; Kallhardt, M.; Levander, F.; Shofstahl, J.; Orchard, S.; Vizcaíno, J. A.; Hermjakob, H.; Stephan, C.; Meyer, H. E.; Eisenacher, M. HUPO-PSI Group. The HUPO Proteomics Standards Initiative- Mass Spectrometry Controlled Vocabulary. Database (Oxford) 2013, 2013, bat009, DOI: 10.1093/database/bat009There is no corresponding record for this reference.
- 17Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W. H.; Römpp, A.; Neumann, S.; Pizarro, A. D.; Montecchi-Palazzi, L.; Tasman, N.; Coleman, M.; Reisinger, F.; Souda, P.; Hermjakob, H.; Binz, P.-A.; Deutsch, E. W. MzML-a Community Standard for Mass Spectrometry Data. Mol. Cell Proteomics 2011, 10 (1), R110.000133, DOI: 10.1074/mcp.R110.000133There is no corresponding record for this reference.
- 18Côté, R. G.; Jones, P.; Apweiler, R.; Hermjakob, H. The Ontology Lookup Service, a Lightweight Cross-Platform Tool for Controlled Vocabulary Queries. BMC Bioinformatics 2006, 7, 97, DOI: 10.1186/1471-2105-7-9718https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD287nslGqtw%253D%253D&md5=1cbf39736a2e527498098988be7ddb17The Ontology Lookup Service, a lightweight cross-platform tool for controlled vocabulary queriesCote Richard G; Jones Philip; Apweiler Rolf; Hermjakob HenningBMC bioinformatics (2006), 7 (), 97 ISSN:.BACKGROUND: With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries. RESULTS: The Ontology Lookup Service (OLS) was created to integrate publicly available biomedical ontologies into a single database. All modified ontologies are updated daily. A list of currently loaded ontologies is available online. The database can be queried to obtain information on a single term or to browse a complete ontology using AJAX. Auto-completion provides a user-friendly search mechanism. An AJAX-based ontology viewer is available to browse a complete ontology or subsets of it. A programmatic interface is available to query the webservice using SOAP. The service is described by a WSDL descriptor file available online. A sample Java client to connect to the webservice using SOAP is available for download from SourceForge. All OLS source code is publicly available under the open source Apache Licence. CONCLUSION: The OLS provides a user-friendly single entry point for publicly available ontologies in the Open Biomedical Ontology (OBO) format. It can be accessed interactively or programmatically at http://www.ebi.ac.uk/ontology-lookup/.
- 19Perez-Riverol, Y.; Ternent, T.; Koch, M.; Barsnes, H.; Vrousgou, O.; Jupp, S.; Vizcaíno, J. A. OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets. Proteomics 2017, 17 (19), 1700244, DOI: 10.1002/pmic.201700244There is no corresponding record for this reference.
- 20Whetzel, P. L.; Noy, N. F.; Shah, N. H.; Alexander, P. R.; Nyulas, C.; Tudorache, T.; Musen, M. A. BioPortal: Enhanced Functionality via New Web Services from the National Center for Biomedical Ontology to Access and Use Ontologies in Software Applications. Nucleic Acids Res. 2011, 39 (Web Server issue), W541– 545, DOI: 10.1093/nar/gkr46920https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOmsLw%253D&md5=a2f753b47f8210b77bc121583f223f62BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applicationsWhetzel, Patricia L.; Noy, Natalya F.; Shah, Nigam H.; Alexander, Paul R.; Nyulas, Csongor; Tudorache, Tania; Musen, Mark A.Nucleic Acids Research (2011), 39 (Web Server), W541-W545CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The National Center for Biomedical Ontol. (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontol.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontol. content by providing features to add mappings between terms, to add comments linked to specific ontol. terms and to provide ontol. reviews. The NCBO Web services (http://www.bioontol.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontol. Language (OWL) and Open Biol. and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontol. content, from getting all terms in an ontol. to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.
- 21Montecchi-Palazzi, L.; Beavis, R.; Binz, P.-A.; Chalkley, R. J.; Cottrell, J.; Creasy, D.; Shofstahl, J.; Seymour, S. L.; Garavelli, J. S. The PSI-MOD Community Standard for Representation of Protein Modification Data. Nat. Biotechnol. 2008, 26 (8), 864– 866, DOI: 10.1038/nbt0808-86421https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXps1Wmtr8%253D&md5=7e24cce0f34148ae3ad6a8f424fd2af9The PSI-MOD community standard for representation of protein modification dataMontecchi-Palazzi, Luisa; Beavis, Ron; Binz, Pierre-Alain; Chalkley, Robert J.; Cottrell, John; Creasy, David; Shofstahl, Jim; Seymour, Sean L.; Garavelli, John S.Nature Biotechnology (2008), 26 (8), 864-866CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 22Garavelli, J. S. The RESID Database of Protein Modifications as a Resource and Annotation Tool. Proteomics 2004, 4 (6), 1527– 1533, DOI: 10.1002/pmic.20030077722https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXkvFGit7s%253D&md5=37574d54b9c617573637ce1b9fb20f53The RESID database of protein modifications as a resource and annotation toolGaravelli, John S.Proteomics (2004), 4 (6), 1527-1533CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The RESID Database of Protein Modifications is a comprehensive collection of annotations and structures for protein modifications and cross-links including pre-, co-, and post-translational modifications. The database provides: systematic and alternate names, at. formulas and masses, enzymic activities that generate the modifications, keywords, literature citations, Gene Ontol. (GO) cross-refs., protein sequence database feature table annotations, structure diagrams, and mol. models. This database is freely accessible on the Internet through resources provided by the European Bioinformatics Institute (http://www.ebi.ac.uk/RESID), and by the National Cancer Institute - Frederick Advanced Biomedical Computing Center (http://www.ncifcrf.gov/RESID). Each RESID Database entry presents a chem. unique modification and shows how that modification is currently annotated in the protein sequence databases, Swiss-Prot and the Protein Information Resource (PIR). The RESID Database provides a table of corresponding equiv. feature annotations that is used in the UniProt project, an international effort to combine the resources of the Swiss-Prot, TrEMBL and PIR. As an annotation tool, the RESID Database is used in standardizing and enhancing modification descriptions in the feature tables of Swiss-Prot entries. As an Internet resource, the RESID Database assists researchers in high-throughput proteomics to search monoisotopic masses and mass differences and identify known and predicted protein modifications.
- 23Creasy, D. M.; Cottrell, J. S. Unimod: Protein Modifications for Mass Spectrometry. PROTEOMICS 2004, 4 (6), 1534– 1536, DOI: 10.1002/pmic.20030074423https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXkvFGit7g%253D&md5=d6644c36aed9728f520d688c41e7786eUnimod: Protein modifications for mass spectrometryCreasy, David M.; Cottrell, John S.Proteomics (2004), 4 (6), 1534-1536CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Unimod is a database of protein modifications for use in mass spectrometry applications, esp. protein identification and de novo sequencing. It contains accurate and verifiable values, derived from elemental compns., for the mass differences introduced by both natural and artificial modifications.
- 24Mayer, G. XLMOD: Cross-Linking and Chromatography Derivatization Reagents Ontology. arXiv 2020. DOI: 10.48550/ARXIV.2003.00329 .There is no corresponding record for this reference.
- 25Orchard, S.; Salwinski, L.; Kerrien, S.; Montecchi-Palazzi, L.; Oesterheld, M.; Stümpflen, V.; Ceol, A.; Chatr-aryamontri, A.; Armstrong, J.; Woollard, P.; Salama, J. J.; Moore, S.; Wojcik, J.; Bader, G. D.; Vidal, M.; Cusick, M. E.; Gerstein, M.; Gavin, A.-C.; Superti-Furga, G.; Greenblatt, J.; Bader, J.; Uetz, P.; Tyers, M.; Legrain, P.; Fields, S.; Mulder, N.; Gilson, M.; Niepmann, M.; Burgoon, L.; De Las Rivas, J.; Prieto, C.; Perreau, V. M.; Hogue, C.; Mewes, H.-W.; Apweiler, R.; Xenarios, I.; Eisenberg, D.; Cesareni, G.; Hermjakob, H. The Minimum Information Required for Reporting a Molecular Interaction Experiment (MIMIx). Nat. Biotechnol. 2007, 25 (8), 894– 898, DOI: 10.1038/nbt132425https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXos12ntLY%253D&md5=2e976e0d5b01e8c57d9217b001c6f5a8The minimum information required for reporting a molecular interaction experiment (MIMIx)Orchard, Sandra; Salwinski, Lukasz; Kerrien, Samuel; Montecchi-Palazzi, Luisa; Oesterheld, Matthias; Stuempflen, Volker; Ceol, Arnaud; Chatr-aryamontri, Andrew; Armstrong, John; Woollard, Peter; Salama, John J.; Moore, Susan; Wojcik, Jerome; Bader, Gary D.; Vidal, Marc; Cusick, Michael E.; Gerstein, Mark; Gavin, Anne-Claude; Superti-Furga, Giulio; Greenblatt, Jack; Bader, Joel; Uetz, Peter; Tyers, Mike; Legrain, Pierre; Fields, Stan; Mulder, Nicola; Gilson, Michael; Niepmann, Michael; Burgoon, Lyle; De Las Rivas, Javier; Prieto, Carlos; Perreau, Victoria M.; Hogue, Chris; Mewes, Hans-Werner; Apweiler, Rolf; Xenarios, Ioannis; Eisenberg, David; Cesareni, Gianni; Hermjakob, HenningNature Biotechnology (2007), 25 (8), 894-898CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)A wealth of mol. interaction data is available in the literature, ranging from large-scale datasets to a single interaction confirmed by several different techniques. These data are all too often reported either as free text or in tables of variable format, and are often missing key pieces of information essential for a full understanding of the expt. Here we propose MIMIx, the min. information required for reporting a mol. interaction expt. Adherence to these reporting guidelines will result in publications of increased clarity and usefulness to the scientific community and will support the rapid, systematic capture of mol. interaction data in public databases, thereby improving access to valuable interaction data.
- 26Taylor, C. F.; Paton, N. W.; Lilley, K. S.; Binz, P.-A.; Julian, R. K.; Jones, A. R.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E. W.; Dunn, M. J.; Heck, A. J. R.; Leitner, A.; Macht, M.; Mann, M.; Martens, L.; Neubert, T. A.; Patterson, S. D.; Ping, P.; Seymour, S. L.; Souda, P.; Tsugita, A.; Vandekerckhove, J.; Vondriska, T. M.; Whitelegge, J. P.; Wilkins, M. R.; Xenarios, I.; Yates, J. R.; Hermjakob, H. The Minimum Information about a Proteomics Experiment (MIAPE). Nat. Biotechnol. 2007, 25 (8), 887– 893, DOI: 10.1038/nbt132926https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXos12ntb8%253D&md5=faede75793f0c7cb95452d64490d3efcThe minimum information about a proteomics experiment (MIAPE)Taylor, Chris F.; Paton, Norman W.; Lilley, Kathryn S.; Binz, Pierre-Alain; Julian, Randall K., Jr.; Jones, Andrew R.; Zhu, Weimin; Apweiler, Rolf; Aebersold, Ruedi; Deutsch, Eric W.; Dunn, Michael J.; Heck, Albert J. R.; Leitner, Alexander; Macht, Marcus; Mann, Matthias; Martens, Lennart; Neubert, Thomas A.; Patterson, Scott D.; Ping, Peipei; Seymour, Sean L.; Souda, Puneet; Tsugita, Akira; Vandekerckhove, Joel; Vondriska, Thomas M.; Whitelegge, Julian P.; Wilkins, Marc R.; Xenarios, Ioannnis; Yates, John R., III; Hermjakob, HenningNature Biotechnology (2007), 25 (8), 887-893CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)Both the generation and the anal. of proteomics data are now widespread, and high-throughput approaches are commonplace. Protocols continue to increase in complexity as methods and technologies evolve and diversify. To encourage the standardized collection, integration, storage and dissemination of proteomics data, the Human Proteome Organization's Proteomics Stds. Initiative develops guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry. This paper describes the processes and principles underpinning the development of these modules; discusses the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector; addresses the issue of overlap with other reporting guidelines; and highlights the criticality of appropriate tools and resources in enabling 'MIAPE-compliant' reporting.
- 27Brazma, A.; Hingamp, P.; Quackenbush, J.; Sherlock, G.; Spellman, P.; Stoeckert, C.; Aach, J.; Ansorge, W.; Ball, C. A.; Causton, H. C.; Gaasterland, T.; Glenisson, P.; Holstege, F. C.; Kim, I. F.; Markowitz, V.; Matese, J. C.; Parkinson, H.; Robinson, A.; Sarkans, U.; Schulze-Kremer, S.; Stewart, J.; Taylor, R.; Vilo, J.; Vingron, M. Minimum Information about a Microarray Experiment (MIAME)-toward Standards for Microarray Data. Nat. Genet. 2001, 29 (4), 365– 371, DOI: 10.1038/ng1201-36527https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXovFamurw%253D&md5=0543844eb34561de34c8b03d271f0998Minimum information about a microarray experiment (MIAME)-toward standards for microarray dataBrazma, Alvis; Hingamp, Pascal; Quackenbush, John; Sherlock, Gavin; Spellman, Paul; Stoeckert, Chris; Aach, John; Ansorge, Wilhelm; Ball, Catherine A.; Causton, Helen C.; Gaasterland, Terry; Glenisson, Patrick; Holstege, Frank C. P.; Kim, Irene F.; Markowitz, Victor; Matese, John C.; Parkinson, Helen; Robinson, Alan; Sarkans, Ugis; Schulze-Kremer, Steffen; Stewart, Jason; Taylor, Ronald; Vilo, Jaak; Vingron, MartinNature Genetics (2001), 29 (4), 365-371CODEN: NGENEC; ISSN:1061-4036. (Nature America Inc.)Microarray anal. has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of stds. for presenting and exchanging such data. Here we present a proposal, the Min. Information About a Microarray Expt. (MIAME), that describes the min. information required to ensure that microarray data can be easily interpreted and that results derived from its anal. can be independently verified. The ultimate goal of this work is to establish a std. for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data anal. tools. With respect to MIAME, we conc. on defining the content and structure of the necessary information rather than the tech. format for capturing it.
- 28Jones, A. R.; Carroll, K.; Knight, D.; Maclellan, K.; Domann, P. J.; Legido-Quigley, C.; Huang, L.; Smallshaw, L.; Mirzaei, H.; Shofstahl, J.; Paton, N. W. Minimum Information About a Proteomics Experiment (MIAPE). Guidelines for Reporting the Use of Column Chromatography in Proteomics. Nat. Biotechnol. 2010, 28 (7), 654, DOI: 10.1038/nbt0710-654a28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXos1ahsLY%253D&md5=3c6adb12ba976a6e4608b7d948cedc69Guidelines for reporting the use of column chromatography in proteomicsJones, Andrew R.; Carroll, Kathleen; Knight, David; MacLellan, Kirsty; Domann, Paula J.; Legido-Quigley, Cristina; Huang, Lihua; Smallshaw, Lance; Mirzaei, Hamid; Shofstahl, James; Paton, Norman W.Nature Biotechnology (2010), 28 (7), 654CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 29Taylor, C. F.; Binz, P.-A.; Aebersold, R.; Affolter, M.; Barkovich, R.; Deutsch, E. W.; Horn, D. M.; Hühmer, A.; Kussmann, M.; Lilley, K.; Macht, M.; Mann, M.; Müller, D.; Neubert, T. A.; Nickson, J.; Patterson, S. D.; Raso, R.; Resing, K.; Seymour, S. L.; Tsugita, A.; Xenarios, I.; Zeng, R.; Julian, R. K. Guidelines for Reporting the Use of Mass Spectrometry in Proteomics. Nat. Biotechnol. 2008, 26 (8), 860– 861, DOI: 10.1038/nbt0808-86029https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXps1Wls70%253D&md5=d1724d10357d9d50c6bc02dcce40e797Guidelines for reporting the use of mass spectrometry in proteomicsTaylor, Chris F.; Binz, Pierre-Alain; Aebersold, Ruedi; Affolter, Michel; Barkovich, Robert; Deutsch, Eric W.; Horn, David M.; Huehmer, Andreas; Kussmann, Martin; Lilley, Kathryn; Macht, Marcus; Mann, Matthias; Mueller, Dieter; Neubert, Thomas A.; Nickson, Janice; Patterson, Scott D.; Raso, Roberto; Resing, Kathryn; Seymour, Sean L.; Tsugita, Akira; Xenarios, Ioannis; Zeng, Rong; Julian, Randall K., Jr.Nature Biotechnology (2008), 26 (8), 860-861CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 30Binz, P.-A.; Barkovich, R.; Beavis, R. C.; Creasy, D.; Horn, D. M.; Julian, R. K.; Seymour, S. L.; Taylor, C. F.; Vandenbrouck, Y. Guidelines for Reporting the Use of Mass Spectrometry Informatics in Proteomics. Nat. Biotechnol. 2008, 26 (8), 862, DOI: 10.1038/nbt0808-86230https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXps1Wmtbw%253D&md5=1f766e4e26eb42fdb2c931acdb517c3aGuidelines for reporting the use of mass spectrometry informatics in proteomicsBinz, Pierre-Alain; Barkovich, Robert; Beavis, Ronald C.; Creasy, David; Horn, David M.; Julian, Randall K., Jr.; Seymour, Sean L.; Taylor, Chris F.; Vandenbrouck, YvesNature Biotechnology (2008), 26 (8), 862CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)There is no expanded citation for this reference.
- 31Martínez-Bartolomé, S.; Deutsch, E. W.; Binz, P.-A.; Jones, A. R.; Eisenacher, M.; Mayer, G.; Campos, A.; Canals, F.; Bech-Serra, J.-J.; Carrascal, M.; Gay, M.; Paradela, A.; Navajas, R.; Marcilla, M.; Hernáez, M. L.; Gutiérrez-Blázquez, M. D.; Velarde, L. F. C.; Aloria, K.; Beaskoetxea, J.; Medina-Aunon, J. A.; Albar, J. P. Guidelines for Reporting Quantitative Mass Spectrometry Based Experiments in Proteomics. J. Proteomics 2013, 95, 84– 88, DOI: 10.1016/j.jprot.2013.02.02631https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXlvVCqtLc%253D&md5=c0ac95f3ea5d455114ab3f74abb4f449Guidelines for reporting quantitative mass spectrometry based experiments in proteomicsMartinez-Bartolome, Salvador; Deutsch, Eric W.; Binz, Pierre-Alain; Jones, Andrew R.; Eisenacher, Martin; Mayer, Gerhard; Campos, Alex; Canals, Francesc; Bech-Serra, Joan-Josep; Carrascal, Montserrat; Gay, Marina; Paradela, Alberto; Navajas, Rosana; Marcilla, Miguel; Hernaez, Maria Luisa; Gutierrez-Blazquez, Maria Dolores; Velarde, Luis Felipe Clemente; Aloria, Kerman; Beaskoetxea, Jabier; Medina-Aunon, J. Alberto; Albar, Juan P.Journal of Proteomics (2013), 95 (), 84-88CODEN: JPORFQ; ISSN:1874-3919. (Elsevier B.V.)A review. Mass spectrometry is already a well-established protein identification tool and recent methodol. and technol. developments have also made possible the extn. of quant. data of protein abundance in large-scale studies. Several strategies for abs. and relative quant. proteomics and the statistical assessment of quantifications are possible, each having specific measurements and therefore, different data anal. workflows. The guidelines for Mass Spectrometry Quantification allow the description of a wide range of quant. approaches, including labeled and label-free techniques and also targeted approaches such as Selected Reaction Monitoring (SRM). The HUPO Proteomics Stds. Initiative (HUPO-PSI) has invested considerable efforts to improve the standardization of proteomics data handling, representation and sharing through the development of data stds., reporting guidelines, controlled vocabularies and tooling. In this manuscript, we describe a key output from the HUPO-PSI-namely the MIAPE Quant guidelines, which have developed in parallel with the corresponding data exchange format mzQuantML [1]. The MIAPE Quant guidelines describe the HUPO-PSI proposal concerning the min. information to be reported when a quant. data set, derived from mass spectrometry (MS), is submitted to a database or as supplementary information to a journal. The guidelines have been developed with input from a broad spectrum of stakeholders in the proteomics field to represent a true consensus view of the most important data types and metadata, required for a quant. expt. to be analyzed critically or a data anal. pipeline to be reproduced. It is anticipated that they will influence or be directly adopted as part of journal guidelines for publication and by public proteomics databases and thus may have an impact on proteomics labs. across the world. This article is part of a Special Issue entitled: Standardization and Quality Control.
- 32Medina-Aunon, J. A.; Martínez-Bartolomé, S.; López-García, M. A.; Salazar, E.; Navajas, R.; Jones, A. R.; Paradela, A.; Albar, J. P. The ProteoRed MIAPE Web Toolkit: A User-Friendly Framework to Connect and Share Proteomics Standards. Mol. Cell Proteomics 2011, 10 (10), M111.008334, DOI: 10.1074/mcp.M111.008334There is no corresponding record for this reference.
- 33Deutsch, E. W.; Overall, C. M.; Van Eyk, J. E.; Baker, M. S.; Paik, Y.-K.; Weintraub, S. T.; Lane, L.; Martens, L.; Vandenbrouck, Y.; Kusebauch, U.; Hancock, W. S.; Hermjakob, H.; Aebersold, R.; Moritz, R. L.; Omenn, G. S. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. J. Proteome Res. 2016, 15 (11), 3961– 3970, DOI: 10.1021/acs.jproteome.6b0039233https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1ynt73P&md5=fbe3fa339c4866915db29018f407ced1Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1Deutsch, Eric W.; Overall, Christopher M.; Van Eyk, Jennifer E.; Baker, Mark S.; Paik, Young-Ki; Weintraub, Susan T.; Lane, Lydie; Martens, Lennart; Vandenbrouck, Yves; Kusebauch, Ulrike; Hancock, William S.; Hermjakob, Henning; Aebersold, Ruedi; Moritz, Robert L.; Omenn, Gilbert S.Journal of Proteome Research (2016), 15 (11), 3961-3970CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)Every data-rich community research effort requires a clear plan for ensuring the quality of the data interpretation and comparability of analyses. To address this need within the Human Proteome Project (HPP) of the Human Proteome Organization (HUPO), we have developed through broad consultation a set of mass spectrometry data interpretation guidelines that should be applied to all HPP data contributions. For submission of manuscripts reporting HPP protein identification results, the guidelines are presented as a one-page checklist contg. fifteen essential points followed by two pages of expanded description of each. Here, we present an overview of the guidelines and provide an in-depth description of each of the fifteen elements to facilitate understanding of the intentions and rationale behind the guidelines, both for authors and for reviewers. Broadly, these guidelines provide specific directions regarding how HPP data are to be submitted to mass spectrometry data repositories, how error anal. should be presented, and how detection of novel proteins should be supported with addnl. confirmatory evidence. These guidelines, developed by the HPP community, are presented to the broader scientific community for further discussion.
- 34Deutsch, E. W.; Lane, L.; Overall, C. M.; Bandeira, N.; Baker, M. S.; Pineau, C.; Moritz, R. L.; Corrales, F.; Orchard, S.; Van Eyk, J. E.; Paik, Y.-K.; Weintraub, S. T.; Vandenbrouck, Y.; Omenn, G. S. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0. J. Proteome Res. 2019, 18 (12), 4108– 4116, DOI: 10.1021/acs.jproteome.9b0054234https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3MnmsFyntQ%253D%253D&md5=fd594de6ff3b72e43086a537c0d65088Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0Deutsch Eric W; Moritz Robert L; Omenn Gilbert S; Lane Lydie; Overall Christopher M; Bandeira Nuno; Baker Mark S; Pineau Charles; Corrales Fernando; Orchard Sandra; Van Eyk Jennifer E; Paik Young-Ki; Weintraub Susan T; Vandenbrouck Yves; Omenn Gilbert SJournal of proteome research (2019), 18 (12), 4108-4116 ISSN:.The Human Proteome Organization's (HUPO) Human Proteome Project (HPP) developed Mass Spectrometry (MS) Data Interpretation Guidelines that have been applied since 2016. These guidelines have helped ensure that the emerging draft of the complete human proteome is highly accurate and with low numbers of false-positive protein identifications. Here, we describe an update to these guidelines based on consensus-reaching discussions with the wider HPP community over the past year. The revised 3.0 guidelines address several major and minor identified gaps. We have added guidelines for emerging data independent acquisition (DIA) MS workflows and for use of the new Universal Spectrum Identifier (USI) system being developed by the HUPO Proteomics Standards Initiative (PSI). In addition, we discuss updates to the standard HPP pipeline for collecting MS evidence for all proteins in the HPP, including refinements to minimum evidence. We present a new plan for incorporating MassIVE-KB into the HPP pipeline for the next (HPP 2020) cycle in order to obtain more comprehensive coverage of public MS data sets. The main checklist has been reorganized under headings and subitems, and related guidelines have been grouped. In sum, Version 2.1 of the HPP MS Data Interpretation Guidelines has served well, and this timely update to version 3.0 will aid the HPP as it approaches its goal of collecting and curating MS evidence of translation and expression for all predicted ∼20 000 human proteins encoded by the human genome.
- 35Kerrien, S.; Orchard, S.; Montecchi-Palazzi, L.; Aranda, B.; Quinn, A. F.; Vinod, N.; Bader, G. D.; Xenarios, I.; Wojcik, J.; Sherman, D.; Tyers, M.; Salama, J. J.; Moore, S.; Ceol, A.; Chatr-Aryamontri, A.; Oesterheld, M.; Stümpflen, V.; Salwinski, L.; Nerothin, J.; Cerami, E.; Cusick, M. E.; Vidal, M.; Gilson, M.; Armstrong, J.; Woollard, P.; Hogue, C.; Eisenberg, D.; Cesareni, G.; Apweiler, R.; Hermjakob, H. Broadening the Horizon-Level 2.5 of the HUPO-PSI Format for Molecular Interactions. BMC Biol. 2007, 5, 44, DOI: 10.1186/1741-7007-5-4435https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1c%252Fht1Wmsg%253D%253D&md5=b9e21e38831627d03c9f0a7404229da6Broadening the horizon--level 2.5 of the HUPO-PSI format for molecular interactionsKerrien Samuel; Orchard Sandra; Montecchi-Palazzi Luisa; Aranda Bruno; Quinn Antony F; Vinod Nisha; Bader Gary D; Xenarios Ioannis; Wojcik Jerome; Sherman David; Tyers Mike; Salama John J; Moore Susan; Ceol Arnaud; Chatr-Aryamontri Andrew; Oesterheld Matthias; Stumpflen Volker; Salwinski Lukasz; Nerothin Jason; Cerami Ethan; Cusick Michael E; Vidal Marc; Gilson Michael; Armstrong John; Woollard Peter; Hogue Christopher; Eisenberg David; Cesareni Gianni; Apweiler Rolf; Hermjakob HenningBMC biology (2007), 5 (), 44 ISSN:.BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
- 36Sivade Dumousseau, M.; Alonso-López, D.; Ammari, M.; Bradley, G.; Campbell, N. H.; Ceol, A.; Cesareni, G.; Combe, C.; De Las Rivas, J.; Del-Toro, N.; Heimbach, J.; Hermjakob, H.; Jurisica, I.; Koch, M.; Licata, L.; Lovering, R. C.; Lynn, D. J.; Meldal, B. H. M.; Micklem, G.; Panni, S.; Porras, P.; Ricard-Blum, S.; Roechert, B.; Salwinski, L.; Shrivastava, A.; Sullivan, J.; Thierry-Mieg, N.; Yehudi, Y.; Van Roey, K.; Orchard, S. Encompassing New Use Cases - Level 3.0 of the HUPO-PSI Format for Molecular Interactions. BMC Bioinformatics 2018, 19 (1), 134, DOI: 10.1186/s12859-018-2118-136https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MjhsVGisQ%253D%253D&md5=74a8c33671263fdb89ef14d631d64e0bEncompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactionsSivade Dumousseau M; Del-Toro N; Hermjakob H; Koch M; Meldal B H M; Porras P; Shrivastava A; Orchard S; Alonso-Lopez D; De Las Rivas J; Ammari M; Bradley G; Campbell N H; Lovering R C; Ceol A; Cesareni G; Licata L; Combe C; Heimbach J; Micklem G; Sullivan J; Yehudi Y; Heimbach J; Micklem G; Sullivan J; Yehudi Y; Hermjakob H; Jurisica I; Jurisica I; Lynn D J; Lynn D J; Panni S; Ricard-Blum S; Roechert B; Salwinski L; Thierry-Mieg N; Van Roey KBMC bioinformatics (2018), 19 (1), 134 ISSN:.BACKGROUND: Systems biologists study interaction data to understand the behaviour of whole cell systems, and their environment, at a molecular level. In order to effectively achieve this goal, it is critical that researchers have high quality interaction datasets available to them, in a standard data format, and also a suite of tools with which to analyse such data and form experimentally testable hypotheses from them. The PSI-MI XML standard interchange format was initially published in 2004, and expanded in 2007 to enable the download and interchange of molecular interaction data. PSI-XML2.5 was designed to describe experimental data and to date has fulfilled this basic requirement. However, new use cases have arisen that the format cannot properly accommodate. These include data abstracted from more than one publication such as allosteric/cooperative interactions and protein complexes, dynamic interactions and the need to link kinetic and affinity data to specific mutational changes. RESULTS: The Molecular Interaction workgroup of the HUPO-PSI has extended the existing, well-used XML interchange format for molecular interaction data to meet new use cases and enable the capture of new data types, following extensive community consultation. PSI-MI XML3.0 expands the capabilities of the format beyond simple experimental data, with a concomitant update of the tool suite which serves this format. The format has been implemented by key data producers such as the International Molecular Exchange (IMEx) Consortium of protein interaction databases and the Complex Portal. CONCLUSIONS: PSI-MI XML3.0 has been developed by the data producers, data users, tool developers and database providers who constitute the PSI-MI workgroup. This group now actively supports PSI-MI XML2.5 as the main interchange format for experimental data, PSI-MI XML3.0 which additionally handles more complex data types, and the simpler, tab-delimited MITAB2.5, 2.6 and 2.7 for rapid parsing and download.
- 37Sivade Dumousseau, M.; Koch, M.; Shrivastava, A.; Alonso-López, D.; De Las Rivas, J.; Del-Toro, N.; Combe, C. W.; Meldal, B. H. M.; Heimbach, J.; Rappsilber, J.; Sullivan, J.; Yehudi, Y.; Orchard, S. JAMI: A Java Library for Molecular Interactions and Data Interoperability. BMC Bioinformatics 2018, 19 (1), 133, DOI: 10.1186/s12859-018-2119-037https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MjhsVGiug%253D%253D&md5=c5b9d0ca769653a65671dfce1219eccfJAMI: a Java library for molecular interactions and data interoperabilitySivade Dumousseau M; Koch M; Shrivastava A; Del-Toro N; Meldal B H M; Orchard S; Alonso-Lopez D; De Las Rivas J; Combe C W; Rappsilber J; Heimbach J; Sullivan J; Yehudi Y; Heimbach J; Sullivan J; Yehudi Y; Rappsilber JBMC bioinformatics (2018), 19 (1), 133 ISSN:.BACKGROUND: A number of different molecular interactions data download formats now exist, designed to allow access to these valuable data by diverse user groups. These formats include the PSI-XML and MITAB standard interchange formats developed by Molecular Interaction workgroup of the HUPO-PSI in addition to other, use-specific downloads produced by other resources. The onus is currently on the user to ensure that a piece of software is capable of read/writing all necessary versions of each format. This problem may increase, as data providers strive to meet ever more sophisticated user demands and data types. RESULTS: A collaboration between EMBL-EBI and the University of Cambridge has produced JAMI, a single library to unify standard molecular interaction data formats such as PSI-MI XML and PSI-MITAB. The JAMI free, open-source library enables the development of molecular interaction computational tools and pipelines without the need to produce different versions of software to read different versions of the data formats. CONCLUSION: Software and tools developed on top of the JAMI framework are able to integrate and support both PSI-MI XML and PSI-MITAB. The use of JAMI avoids the requirement to chain conversions between formats in order to reach a desired output format and prevents code and unit test duplication as the code becomes more modular. JAMI's model interfaces are abstracted from the underlying format, hiding the complexity and requirements of each data format from developers using JAMI as a library.
- 38Shah, A. R.; Davidson, J.; Monroe, M. E.; Mayampurath, A. M.; Danielson, W. F.; Shi, Y.; Robinson, A. C.; Clowers, B. H.; Belov, M. E.; Anderson, G. A.; Smith, R. D. An Efficient Data Format for Mass Spectrometry-Based Proteomics. J. Am. Soc. Mass Spectrom. 2010, 21 (10), 1784– 1788, DOI: 10.1016/j.jasms.2010.06.01438https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1Wmt7fO&md5=28a7abcdf4e375b6f65f0b4672fe4ea3An Efficient Data Format for Mass Spectrometry-Based ProteomicsShah, Anuj R.; Davidson, Jennifer; Monroe, Matthew E.; Mayampurath, Anoop M.; Danielson, William F.; Shi, Yan; Robinson, Aaron C.; Clowers, Brian H.; Belov, Mikhail E.; Anderson, Gordon A.; Smith, Richard D.Journal of the American Society for Mass Spectrometry (2010), 21 (10), 1784-1788CODEN: JAMSEF; ISSN:1044-0305. (Elsevier B.V.)The diverse range of mass spectrometry (MS) instrumentation along with corresponding proprietary and nonproprietary data formats has generated a proteomics community driven call for a standardized format to facilitate management, processing, storing, visualization, and exchange of both exptl. and processed data. To date, significant efforts have been extended towards standardizing XML-based formats for mass spectrometry data representation, despite the recognized inefficiencies assocd. with storing large numeric datasets in XML. The proteomics community has periodically entertained alternate strategies for data exchange, e.g., using a common application programming interface or a database-derived format. However, these efforts have yet to gain significant attention, mostly because they have not demonstrated significant performance benefits over existing stds., but also due to issues such as extensibility to multidimensional sepn. systems, robustness of operation, and incomplete or mismatched vocabulary. Here, the authors describe a format based on std. database principles that offers multiple benefits over existing formats in terms of storage size, ease of processing, data retrieval times, and extensibility to accommodate multidimensional sepn. systems.
- 39Wilhelm, M.; Kirchner, M.; Steen, J. A. J.; Steen, H. Mz5: Space- and Time-Efficient Storage of Mass Spectrometry Data Sets. Mol. Cell Proteomics 2012, 11 (1), O111.011379, DOI: 10.1074/mcp.O111.011379There is no corresponding record for this reference.
- 40Bouyssié, D.; Dubois, M.; Nasso, S.; Gonzalez de Peredo, A.; Burlet-Schiltz, O.; Aebersold, R.; Monsarrat, B. MzDB: A File Format Using Multiple Indexing Strategies for the Efficient Analysis of Large LC-MS/MS and SWATH-MS Data Sets. Mol. Cell Proteomics 2015, 14 (3), 771– 781, DOI: 10.1074/mcp.O114.03911540https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXjslCmt7w%253D&md5=314045afe89ba78463cec6923792fb51mzDB: A File Format Using Multiple Indexing Strategies for the Efficient Analysis of Large LC-MS/MS and SWATH-MS Data SetsBouyssie, David; Dubois, Marc; Nasso, Sara; Gonzalez de Peredo, Anne; Burlet-Schiltz, Odile; Aebersold, Ruedi; Monsarrat, BernardMolecular & Cellular Proteomics (2015), 14 (3), 771-781CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)In comparison with XML formats, mzDB saves ∼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data anal. pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.
- 41Wang, J.; Lu, M.; Wang, R.; An, S.; Xie, C.; Yu, C. StackZDPD: A Novel Encoding Scheme for Mass Spectrometry Data Optimized for Speed and Compression Ratio. Sci. Rep 2022, 12 (1), 5384, DOI: 10.1038/s41598-022-09432-141https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xoslyqu7k%253D&md5=2e89701d84f19a852e29c3675e0e81cfStackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratioWang, Jinyin; Lu, Miaoshan; Wang, Ruiming; An, Shaowei; Xie, Cong; Yu, ChangbinScientific Reports (2022), 12 (1), 5384CODEN: SRCEC3; ISSN:2045-2322. (Nature Portfolio)Abstr.: As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various anal. platforms due to its sheer vol. of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed soln. guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient. Under the premise of ensuring the efficiency of decompression, we propose an encoding scheme "Stack-ZDPD" that is optimized for storage of raw MS data, designed for the format "Aird", a computation-oriented format with fast accessing and decoding time, where the core compression algorithm is "ZDPD". Stack-ZDPD reduces the vol. of data stored in mzML format by around 80% or more, depending on the data acquisition pattern, and the compression ratio is approx. 30% compared to ZDPD for data generated using Time of Flight technol. Our approach is available on AirdPro, for file conversion and the Java-API Aird-SDK, for data parsing.
- 42Schramm, T.; Hester, Z.; Klinkert, I.; Both, J.-P.; Heeren, R. M. A.; Brunelle, A.; Laprévote, O.; Desbenoit, N.; Robbe, M.-F.; Stoeckli, M.; Spengler, B.; Römpp, A. ImzML-a Common Data Format for the Flexible Exchange and Processing of Mass Spectrometry Imaging Data. J. Proteomics 2012, 75 (16), 5106– 5110, DOI: 10.1016/j.jprot.2012.07.02642https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtFGrs7bL&md5=51588487986e21e83d21ce7be53916abimzML - A common data format for the flexible exchange and processing of mass spectrometry imaging dataSchramm, Thorsten; Hester, Alfons; Klinkert, Ivo; Both, Jean-Pierre; Heeren, Ron M. A.; Brunelle, Alain; Laprevote, Olivier; Desbenoit, Nicolas; Robbe, Marie-France; Stoeckli, Markus; Spengler, Bernhard; Roempp, AndreasJournal of Proteomics (2012), 75 (16), 5106-5110CODEN: JPORFQ; ISSN:1874-3919. (Elsevier B.V.)The application of mass spectrometry imaging (MS imaging) is rapidly growing with a constantly increasing no. of different instrumental systems and software tools. The data format imzML was developed to allow the flexible and efficient exchange of MS imaging data between different instruments and data anal. software. imzML data is divided in two files which are linked by a universally unique identifier (UUID). Exptl. details are stored in an XML file which is based on the HUPO-PSI format mzML. Information is provided in the form of a 'controlled vocabulary' (CV) in order to unequivocally describe the parameters and to avoid redundancy in nomenclature. Mass spectral data are stored in a binary file in order to allow efficient storage. imzML is supported by a growing no. of software tools. Users will be no longer limited to proprietary software, but are able to use the processing software best suited for a specific question or application. MS imaging data from different instruments can be converted to imzML and displayed with identical parameters in one software package for easier comparison. All tech. details necessary to implement imzML and addnl. background information is available at www.imzml.org.This article is part of a Special Issue entitled: Imaging Mass Spectrometry: A User's Guide to a New Technique for Biol. and Biomedical Research.
- 43Bhamber, R. S.; Jankevics, A.; Deutsch, E. W.; Jones, A. R.; Dowsey, A. W. MzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant MzML and Optimized for Speed and Storage Requirements. J. Proteome Res. 2021, 20 (1), 172– 183, DOI: 10.1021/acs.jproteome.0c0019243https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXhs12rsbzN&md5=ee85faa3c192cb6d1a15c618132b579cmzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage RequirementsBhamber, Ranjeet S.; Jankevics, Andris; Deutsch, Eric W.; Jones, Andrew R.; Dowsey, Andrew W.Journal of Proteome Research (2021), 20 (1), 172-183CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)With ever-increasing amts. of data produced by mass spectrometry (MS) proteomics and metabolomics, and the sheer vol. of samples now analyzed, the need for a common open format possessing both file size efficiency and faster read/write speeds has become paramount to drive the next generation of data anal. pipelines. The Proteomics Stds. Initiative (PSI) has established a clear and precise extensible markup language (XML) representation for data interchange, mzML, receiving substantial uptake; nevertheless, storage and file access efficiency has not been the main focus. We propose an HDF5 file format 'mzMLb' that is optimized for both read/write speed and storage of the raw mass spectrometry data. We provide an extensive validation of the write speed, random read speed, and storage size, demonstrating a flexible format that with or without compression is faster than all existing approaches in virtually all cases, while with compression is comparable in size to proprietary vendor file formats. Since our approach uniquely preserves the XML encoding of the metadata, the format implicitly supports future versions of mzML and is straightforward to implement: mzMLb's design adheres to both HDF5 and NetCDF4 std. implementations, which allows it to be easily utilized by third parties due to their widespread programming language support. A ref. implementation within the established ProteoWizard toolkit is provided.
- 44Jones, A. R.; Eisenacher, M.; Mayer, G.; Kohlbacher, O.; Siepen, J.; Hubbard, S. J.; Selley, J. N.; Searle, B. C.; Shofstahl, J.; Seymour, S. L.; Julian, R.; Binz, P.-A.; Deutsch, E. W.; Hermjakob, H.; Reisinger, F.; Griss, J.; Vizcaíno, J. A.; Chambers, M.; Pizarro, A.; Creasy, D. The MzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results. Mol. Cell Proteomics 2012, 11 (7), M111.014381, DOI: 10.1074/mcp.M111.014381There is no corresponding record for this reference.
- 45Vizcaíno, J. A.; Mayer, G.; Perkins, S.; Barsnes, H.; Vaudel, M.; Perez-Riverol, Y.; Ternent, T.; Uszkoreit, J.; Eisenacher, M.; Fischer, L.; Rappsilber, J.; Netz, E.; Walzer, M.; Kohlbacher, O.; Leitner, A.; Chalkley, R. J.; Ghali, F.; Martínez-Bartolomé, S.; Deutsch, E. W.; Jones, A. R. The MzIdentML Data Standard Version 1.2, Supporting Advances in Proteome Informatics. Mol. Cell Proteomics 2017, 16 (7), 1275– 1285, DOI: 10.1074/mcp.M117.06842945https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFSisr%252FP&md5=5479a48ddd9534a2dfca98dd3cadd06bThe mzIdentML Data Standard Version 1.2, Supporting Advances in Proteome InformaticsVizcaino, Juan Antonio; Mayer, Gerhard; Perkins, Simon; Barsnes, Harald; Vaudel, Marc; Perez-Riverol, Yasset; Ternent, Tobias; Uszkoreit, Julian; Eisenacher, Martin; Fischer, Lutz; Rappsilber, Juri; Netz, Eugen; Walzer, Mathias; Kohlbacher, Oliver; Leitner, Alexander; Chalkley, Robert J.; Ghali, Fawaz; Martinez-Bartolome, Salvador; Deutsch, Eric W.; Jones, Andrew R.Molecular & Cellular Proteomics (2017), 16 (7), 1275-1285CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)The first stable version of the Proteomics Stds. Initiative mzIdentML open data std. (version 1.1) was published in 2012-capturing the outputs of peptide and protein identification software. In the intervening years, the std. has become well-supported in both com. and open software, as well as a submission and download format for public repositories. Here we report a new release of mzIdentML (version 1.2) that is required to keep pace with emerging practice in proteome informatics. New features have been added to support: (1) scores assocd. with localization of modifications on peptides; (2) statistics performed at the level of peptides; (3) identification of crosslinked peptides; and (4) support for proteogenomics approaches. In addn., there is now improved support for the encoding of de novo sequencing of peptides, spectral library searches, and protein inference. As a key point, the underlying XML schema has only undergone very minor modifications to simplify as much as possible the transition from version 1.1 to version 1.2 for implementers, but there have been several notable updates to the format specification, implementation guidelines, controlled vocabularies and validation software. MzIdentML 1.2 can be described as backwards compatible, in that reading software designed for mzIdentML 1.1 should function in most cases without adaptation. We anticipate that these developments will provide a continued stable base for software teams working to implement the std.
- 46Griss, J.; Jones, A. R.; Sachsenberg, T.; Walzer, M.; Gatto, L.; Hartler, J.; Thallinger, G. G.; Salek, R. M.; Steinbeck, C.; Neuhauser, N.; Cox, J.; Neumann, S.; Fan, J.; Reisinger, F.; Xu, Q.-W.; Del Toro, N.; Pérez-Riverol, Y.; Ghali, F.; Bandeira, N.; Xenarios, I.; Kohlbacher, O.; Vizcaíno, J. A.; Hermjakob, H. The MzTab Data Exchange Format: Communicating Mass-Spectrometry-Based Proteomics and Metabolomics Experimental Results to a Wider Audience. Mol. Cell Proteomics 2014, 13 (10), 2765– 2775, DOI: 10.1074/mcp.O113.03668146https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhs1Knsb3L&md5=6011bcd94723d4a1507360d6459a4ff0The mzTab Data Exchange Format: Communicating Mass-spectrometry-based Proteomics and Metabolomics Experimental Results to a Wider AudienceGriss, Johannes; Jones, Andrew R.; Sachsenberg, Timo; Walzer, Mathias; Gatto, Laurent; Hartler, Jurgen; Thallinger, Gerhard G.; Salek, Reza M.; Steinbeck, Christoph; Neuhauser, Nadin; Cox, Jurgen; Neumann, Steffen; Fan, Jun; Reisinger, Florian; Xu, Qing-Wei; del Toro, Noemi; Perez-Riverol, Yasset; Ghali, Fawaz; Bandeira, Nuno; Xenarios, Ioannis; Kohlbacher, Oliver; Vizcaino, Juan Antonio; Hermjakob, HenningMolecular & Cellular Proteomics (2014), 13 (10), 2765-2775CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. MzTab is intended as a lightwt. supplement to the existing std. XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. MzTab files can contain protein, peptide, and small mol. identifications together with exptl. metadata and basic quant. information. The format is not intended to store the complete exptl. evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the exptl. design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biol. community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive addnl. documentation can be found online.
- 47Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis 1999, 20 (18), 3551– 3567, DOI: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-247https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXhtF2ntw%253D%253D&md5=ce7124df36d12a7fe26f05f0f264d0efProbability-based protein identification by searching sequence databases using mass spectrometry dataPerkins, David N.; Pappin, Darryl J. C.; Creasy, David M.; Cottrell, John S.Electrophoresis (1999), 20 (18), 3551-3567CODEN: ELCTDN; ISSN:0173-0835. (Wiley-VCH Verlag GmbH)Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the exptl. data are peptide mol. wts. from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a no. of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homol. (iii) Search parameters can be readily optimized by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.
- 48Röst, H. L.; Sachsenberg, T.; Aiche, S.; Bielow, C.; Weisser, H.; Aicheler, F.; Andreotti, S.; Ehrlich, H.-C.; Gutenbrunner, P.; Kenar, E.; Liang, X.; Nahnsen, S.; Nilse, L.; Pfeuffer, J.; Rosenberger, G.; Rurik, M.; Schmitt, U.; Veit, J.; Walzer, M.; Wojnar, D.; Wolski, W. E.; Schilling, O.; Choudhary, J. S.; Malmström, L.; Aebersold, R.; Reinert, K.; Kohlbacher, O. OpenMS: A Flexible Open-Source Software Platform for Mass Spectrometry Data Analysis. Nat. Methods 2016, 13 (9), 741– 748, DOI: 10.1038/nmeth.395948https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1ejtLrF&md5=6185e304e7a051764414f932c0c266aaOpenMS: a flexible open-source software platform for mass spectrometry data analysisRost, Hannes L.; Sachsenberg, Timo; Aiche, Stephan; Bielow, Chris; Weisser, Hendrik; Aicheler, Fabian; Andreotti, Sandro; Ehrlich, Hans-Christian; Gutenbrunner, Petra; Kenar, Erhan; Liang, Xiao; Nahnsen, Sven; Nilse, Lars; Pfeuffer, Julianus; Rosenberger, George; Rurik, Marc; Schmitt, Uwe; Veit, Johannes; Walzer, Mathias; Wojnar, David; Wolski, Witold E.; Schilling, Oliver; Choudhary, Jyoti S.; Malmstrom, Lars; Aebersold, Ruedi; Reinert, Knut; Kohlbacher, OliverNature Methods (2016), 13 (9), 741-748CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)High-resoln. mass spectrometry (MS) has become an important tool in the life sciences, contributing to the diagnosis and understanding of human diseases, elucidating biomol. structural information and characterizing cellular signaling networks. However, the rapid growth in the vol. and complexity of MS data makes transparent, accurate and reproducible anal. difficult. We present OpenMS 2.0 (http://www.openms.de), a robust, open-source, cross-platform software specifically designed for the flexible and reproducible anal. of high-throughput MS data. The extensible OpenMS software implements common mass spectrometric data processing tasks through a well-defined application programming interface in C++ and Python and through standardized open data formats. OpenMS addnl. provides a set of 185 tools and ready-made workflows for common mass spectrometric data processing tasks, which enable users to perform complex quant. mass spectrometric analyses with ease.
- 49Tyanova, S.; Temu, T.; Cox, J. The MaxQuant Computational Platform for Mass Spectrometry-Based Shotgun Proteomics. Nat. Protoc 2016, 11 (12), 2301– 2319, DOI: 10.1038/nprot.2016.13649https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhslynsL7O&md5=31539b285b373b7fcb4e6a497857d228The MaxQuant computational platform for mass spectrometry-based shotgun proteomicsTyanova, Stefka; Temu, Tikira; Cox, JuergenNature Protocols (2016), 11 (12), 2301-2319CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)MaxQuant is one of the most frequently used platforms for mass-spectrometry (MS)-based proteomics data anal. Since its first release in 2008, it has grown substantially in functionality and can be used in conjunction with more MS platforms. Here we present an updated protocol covering the most important basic computational workflows, including those designed for quant. label-free proteomics, MS1-level labeling and isobaric labeling techniques. This protocol presents a complete description of the parameters used in MaxQuant, as well as of the configuration options of its integrated search engine, Andromeda. This protocol update describes an adaptation of an existing protocol that substantially modifies the technique. Important concepts of shotgun proteomics and their implementation in MaxQuant are briefly reviewed, including different quantification strategies and the control of false-discovery rates (FDRs), as well as the anal. of post-translational modifications (PTMs). The MaxQuant output tables, which contain information about quantification of proteins and PTMs, are explained in detail. Furthermore, we provide a short version of the workflow that is applicable to data sets with simple and std. exptl. designs. The MaxQuant algorithms are efficiently parallelized on multiple processors and scale well from desktop computers to servers with many cores. The software is written in C# and is freely available at http://www.maxquant.org.
- 50Hoffmann, N.; Rein, J.; Sachsenberg, T.; Hartler, J.; Haug, K.; Mayer, G.; Alka, O.; Dayalan, S.; Pearce, J. T. M.; Rocca-Serra, P.; Qi, D.; Eisenacher, M.; Perez-Riverol, Y.; Vizcaíno, J. A.; Salek, R. M.; Neumann, S.; Jones, A. R. MzTab-M: A Data Standard for Sharing Quantitative Results in Mass Spectrometry Metabolomics. Anal. Chem. 2019, 91 (5), 3302– 3310, DOI: 10.1021/acs.analchem.8b0431050https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXitVCnsLY%253D&md5=dc86ef1e1edda270da21080ae4c30f5bmzTab-M: A Data Standard for Sharing Quantitative Results in Mass Spectrometry MetabolomicsHoffmann, Nils; Rein, Joel; Sachsenberg, Timo; Hartler, Juergen; Haug, Kenneth; Mayer, Gerhard; Alka, Oliver; Dayalan, Saravanan; Pearce, Jake T. M.; Rocca-Serra, Philippe; Qi, Da; Eisenacher, Martin; Perez-Riverol, Yasset; Vizcaino, Juan Antonio; Salek, Reza M.; Neumann, Steffen; Jones, Andrew R.Analytical Chemistry (Washington, DC, United States) (2019), 91 (5), 3302-3310CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)Mass spectrometry (MS) is one of the primary techniques used for large-scale anal. of small mols. in metabolomics studies. To date, there has been little data format standardization in this field, as different software packages export results in different formats represented in XML or plain text, making data sharing, database deposition, and reanal. highly challenging. Working within the consortia of the Metabolomics Stds. Initiative, Proteomics Stds. Initiative, and the Metabolomics Society, the authors have created mzTab-M to act as a common output format from anal. approaches using MS on small mols. The format has been developed over several years, with input from a wide range of stakeholders. mzTab-M is a simple tab-sepd. text format, but importantly, the structure is highly standardized through the design of a detailed specification document, tightly coupled to validation software, and a mandatory controlled vocabulary of terms to populate it. The format is able to represent final quantification values from analyses, as well as the evidence trail in terms of features measured directly from MS (e.g., LC-MS, GC-MS, DIMS, etc.) and different types of approaches used to identify mols. mzTab-M allows for ambiguity in the identification of mols. to be communicated clearly to readers of the files (both people and software). There are several implementations of the format available, and the authors anticipate widespread adoption in the field.
- 51Menschaert, G.; Wang, X.; Jones, A. R.; Ghali, F.; Fenyö, D.; Olexiouk, V.; Zhang, B.; Deutsch, E. W.; Ternent, T.; Vizcaíno, J. A. The ProBAM and ProBed Standard Formats: Enabling a Seamless Integration of Genomics and Proteomics Data. Genome Biol. 2018, 19 (1), 12, DOI: 10.1186/s13059-017-1377-x51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitFGrsbrE&md5=f4c157c3e0da5198c4f7e53dc429ef95The proBAM and proBed standard formats: enabling a seamless integration of genomics and proteomics dataMenschaert, Gerben; Wang, Xiaojing; Jones, Andrew R.; Ghali, Fawaz; Fenyo, David; Olexiouk, Volodimir; Zhang, Bing; Deutsch, Eric W.; Ternent, Tobias; Vizcaino, Juan AntonioGenome Biology (2018), 19 (), 12/1-12/8CODEN: GNBLFW; ISSN:1474-760X. (BioMed Central Ltd.)On behalf of The Human Proteome Organization (HUPO) Proteomics Stds. Initiative, we introduce here two novel std. data formats, proBAM and proBed, that have been developed to address the current challenges of integrating mass spectrometry-based proteomics data with genomics and transcriptomics information in proteogenomics studies. proBAM and proBed are adaptations of the well-defined, widely used file formats SAM/BAM and BED, resp., and both have been extended to meet the specific requirements entailed by proteomics data. Therefore, existing popular genomics tools such as SAMtools and Bedtools, and several widely used genome browsers, can already be used to manipulate and visualize these formats "out-of-the-box." We also highlight that a no. of specific addnl. software tools, properly supporting the proteomics information available in these formats, are now available providing functionalities such as file generation, file conversion, and data anal.
- 52Binz, P.-A.; Shofstahl, J.; Vizcaíno, J. A.; Barsnes, H.; Chalkley, R. J.; Menschaert, G.; Alpi, E.; Clauser, K.; Eng, J. K.; Lane, L.; Seymour, S. L.; Sánchez, L. F. H.; Mayer, G.; Eisenacher, M.; Perez-Riverol, Y.; Kapp, E. A.; Mendoza, L.; Baker, P. R.; Collins, A.; Van Den Bossche, T.; Deutsch, E. W. Proteomics Standards Initiative Extended FASTA Format. J. Proteome Res. 2019, 18 (6), 2686– 2692, DOI: 10.1021/acs.jproteome.9b0006452https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXptlemt7o%253D&md5=40142e226fc572eecfd51f2a34c4d6dfProteomics Standards Initiative Extended FASTA FormatBinz, Pierre-Alain; Shofstahl, Jim; Vizcaino, Juan Antonio; Barsnes, Harald; Chalkley, Robert J.; Menschaert, Gerben; Alpi, Emanuele; Clauser, Karl; Eng, Jimmy K.; Lane, Lydie; Seymour, Sean L.; Sanchez, Luis Francisco Hernandez; Mayer, Gerhard; Eisenacher, Martin; Perez-Riverol, Yasset; Kapp, Eugene A.; Mendoza, Luis; Baker, Peter R.; Collins, Andrew; Van Den Bossche, Tim; Deutsch, Eric W.Journal of Proteome Research (2019), 18 (6), 2686-2692CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biol. samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most anal. tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Stds. Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff.
- 53Eng, J. K.; Jahan, T. A.; Hoopmann, M. R. Comet: An Open-Source MS/MS Sequence Database Search Tool. Proteomics 2013, 13 (1), 22– 24, DOI: 10.1002/pmic.20120043953https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhslKqtLbI&md5=f876fbf5006fdf16335114a68c457140Comet: An open-source MS/MS sequence database search toolEng, Jimmy K.; Jahan, Tahmina A.; Hoopmann, Michael R.Proteomics (2013), 13 (1), 22-24CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of com. and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years.
- 54Eng, J. K.; Deutsch, E. W. Extending Comet for Global Amino Acid Variant and Post-Translational Modification Analysis Using the PSI Extended FASTA Format. Proteomics 2020, 20 (21–22), e1900362 DOI: 10.1002/pmic.20190036254https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3cXmtFWjsLk%253D&md5=89f018e0ef4f9e87f7b74c1ecff9e6e0Extending Comet for Global Amino Acid Variant and Post-Translational Modification Analysis Using the PSI Extended FASTA FormatEng, Jimmy K.; Deutsch, Eric W.Proteomics (2020), 20 (21-22), 1900362CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Protein identification by tandem mass spectrometry sequence database searching is a std. practice in many proteomics labs. The de facto std. for the representation of sequence databases used as input to sequence database search tools is the FASTA format. The Human Proteome Organization's Proteomics Stds. Initiative has developed an extension to the FASTA format termed the proteomics stds. initiative extended FASTA format or PSI extended FASTA format (PEFF) where addnl. information such as structural annotations are encoded in the protein description lines. Comet has been extended to automatically analyze the post translational modifications and amino acid substitutions encoded in PEFF databases. Comet's PEFF implementation and example anal. results searching a HEK293 dataset against the neXtProt PEFF database are presented.
- 55LeDuc, R. D.; Schwämmle, V.; Shortreed, M. R.; Cesnik, A. J.; Solntsev, S. K.; Shaw, J. B.; Martin, M. J.; Vizcaino, J. A.; Alpi, E.; Danis, P.; Kelleher, N. L.; Smith, L. M.; Ge, Y.; Agar, J. N.; Chamot-Rooke, J.; Loo, J. A.; Pasa-Tolic, L.; Tsybin, Y. O. ProForma: A Standard Proteoform Notation. J. Proteome Res. 2018, 17 (3), 1321– 1325, DOI: 10.1021/acs.jproteome.7b0085155https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1MvntVOisA%253D%253D&md5=929360b60d06e66c191fe841c3d307feProForma: A Standard Proteoform NotationLeDuc Richard D; Kelleher Neil L; Schwammle Veit; Shortreed Michael R; Cesnik Anthony J; Solntsev Stefan K; Smith Lloyd M; Ge Ying; Shaw Jared B; Pasa-Tolic Ljiljana; Martin Maria J; Vizcaino Juan A; Alpi Emanuele; Danis Paul; Smith Lloyd M; Agar Jeffrey N; Chamot-Rooke Julia; Loo Joseph A; Tsybin Yury OJournal of proteome research (2018), 17 (3), 1321-1325 ISSN:.The Consortium for Top-Down Proteomics (CTDP) proposes a standardized notation, ProForma, for writing the sequence of fully characterized proteoforms. ProForma provides a means to communicate any proteoform by writing the amino acid sequence using standard one-letter notation and specifying modifications or unidentified mass shifts within brackets following certain amino acids. The notation is unambiguous, human-readable, and can easily be parsed and written by bioinformatic tools. This system uses seven rules and supports a wide range of possible use cases, ensuring compatibility and reproducibility of proteoform annotations. Standardizing proteoform sequences will simplify storage, comparison, and reanalysis of proteomic studies, and the Consortium welcomes input and contributions from the research community on the continued design and maintenance of this standard.
- 56LeDuc, R. D.; Deutsch, E. W.; Binz, P.-A.; Fellers, R. T.; Cesnik, A. J.; Klein, J. A.; Van Den Bossche, T.; Gabriels, R.; Yalavarthi, A.; Perez-Riverol, Y.; Carver, J.; Bittremieux, W.; Kawano, S.; Pullman, B.; Bandeira, N.; Kelleher, N. L.; Thomas, P. M.; Vizcaíno, J. A. Proteomics Standards Initiative’s ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms. J. Proteome Res. 2022, 21 (4), 1189– 1195, DOI: 10.1021/acs.jproteome.1c0077156https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38XntVejtb0%253D&md5=e1cc8b4127d6517f82e088ddb388f8d1Proteomics Standards Initiative's ProForma 2.0: Unifying the Encoding of Proteoforms and PeptidoformsLeDuc, Richard D.; Deutsch, Eric W.; Binz, Pierre-Alain; Fellers, Ryan T.; Cesnik, Anthony J.; Klein, Joshua A.; Van Den Bossche, Tim; Gabriels, Ralf; Yalavarthi, Arshika; Perez-Riverol, Yasset; Carver, Jeremy; Bittremieux, Wout; Kawano, Shin; Pullman, Benjamin; Bandeira, Nuno; Kelleher, Neil L.; Thomas, Paul M.; Vizcaino, Juan AntonioJournal of Proteome Research (2022), 21 (4), 1189-1195CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chem. induced, and artifactual modifications. The Human Proteome Organization Proteomics Stds. Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a std. notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chem. formulas, or controlled vocabulary terms, including cross-links (natural and chem.) and at. isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.
- 57Deutsch, E. W.; Perez-Riverol, Y.; Carver, J.; Kawano, S.; Mendoza, L.; Van Den Bossche, T.; Gabriels, R.; Binz, P.-A.; Pullman, B.; Sun, Z.; Shofstahl, J.; Bittremieux, W.; Mak, T. D.; Klein, J.; Zhu, Y.; Lam, H.; Vizcaíno, J. A.; Bandeira, N. Universal Spectrum Identifier for Mass Spectra. Nat. Methods 2021, 18 (7), 768– 770, DOI: 10.1038/s41592-021-01184-657https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXhsVenurjM&md5=1f1df6a61c31ba9ff5aad99999cf6574Universal Spectrum Identifier for mass spectraDeutsch, Eric W.; Perez-Riverol, Yasset; Carver, Jeremy; Kawano, Shin; Mendoza, Luis; Van Den Bossche, Tim; Gabriels, Ralf; Binz, Pierre-Alain; Pullman, Benjamin; Sun, Zhi; Shofstahl, Jim; Bittremieux, Wout; Mak, Tytus D.; Klein, Joshua; Zhu, Yunping; Lam, Henry; Vizcaino, Juan Antonio; Bandeira, NunoNature Methods (2021), 18 (7), 768-770CODEN: NMAEA3; ISSN:1548-7091. (Nature Portfolio)Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
- 58Bittremieux, W.; Chen, C.; Dorrestein, P. C.; Schymanski, E. L.; Schulze, T.; Neumann, S.; Meier, R.; Rogers, S.; Wang, M. Universal MS/MS Visualization and Retrieval with the Metabolomics Spectrum Resolver Web Service bioRxiv ; preprint; Bioinformatics, 2020. DOI: 10.1101/2020.05.09.086066 .There is no corresponding record for this reference.
- 59Dai, C.; Füllgrabe, A.; Pfeuffer, J.; Solovyeva, E. M.; Deng, J.; Moreno, P.; Kamatchinathan, S.; Kundu, D. J.; George, N.; Fexova, S.; Grüning, B.; Föll, M. C.; Griss, J.; Vaudel, M.; Audain, E.; Locard-Paulet, M.; Turewicz, M.; Eisenacher, M.; Uszkoreit, J.; Van Den Bossche, T.; Schwämmle, V.; Webel, H.; Schulze, S.; Bouyssié, D.; Jayaram, S.; Duggineni, V. K.; Samaras, P.; Wilhelm, M.; Choi, M.; Wang, M.; Kohlbacher, O.; Brazma, A.; Papatheodorou, I.; Bandeira, N.; Deutsch, E. W.; Vizcaíno, J. A.; Bai, M.; Sachsenberg, T.; Levitsky, L. I.; Perez-Riverol, Y. A Proteomics Sample Metadata Representation for Multiomics Integration and Big Data Analysis. Nat. Commun. 2021, 12 (1), 5854, DOI: 10.1038/s41467-021-26111-359https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB3MXitF2nsLnL&md5=e9b01a5ec10d750eb50119535bac998bA proteomics sample metadata representation for multiomics integration and big data analysisDai, Chengxin; Fullgrabe, Anja; Pfeuffer, Julianus; Solovyeva, Elizaveta M.; Deng, Jingwen; Moreno, Pablo; Kamatchinathan, Selvakumar; Kundu, Deepti Jaiswal; George, Nancy; Fexova, Silvie; Gruning, Bjorn; Foll, Melanie Christine; Griss, Johannes; Vaudel, Marc; Audain, Enrique; Locard-Paulet, Marie; Turewicz, Michael; Eisenacher, Martin; Uszkoreit, Julian; Van Den Bossche, Tim; Schwammle, Veit; Webel, Henry; Schulze, Stefan; Bouyssie, David; Jayaram, Savita; Duggineni, Vinay Kumar; Samaras, Patroklos; Wilhelm, Mathias; Choi, Meena; Wang, Mingxun; Kohlbacher, Oliver; Brazma, Alvis; Papatheodorou, Irene; Bandeira, Nuno; Deutsch, Eric W.; Vizcaino, Juan Antonio; Bai, Mingze; Sachsenberg, Timo; Levitsky, Lev I.; Perez-Riverol, YassetNature Communications (2021), 12 (1), 5854CODEN: NCAOBW; ISSN:2041-1723. (Nature Research)The amt. of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanal. Here we propose to develop the transcriptomics data format MAGE-TAB into a std. representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanal. and integration of public proteomics datasets.
- 60Rayner, T. F.; Rocca-Serra, P.; Spellman, P. T.; Causton, H. C.; Farne, A.; Holloway, E.; Irizarry, R. A.; Liu, J.; Maier, D. S.; Miller, M.; Petersen, K.; Quackenbush, J.; Sherlock, G.; Stoeckert, C. J.; White, J.; Whetzel, P. L.; Wymore, F.; Parkinson, H.; Sarkans, U.; Ball, C. A.; Brazma, A. A Simple Spreadsheet-Based, MIAME-Supportive Format for Microarray Data: MAGE-TAB. BMC Bioinformatics 2006, 7, 489, DOI: 10.1186/1471-2105-7-48960https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD28jhtlWrtw%253D%253D&md5=e81731e3b3d2a7422525aa52bff5d11fA simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TABRayner Tim F; Rocca-Serra Philippe; Spellman Paul T; Causton Helen C; Farne Anna; Holloway Ele; Irizarry Rafael A; Liu Junmin; Maier Donald S; Miller Michael; Petersen Kjell; Quackenbush John; Sherlock Gavin; Stoeckert Christian J Jr; White Joseph; Whetzel Patricia L; Wymore Farrell; Parkinson Helen; Sarkans Ugis; Ball Catherine A; Brazma AlvisBMC bioinformatics (2006), 7 (), 489 ISSN:.BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. RESULTS: We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. CONCLUSION: MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
- 61Gibson, F.; Hoogland, C.; Martinez-Bartolomé, S.; Medina-Aunon, J. A.; Albar, J. P.; Babnigg, G.; Wipat, A.; Hermjakob, H.; Almeida, J. S.; Stanislaus, R.; Paton, N. W.; Jones, A. R. The Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards Initiative. Proteomics 2010, 10 (17), 3073– 3081, DOI: 10.1002/pmic.20100012061https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhtV2gsbzK&md5=fe7f791ab89cdb4c61a5e57baf2e4ddbThe Gel Electrophoresis Markup Language (GelML) from the Proteomics Standards InitiativeGibson, Frank; Hoogland, Christine; Martinez-Bartolome, Salvador; Medina-Aunon, J. Alberto; Albar, Juan Pablo; Babnigg, Gyorgy; Wipat, Anil; Hermjakob, Henning; Almeida, Jonas S.; Stanislaus, Romesh; Paton, Norman W.; Jones, Andrew R.Proteomics (2010), 10 (17), 3073-3081CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)The Human Proteome Organization's Proteomics Stds. Initiative has developed the GelML (gel electrophoresis markup language) data exchange format for representing gel electrophoresis expts. performed in proteomics investigations. The format closely follows the reporting guidelines for gel electrophoresis, which are part of the Min. Information About a Proteomics Expt. (MIAPE) set of modules. GelML supports the capture of metadata (such as exptl. protocols) and data (such as gel images) resulting from gel electrophoresis so that labs. can be compliant with the MIAPE Gel Electrophoresis guidelines, while allowing such data sets to be exchanged or downloaded from public repositories. The format is sufficiently flexible to capture data from a broad range of exptl. processes, and complements other PSI formats for MS data and the results of protein and peptide identifications to capture entire gel-based proteome workflows. GelML has resulted from the open standardization process of PSI consisting of both public consultation and anonymous review of the specifications.
- 62Deutsch, E. W.; Chambers, M.; Neumann, S.; Levander, F.; Binz, P.-A.; Shofstahl, J.; Campbell, D. S.; Mendoza, L.; Ovelleiro, D.; Helsens, K.; Martens, L.; Aebersold, R.; Moritz, R. L.; Brusniak, M.-Y. TraML-a Standard Format for Exchange of Selected Reaction Monitoring Transition Lists. Mol. Cell Proteomics 2012, 11 (4), R111.015040, DOI: 10.1074/mcp.R111.015040There is no corresponding record for this reference.
- 63Helsens, K.; Brusniak, M.-Y.; Deutsch, E.; Moritz, R. L.; Martens, L. JTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM Transitions. J. Proteome Res. 2011, 10 (11), 5260– 5263, DOI: 10.1021/pr200664h63https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtlSqsbzE&md5=4a6ccfb6a4bb4dbe5aef6e3bbb5f707fjTraML: An Open Source Java API for TraML, the PSI Standard for Sharing SRM TransitionsHelsens, Kenny; Brusniak, Mi-Youn; Deutsch, Eric; Moritz, Robert L.; Martens, LennartJournal of Proteome Research (2011), 10 (11), 5260-5263CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)We here present jTraML, a Java API for the Proteomics Stds. Initiative TraML data std. The library provides fully functional classes for all elements specified in the TraML XSD document, as well as convenient methods to construct controlled vocabulary-based instances required to define SRM transitions. The use of jTraML is demonstrated via a two-way conversion tool between TraML documents and vendor specific files, facilitating the adoption process of this new community std. The library is released as open source under the permissive Apache2 license and can be downloaded from http://jtraml.googlecode.com. TraML files can also be converted online at http://iomics.ugent.be/jtraml.
- 64MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: An Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments. Bioinformatics 2010, 26 (7), 966– 968, DOI: 10.1093/bioinformatics/btq05464https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXjvFykurk%253D&md5=fa018db7ec038d0f6e3a04dce1c76c39Skyline: an open source document editor for creating and analyzing targeted proteomics experimentsMacLean, Brendan; Tomazela, Daniela M.; Shulman, Nicholas; Chambers, Matthew; Finney, Gregory L.; Frewen, Barbara; Kern, Randall; Tabb, David L.; Liebler, Daniel C.; MacCoss, Michael J.Bioinformatics (2010), 26 (7), 966-968CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: Skyline is a Windows client application for targeted proteomics method creation and quant. data anal. It is open source and freely available for academic and com. use. The Skyline user interface simplifies the development of mass spectrometer methods and the anal. of data from targeted proteomics expts. performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously obsd. ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the exptl. design document. The fast and compact Skyline file format is easily shared, even for expts. requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth anal. with common statistical tools. Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.
- 65Walzer, M.; Qi, D.; Mayer, G.; Uszkoreit, J.; Eisenacher, M.; Sachsenberg, T.; Gonzalez-Galarza, F. F.; Fan, J.; Bessant, C.; Deutsch, E. W.; Reisinger, F.; Vizcaíno, J. A.; Medina-Aunon, J. A.; Albar, J. P.; Kohlbacher, O.; Jones, A. R. The MzQuantML Data Standard for Mass Spectrometry-Based Quantitative Studies in Proteomics. Mol. Cell Proteomics 2013, 12 (8), 2332– 2340, DOI: 10.1074/mcp.O113.02850665https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXht1Sht7rL&md5=1e64630faa4232ccb91c190728753382The mzQuantML Data Standard for Mass Spectrometry-based Quantitative Studies in ProteomicsWalzer, Mathias; Qi, Da; Mayer, Gerhard; Uszkoreit, Julian; Eisenacher, Martin; Sachsenberg, Timo; Gonzalez-Galarza, Faviel F.; Fan, Jun; Bessant, Conrad; Deutsch, Eric W.; Reisinger, Florian; Vizcaino, Juan Antonio; Medina-Aunon, J. Alberto; Albar, Juan Pablo; Kohlbacher, Oliver; Jones, Andrew R.Molecular & Cellular Proteomics (2013), 12 (8), 2332-2340CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)The range of heterogeneous approaches available for quantifying protein abundance via mass spectrometry (MS)1 leads to considerable challenges in modeling, archiving, exchanging, or submitting exptl. data sets as supplemental material to journals. To date, there has been no widely accepted format for capturing the evidence trail of how quant. anal. has been performed by software, for transferring data between software packages, or for submitting to public databases. In the context of the Proteomics Stds. Initiative, we have developed the mzQuantML data std. The std. can represent quant. data about regions in two-dimensional retention time vs. mass/charge space (called features), peptides, and proteins and protein groups (where there is ambiguity regarding peptide-to-protein inference), and it offers limited support for small mol. (metabolomic) data. The format has structures for representing replicate MS runs, grouping of replicates (for example, as study variables), and capturing the parameters used by software packages to arrive at these values. The format has the capability to ref. other stds. such as mzML and mzIdentML, and thus the evidence trail for the MS workflow as a whole can now be described. Several software implementations are available, and we encourage other bioinformatics groups to use mzQuantML as an input, internal, or output format for quant. software and for structuring local repositories. All project resources are available in the public domain from the HUPO Proteomics Stds. Initiative http://www.psidev.info/mzquantml.
- 66Walzer, M.; Pernas, L. E.; Nasso, S.; Bittremieux, W.; Nahnsen, S.; Kelchtermans, P.; Pichler, P.; van den Toorn, H. W. P.; Staes, A.; Vandenbussche, J.; Mazanek, M.; Taus, T.; Scheltema, R. A.; Kelstrup, C. D.; Gatto, L.; van Breukelen, B.; Aiche, S.; Valkenborg, D.; Laukens, K.; Lilley, K. S.; Olsen, J. V.; Heck, A. J. R.; Mechtler, K.; Aebersold, R.; Gevaert, K.; Vizcaíno, J. A.; Hermjakob, H.; Kohlbacher, O.; Martens, L. QcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry Experiments. Mol. Cell Proteomics 2014, 13 (8), 1905– 1913, DOI: 10.1074/mcp.M113.03590766https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXht1CksL%252FF&md5=b87d256f986786be3f1b6c343410ddfaqcML: An Exchange Format for Quality Control Metrics from Mass Spectrometry ExperimentsWalzer, Mathias; Pernas, Lucia Espona; Nasso, Sara; Bittremieux, Wout; Nahnsen, Sven; Kelchtermans, Pieter; Pichler, Peter; van den Toorn, Henk W. P.; Staes, An; Vandenbussche, Jonathan; Mazanek, Michael; Taus, Thomas; Scheltema, Richard A.; Kelstrup, Christian D.; Gatto, Laurent; van Breukelen, Bas; Aiche, Stephan; Valkenborg, Dirk; Laukens, Kris; Lilley, Kathryn S.; Olsen, Jesper V.; Heck, Albert J. R.; Mechtler, Karl; Aebersold, Ruedi; Gevaert, Kris; Vizcaino, Juan Antonio; Hermjakob, Henning; Kohlbacher, Oliver; Martens, LennartMolecular & Cellular Proteomics (2014), 13 (8), 1905-1913CODEN: MCPOBS; ISSN:1535-9484. (American Society for Biochemistry and Molecular Biology)Quality control is increasingly recognized as a crucial aspect of mass spectrometry based proteomics. Several recent papers discuss relevant parameters for quality control and present applications to ext. these from the instrumental raw data. What has been missing, however, is a std. data exchange format for reporting these performance metrics. We therefore developed the qcML format, an XML-based std. that follows the design principles of the related mzML, mzIdentML, mzQuantML, and TraML stds. from the HUPO-PSI (Proteomics Stds. Initiative). In addn. to the XML format, we also provide tools for the calcn. of a wide range of quality metrics as well as a database format and interconversion tools, so that existing LIMS systems can easily add relational storage of the quality control data to their existing schema. We here describe the qcML specification, along with possible use cases and an illustrative example of the subsequent anal. possibilities. All information about qcML is available at http://code.google.com/p/qcml.
- 67Bittremieux, W.; Walzer, M.; Tenzer, S.; Zhu, W.; Salek, R. M.; Eisenacher, M.; Tabb, D. L. The Human Proteome Organization-Proteomics Standards Initiative Quality Control Working Group: Making Quality Control More Accessible for Biological Mass Spectrometry. Anal. Chem. 2017, 89 (8), 4474– 4479, DOI: 10.1021/acs.analchem.6b0431067https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXksV2jtr0%253D&md5=9a454db151033de91da1ed901b11aea5The Human Proteome Organization-Proteomics Standards Initiative Quality Control Working Group: Making Quality Control More Accessible for Biological Mass SpectrometryBittremieux, Wout; Walzer, Mathias; Tenzer, Stefan; Zhu, Weimin; Salek, Reza M.; Eisenacher, Martin; Tabb, David L.Analytical Chemistry (Washington, DC, United States) (2017), 89 (8), 4474-4479CODEN: ANCHAM; ISSN:0003-2700. (American Chemical Society)To have confidence in results acquired during biol. mass spectrometry expts., a systematic approach to quality control is of vital importance. Nonetheless, until now, only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteome Organization-Proteomics Stds. Initiative established a new working group on quality control at its meeting in the spring of 2016. The goal of this working group is to provide a unifying framework for quality control data. The initial focus will be on providing a community-driven standardized file format for quality control. For this purpose, the previously proposed qcML format will be adapted to support a variety of use cases for both proteomics and metabolomics applications, and it will be established as an official PSI format. An important consideration is to avoid enforcing restrictive requirements on quality control but instead provide the basic tech. necessities required to support extensive quality control for any type of mass spectrometry-based workflow. The authors want to emphasize that this is an open community effort, and the authors seek participation from all scientists with an interest in this field.
- 68Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; King, N.; Stein, S. E.; Aebersold, R. Development and Validation of a Spectral Library Searching Method for Peptide Identification from MS/MS. Proteomics 2007, 7 (5), 655– 667, DOI: 10.1002/pmic.20060062568https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXjs1Kls70%253D&md5=f34b1ce3ee3a941044c9971d04d2dc50Development and validation of a spectral library searching method for peptide identification from MS/MSLam, Henry; Deutsch, Eric W.; Eddes, James S.; Eng, Jimmy K.; King, Nichole; Stein, Stephen E.; Aebersold, RuediProteomics (2007), 7 (5), 655-667CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)A notable inefficiency of shotgun proteomics expts. is the repeated rediscovery of the same identifiable peptides by sequence database searching methods, which often are time-consuming and error-phone. A more precise and efficient method, in which previously obsd. and identified peptide MS/MS spectra are cataloged and condensed into searchable spectral libraries to allow new identifications by spectral matching, is seen as a promising alternative. To that end, an open-source, functionally complete, high-throughput and readily extensible MS/MS spectral searching tool, SpectraST, was developed. A high-quality spectral library was constructed by combining the high-confidence identifications of millions of spectra taken from various data repositories and searched using four sequence search engines. The resulting library consists of over 30,000 spectra for Saccharomyces cerevisiae. Using this library, SpectraST vastly outperforms the sequence search engine SEQUEST in terms of speed and the ability to discriminate good and bad hits. A unique advantage of SpectraST is its full integration into the popular Trans Proteomic Pipeline suite of software, which facilitates user adoption and provides important functionalities such as peptide and protein probability assignment, quantification, and data visualization. This method of spectral library searching is esp. suited for targeted proteomics applications, offering superior performance to traditional sequence searching.
- 69Lam, H.; Deutsch, E. W.; Eddes, J. S.; Eng, J. K.; Stein, S. E.; Aebersold, R. Building Consensus Spectral Libraries for Peptide Identification in Proteomics. Nat. Methods 2008, 5 (10), 873– 875, DOI: 10.1038/nmeth.125469https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhtFKktLfM&md5=3cf83e26679ad2ce2d62dbfa87429a3fBuilding consensus spectral libraries for peptide identification in proteomicsLam, Henry; Deutsch, Eric W.; Eddes, James S.; Eng, Jimmy K.; Stein, Stephen E.; Aebersold, RuediNature Methods (2008), 5 (10), 873-875CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. The authors developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-anal. pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about obsd. proteomes into a concise and retrievable format for future data analyses.
- 70Frewen, B.; MacCoss, M. J. Using BiblioSpec for Creating and Searching Tandem MS Peptide Libraries. Curr. Protoc Bioinformatics 2007; Chapter 13, Unit 13.7. DOI: 10.1002/0471250953.bi1307s20 .There is no corresponding record for this reference.
- 71Deutsch, E. W.; Perez-Riverol, Y.; Chalkley, R. J.; Wilhelm, M.; Tate, S.; Sachsenberg, T.; Walzer, M.; Käll, L.; Delanghe, B.; Böcker, S.; Schymanski, E. L.; Wilmes, P.; Dorfer, V.; Kuster, B.; Volders, P.-J.; Jehmlich, N.; Vissers, J. P. C.; Wolan, D. W.; Wang, A. Y.; Mendoza, L.; Shofstahl, J.; Dowsey, A. W.; Griss, J.; Salek, R. M.; Neumann, S.; Binz, P.-A.; Lam, H.; Vizcaíno, J. A.; Bandeira, N.; Röst, H. Expanding the Use of Spectral Libraries in Proteomics. J. Proteome Res. 2018, 17, 4051, DOI: 10.1021/acs.jproteome.8b0048571https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhvVWiur%252FN&md5=94614ca5a18653b5654af44c2aae34feExpanding the Use of Spectral Libraries in ProteomicsDeutsch, Eric W.; Perez-Riverol, Yasset; Chalkley, Robert J.; Wilhelm, Mathias; Tate, Stephen; Sachsenberg, Timo; Walzer, Mathias; Kall, Lukas; Delanghe, Bernard; Bocker, Sebastian; Schymanski, Emma L.; Wilmes, Paul; Dorfer, Viktoria; Kuster, Bernhard; Volders, Pieter-Jan; Jehmlich, Nico; Vissers, Johannes P. C.; Wolan, Dennis W.; Wang, Ana Y.; Mendoza, Luis; Shofstahl, Jim; Dowsey, Andrew W.; Griss, Johannes; Salek, Reza M.; Neumann, Steffen; Binz, Pierre-Alain; Lam, Henry; Vizcaino, Juan Antonio; Bandeira, Nuno; Rost, HannesJournal of Proteome Research (2018), 17 (12), 4051-4060CODEN: JPROBS; ISSN:1535-3893. (American Chemical Society)A review. The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Stds. Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. The authors present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine anal. of proteomics datasets.
- 72Mészáros, B.; Hatos, A.; Palopoli, N.; Quaglia, F.; Salladini, E.; Van Roey, K.; Arthanari, H.; Dosztányi, Z.; Felli, I. C.; Fischer, P. D.; Hoch, J. C.; Jeffries, C. M.; Longhi, S.; Maiani, E.; Orchard, S.; Pancsa, R.; Papaleo, E.; Pierattelli, R.; Piovesan, D.; Pritisanac, I.; Viennet, T.; Tompa, P.; Vranken, W.; Tosatto, S. C.; Davey, N. E. MIADE Metadata Guidelines: Minimum Information About a Disorder Experiment; Scientific Communication and Education, 2022. DOI: 10.1101/2022.07.12.495092 .There is no corresponding record for this reference.
- 73Quaglia, F.; Mészáros, B.; Salladini, E.; Hatos, A.; Pancsa, R.; Chemes, L. B.; Pajkos, M.; Lazar, T.; Peña-Díaz, S.; Santos, J.; Ács, V.; Farahi, N.; Fichó, E.; Aspromonte, M. C.; Bassot, C.; Chasapi, A.; Davey, N. E.; Davidović, R.; Dobson, L.; Elofsson, A.; Erdos, G.; Gaudet, P.; Giglio, M.; Glavina, J.; Iserte, J.; Iglesias, V.; Kálmán, Z.; Lambrughi, M.; Leonardi, E.; Longhi, S.; Macedo-Ribeiro, S.; Maiani, E.; Marchetti, J.; Marino-Buslje, C.; Mészáros, A.; Monzon, A. M.; Minervini, G.; Nadendla, S.; Nilsson, J. F.; Novotný, M.; Ouzounis, C. A.; Palopoli, N.; Papaleo, E.; Pereira, P. J. B.; Pozzati, G.; Promponas, V. J.; Pujols, J.; Rocha, A. C. S.; Salas, M.; Sawicki, L. R.; Schad, E.; Shenoy, A.; Szaniszló, T.; Tsirigos, K. D.; Veljkovic, N.; Parisi, G.; Ventura, S.; Dosztányi, Z.; Tompa, P.; Tosatto, S. C. E.; Piovesan, D. DisProt in 2022: Improved Quality and Accessibility of Protein Intrinsic Disorder Annotation. Nucleic Acids Res. 2022, 50 (D1), D480– D487, DOI: 10.1093/nar/gkab108273https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BB38Xis1Churk%253D&md5=62265cf41f620f29f0f3d673578ca5bcDisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotationQuaglia, Federica; Meszaros, Balint; Salladini, Edoardo; Hatos, Andras; Pancsa, Rita; Chemes, Lucia B.; Pajkos, Matyas; Lazar, Tamas; Pena-Diaz, Samuel; Santos, Jaime; Acs, Veronika; Farahi, Nazanin; Ficho, Erzsebet; Aspromonte, Maria Cristina; Bassot, Claudio; Chasapi, Anastasia; Davey, Norman E.; Davidovic, Radoslav; Dobson, Laszlo; Elofsson, Arne; Erdos, Gabor; Gaudet, Pascale; Giglio, Michelle; Glavina, Juliana; Iserte, Javier; Iglesias, Valentin; Kalman, Zsofia; Lambrughi, Matteo; Leonardi, Emanuela; Longhi, Sonia; Macedo-Ribeiro, Sandra; Maiani, Emiliano; Marchetti, Julia; Marino-Buslje, Cristina; Meszaros, Attila; Monzon, Alexander Miguel; Minervini, Giovanni; Nadendla, Suvarna; Nilsson, Juliet F.; Novotny, Marian; Ouzounis, Christos A.; Palopoli, Nicolas; Papaleo, Elena; Pereira, Pedro Jose Barbosa; Pozzati, Gabriele; Promponas, Vasilis J.; Pujols, Jordi; Rocha, Alma Carolina Sanchez; Salas, Martin; Sawicki, Luciana Rodriguez; Schad, Eva; Shenoy, Aditi; Szaniszlo, Tamas; Tsirigos, Konstantinos D.; Veljkovic, Nevena; Parisi, Gustavo; Ventura, Salvador; Dosztanyi, Zsuzsanna; Tompa, Peter; Tosatto, Silvio C. E.; Piovesan, DamianoNucleic Acids Research (2022), 50 (D1), D480-D487CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)The Database of Intrinsically Disordered Proteins is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontol. (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Min. Information About Disorder (MIADE) std., an active collaboration with the Gene Ontol. (GO) and Evidence and Conclusion Ontol. (ECO) consortia and the support of the ELIXIR infrastructure.
- 74Bittremieux, W.; Bouyssié, D.; Dorfer, V.; Locard-Paulet, M.;