PeptideShaker Online: A User-Friendly Web-Based Framework for the Identification of Mass Spectrometry-Based Proteomics Data

Mass spectrometry-based proteomics is a high-throughput technology generating ever-larger amounts of data per project. However, storing, processing, and interpreting these data can be a challenge. A key element in simplifying this process is the development of interactive frameworks focusing on visualization that can greatly simplify both the interpretation of data and the generation of new knowledge. Here we present PeptideShaker Online, a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data, from raw file conversion to interactive visualization of the resulting data. Storage and processing of the data are performed via the versatile Galaxy platform (through SearchGUI, PeptideShaker, and moFF), while the interaction with the results happens via a locally installed web server, thus enabling researchers to process and interpret their own data without requiring advanced bioinformatics skills or direct access to compute-intensive infrastructures. The source code, additional documentation, and a fully functional demo is available at https://github.com/barsnes-group/peptide-shaker-online.


■ INTRODUCTION
Mass spectrometry-based proteomics generates large amounts of data, 1 and it is essential that the data can be processed and analyzed in such a way that the researcher generating the data can interpret its biological meaning correctly. In addition to biological knowledge, this often requires direct access to significant computational resources and advanced computational skills. The overall challenge can be split into three main categories: (i) access to computational resources; (ii) availability of user-friendly bioinformatics software; and (iii) having the biological understanding to translate the data into useful knowledge.
The first category can be addressed by high-performance computing environments that provide the required resources through powerful servers instead of the more limited personal computers, 2 while at the same time making the stored data more portable and accessible; i.e., there is no need to download or move the data. 3 Adding interactive visualization to such setups can help with the second category of the need for user-friendly bioinformatics software, and can also play a key role in the data processing and simplify the interpretation of the results. 4 Interactive visualization can furthermore greatly reduce the complexity of interpretation by providing direct interaction with the data and by dividing it into distinct levels, thus enabling the biological researcher to focus on interpreting the data and extracting biological knowledge. 5 One way to port bioinformatic pipelines to remote servers is to take advantage of Galaxy, a web-based scientific analysis platform including more than 5500 specialized tools, in addition to workflow support and data storage management, thus providing the required infrastructure for large-scale proteomics data analysis. 6 Galaxy is however limited when it comes to advanced interactive visualization of the results and may not be straightforward to use for nonprogrammers.
As a response to these challenges, we here present PeptideShaker Online, a user-friendly web-based framework for the identification of mass spectrometry-based proteomics data, from raw file conversion to interactive visualization of the search results.

■ METHODS
PeptideShaker Online consists of two main components; a Galaxy-based backend where the data is stored and the search and data processing are performed, 7 and a locally installed web-based frontend supporting SearchGUI 8 search and interactive visualization of PeptideShaker 9 projects. The data processing is done via the Galaxy platform using (i) ThermoRawFileParser 10 for converting Thermo raw files into mzML 11 or mgf; (ii) SearchGUI for protein identification based on ten proteomics search and de novo engines, namely OMSSA, 12 X! Tandem, 13 MyriMatch, 14 MS Amanda, 15 MS-GF+, 16 Comet, 17 Tide, 18 MetaMorpheus, 19 DirectTag, 20 and Novor; 21 (iii) PeptideShaker for interpretation of the peptide identification data from SearchGUI; and finally (iv) moFF for extracting MS1 intensities from the mass spectra. 22 Spectrum input is supported as either mgf or mzML for identification, and Thermo raw files for both identification and quantification.
Vaadin 7.26 (https://vaadin.com) and Java 8 are used for the frontend implementation, and Tomcat server version 9 (https://tomcat.apache.org) is used to host the demo web application. Reactome 23 is used for the proteoform network data, Lite-Mol 24 for the protein 3D structures, compomics-utilities 5.0.15 25 to produce the main spectrum charts, and Jersey 2.34 (https://eclipse-ee4j.github.io/jersey) for managing the connections between Galaxy and PeptideShaker Online.
For a complete list of libraries, the full source code, additional documentation, and step-by-step instructions on how to deploy PeptideShaker Online on your own web server, please see https://github.com/barsnes-group/peptide-shakeronline.

■ RESULTS
As a web-based interactive proteomics framework, Peptide-Shaker Online can be deployed in proteomics laboratories and facilities. It aims to simplify the mass spectrometry-based proteomics data identification through providing the users with an intuitive easy-to-use interface, thus removing the need for  Next is the Protein Overview (Figure 3), showing the details for the currently selected protein, including: (i) a proteinpeptide network (with related protein groups and peptides) or a proteoform network for the related proteoforms interactions; (ii) the protein 3D structure; and (iii) the protein coverage. For all three visualizations, the peptides are color-coded according to a user-selected property, e.g., the number of peptide-spectrum matches or the post-translational modification type.
Finally, there is the Peptide-Spectrum Matches level ( Figure  4), including the annotated spectrum matches for the selected peptide. Notably, the spectrum viewer is fully interactive and supports both manual and automatic de novo sequencing in  Journal of Proteome Research pubs.acs.org/jpr Technical Note addition to customizable peak annotation. This level also includes a peptide-spectrum matches table with sequence fragmentation charts and mass error plots. Furthermore, PeptideShaker Online makes it possible for users to share their own processed results using project-specific links. Besides saving time, this feature makes the data more secure and portable given that there is no transfer of the underlying data files between the users. The users can also export the data directly as either an Excel spreadsheet or images. Finally, PeptideShaker Online also supports the uploading and visualization of locally processed data files in tab-delimited file formats, making it possible to use the framework without having to reprocess the data with the full pipeline, for example, when the desired spectrum files are not available.
To test the framework and its main features, a fully functional demo is available, which includes two processed data sets (Table 1) and supporting new searches based on the example data provided. Due to local resource limitations, the maximum number of concurrent users for the demo is set to five. Note that the demo uses a public Galaxy user key by default. When installing their own version of PeptideShaker Online, users should rather use personal Galaxy API keys to control access to the data. For information about how to set up your own PeptideShaker Online web server, please visit the project's GitHub page: https://github.com/barsnes-group/ peptide-shaker-online.

■ CONCLUSION
In summary, PeptideShaker Online is a user-friendly webbased framework for the identification of mass spectrometrybased proteomics data, from raw file conversion to interactive visualization of the results. The framework is easily expandable by either including additional tools from the Galaxy platform or introducing new data visualization levels in the web-based frontend. PeptideShaker Online makes the identification of proteomics data more accessible to researchers lacking advanced computational skills, thus moving the data interpretation closer to the biologists in charge of generating the data. Furthermore, the coordinated interactive visualizations combined with splitting the data into distinct levels allows for intuitive data exploration and thus contributes to a better understanding of proteomics data and its inherent complexity.