Evaluation of Parameters for Confident Phosphorylation Site Localization Using an Orbitrap Fusion Tribrid Mass Spectrometer

Confident identification of sites of protein phosphorylation by mass spectrometry (MS) is essential to advance understanding of phosphorylation-mediated signaling events. However, the development of novel instrumentation requires that methods for MS data acquisition and its interrogation be evaluated and optimized for high-throughput phosphoproteomics. Here we compare and contrast eight MS acquisition methods on the novel tribrid Orbitrap Fusion MS platform using both a synthetic phosphopeptide library and a complex phosphopeptide-enriched cell lysate. In addition to evaluating multiple fragmentation regimes (HCD, EThcD, and neutral-losstriggered ET(ca/hc)D) and analyzers for MS/MS (orbitrap (OT) versus ion trap (IT)), we also compare two commonly used bioinformatics platforms, Andromeda with PTM-score, and MASCOT with ptmRS for confident phosphopeptide identification and, crucially, phosphosite localization. Our findings demonstrate that optimal phosphosite identification is achieved using HCD fragmentation and high-resolution orbitrap-based MS/MS analysis, employing MASCOT/ptmRS for data interrogation. Although EThcD is optimal for confident site localization for a given PSM, the increased duty cycle compared with HCD compromises the numbers of phosphosites identified. Finally, our data highlight that a charge-state-dependent fragmentation regime and a multiple algorithm search strategy are likely to be of benefit for confident large-scale phosphosite localization.


Introduction
Protein phosphorylation is an essential, rapidly reversible, post-translational modification (PTM), with critical roles in nearly all biological processes. Defining these dynamic phosphorylation events is key to understanding their functional significance and gaining insight into the complex biology that they regulate. The ability to comprehensively and confidently decipher the phosphoproteome, the entire cellular phosphorylation state under a given set of conditions, thus yields indispensable information. Significant advances in mass spectrometry (MS) over the last decade have allowed for in-depth, although arguably incomplete, analysis of phosphoproteomes in a wide variety of complex biological systems [1][2][3][4][5][6][7][8][9][10][11][12] . The continual development of more sophisticated ways of generating and analyzing MS data is undoubtedly aiding phosphopeptide identification. However, from a mechanistic biological perspective, it is insufficient to have confidence in phosphopeptide identity if there is ambiguity regarding the site of modification within that peptide; consequently, it is of equal importance to define confidence in both phosphopeptide sequence and site of modification. Reviewers and users of such data generally understand this importance and publication guidelines now typically require researchers to assess site localization confidence 13 .
MS-based analysis of (phospho)peptides relies on the acquisition of tandem MS (MS/MS) data from phosphopeptide-enriched samples, typically following proteolysis of complex cell extracts with proteases such as trypsin or LysC. The complexity and rapid regulation of the phosphoproteome means that significant numbers of samples often need to be analyzed. Maximizing the acquisition of information rich MS/MS data, and its optimal interrogation to derive confident sequence information, is essential for both phosphopeptide and phosphosite identity. In particular, confident phosphosite localization is critical if this key underpinning technology is to be of optimal benefit for the advancement of bioscience and interrogation of cell signaling mechanisms. High-throughput phosphoproteomics studies are currently sub-optimal, with a recent isobaric labelling study demonstrating that only ~30% of phosphopeptides in an enriched complex mixture could be identified using conventional ion trap collision-induced dissociation (CID) 14 . The efficiency of (phospho)peptide identification can undoubtedly be improved by using high resolution mass analyzers for tandem MS. Additionally, confidence in phosphopeptide and phosphosite characterization can be enhanced by increasing the number of site-determining product ions, which we and others have shown can be aided by the exploitation of multiple complementary fragmentation modes [15][16][17][18][19][20] . Recent development of novel types of tribrid mass spectrometer, the Orbitrap Fusion series of instruments, which combine three mass analyzers (quadrupole, ion trap, orbitrap) are potentially of significant utility for such studies. The benefit of being able to perform both high and low energy collision-induced dissociation (HCD and CID respectively), as well as electron transfer dissociation (ETD), with product ion analysis being performed in either the ion trap or the orbitrap [21][22][23] , means that these instruments should be of great benefit in the quest for improved phosphoproteome analysis and unambiguous phosphosite identification. Although collisional dissociation is frequently implemented in proteomics workflows, there are limitations when used in phosphoproteomics pipelines due to preferential cleavage of the phosphoester bond. Such MS/MS spectra exhibit predominant neutral loss of phosphoric acid/phosphate (Δ98/Δ80) from the phosphorylated precursor ion and few informative product ions. Not only does this neutral loss impede peptide identification, but once lost, it is often difficult to pinpoint the original site of modification. Higher energy collisional dissociation (HCD) 24 can overcome limited peptide backbone Page 3 of 21 fragmentation, as the elevated energy applied and the additional kinetic energy of the ions means that they undergo further collisions leading to richer, more informative, fragment ion spectra [24][25][26] . However, neutral loss at the expense of peptide backbone fragmentation can still arise with an HCD fragmentation regime 26 , compromising confident phosphosite localization. In contrast, the nature of phosphopeptide ion fragmentation by ETD means that the phosphate group is retained on the modified residue, often allowing the site of modification to be identified with greater confidence. Application of ETD has typically been limited for large scale studies, in part due to its availability only on selected MS platforms, but also due to inherent limitations. ETD requires longer reaction times, and fragmentation is generally much less efficient than collision-mediated dissociation, in particular for low charge states (where z = 2). The development of EThcD 18 , a dual fragmentation strategy which combines ETD and HCD resulting in MS/MS spectra containing b/y and c/z ions, is reported to enhance localization of various PTMs on peptides and proteins, including phosphorylation 19,[27][28][29] .
The number of potential phosphopeptide MS acquisition strategies, particularly with the new generation of versatile tribrid Orbitrap instruments, means that it can be extremely complicated and time-consuming to establish an 'optimal' phosphoproteomics pipeline. There are numerous challenges associated with optimizing instrument settings to maximize phosphopeptide identification and crucially, confident site localization. The added capability of the Orbitrap Fusion instruments to parallelize acquisition of MS 1 in the high resolution orbitrap, while acquiring at a faster rate, lower resolution MS 2 in the ion trap (if required), means that there can be significant advantages for high throughput proteomics using this type of tribrid instrument. The number of possible strategies for MS(/MS) data acquisition (orbitrap versus ion trap), as well as potential fragmentation regimes (CID, HCD, ETcaD, EThcD, with or without neutral loss considerations that may be used for triggering additional MS 2 /MS 3 acquisition, or multistage acquisition (MSA)) means that the combinatorial options for MS data acquisition are vast.
Here, we systematically evaluate eight acquisition modes on the tribrid Orbitrap Fusion MS platform, using a library of synthetic phosphopeptide standards, and a complex phosphopeptide-enriched cell lysate preparation. We define optimal MS acquisition settings for both phosphopeptide identification and phosphosite localization, interrogating these datasets using two commonly used phosphoproteomics bioinformatics platforms: Proteome Discoverer (PD) with MASCOT and phosphoRS (ptmRS), and MaxQuant with Andromeda and PTM-score, comparing the benefits of each for confident peptide identification and phosphosite localization. We also evaluate the effect of charge state, and the number of putative phosphorylatable residues on site localization confidence. Although previous experience had suggested that optimal phosphosite localization would require electron transfer-mediated fragmentation, this was not observed for the vast majority of phosphopeptides.
We anticipate that this data, and the analysis thereof, will serve as an ideal starting point for laboratories worldwide looking to establish high-throughput phosphoproteomics using this next generation of tribrid MS instrumentation.

Reagents
All chemicals were purchased from Sigma Aldrich unless otherwise stated. The synthetic phosphopeptide library was purchased from Intavis.

Cell Culture and Lysis
U2OS T-Rex Flp-in cells were maintained in DMEM supplemented with 10% (v/v) fetal bovine serum, penicillin (100 U/mL), streptomycin (100 U/mL) and L-glutamine (2 mM), at 37 °C, 5% CO 2 . Once 80 % confluence was reached, cells were washed with PBS and released with trypsin (0.05 % (v/v)). Cells were centrifuged at 220 x g and lysed with 500 μL 0.25 % (w/v) RapiGest SF (Waters, UK) in 50 mM ammonium bicarbonate with 1x PhosSTOP phosphatase inhibitor cocktail tablet (Roche). The lysate was sonicated briefly and centrifuged at maximum speed for 20 minutes. Protein concentration was determined using the Bradford assay and 4 mg was set aside for protein digestion.

Sample Preparation
Disulfide bonds were reduced by addition of 3 mM DTT in 50 mM ammonium bicarbonate and heated at 60°C for 15 minutes. The resulting free cysteine residues were alkylated with 14 mM iodoacetamide (dark, room temperature, 45 minutes) and excess iodoacetamide quenched by addition of DTT to a final concentration of 7 mM. Proteins were digested overnight with trypsin (2% (w/w); Promega) at 37 °C. RapiGest SF hydrolysis was induced by addition of trifluoroacetic acid (TFA) to 1 % (v/v) and incubated at 37 °C for up to 2 h, 400 rpm. Insoluble hydrolysis product was removed by centrifugation (13,000 x g, 15 min, 4°C) 15 . Peptides were desalted using C18 macro columns (Harvard Apparatus, Cambridge, UK). Briefly, columns were conditioned with 100 % methanol and washed with H 2 O and 1% (v/v) TFA. Peptides were loaded on to the column and centrifuged for 1 minute at 110 x g. The flow-through was re-applied a total of 5 times and peptides were eluted with 80% (v/v) MeCN and 1% (v/v) TFA and dried to completion by vacuum centrifugation.

Liquid Chromatography-Mass Spectrometry
Reversed-phase capillary HPLC separations were performed using an UltiMate 3000 nano system (Dionex) coupled in-line with a Thermo Orbitrap Fusion tribrid mass spectrometer (Thermo Scientific, Bremen, Germany). Synthetic phosphopeptide standards (~10 pmol, split in to 5 pools to separate phosphoisomers and thus ensure confidence in phosphosite localization) and 6 µL enriched phosphopeptides (equivalent to 100 μg of digested cell lysate) were loaded onto the trapping column (PepMap100, C18, 300 μm x 5 mm), using partial loop injection, for seven minutes at a flow rate of 9 μL/min with 2% (v/v) MeCN, 0.1% (v/v) TFA and then resolved on an analytical 21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 Page 5 of 21 column (Easy-Spray C18 75 µm x 500 mm, 2 µm bead diameter column) using a gradient of 96.2% A (0.1% (v/v) formic acid (FA)): 3.8% B (80% (v/v) MeCN, 0.1% (v/v) FA) to 50% B over 97 minutes at a flow rate of 300 nL/min . . MS(/MS) data were acquired on an Orbitrap Fusion as follows: all MS 1 spectra were acquired over m/z 350-2000 in the orbitrap (120K resolution at 200 m/z for high-low strategies and 60K resolution at 200 m/z for high-high strategies); automatic gain control (AGC) was set to accumulate 2E 5 ions, with a maximum injection time of 50 ms. Data-dependent tandem MS analysis was performed using a top speed approach (cycle time of 3 s) with multiple fragmentation methods tested (see Table 1 for summary of parameters). The normalized collision energy was optimized at 32% for HCD. MS 2 spectra were acquired with a fixed first m/z of 100. The intensity threshold for fragmentation was set to 50,000 for orbitrap methods and 5000 for ion trap methods and included charge states 2+ to 5+. A dynamic exclusion of 60 seconds was applied with a mass tolerance of 10 ppm. For neutral loss triggered ETcaD/EThcD methods, fragmentation was enabled for all precursor ions exhibiting neutral loss of mass 97.9763 Da or 80 Da with a mass tolerance of 20 ppm for orbitrap data and 0.5 m/z for ion trap data, where the neutral loss ion was one of the top 10 most intense MS 2 ions. ETD calibrated parameters were applied. AGC was set to 10,000 with a maximum injection time set at 50 ms for IT and 70 ms for OT; ETD reaction time was charge-dependent.

Data Analysis
Data was processed using either Thermo Proteome Discoverer (v. 1.4) in conjunction with MASCOT (v 2.6), or with Andromeda integrated in MaxQuant (version 1.5.8.0) using default settings unless otherwise specified. To address the requirement of MASCOT for centroided data, raw data files were converted to mzML format in order to perform MS 2 de-isotoping prior to processing with MASCOT through the Proteome Discoverer (PD) pipeline. Peak lists were searched against a database containing either the synthetic phosphopeptide sequences or the human UniProt database (201512; 20187 sequences). Parameters were set as follows: MS 1 tolerance of 10 ppm; MS 2 mass tolerance of 0.01 Da for orbitrap detection, 0.6 Da for ion trap detection; enzyme specificity was set as trypsin with 2 missed cleavages allowed; no enzyme was defined for the phosphopeptide library processing; carbamidomethylation of cysteine was set as a fixed modification; phosphorylation of serine, threonine and tyrosine, and oxidation of methionine were set as variable modifications. Nonfragment filtering was applied to ETD scans to remove the precursor peak within a 4 Da window and remove charged reduced precursor and neutral loss ions from charged reduced precursor ions within a 2 Da window. ptmRS was run in PhosphoRS mode using diagnostic fragment ions and analyzer specific fragment ion tolerances as previously defined in the search. For EThcD data, 'Treat all spectra as EThcD' option was set to 'True'. Data was filtered to a 1% false discovery rate (FDR) on PSMs using automatic decoy searching with Mascot and a target-decoy search with Andromeda.

Comparison of fragmentation methods and MS 2 resolution settings for identification and site localization of phosphopeptide standards
To evaluate the advanced capabilities of the Orbitrap Fusion tribrid mass spectrometer for sitespecific phosphopeptide identification, we designed a series of MS acquisition methods to assess the benefits of using either the high resolution orbitrap or the lower resolution ion trap mass analyzers. In the first instance we analyzed a commercially available synthetic library of phosphopeptides, 30 that comprised tryptic peptides previously observed in multiple large-scale phosphopeptide studies. The library was designed such that the typical composition and length observed in bottom-up proteomics is represented, with a natural occurrence of unmodified and phosphorylated serine, threonine and tyrosine residues.
As well as differing in resolving power, there are significant differences in speed and sensitivity between the orbitrap (OT) and ion trap (IT) mass analyzers. HCD, EThCD and neutral loss (NL) triggered ETD-mediated fragmentation strategies, where ions exhibiting precursor neutral loss of 98 amu (arising due to the characteristic loss of H 3 PO 4 from phosphorylated peptide ions 16,17,31 ) or 80 amu (arising due to loss of HPO 3 ) following HCD, were also compared ( Table 1; Table S1).

Method
Resolution (MS 1 ) The phosphopeptide library, containing 175 unique phosphopeptides (191 phosphorylation sites), was divided into five pools for LC-MS/MS analysis. Isomeric phosphopeptides (where the same peptide sequence is modified on a different residue) were allocated to different analytical pools to ensure that site localization could be defined absolutely. The five pools of synthetic phosphopeptide standards were each analyzed in duplicate using the eight MS acquisition methods, assessing both phosphopeptide identification and phosphosite localization ( Table 2, Supp. Fig 1).

Resolution
As an extension of previously published studies 30,32 we also assessed the ability of two commonly used phosphoproteomics data analysis platforms, MASCOT integrated into Proteome Discoverer (PD) using ptmRS (a slightly modified version of phosphoRS 33 ) for phosphosite localization, and Andromeda with MaxQuant and PTM-score 34 , to identify the synthetic phosphopeptides from all eight datasets (Table 2).  Implementation of either the Andromeda or MASCOT search algorithms resulted in notably fewer PSMs using EThcD compared to HCD, independent of whether MS 2 was performed in the orbitrap or the ion trap (Table 2). This result can be explained by the increase in duty cycle for this mixed mode fragmentation regime. Consequently fewer phosphopeptides were identified with EThcD OT compared with the analogous HCD OT, and likewise for EThcD IT compared with HCD IT (Table 2; ). However, the higher percentage of PSMs with correctly localized phosphosites following EThcD IT (94% compared with 87% for Andromeda/PTM-score; 92% compared with 83% for MASCOT/ptmRS for EThcD IT or HCD IT respectively) translated to the same or higher numbers of correctly site localized phosphosites being characterized overall with EThcD IT than HCD IT (Table 2; Figure 1A; Supp. Figure 2). These findings are in agreement with previous observations on different instrument platforms, which highlight the benefit of mixed mode fragmentation for improved phosphosite localisation 19 . For the high resolution OT data, there was a notable difference in the performance of the two search engines. Consequently, while phosphosite localization confidence increased with EThcD compared with HCD (resulting in the same numbers of correctly localized phosphosites) using Andromeda/PTM-score, this was not the case with MASCOT/ptmRS. 172 phosphosites were correctly identified with HCD OT, whereas EThcD OT yielded only 154 correctly localized phosphosites. The benefits of high resolution MS 2 acquisition therefore appear to outweigh the increased duty cycle associated with EThcD when using MASCOT/ptmRS for this phosphopeptide library.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 Page 9 of 21 When considering HCD fragmentation, with or without NL-triggered ET(hc/ca)D, phosphosite localization with both bioinformatics platforms was optimal (higher percentage) with high resolution orbitrap MS 2 analysis, likely due to the improved confidence afforded by the enhanced mass accuracy as compared with low resolution ion trap MS 2 measurements (Table 2; Figure 1A; Supp. Figure 1). Interestingly, Andromeda/PTM-score yielded fewer numbers overall, both of unique phosphopeptides and correctly localized phosphosites, compared with MASCOT/ptmRS, irrespective of MS method. A maximum of 159 unique phosphopeptides (155 correctly localized phosphosites) were identified from the pool of 175 synthetic phosphopeptides with Andromeda/PTM-score, compared with 168 phosphopeptides (172 correctly localized phosphosites) when the same data were interrogated using MASCOT/ptmRS.
With both search algorithms, HCD IT was optimal for both PSMs and the numbers of unique phosphopeptide identified, as might be expected given the possibility for parallelization of MS 1 data acquisition in the orbitrap and concurrent MS 2 analysis in the ion trap. However, site localization confidence, the critical parameter from the point of view of biological inference, was either optimal (ptmRS) or of equal performance (PTM-score) using the HCD OT method.
Upon further examination of the workflows exploiting neutral loss-triggered ETcaD, the vast majority (89 -93%) of correctly site localized phosphopeptides were derived from the HCD spectra rather than the ETcaD spectra triggered following precursor neutral loss. The additional incorporation of ETcaD in this regime thus appeared to offer no benefit for either phosphopeptide identification or site localization over that achieved with HCD alone. Indeed, the number of PSMs was compromised due to the increase in duty cycle for the EThcD component of this multi-stage acquisition method. The HCD IT/OT nl ETcaD IT methods are therefore not discussed in subsequent analytical comparisons.
A significant advantage of using synthetic phosphopeptides of known sequence is the ability to define false localization rates (FLRs) specific to the MS acquisition method employed, by counting the numbers of correct and incorrectly site localized PSMs 30 (Figure 1 B-E). The distribution of site localization scores for each of the four unique fragmentation modes, HCD OT, HCD IT, EThcD OT, EThcD IT, with each of the two informatics pipelines is presented in Figure 1. Akin to previous observations on different MS platforms with both synthetic phosphopeptides 30 and a complex phosphopeptide enriched cell lysate 32 , both site localization tools require MS acquisition method specific scores to yield a 1% FLR, (Figure 1 B, C). It is of interest to note that although fewer phosphosites were incorrectly localized overall with HCD OT compared to HCD IT with both search engines, this does not correlate with a lower site localization score. ptmRS exhibits a bimodal distribution for high-resolution MS 2 data, with clustering of values around ptmRS = 100 and ptmRS = 50, indicating either 'certainty', or lack of discriminatory evidence between two possible sites, respectively. In contrast, PTM-score values are more evenly distributed (red plots in Fig. 1D and 1E). This difference is likely due to how the two algorithms were developed; while phosphoRS was optimized with both high and low resolution data 33 , PTM-score was originally developed for phosphosite localization using low mass accuracy ion-trap generated CID data 35 . Unlike phosphoRS, PTM-score treats all observed MS 2 peaks as integer masses 33,36 , meaning that there is limited benefit using PTM-score when high resolution data has been acquired. Furthermore, while PTM-score searches the "n" most intense peaks within a bin of 100 m/z to identify site-determining product ions, ptmRS considers the total number of extracted peaks across the full mass range of the MS 2  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 spectrum, overcoming potential issues of uneven peak distribution in individual m/z bins 33,36 , and is thus better suited for data generated with high resolution mass analyzers.
Both localization tools underestimated the true FLR for EThcD IT data ( Fig. 1B and C), demonstrating the additional benefit of generating site determining c/z as well as b/y ions within a single spectrum. A 1% FLR could not be computed for the EThcD OT dataset, as insufficient incorrectly localized phosphopeptides were identified from the library. Instead, the scores defined for this fragmentation mode (PTM-Score = 0.9; ptmRS = 99.4, Figure 1B, C) represent an FLR of 0.8%. The other PTM-score and ptmRS values computed for phosphosite localization at a 1% FLR are broadly in agreement with those previously defined for a larger synthetic phosphopeptide library using a different orbitrapbased MS platform, demonstrating that the MS acquisition methods and the associated bioinformatics platforms are largely transferable between similar platforms 30 .
In addition to the 1% FDR filtering, 'default settings' in Andromeda apply a score cut off of 40 for post-translationally modified peptides. To investigate whether this artificially reduced the numbers of phosphopeptides identified from our library, all eight datasets were searched again with Andromeda, having removed the requirement for scores to exceed 40 (Supplementary Table 2). An analogous threshold for comparison with MASCOT could not be set since there is not a perfect linear relationship between the two scoring algorithms 35 . Upon removal of this score filter in Andromeda, the numbers of confidently identified phosphorylation sites was broadly similar, with the exception of the high resolution HCD OT and HCD OT nl EThcD datasets, where an additional seven and six phosphosites were identified respectively. The resultant minimal change in confidently assigned phosphosites (max. 4% with HCD OT; 2% decrease with EThcD IT) meant that amendment of the default settings in Andromeda did not warrant further investigation. Default settings for both search engines were thus used in subsequent investigations, these also being the parameters that most end-users will typically apply.

Phosphopeptide identification from a phosphopeptide enriched complex human cell lysate
Having evaluated the eight MS acquisition methods using the phosphopeptide library, we were able to define six methods for this tribrid MS platform worthy of further investigation based on the numbers of correctly site localized phosphopeptides. Performance of these six MS acquisition strategies for phosphopeptide identification and phosphosite localization was then evaluated using a larger dataset derived from a more complex, biologically relevant sample. Phosphopeptides were enriched from a U2OS cell lysate using TiO 2 , and aliquots (6 µl, equivalent to 100 µg from 4 mg digested cell lysate) of the same phosphopeptide enriched sample were analyzed in duplicate by LC-MS/MS using HCD OT, HCD IT, EThcD OT, EThcD IT, HCD OT nl EThcD IT or HCD IT nl EThcD IT (Table  S1).  The number and overlap of unique phosphopeptide identifications using either Andromeda/PTMscore or MASCOT/ptmRS is presented for each of the MS acquisition methods (Table 3, Figure 2, Figures S2, S3). Of the six methods assessed, HCD IT exhibited the least overlap between technical replicates, with up to 44% of phosphopeptides being unique to a single LC-MS/MS run. Other methods exhibited between ~20% (HCD OT nl EThcD IT) and 25% (HCD OT) overlap ( Figure S1).
The highest total number of unique phosphopeptides from the enriched U2OS cell lysate (6877 phosphopeptides above a 1% FDR) was identified using HCD IT and Andromeda (Table 3, Figure 2, Figures S4, S5, S6). This regime maximizes on the capability of the Orbitrap Fusion to parallelize high resolution MS 1 acquisition in the orbitrap whilst simultaneously acquiring MS 2 data in the ion trap. Interestingly, there was little difference in the numbers of unique phosphopeptides identified using MASCOT when MS 2 was performed in the OT versus the IT; 4957 phosphopeptides were confidently identified for HCD OT compared with 4920 phosphopeptides using HCD IT (Table 3). This is almost certainly due to the enhanced confidence in phosphopeptide identification that results when MS data is acquired with higher mass accuracy, as is the case with HCD OT. However, it is particularly interesting to note how Andromeda and MASCOT differentially handle high resolution and low resolution MS 2 data (discussed in more detail below).

Figure 2. Comparison of method-dependent phosphorylation site localization. Confidently localized phosphorylation sites (FLR ≤1%, green) or ambiguous phosphosite assignments (white, grey) from a TiO 2 -enriched U2OS cell lysate, using either (A, B) Andromeda/PTM-score, or (C, D) MASCOT/ptmRS for each of the six Orbitrap Fusion MS acquisition methods. Phosphosites assigned by virtue of neutral loss (NL)-triggered EThcD are also presented. Number (A, C) or percentage (B, D) of phosphosites identified is indicated for each condition.
An important reason for undertaking this study was to evaluate confidence in phosphosite localization. Under the conditions examined, phosphosite localization was optimal when utilizing HCD OT and MASCOT/ptmRS searching. Of the 5733 phosphosites identified, 76% (4337) were confidently site localized under these conditions (Table 3, Figure 2, Figures S4, S5, S6). For the same dataset, 4808 phosphosites were defined using Andromeda/PTM-score, of which 50% failed to meet the 1% FLR cut-off for confident site localization using the previously defined PTM-score of 0.994. Although the proportion of confidently site localized phosphopeptides is optimal overall with the  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 EThcD regimes (both OT and IT), as we observed with the phosphopeptide library dataset, the numbers of phosphosites was compromised compared with either the equivalent HCD method, or the neutral-loss driven strategies. Even considering that the site localization scores applied to the EThcD OT data was slightly more conservative (equating to 0.7% FLR, rather than 1% FLR), the distribution of phosphosite localization scores demonstrates that total numbers of phosphosites is still significantly lower with this MS 2 method, irrespective of search engine (Fig. S4). Not surprisingly, site localization confidence generally decreased as the number of phosphorylation sites per peptide increased, irrespective of the search algorithm employed (Figures S5, S6). The exception was EThcD OT: ~76% of phosphosites were confidently localized with PTM-score independent of the number of phosphate groups; doubly phosphorylated peptides yielded a higher number of confidently localized phosphosites on average (93%) with ptmRS site than singly (86%) or triply (83%) phosphorylated peptides. The performance of Andromeda/PTM-score was uniformly weaker across all datasets compared with MASCOT/ptmRS. The exception was the EThcD IT data for singly phosphorylated peptides, where the percentage of confidently localized sites was more comparable for the two search algorithms (78% for Andromeda/PTM-score, 83% for MASCOT/ptmRS).
Although the trend in confident phosphosite identification is similar to that observed for the phosphopeptide library, the proportion of incorrect or ambiguous assignments is much higher in the lysate-derived peptides, possibly due the greater diversity of peptide size, and the true/false nature of the manner that the phosphopeptide library was used to define correct/incorrect site localization. In contrast, Andromeda/PTM-score performed much better than MASCOT/ptmRS with EThcD IT (but not EThcD OT) data, identifying 12.5% more phosphopeptides, and ~7% more phosphosites with confidence (Table 3, Figure 2).
For both the HCD OT and HCD IT regimes where nl EThcD IT is triggered, the percentage of confidently assigned phosphosites increases with Andromeda/PTM-score compared to HCD alone, particularly for HCD IT. This reflects the high performance of Andromeda/PTM-score with EThcD IT data. However, the total numbers of phosphosites identified with HCD IT are much lower when neutral loss EThcD is triggered due to the increased time required for ETD. Interestingly, although 42% of HCD IT spectra contained precursor neutral loss product ions (either 98 or 80 amu, at ≥10% base peak signal), a significant number of these were not within the top 10 ions that triggered EThcD, and only 16% of HCD IT spectra precipitated the acquisition of EThcD.
The high proportion of confidently localized phosphosites with EThcD IT (76% and 83% from Andromeda/PTM-score and MASCOT/ptmRS respectively), combined with the fact that the two data analysis platforms yielded a high proportion of algorithm unique identifications (Figure 3) suggests that this mixed mode fragmentation regime would likely benefit from data interrogation using multiple informatics pipelines: 31% of Andromeda/PTM-score identifications were unique, while 23% were unique to MASCOT/ptmRS. Perhaps not unexpectedly, the utility of EThcD OT for highthroughput phosphosite identification was severely compromised due to the additional time required for both ETD and OT-based product ion analysis, resulting in much slower overall acquisition speeds for this high resolution mixed mode fragmentation method. Consequently, there was a ~40-50% decrease in the numbers of confidently localized phosphosites using EThcD OT compared to HCD OT.
The difference in site localization confidence for HCD IT versus HCD OT data for the two algorithms becomes much more apparent for the complex cell lysate derived phosphopeptide sample  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 Page 14 of 21 compared to the synthetic phosphopeptide library, with site localization confidence decreasing from 76% to 60% for MASCOT/ptmRS and 50% to 37% for Andromeda/PTM-score ( Figure 2B, D), again emphasizing the benefits of high resolution MS 2 over the reduction in duty cycle afforded by analysis in the ion trap. Evaluation of the distribution of site localization scores for all phosphopeptides facilitates a better understanding of how the two site localization algorithms handle the different fragmentation modes for this complex phosphopeptide sample ( Figure S4). Scoring of EThcD IT data, particularly with ptmRS, yields a much shallower distribution of scores than those for HCD IT. Consequently, large changes in score result in relatively small changes in the number of confidently localized phosphosites. The distribution of scores for HCD OT data is notably distinct between the two algorithms. The elevated mass accuracy of the orbitrap allows ptmRS to maximize its ability to pinpoint the correct site of modification, with ~4000 phosphosites having a ptmRS score of 100. In contrast, PTM-score consistently scores low resolution ion trap data higher, where the increased ion current and enhanced duty cycle likely yields benefits that are not compensated by the inability of this scoring system to handle high resolution data.

Confident phosphosite localization is dependent on the number of potential sites of phosphorylation
To avoid potential confusion when examining the effect of multiple potential sites of phosphorylation (Ser, Thr or Tyr) within a single peptide on site localization confidence, singly phosphorylated peptides only were considered for investigation ( Figure 4; Figures S7, S8).  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 Unsurprisingly, as the number of Ser/Thr/Tyr residues increases, i.e. the number of potential sites of modification increases, the numbers of confidently site localized phosphopeptides decreases with both ptmRS and PTM-score. For HCD OT generated tandem mass spectra, this decrease in confident phosphosite localization is much more apparent with PTM-score than with ptmRS. For those phosphopeptides containing two Ser/Thr/Tyr residues, the phosphosite is confidently localized in 92% of cases using ptmRS, while only 72% are correctly localized with PTM-score. This decreases to 39% for PTM-score when a peptide contains four Ser/Thr/Tyr residues, but only 73% for the same cohort when searched using ptmRS. The trend is consistent for HCD OT incorporating neutral loss triggered EThcD, with 80% of the peptides containing 4 Ser/Thr/Tyr residues from the ptmRS search having confident site localization, but only 49% being confidently localized by PTM-score ( Figure 4; Figures S7, S8). For both scoring algorithms, the numbers of confidently assigned sites with HCD OT nl EThcD IT was intermediary between the numbers observed with either HCD OT and EThcD IT, showing potential benefit of the dual fragmentation approach when considering peptides with multiple possible sites of phosphorylation.
Under all tandem MS conditions examined, MASCOT/ptmRS performed equal to, or better than Andromeda/PTM-score for confident site localization, irrespective of the number putative sites of phosphorylation ( Figure 4, Figures S7, S8). Figures S7 and S8.

Effect of charge state on phosphosite assignment
It is known that the efficiency of ETD is dependent on charge density and is thus optimal for tryptic peptides where the charge state is ≥3 37 . Given that EThcD is a dual fragmentation mechanism, generating both b/y (HCD) and c/z ions (ETD), the total number of ions generated using this fragmentation regime will thus be dependent on charge state, impacting the number of site-Page 16 of 21 determining product ions. We therefore evaluated the effect of charge state on phosphosite localization confidence ( Figure 5, Figures S9, S10). Unsurprisingly, the ability to pinpoint the site of modification was notably improved with EThcD IT compared with HCD IT alone for precursor ions where z=3, with either 84% (MASCOT/ptmRS) or 75% (Andromeda/PTM-score) of phosphosites being defined by EThcD, compared with 53% or 29% respectively for HCD IT. The same is true for EThcD OT compared with HCD OT, with 77% or 42% respectively of 3+ peptide ions being correctly site localized with PTM-score, c.f. 87% (EThcD OT) and 66% (HCD OT) with ptmRS (Figs. S9, S10). EThcD IT also outperformed both HCD OT and HCD IT for confident site localization for ions of charge states 2+ and 4+, albeit with significantly fewer phosphosites being identified in total with EThCD IT than with either HCD method for 2+ ions (Figure 5, Figs. S9, S10).
Both of the MS acquisition strategies invoking EThcD as a consequence of precursor neutral loss (HCD IT nl EThcD; HCT OT nl EThcD) were compromised in terms of the efficiency and total number of phosphosites identified for 3+ and 4+ ions, with no apparent benefit.

Conclusions
In this investigation, we have systematically evaluated eight MS acquisition strategies on the Orbitrap Fusion mass spectrometer, a versatile tribrid MS platform, for their ability to confidently identify and, crucially, to pinpoint sites of modification on phosphopeptides. We have also examined the relative efficiency of two of the most widely used phosphoproteomics data analysis platforms for optimal phosphosite identification: MASCOT integrated into Proteome Discover using ptmRS, and Andromeda with PTM-score.
Using a synthetic phosphopeptide library, we initially defined MS method-specific scores for Andromeda/PTM-score and MASCOT/ptmRS that yielded a 1% FLR. When applied to a complex biologically-derived phosphopeptide mixture, even small changes in the applied scores may yield significant changes in the numbers of phosphosites identified for HCD-mediated fragmentation, and the marked difference in site confidence for the different MS methods at any given value cannot be ignored.
Our findings are largely in agreement with previous observations made using other orbitrap-based MS platforms, which demonstrate that phosphosite localization confidence is optimal with EThCD where a dual ion series is generated 19 . However, the total number of unique phosphopeptides identified, as well as the number of confidently localized phosphosites, is optimal when employing high resolution analysis of HCD fragment ions for MS 2 . MS acquisition strategies invoking neutral loss-mediated ETD-based fragmentation is hampered by both the additional time taken to perform this type of fragmentation in a second round of MS 2 , as well as the surprisingly few phosphopeptide ions that generate neutral loss product ions and thereby invoke this second round of MS 2 analysis.
Differences in the ways that the two bioinformatics platforms handle distinct types of tandem MS data and the number of unique phosphopeptides identified, means that there is likely to be benefit in searching data acquired using a single acquisition strategy using both data analysis pipelines. This is particularly apparent with EThcD, where 31% and 23% of phosphopeptides respectively are unique to either Andromeda/PTM-score or MASCOT/ptmRS. The relatively few unique phosphopeptide identifications with Andromeda for HCD OT data, and the overall reduction in confident site localization using Andromeda/PTM-score for regimes exploiting fragmentation strategies other than EThcD, means that multi-algorithm searching may not be of significant benefit with other types of data.
We conclude that optimal phosphoproteomics analysis on the Orbitrap Fusion Tribrid platform is achieved in the first instance using HCD OT and interrogation with MASCOT/ptmRS. Indeed, based on the settings used and amount of sample analyzed in these studies, we suggest that the benefits of acquiring high resolution orbitrap data are largely negated when using Andromeda/PTM-score. Our data also highlights that there are likely to be additional benefits in terms of increased numbers of confidently localized phosphosites, by implementing EThcD for ions with charge state of ≥3+ and the employment of a multiple algorithm search strategy. Moreover, the 'high-definition ETD' (ETD HD) permissible with the Orbitrap Fusion Lumos, which is reported to facilitate ETD on larger precursor ion populations, will likely result in even greater benefits when applied to such a chargestate mediated data acquisition strategy for phosphoproteomics.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57 58 59

Associated Content
Additional supporting information as noted in the text has been provided.

SUPPORTING INFORMATION:
The following files are available free of charge at ACS website: Table S1. Orbitrap Fusion Tribrid MS acquisition parameters for the eight methods assessed. Table S2. Evaluation of Andromeda score cut-off using synthetic phosphopeptides. Figure S1. Acquisition method-specific phosphosite localization. Figure S2. Overlap between technical replicates processed using Andromeda Figure S3. Overlap between technical replicates processed using Mascot. Figure S4. Distribution of phosphosite localization scores for either PTM-score (A) or ptmRS (B) from cell lysate-derived phosphopeptides Figure S5. Phosphosite localization confidence with Andromeda/PTM-score. Figure S6. Phosphosite localization confidence with MASCOT/ptmRS. Figure S7 Phosphosite localization confidence as determined using Andromeda/PTM-score, as a function of prevalence of common putative phosphorylated residues. Figure S8. Phosphosite localization confidence as determined using MASCOT/ ptmRS, as a function of prevalence of common putative phosphorylated residues. Figure S9. Phosphosite localization determined using Andromeda/PTM-score, as a function of peptide ion charge state. Figure S10. Phosphosite localization determined using MASCOT/ptmRS, as a function of peptide ion charge state.

Author Information
Corresponding author: *E-mail:Claire.Eyers@Liverpool.ac.uk; Phone: +44 151 795 4424 Author contributions: The manuscript was written with contribution from all authors. All authors have given approval to the final manuscript.