Proteomics Using Protease Alternatives to Trypsin Benefits from Sequential Digestion with Trypsin

Trypsin is the most used enzyme in proteomics. Nevertheless, proteases with complementary cleavage specificity have been applied in special circumstances. In this work, we analyzed the characteristics of five protease alternatives to trypsin for protein identification and sequence coverage when applied to S. pombe whole cell lysates. The specificity of the protease heavily impacted the number of proteins identified. Proteases with higher specificity led to the identification of more proteins than proteases with lower specificity. However, AspN, GluC, chymotrypsin, and proteinase K largely benefited from being paired with trypsin in sequential digestion, as had been shown by us for elastase before. In the most extreme case, predigesting with trypsin improves the number of identified proteins for proteinase K by 731%. Trypsin predigestion also improved the protein identifications of other proteases, AspN (+62%), GluC (+80%), and chymotrypsin (+21%). Interestingly, the sequential digest with trypsin and AspN yielded even a higher number of protein identifications than digesting with trypsin alone.

T rypsin is the protease of choice for mass spectrometry (MS)-based proteomics. It cleaves carboxyterminal of Arg and Lys residues, resulting in a positive charge at the peptide C-terminus, which is advantageous for MS analysis. 1,2 Nevertheless, other proteases are frequently used to obtain complementary data. 3,4 Among these, AspN and GluC target acidic amino acid residues (Figure 1a). Both enzymes generate peptide mixtures of comparable complexity to that of trypsin and have been successfully used in many studies. 4−7 Chymotrypsin, which targets primarily aromatic residues, has also been used. 7−9 In contrast, broad specificity proteases are much less widely used in proteomics. This is likely due to the high complexity of the peptide mixtures that they generate. To our knowledge, their application has been limited to prefractionated samples. Proteinase K, for example, was used to "shave" surface-exposed loops from proteins in membrane vesicles. 10,11 Our group has previously shown that the number of identified peptides, when using alternatives to trypsin, could largely be improved by a sequential combination with trypsin. This includes AspN, GluC, chymotrypsin, and elastase for the detection of cross-link sites 12−15 and elastase applied to S. pombe whole cell lysates. 14 The sequential digestion increased the number of identified cross-links up to 19-fold for the Taf4−12 complex compared to digesting with elastase alone. 14 Introducing positively charged C-termini through trypsin improves the detection of previously nontryptic peptides. Importantly, smaller peptides are protected from the second protease. 12,14 Thus, use of two proteases does not lead to the very small peptides that in silico digestion would predict. As a consequence, using elastase after trypsin does not lead to the same peptide complexity as using elastase alone.
In this study, we analyzed whether the introduction of trypsin in a sequential digest might improve the application of AspN, GluC, chymotrypsin, and proteinase K on unfractionated S. pombe lysate.

■ METHODS
Public Data Sets. The data on trypsin, elastase, trypsin− elastase, and elastase−trypsin were taken from our previous work 14 and retrieved from PRIDE with the data set identifier PXD011459.
Sample Preparation. One gram of frozen and ground S. pombe cells were resuspended in 2 mL of RIPA (Sigma-Aldrich, St. Louis, MO) supplemented with the protease inhibitor cocktail cOmplete according to the manufacturer's instructions (Roche, Basel). To remove the cell debris, the samples were centrifugated at 1200g for 15 min. The lysates were subjected to gel electrophoresis on a 4%−12% Bis-Tris gel (Life Technologies, Carlsbad, CA) for 5 min and stained using Imperial Protein Stain (Thermo Fisher Scientific, Rockford, IL). After excising the stained gel area as a single fraction, the proteins were first reduced with dithiothreitol and then alkylated with iodoacetamide.
We used a standardized protocol to desalt and concentrate the peptides on C18 StageTips for subsequent analysis. 16,17 For each condition, the equivalent of 1 μg protein starting material was used.
LC-MS/MS. All samples were analyzed on a linear iontrap− orbitrap mass spectrometer (Orbitrap Elite, Thermo Fisher Scientific, Rockford, IL) coupled online to a liquid chromatograph (Ultimate 3000 RSLCnano Systems, Dionex, Thermo Fisher Scientific, UK) with a C18-column (EASY-Spray LC Column, Thermo Fisher Scientific, Rockford, IL). The flow rate was 0.2 μL/min using 98% mobile phase A (0.1% formic acid) and 2% mobile phase B (80% acetonitrile in 0.1% formic acid). To elute the peptides, the percentage of mobile phase B was first increased to 40% over a time course of 110 min followed by a linear increase to 95% in 11 min. Full MS scans were recorded in the orbitrap at a 120,000 resolution for MS1 with a scan range of 300−1700 m/z. The 20 most intense ions (precursor charge ≥2) were selected for fragmentation by collision-induced disassociation, and MS2 spectra were recorded in the ion trap (20,000 ions as a minimal required signal, 35 normalized collision energy, dynamic exclusion for 40 s).
Data Analysis. MaxQuant software 18 (version 1.5.2.8) employing the Andromeda search engine 19 in combination with the PombeBase database 20 was used to analyze the samples. The following parameters were used for the search: carbamidomethylation of cysteine as a fixed modification, oxidation of methionine as a variable modification, MS accuracy of 4.5 ppm, and MS/MS tolerance of 0.5 Da. Up to six miscleavages were allowed for digests involving trypsin, AspN, GluC, or chymotrypsin and up to 10 miscleavages for digests containing elastase or proteinase K. Frequencies of amino acids were taken from the statistics of the UniProtKB/ TrEMBL protein database release 2019_11 (https://www.ebi. ac.uk/uniprot/TrEMBLstats).

■ RESULTS AND DISCUSSION
Lysate from S. pombe was digested either with trypsin, AspN, GluC, chymotrypsin, elastase, or proteinase K. We also combined each of the proteases other than trypsin in a sequential digest with trypsin as either the first or second protease.
The biggest impact of sequential digestion with trypsin was seen on the performance of proteinase K. Using proteinase K alone led to very few identifications of proteins (proteinase K = 78 ± 33) and peptides (proteinase K = 527 ± 179). This Analytical Chemistry pubs.acs.org/ac Article might be due to very short peptides being generated by proteinase K, which cleaves carboxyterminal of half of all the amino acids. Alternatively, or in addition, the high complexity of the peptide mixture generated by proteinase K might reduce identification rates. Surprisingly, adding trypsin to the proteinase K digest increased the number of identifications for proteins (proteinase K−trypsin = 461 ± 17) and peptides (proteinase K−trypsin = 3169 ± 194). Using trypsin prior to proteinase K further improved on these results as this led to the identification of 8 times more proteins (646 ± 36) and 8 times more peptides (4279 ± 530) compared to proteinase K alone.
In summary, AspN, GluC, and proteinase K profited most of the five tested proteases from the addition of trypsin. The underlying reasons for the observed gains are likely different. AspN and GluC have low amounts of available cleavage sites and therefore generate relatively long peptides. Many of these will be unfavorably long for mass spectrometric detection. In addition, they are missing a terminal positive charge. Adding trypsin introduces such a C-terminal charge and shortens very long peptides, both enhancing peptide detection in MS analysis.
AspN and GluC are highly efficient (Figure 2a), while for chymotrypsin and especially elastase and proteinase K many miscleavages were detected. Although we cannot exclude that undigested protein from the first digest may be the source for the additional identification of peptides and proteins, the high efficiency of GluC and AspN makes it unlikely to be the case for these enzymes. Also, the LC-MS data did not indicate the presence of a large quantity of semidigested proteins, as judged from the absence of a late eluting and highly charged cluster of ions (data not shown).
To analyze possible reasons for the low identification rates of more promiscuous cutters, we looked at the submitted and identified MS/MS spectra (Figure 2b, c). Only proteinase K had a reduced number of submitted MS/MS spectra. This might be due to the complexity of the peptide mixture resulting from proteinase K. However, the main problem was the low identification success of these MS/MS spectra. The same applied to the spectra from other less specific proteases. One of the reasons might be cofragmentation of several peptides as the mixture is more complex than for specific proteases. This is supported by the fact that AspN and GluC showed similar identification rates to trypsin. Another reason might be the increase in the database size and the problems associated with it for identification.
While AspN and GluC are very specific proteases, over 50% of the residues are potential cleavage sites for proteinase K. The problem for proteinase K is therefore not a lack of cleavage sites. Adding trypsin to proteinase K increased identifications and thus ruled out the possibility that peptides generated by proteinase K alone, at least under standard conditions, are generally too short for proteomics. If therefore complexity of a proteinase K digest is the reason for the low identification yields of proteinase K; then, the addition of trypsin must reduce this complexity. Adding trypsin might unify "ragged" proteinase K peptides that share either the N-or C-terminus but have different lengths (Figure 3a). In this way, trypsin leads to a concentration increase of peptides by  Analytical Chemistry pubs.acs.org/ac Article reducing sample complexity. At least when trypsin is used first, an additional mechanism must be considered that was previously described for sequential digestion. 12,14 The second enzyme does not cleave shorter peptides with high efficiency, effectively leading to short tryptic peptides being protected from proteinase K. In either case, the complexity that is normally introduced through proteinase K is reduced by the tryptic treatment. End trimming and short peptide protection alone are likely not the sole explanations. We observed previously that among all observed mixed-protease action peptides, i.e., those peptides that were generated by trypsin action on one end and another protease at the other end, there is a misbalance: tryptic Ctermini are more prevalent than N-termini generated by trypsin (semitryptic peptides with tryptic N-terminus = 652 ± 42, semitryptic peptides with tryptic C-terminus = 763 ± 15) ( Figure 3a). This means also the improved observability of peptides with a basic C-terminal residue contributes to the observed effect of sequential digestion on identification rates.
As an example, we analyzed the 60S acidic ribosomal protein P1-alpha 1 (Figure 3b). There are no trypsin cleavages sites between residues 56 and 90, so this region is not covered when trypsin is used alone. Digesting with proteinase K alone did not improve the coverage for this region, although or possibly because every other residue is a potential cleavage site for proteinase K. Peptides from this region could only be identified when proteinase K and trypsin were used in a sequential digest.
We then wondered how far the proteins and peptides that were observed in the different uses of proteases alone or in combination with trypsin covered different sequence space. We measured this in number of unique residues. As one would expect, this followed the same trends seen for protein and peptide identifications. For AspN and GluC, the largest number of residues was covered when trypsin was used following the other protease ( Figure S-1a, b). For chymotrypsin, elastase, and proteinase K, the inverse order, i.e., trypsin first, yielded the larger coverage ( Figure S-1c−e). Nonetheless, the different conditions yielded substantial nonoverlap. When combining the results of two digestion conditions, one would combine the data obtained by the protease alone with that of a trypsin-first sequential digest. Their overlap is substantially smaller (4 ± 2% to 38 ± 1%) than what we observed here for trypsin replicas (83 ± 2%).
Next, we compared the gain of residues on top of the trypsin digest that was observed for each digestion protocol ( Figure  4a). For AspN, GluC, and proteinase K, there was a significant increase of additional identified residues if trypsin was added prior to the digest. Curiously, the highest gain in residues for AspN was achieved with a sequential AspN−trypsin digest. For elastase and chymotrypsin, adding trypsin prior to their usage did not increase the number of identified significantly. Reversing the order in the sequential digest even decreased the gain in residues.
Finally, we analyzed the gain in identified proteins and residues when using different combinations of digestion conditions (Figure 4b, c). We combined the results of either five replicas of trypsin, trypsin followed by either of the five other proteases, or either of the five other proteases followed by trypsin. An initial trypsin digest served as the reference, in which 1484 proteins and 202,556 residues were identified. This followed the rationale that one would always use trypsin for an initial analysis, although trypsin followed by AspN in a sequential digest consistently gave here higher protein and peptide identifications. The highest numbers of complementary proteins (344) and residues (119,763) were identified

■ CONCLUSION
In this study, we investigated the impact of adding trypsin to other proteases in proteomics. Sequential digestion has been used before, 5,6,21 and we here add a systematic evaluation of different protease combinations. Protein and peptide identifications improved when combining any of the tested proteases with trypsin. This is in line with previous studies on crosslinking identification, which benefited from the sequential digest with trypsin. 12,14 In the most extreme case, the sequential digest with trypsin and AspN outperformed results obtained by trypsin alone. This effect is relatively small, and due to cost considerations, trypsin will remain the protease of first choice in proteomics also after our study. However, situations where alternative proteases are currently used could in the future benefit from adding a sequential digestion step with trypsin. As trypsin is compatible with the buffer conditions of the tested proteases, this requires no other additional step than adding trypsin.