Sequential Digestion with Trypsin and Elastase in Cross-Linking Mass Spectrometry

Cross-linking mass spectrometry has become an important approach for studying protein structures and protein–protein interactions. The amino acid compositions of some protein regions impede the detection of cross-linked residues, although it would yield invaluable information for protein modeling. Here, we report on a sequential-digestion strategy with trypsin and elastase to penetrate regions with a low density of trypsin-cleavage sites. We exploited intrinsic substrate-recognition properties of elastase to specifically target larger tryptic peptides. Our application of this protocol to the TAF4–12 complex allowed us to identify cross-links in previously inaccessible regions.

T rypsin is the enzyme of choice in mass-spectrometry (MS)-based proteomics. The favorable behavior of tryptic peptides during mass-spectrometric analysis is one of the main reasons for this. 1,2 However, as trypsin cleaves after Arg and Lys, peptides generated in protein regions with a low Arg and Lys densities are very long, making them potentially nonobservable by mass spectrometry and thus resulting in poor coverage of those regions.
This problem is particularly relevant for cross-linking mass spectrometry (CLMS). In CLMS, protein proximities and conformations are preserved through the introduction of covalent bonds. The detection of these bonds as cross-linked peptide pairs by MS is translated into distance constraints. 3−7 Currently, the most frequently used cross-linkers are Nhydroxysuccinimide (NHS) esters that primarily target the εamino groups of Lys residues and to a lesser extent also react with the hydroxyl groups of Ser, Thr, and Tyr. 8,9 CLMS critically depends on good sequence coverage to reveal structural information for the whole protein. Cross-linking naturally involves two peptides, which aggravates the problem of overly large peptides. This is exacerbated when the crosslinker reacts with two lysine residues as two potential cleavage sites for trypsin are removed. The stabilization of proteins through cross-linking and the destruction of potential trypsincleavage sites may further affect the efficiency of trypsin digestion and lead to an increase in missed cleavages.
Alternative proteases to trypsin target basic (LysC and ArgC), acidic (AspN and GluC), or aromatic (chymotrypsin) amino acids. 10,11 Although usage of these proteases improves sequence coverage 12,13 and has been used in parallel to usage of trypsin to enhance identification of cross-linked residues, 14,15 some protein regions, such as the N-terminal region of TAF4, a subunit of transcriptional factor II D, still cannot be accessed ( Figure 1). The existence of TAF4b, a cell-typespecific TAF4 paralogue with a unique N-terminal region, points to important functions and interactions mediated through the N-terminal region of TAF4. 16 Cross-linked residues in this region would therefore provide important information on the function and structure of TAF4.
Proteases with broader specificity, like elastase, proteinase K, and pepsin, have the potential of complementing more selective proteases and indeed have been used in membrane proteomics 17,18 to detect phosphorylation sites 19 and even in CLMS 20 with varying levels of success. Elastase, for example, cleaves after Ala, Val, Leu, Ile, Ser, and Thr 17 and should be able to cleave the N-terminal region of TAF4 at many sites ( Figure 1). This broad specificity leads to two potential problems. If cleavage is complete, the resulting peptides would be very small. However, cleavage of elastase is not expected to be complete. Analyses of elastase efficiency on peptides with varying length revealed elastase to have an extended substratebinding region. 21,22 Short peptides that do not cover the complete substrate-binding region are less efficiently cleaved by elastase. Although this addresses the potential problem of peptide length, it leads to another problem. Having many missed cleavage sites leads elastase to generate very complex mixtures with often overlapping peptides. 17 This problem is exacerbated in cross-linked peptides, as two peptides are involved, resulting in a combinatorial increase of complexity. Recent work by our lab revealed a strong substrate preference of proteases for longer peptides. 15 As a consequence, in a sequential digest, preferentially long peptides are cleaved by the second protease, whereas short peptides remain undigested. The complexity of an elastase digest might therefore be vastly reduced by first using trypsin. Only long tryptic peptides should be good substrates for elastase, whereas any sequence range of a protein that is covered by short tryptic peptides would remain unaffected by elastase treatment.
In this study, we investigate the hypothesis whether the sequential digestion of trypsin and elastase might positively impact the use of elastase in CLMS. As a result, we detected cross-links in the difficult-to-digest N-terminal region of TAF4.

■ MATERIALS AND METHODS
Sample Preparation. The human TAF4−TAF12 complex was coexpressed in Sf21 insect cells using the MultiBac system. 23 DNA encoding an N-terminal hexa-histidine tag and a protease-cleavage site for tobacco etch virus (TEV) NIa protease was added to the 5′ end of the TAF4 open reading frame and cloned into pFL plasmid along with TAF12. Baculovirus generation, cell-culture infection, and protein production were carried out following published protocols. 24 Cell pellets were resuspended in Talon Buffer A (25 mM Tris pH 8.0, 150 mM NaCl, 5 mM imidazole, and complete protease inhibitor; Roche Molecular Biochemicals). Cells were lysed by freeze−thawing (twice), followed by centrifugation at 40 000g in a Ti70 rotor for 60 min to clear the lysate. The TAF4−TAF12 complex was first bound to talon resin and preequilibrated with Talon Buffer A; this was followed by washes with Talon Buffer A, then with Talon Buffer HS (25 mM Tris pH 8.0, 1 M NaCl, 5 mM imidazole, and complete protease inhibitor), and then again with Talon Buffer A. The TAF4− TAF12 complex was eluted using Talon Buffer B (25 mM Tris pH 8.0, 150 mM NaCl, 200 mM imidazole, and complete protease inhibitor). Fractions containing the TAF4−TAF12 complex were dialyzed overnight against MonoQ Buffer A (25 mM HEPES, pH 7.5, 100 mM NaCl, 1 mM DTT, and complete protease inhibitor). The complex was further purified using ion-exchange chromatography (IEX) with a MonoQ column pre-equilibrated with MonoQ Buffer A. After binding, the column was washed with MonoQ Buffer A, and TAF4− TAF12 was eluted using a continuous, linear gradient of MonoQ Buffer B (25 mM HEPES pH 7.5, 1000 mM NaCl, 1 mM DTT, and complete protease inhibitor) from 0 to 100%. The complex was further purified by size-exclusion chromatography (SEC) with a SuperoseS6 10/300 column in SEC buffer (25 mM Tris pH 8.0, 150 mM NaCl, 1 mM DTT, and complete protease inhibitor).
TAF4−TAF12 (1.5 μg per condition) complexes were crosslinked with bis(sulfosuccinimidyl)suberate (BS3) (Thermo Fisher Scientific) at ratio of 1:1 (w/w) and incubated on ice for 2 h. The mixture was incubated on ice for 1 h after saturated bicarbonate (50 molar excess) was added to quench the reaction. Frozen Schizosaccharomyces pombe cells were ground, and 1 g of yeast powder was resuspended in 2 mL of RIPA (Sigma-Aldrich) supplemented with cOmplete (Roche). Cell debris was moved via centrifugation at 1200g for 15 min. All samples were separated by SDS-PAGE on a 4−12% Bis Tris gel (Life Technologies) and stained using Imperial Protein Stain (Thermo Fisher Scientific). Appropriate bands were cut and proteins were first reduced with DTT and then alkylated with iodoacetamide. Samples were incubated with trypsin (13 ng μL −1 ; Pierce, Thermo Fisher Scientific) or elastase (15 ng μL −1 , Promega) at 37°C for 16 h. In the case of the elastase−trypsin digestion, trypsin (13 ng μL −1 ) was added to the overnight elastase digest and incubated for 4 h at room temperature. For the sequential trypsin−elastase digestion, elastase (15 ng μL −1 ) was added to the trypsin digest and incubated at 37°C for 30 min. A comparison between 30 min and 4 h of digestion with elastase showed no significant difference (data not shown). Following the digestion, peptides were purified on C18 StageTips using standardized protocols. 25 LC-MS/MS. TAF4−TAF12 complexes were analyzed on an Orbitrap Fusion Lumos Tribrid (Thermo Fisher Scientific) and yeast lysates were analyzed on an Orbitrap Elite (Thermo Fisher Scientific). Both were coupled online to an Ultimate 3000 RSLCnano Systems (Dionex, Thermo Fisher Scientific). Peptides were loaded onto an EASY-Spray LC Column (Thermo Fisher Scientific) at a flow rate of 0.300 μL min −1 using 98% mobile phase A (0.1% formic acid) and 2% mobile phase B (80% acetonitrile in 0.1% formic acid). To elute the peptides, the percentage of mobile phase B was first increased to 40% over a time course of 110 min followed by a linear increase to 95% in 11 min.
Full MS scans for yeast lysates were recorded in the orbitrap at 120 000 resolution for MS1 with a scan range of 300−1700 m/z. The 20 most intense ions (precursor charge ≥2) were selected for fragmentation by collision-induced disassociation, and MS2 spectra were recorded in the ion trap (2.0 × 10 4 ions as a minimal required signal, 35 normalized collision energy, dynamic exclusion for 40 s). Both MS1 and MS2 were recorded in the orbitrap for the TAF4−TAF12 complexes (120 000 mass resolution for MS1 and 15 000 mass resolution for MS2, scan range 300−1700 m/z, dynamic exclusion for 60 s, precursor charge ≥3). Higher-energy collision dissociation was used to fragment peptides (30% collision energy, 5.0 × 10 4 AGC target, 60 ms maximum injection time).
Data Analysis. MaxQuant software 26 (version 1.5.2.8) employing the Andromeda search engine 27 was used to analyze the whole-cell lysates of S. Pombe. We used the PombeBase database 28 with carbamidomethylation of cysteine as a fixed modification and oxidation of methionine as a variable modification. MS accuracy was set to 4.5 ppm and MS/MS tolerance to 20 ppm. For trypsin, we allowed up to two miscleavages. Specificity of elastase was defined as cleavage after Ala, Val, Leu, Ile, Ser, and Thr but not before Pro. Specificity for the trypsin and elastase digest was set to cleavage after Ala, Val, Leu, Ile, Ser, Thr, Arg, and Lys except if it was followed by Pro. For all digests containing elastase, up to 10 miscleavages were allowed. The peaklist for identification of cross-linked peptides was generated using MS convert (ProteoWizard) for tryptic digests with the following settings: peakPicking: 2−, msLevel: 2−, MS2Denoise: 20 100, false or Maxquant version 1. 6 were extracted, and peak areas were quantified using Skyline. 32 In-Silico Digest. TheTAF4−TAF12 complex was digested in-silico using trypsin, elastase or a combination of trypsin and elastase. We assumed a complete digestion with no miscleavages for the in-silico digestion and a minimum length of five amino acids.
Random Distribution. Random links were generated between cross-linkable residues. The links were fitted into the crystal structure of monomeric HSA (PDB: 1AO6) to calculate the distances.

■ RESULTS AND DISCUSSION
Four different types of digestion, trypsin, elastase, sequential trypsin−elastase, and sequential elastase−trypsin, were applied to a cross-linked TAF4−TAF12 complex. We compared these digests to assess the influence of substrate length on the cleavage behavior of elastase. The peptide lengths of the observed tryptic peptides without any modifications ranged from 5 to 65 amino acids, which are very similar to the theoretical peptide-length distribution (Figure 2a). However, the median lengths of the peptides digested by elastase, trypsin−elastase, and elastase−trypsin were considerably higher (12 amino acids) than the predicted medians of 6 amino acids (Figure 2b−d). The majority of elastase-derived peptides are larger than 5 amino acids, and the maximal length of 29 amino acids is considerably shorter than that of trypsin (65). These observations are in line with previous studies on short peptides reporting that the activity of elastase is dependent on the substrate length N-terminal of the cleavage site. 21,22 Our results suggest that a similar restriction might also apply to longer peptides and to the substrate length C-terminal from the cleavage site.
Notably fewer peptide-spectrum matches (546 ± 74 PSM) were identified following elastase digestion compared with those from peptides digested by trypsin alone (869 ± 170 PSM), trypsin−elastase (1116 ± 164 PSM), or elastase− trypsin (1111 ± 100 PSM, Figure 2e). One possible explanation for the reduced identification of elastase peptides could be the missing positive C-terminal charge that tryptic peptides have. To this end, we analyzed semitryptic peptides, as they should distribute equally among those featuring a tryptic C-terminus and a tryptic N-terminus. Semitryptic peptides with tryptic C-termini were identified more often (70 Analytical Chemistry Article ± 2%), in agreement with a positive C-terminal charge being beneficial for detection (Figure 2f). Another disadvantage of elastase is that it frequently misses cleavage sites and overlapping peptides lead to complex peptide mixtures. By digesting proteins with trypsin first, the amount of available cleavage sites for elastase might be reduced, as short tryptic peptides should be protected from elastase because of their size. The products of digested small peptides may not be identified, because of the identification bias against small peptides that occurs in LC-MS/MS. To avoid this, we quantified tryptic peptides before and after the addition of elastase. Peptides that had been reduced by more than half were labeled as "reduced". The remaining peptides were categorized as "unchanged". Peptides were divided into three groups depending on their lengths (≤12, 13−20, and >20). The number of reduced peptides, 19, 70, or 113 peptides, respectively, increased with peptide size (Figure 2g). The labelfree quantification confirms that elastase preferentially digests long peptides; thus, initial trypsin digestion reduces elastaseinduced complexity. Indeed, analyzing the 404−420 region in TAF4 revealed that trypsin treatment protected this region from elastase digestion. Consequently, the complexity within the sequential trypsin−elastase digest was reduced compared with that of the digest from elastase alone or elastase−trypsin (Figure 2h).
In the next step, we analyzed the effect of our sequential trypsin−elastase on the detection of cross-linked peptides. As a proof of principle, we first analyzed BS3 cross-linked humanserum albumin (HSA). As expected, the increase in complexity and the missing C-terminal charge associated with elastase digestion adversely affected the identification of cross-linked peptides. Although digestion of the BS3 cross-linked complexes with trypsin allowed us to identify 152 cross-links, digestion with elastase only led to the identification of 42 cross-links (Figure 3a). This might be due to the increase in the database, which is quite significant because of the n 2 behavior of the search space from cross-linked peptides. However, adding trypsin to the elastase digest improved the detection of cross-links, despite the increase in database size. On the one hand, adding trypsin to the elastase-digested peptides increased the complexity of the samples, but on the other hand, it produced semitryptic and tryptic peptides Figure 2. continued lengths (<12, 13−20, and >20 residues). Peptides with a 2-fold reduction or lower were labeled "decreased". (h) TAF4 (residues 827−880) with the observed cleavage patterns of trypsin, elastase, and the sequential trypsin−elastase treatment. Identified peptides are represented as lines below the protein sequence. Data shown are the means ± standard deviations (SD) of three independent digestions. Trypsin, dark blue; elastase, light blue; sequential trypsin−elastase, light green; sequential elastase−trypsin digest, dark green. featuring peptides with a positive C-terminal charge. As 111 cross-links were identified with the elastase−trypsin digest, the new peptides with tryptic C-termini seemed to have a bigger impact than the increase in complexity. Our analysis so far indicated that reversing the order to trypsin followed by elastase reduces the complexity of the sample and should therefore improve detection of cross-linked peptides. Indeed, more cross-links (131) were identified with the trypsin− elastase digest compared with the elastase−trypsin digest. As a final test of our workflow, we fitted the detected cross-links into the crystal structure of HSA (PDB: 1AO6, monomeric; Figure 3b). As expected from the link-level FDR (5%), 5% of the links derived from the trypsin−elastase digest were overly long (Figure 3c).
Having successfully tested our workflow on HSA, we moved on to analyze the TAF4−TAF12 complex. Sequence coverage of both TAF4 and TAF12 decreased from 74 and 75% to 73 and 65%, respectively, when trypsin was replaced by elastase (Figure 3d,e). In marked contrast, adding elastase to tryptic peptides increased sequence coverage of TAF4 to 93% and TAF12 to 92%. Reversing the order of digestion to elastase− trypsin still increased the sequence coverage of TAF4 and TAF12 to 95 and 85%. Our previous observation on the impact of complexity and positively charged C-termini on cross-link identification held true for the analysis of the TAF4− 12 complex. We identified 162 cross-links with trypsin alone, followed by 115 and 42 cross-links for trypsin−elastase and elastase−trypsin digestion, respectively. A very small number of cross-links (six) was detected when only elastase was used. The cross-links determined from the sequential trypsin− elastase digest were highly complementary to the trypsin data, with only 59 shared out of 218 identified cross-links ( Figure  3f). Importantly, the additional digestion of tryptic peptides by elastase provided data in the previously undetected N-terminal region of TAF4 (Figure 3g).
To test the behavior of elastase in complex mixtures, we applied the protocol to whole-cell lysate of S. pombe. Three characteristics of the identified peptides (peptide length, peptide type, and amount of miscleavages) indicate that the length of the substrate influences the activity of elastase. Although digesting tryptic peptides with elastase reduced the number of longer peptides, the median of all digests containing elastase was 13 amino acids ( Figure 4a). If the activity of elastase were not influenced by the substrate size, the distribution of tryptic, semitryptic, and elastase peptides would be independent of the digestion order. However, sequential trypsin−elastase digestion led to more tryptic peptides than elastase−trypsin digestion (Figure 4b). Shorter peptides were very poor substrates for the second enzyme. Consequently, the first enzyme produced the majority of the peptides. Because of the favorable mass-spectrometric behavior of tryptic peptides, trypsin should always be the first enzyme used in a sequential digestion. Unsurprisingly, a higher number of miscleavages occurred when elastase was used in all enzyme combinations (Figure 4c). Interestingly, treating the sample with trypsin prior to elastase digestion increased miscleavages. This suggests that the treatment with trypsin reduced the available elastase sites, presumably through their "protection" of being in smaller peptides.
The highest number, with 1377 ± 62 protein identifications, was achieved using only trypsin (Figure 4d). Subsequent elastase digestion reduced the numbers to 874 ± 39. Using only elastase yielded 593 ± 6 protein identifications. The least amount of proteins, 492 ± 115 proteins, was identified with sequential elastase−trypsin. More of the identified semitryptic peptides from the sequential digests had tryptic C-termini (55 ± 2%). Although the difference is not as pronounced as for the TAF4−12 complex, it is consistent with a favorable property of a positively charged residue at the C-terminus for MS/MS analysis. This may also explain in part the reduced identification numbers in the sequential digestion compared with trypsin alone. Peptides with a nontryptic C-terminus should be as frequent in the mixture as with a nontryptic Nterminus. A reduced number of identifications paired with a similar number of MS2 spectra between trypsin and sequential digestion suggests that the challenge is in MS2. Indeed, although the number of MS2 for the trypsin (35 203 ± 3088) and the sequential trypsin−elastase digests (35 160 ± 3322) were quite similar, only 31 ± 3% of the MS2 spectra from the trypsin−elastase digest could be matched, compared with 54 ± 2% from the tryptic digest (Figure 4e). In terms of sequence coverage, the larger number of peptides identified in the trypsin digest also resulted in more residues being covered

Analytical Chemistry
Article (190 261 ± 8215) than with the sequential digest (100 570 ± 10 660). However, peptides derived from the sequential digest covered some different regions than the tryptic peptides, expanding the covered sequence space of trypsin by 11% (Figure 4f).

■ CONCLUSION
Elastase digestion leading to complementary data compared with trypsin digestion has been shown before and could be confirmed in our study. 17,33 Although sequential digests have been used before, 34−36 to our knowledge elastase has not been exploited in sequential digestion of tryptic peptides or for CLMS. Elastase generates very complex peptide mixtures, with many peptides having sequence overlap. Conceptually, CLMS does not combine well with the use of elastase or other proteases with broad cleavage specificity. The increase of complexity and loss in sensitivity normally accompanying the occurrence of overlapping peptides is exacerbated in crosslinking. Every cross-linked peptide consists of two peptides and consequently is subject to combinatorial loss. This meets an already low abundance of cross-linked peptides. The cleavage by trypsin prior to elastase digestion reduces the complexity that is typically associated with elastase digestion. The cleavage of tryptic peptides by elastase is biased toward long peptides and increases their detection. As a result, we detect cross-links in the difficult-to-digest N-terminal region of TAF4. Our protocol of using trypsin prior to a complementary enzyme to enhance the performance of the complementary enzyme is also applicable to other enzymes, as was shown recently for AspN, GluC, and chymotrypsin. 15 These gains, at least for elastase, transfer to proteomics at large as shown by our results in S. pombe lysate. We anticipate that our protocol will therefore also be useful for other proteomic applications. This includes the detection of post-translational modifications and the analysis of transmembrane domains. The former benefits from increased sequence observation, whereas trypsin leads to systematically large peptides in the latter.

■ ASSOCIATED CONTENT Accession Codes
The mass-spectrometry proteomics data have been deposited to the ProteomeXchange Consortium 37 via the PRIDE partner repository with the data set identifier PXD011459.