Cleavable Cross-Linkers and Mass Spectrometry for the Ultimate Task of Profiling Protein–Protein Interaction Networks in Vivo

Cross-linking mass spectrometry (XL-MS) has matured into a potent tool to identify protein–protein interactions or to uncover protein structures in living cells, tissues, or organelles. The unique ability to investigate the interplay of proteins within their native environment delivers valuable complementary information to other advanced structural biology techniques. This Review gives a comprehensive overview of the current possible applications as well as the remaining limitations of the technique, focusing on cross-linking in highly complex biological systems like cells, organelles, or tissues. Thanks to the commercial availability of most reagents and advances in user-friendly data analysis, validation, and visualization tools, studies using XL-MS can, in theory, now also be utilized by nonexpert laboratories.


■ INTRODUCTION
In the last few decades, cross-linking mass spectrometry (XL-MS) has evolved into a widely accepted tool in structural biology. Every single cell contains millions of protein molecules, 1 which are part of a highly complex and dynamic interaction network. Furthermore, most proteins are organized in multiprotein complexes with a tightly regulated structure that has a significant impact on their functions. To understand the interplay of those proteins as well as the regulation of biochemical pathways on a molecular level is one of the ultimate goals in life science.
For investigations of protein interaction topologies, approaches such as yeast two-hybrid (Y2H) systems, 2 proximityenhanced biotin labeling strategies (e.g., BioID 3 or APEX 4 ), immune precipitation, and affinity purification coupled to mass spectrometry are already commonly used and are reviewed elsewhere. 5,6 To study protein structures, techniques such as cryo-electron microscopy, nuclear magnetic resonance (NMR) spectroscopy, and X-ray crystallography are well known (as reviewed in detail elsewhere 7 ). All of these techniques were already proven to produce reliable and high-quality results, but they suffer from several limitations: To give some examples, the Y2H system relies on time-consuming genetic modifications of bait and pray protein, affinity purification might lose transient interaction partners during washing, X-ray crystallography works only on crystallized proteins, and NMR works only on small and highly purified proteins. To conclude, all of these methods investigate their analytes in a rather artificial environment or cannot give comprehensive information on their own. XL-MS techniques try to fill this gap by providing complementary information on interaction topologies as well as providing low-resolution information on the tertiary structure of proteins within their native environment.
Whereas most pioneering XL-MS studies were limited to less complex systems like purified protein complexes, the development of MS-cleavable cross-linkers, beginning in 2005 (e.g., PIR, 8 DSSO, 9 DSBU, 10 DEST, 11 CBDPS, 12 and DC4 13 ), together with technical improvements in acquisition and data analysis advanced the technique to system-wide studies. Independent of the cross-linker type used, studies on purified protein complexes are often a good way to start with crosslinking studies because they are far less challenging compared with studies in living cells, isolated organelles, or tissues. All such studies, where the cross-linker is applied to a complex system within its native environment, will be called "in vivo" within this Review for simplicity reasons. We refer the interested reader to other excellent reviews on structural investigations of protein complexes 14−16 or the general design of cross-linkers 17,18 and will focus on the workflows, acquisition techniques, and challenges in data analysis in system-wide studies using MScleavable cross-linkers.
To perform system-wide studies, tremendous efforts have lead to the development of highly sensitive XL-MS workflows that allow us to freeze and visualize the interactome or structural changes of distinct protein complexes within living cells. This is achieved by the formation of covalent bonds between amino acid residues in close proximity. The cross-link positions, types, and number are highly dependent on the reactive sites and the spacer length of the cross-linker molecule. In general, longer spacer arms make the linker molecule more flexible, and more cross-links can potentially be formed. Although this results in higher density data, it also increases the noise, as spatial resolution is decreased. Most available cross-linkers specifically target lysine residues or a few other amino acids that facilitate data analysis, as this limits the number of potential connection combinations. On the downside, the amount of information to be generated is reduced, and nonspecific (e.g., photoreactive 15,19−21 ) linker types were developed to increase coverage.
In summary, the choice of the linker type strongly influences the outcome of an experiment, which is why we aim to help choose the right linker and a corresponding processing workflow in the following sections of this Review.

■ APPLICATIONS AND BOTTLENECKS
XL-MS was originally developed to investigate dynamic protein structures in solution with low resolution. The technique was then expanded to other applications (Figure 1), including the investigation of protein complex topologies and quantitative techniques to report conformational changes of protein assemblies; finally, it was extended to proteome-wide studies.
Investigating the whole interactome (e.g., of a cell) with a high proteome coverage is still one of the most challenging tasks to do. Much effort in the development of cross-linker molecules, enrichment methods, and data acquisition strategies has pushed XL-MS closer than ever to this ultimate goal.
Although many studies employing XL-MS methods were very successful, there are still three main bottlenecks to overcome: First, the miscleavage rate during enzymatic digestion is increased because the cleavage sites are often blocked by the cross-linker. This results in an increased peptide size. Additionally, the cross-linked peptide size is even more increased due to the fact that two peptides are connected to each other. This makes them quite bulky, leading to an altered ionization behavior as well as a complex MS/MS fragmentation, both impeding the data analysis. In some cases, this issue can be alleviated by combining two or more proteases with different specificities in a sequential digest. 22,23 Second, the abundance of cross-linked peptides versus linear (not cross-linked) peptides is very low, and the high dynamic range of protein abundances within the proteome leads to the exclusive detection of cross-linked peptides from high abundant proteins (e.g., refs 24−27). In particular, for in vivo studies, the fraction of intact cross-linker that connects two amino acids in the correct proximity and of close enough distance is even lowered. This is due to the time needed for a linker-reagent to diffuse through a membrane and reach its target (i.e., cytosolic or inner organelle) proteins. Most linkers used are based on N- Figure 1. Applications of XL-MS. A cross-linker consists of three main elements: First, the reactive group either targets specific amino acid residues or nonspecifically reacts with any amino acid. Second, the spacer arm might contain one or more labile sites for MS cleavability. Shorter spacers provide higher resolution structural data but will lead to fewer cross-links. Third, some reagents bear an enrichment handle for the selective capture of crosslinked peptides. The linker molecule can be applied either to single proteins/protein complexes (shown in green) or in vivo (shown in blue). After MS/ MS acquisition and data analysis, the obtained cross-links can give valuable information on the protein structure, complex topologies, conformational changes, specific interaction sites, or (proteome-wide) protein−protein interaction (PPI) networks. hydroxysuccinimide (NHS) esters, and they are partly hydrolyzed during that time in an aqueous environment. 28−30 To tackle this issue, cross-linked peptides are often enriched prior to measurement (see Step 2: Sample Preparation and Enrichment).
Third, one of the main issues in proteome-wide studies is the search space. Because the cross-links consist of two peptides instead of individual ones, the number of theoretical peptide− peptide combinations increases quadratically with the size of the database. This so-called n 2 problem further increases the chance of random hits and therefore has a disadvantageous impact on the confidence in assigned cross-links. 31 Because of the explosion of needed search space, this bottleneck has made studies of complex samples impossible for primary XL-MS methods.
To circumvent this issue, some software algorithms have recently implemented specialized search strategies. For example, the cross-link search software pLink2 32 implemented a twostage strategy, where a fragment index is first created from the in silico digested peptides to identify the first of the two crosslinked peptides (with an unknown and large modification). In the second step, only the top coarse-scored hits are used to identify the possible candidates for the second peptide of the cross-link. Only this small number of final candidate cross-links is fine-scored to reduce the overall search time.
Aiming to further increase the confidence in cross-link matches as well as to minimize the search time needed, cleavable cross-linkers were developed. This compound class bears a labile functionality, which is cleaved upon collisional (collisioninduced dissociation (CID)/higher-energy collisional dissociation (HCD)) or electron-transfer fragmentation (ETD). The dissociation of the cross-linker preferably occurs at potentials lower than (e.g., sulfoxides 9 ) or comparable to (e.g., urea functionality 10 ) those needed for peptide backbone fragmentation. By that, characteristic cross-link ions are formed ( Figure 2). This circumvents the n 2 problem because ideally each individual peptide mass plus the known mass of an attached linker fragment can be detected.
By using MS-cleavable cross-linkers, several impressive proteome-wide studies were already performed. For example, more than 7400 unique cross-link sites were confidently (1% false discovery rate (FDR)) identified in Drosophila melanogaster embryo extracts using DSBU (linker details and structure; see Step 1: Cross-Linking Reaction). Of those, up to 4000 linked sites were identified in a single replicate. 25 Comparable numbers of more than 1000 cross-link sites were also seen using the DSSO linker, for example, in human lung adenocarcinoma cell lysates. 24 A recent study obtained their data with DSSO in human immortalized myelogenous leukemia cell lysates (K562 cells) and analyzed it with a novel algorithm called MaxLinker. They even boosted their numbers close to 10 000 unique crosslinked sites at 1% FDR. 33 All of these studies were performed on cell lysates. Only a few studies were reported where the cross-linking reagent was applied directly onto living cells, presumably due to an even increased sample complexity and reduced cross-link yield due to an increased hydrolysis time during the diffusion of the linker through the cell membrane. Those in vivo studies usually employ cleavable cross-linkers that additionally bear an enrichment handle. The Bruce group pioneered in vivo cross-link studies by developing so-called protein interaction reporter (PIR) linkers, 8,27,34,35 which are membrane-permeable, selectively enrichable, and MS-cleavable. Upon fragmentation, a reporter ion of a specific mass is formed to identify and select cross-linked ions for further fragmentation (see Figure 2). Using this linker class, more than 3300 unique cross-link sites were found after the application of the linker to living HeLa cells, protein extraction, and final enrichment using the biotin handle of the PIR linker. With this data, they were able to take a glance at the interactome of those cells; however, it mainly contained interaction sites of the abundant HSP90 protein complex. 36 38 investigating membrane proteins, and to isolated intact mitochondria, 39 investigating the mechanistic details and the interactome of the synthetic peptide SS-31, which improves mitochondrial function. The Bruce lab also already demonstrated the use of their cross-linker in tissues. More than 2000 lysine−lysine cross-links were identified after cross-linking minced mouse heart tissue, isolating mitochondria, and enriching for cross-links afterward. 40 This impressively shows the rapidly expanding scope of crosslinking in combination with mass spectrometry from an in vitro to an in vivo application. However, to the best of our knowledge, all proteome-wide in vivo studies exclusively covered high abundant proteins. In conclusion, the improvement of cross-link enrichment using a selective handle seems to be a logical strategy. Whereas selective enrichment is often done via biotin (as previously mentioned for the PIR linkers), other linkers contain alkyne (e.g., Leiker linker, 41 cliXlink 42 ), azide (e.g., DSBSO, 26 Azide-DSG 43 ), or phosphonate tags (PhoX 44 ), allowing for a more effective enrichment. In particular, for azide-tagged linkers, our lab recently streamlined the enrichment protocol. That is, we circumvented the need for biotin as an intermediate step. 45 Of note, affinity-based enrichment methods usually do not differentiate between monolinked peptides (= type 0 or dead-end cross-links; one side of the crosslinker is hydrolyzed and not connected to any amino acid) and cross-linked (= type 2) peptides. Because those monolinked peptides are formed in excess, the more informative cross-linked peptides are still a minority within the total peptide population. Therefore, the combination of affinity-based enrichment strategies with more conventional techniques like size exclusion chromatography (SEC) might be beneficial to reduce the noise of monolinked peptides that are of lower average size. Recently, ion mobility mass spectrometry was also shown to reduce the background of monolinked and linear peptides. 46 The technique adds the collisional cross-section of ions as another separation dimension and thereby selectively accumulates and releases ions based on size, shape, and charge. 47 In theory, a higher proteome coverage can also be obtained by combining linkers with orthogonal reactivities to amino acids, as more theoretical protein positions will be connected. Such an approach was recently successfully shown for the investigation of carbonic anhydrase protein complexes, 48 but in vivo data is still lacking. The downside of such high-density cross-linking experiments is an impeded enzymatic digestion and a complex fragmentation behavior complicating the data analysis.
Another study suggests that the enrichment of cross-links might be of limited use to increase proteome coverage, as crosslinks are predominantly formed on the high abundant proteins. The usage of cross-linker reagents in very high excess ratios was shown to alleviate this issue, as more cross-links are also formed on lower abundant proteins. 49 This would be highly interesting for many potential in vivo studies. However, such (on purpose) over-cross-linked systems bear an increased risk of finding falsepositives or formed cross-links that do not report native confirmations anymore.
In conclusion, recent advances have led to a wide expansion of possible applications from uncovering protein structures to system-wide applications aiming to capture whole interactomes or interactome changes in vivo. As recently estimated by O'Reilly and Rappsilber, 14 the theoretical number of cross-links from and to the 4000 most abundant proteins in a human cell, formed with a commonly used NHS-ester-based reagent, can be estimated to be >200 000. With a maximal number of ∼10 000 cross-links generated in lysates, which is even lower when applied in vivo, there is still some way to go in our aim to get a more comprehensive map of the human interactome.

■ WORKFLOW AND EXPERIMENT DESIGN
Because the number of available cross-linking reagents, enrichment techniques, data acquisition strategies, and data analysis tools has expanded to a vast array, the following section aims to give an overview of the available XL-MS workflows.
Step 1: Cross-Linking Reaction As discussed in detail in another review, 50 since 2016, the number of studies employing MS-cleavable cross-linkers is increasing as they alleviate data analysis, especially for large and complex samples. Their cleavage is usually induced upon CID/ HCD or, more rarely, via ETD fragmentation, producing specific product ions. Common labile groups are urea (DSBU, DAU, CDI), sulfoxide (DSSO, BMSO, DHSO, DSBSO), quaternary amines (DC4), or the aspartic acid to proline peptide bond (PIR linkers). A selection of cleavable cross-linker agents and their properties is shown in Table 1.
The majority of the reagents used so far are NHS-ester-based, and they were reported to target primary amines (lysine residues). The vast majority of studies exclusively focus on the search for lysine−lysine cross-links to alleviate data analysis. NHS esters are furthermore popular because lysines are evenly distributed and of relative high abundance on the surface of proteins. However, NHS esters are also reactive toward other nucleophiles, such as serine, threonine, and tyrosine residues, to a lower extent. The reactivity is highly controlled by neighboring amino acids as well as the pH value during the cross-linking reaction. 51 The biased approach of targeting only lysine residues leads to ineffective coverage for lysine-deficient regions and hampers cleavage by trypsin, which is commonly used during digestion. To complement data from NHS-based studies, a few cross-linkers targeting other amino acids, such as the noncleavable SufFEx 52 (heterobifunctional: NHS ester + less reactive sulfonyl fluoride targeting all nucleophilic amino acids) or ArGOs 53 (homobifunctional, targets arginine), hydrazine-based acidic cross-linkers, 54 or cleavable linkers (shown in Table 1) such as DAU 55 (homobifunctional, targets cysteines), DHSO 56 (homobifunctional, targets acidic amino acids), and SDAD 57 (heterobifunctional: NHS ester + diazirine reacting in an unspecific manner), have recently been developed. Unspecific cross-linkers promise to provide an unbiased analysis of distance constraints within protein complexes, but their analysis is complicated because more mixed spectra with the cross-linker simultaneously attached to many different positions will occur. This issue is partly addressed by the usage of heterobifunctional linkers, such as the aforementioned SufFEx. Here one side of the linker contains a selective NHS ester, and the other side can react with histidine, serine, threonine, tyrosine, or lysine. However, for the already complicated data analysis of proteome-wide studies, the use of an unspecific linker might be a tough choice, which is likely the reason why this has not been done so far.
Because some linker substances lack membrane permeability or are already partly hydrolyzed after entering the cell, significantly reducing the reactivity, the stabilization of transient interactions with formaldehyde 58 or glutaraldehyde 59 as mild pre-cross-linking agents was reported as a workaround. Because of its small size, formaldehyde has excellent membrane Journal of Proteome Research pubs.acs.org/jpr Reviews permeability and shows high reactivity toward DNA and amino acids. 60 For example, changes in the interaction of the 19S to the 20S subcomplex of the 26S proteasome upon treatment with hydrogen peroxide were shown by first freezing this interaction within its native environment using formaldehyde followed by DSSO application on beads in a later step to identify cross-link sites by mass spectrometry. 58 With this, the actual detected cross-linker DSSO can be applied on the already concentrated To directly cross-link complex samples in one step, selective enrichment handles attached to the cross-linker reagent are a promising strategy. Biotin tags are most commonly used. They profit from many commercially available tools for an effective enrichment via the strong interaction with streptavidin. On the downside, biotin is relatively bulky, which might hinder the reagent from reaching reactive sites on proteins. Furthermore, endogenous biotinylations might potentially interfere with the selective enrichment of cross-linked peptides. More recently, reagents bearing an alkyne tag for a click-chemistry-based enrichment or a phosphonate tag were developed.
The recently published PhoX 44 linker takes advantage of being enrichable via immobilized metal affinity chromatography (IMAC). This technique was originally developed for the enrichment of phospho-peptides 66 and is already established in many proteomic laboratories. It has reached very high enrichment reproducibilities and specificities of >95%. 67 To preclude the coenrichment of phospho-peptides, samples can be treated with a phosphatase, cleaving off phosphate groups but keeping the more stable phosphonate tag on the PhoX linker intact. By applying PhoX to a human cell lysate, more than 1100 cross-linked sites were successfully identified in a single measurement after IMAC enrichment. Although this shows that the phosphonate group is also highly applicable for an effective enrichment from complex samples, the data obtained with PhoX was searched against a reduced fasta file containing only the most abundant proteins to tackle the n 2 problem. 44 In conclusion, an ideal cross-linker for in vivo studies not only is selectively enrichable but also contains an MS-cleavable group to facilitate the data analysis. Different types of such linkers are shown in Table 2.
Most of those selected reagents are cleavable upon CID fragmentation (CBDPS, DSBSO, pBVS, PIR), whereas, for example, the DEB linker does not bear a liable group in its spacer arm per se but forms diagnostic ions through the cleavage of its connection to an amino acid upon ETD fragmentation (as schematically illustrated in Figure 2). As already mentioned, the class of PIR linkers pioneered the field of cleavable cross-linker molecules by capitalizing a weak aspartic acid−proline peptide bond.
In contrast, CBDPS 12 bears a thio-functionality as a CID cleavable site. It is further available in different isotopically coded versions, which generates a distinct isotopic signature in the resulting MS/MS spectra. This facilitates data analysis while increasing confidence in cross-linked spectra. For an easy and selective enrichment, the linker has a biotin handle.
The bulky biotin groups of the PIR linkers as well as CBDPS might, however, lead to steric hindrance for reaching the reactive site on a protein surface.
This issue, among others, is addressed by the recently published and rather compact pBVS 68 linker. It is furthermore the first linker reagent containing vinyl sulfones as a reactive group. Whereas most other available options are exclusively targeting primary amines and are therefore biased for lysine residues, vinyl sulfones were reported to be reactive toward cysteine, lysine, and histidine residues. 69,70 MS cleavability is enabled via a retro-Michael addition at higher collisional energies. In addition, a phospho-tag can be selectively enriched via IMAC. The tag is liable, and upon MS fragmentation, an additional diagnostic ion is formed. A likely disadvantage of pBVS is the coenrichment of endogenous phospho-peptides. In contrast with the aforementioned PhoX linker, a selective dephosphorylation of only the peptides is not possible in this case.
Another very well working cross-linker fulfilling all important criteria for in vivo studies is DSBSO. 26 It was shown to be membrane-permeable, and it has a sulfoxy group as a liable site and an azide tag for enrichment. This tag undergoes a bioorthogonal click reaction to alkynes and thereby can be connected to biotin. After that, it can easily be enriched from complex cellular environments using streptavidin. Recently, we additionally developed a streamlined enrichment workflow with improved performance by directly coupling the linker to dibenzocyclooctyne (DBCO)-functionalized beads. 45 A remaining bottleneck of most novel linkers, especially for nonchemists, is their challenging multistep synthesis. To the best of our knowledge, of all of the potentially highly effectively usable reagents shown in Table 2, only CBDPS and, very recently, DSBSO are commercially available. This limits their usage by the broader scientific community.
In a nutshell: How does one start with an in vivo XL-MS experiment?
• When using NHS-based linkers, note their sensitivity to humidity. Consider storing the linker dry and always prepare fresh stock solutions (e.g., in dry DMSO) for each experiment. • Select an appropriate cross-linker for the proposed (in vivo) study: Consider the need for MS cleavability, the availability of an enrichment handle, the membrane permeability, and, if synthesis is not applicable, the commercial availability. • For proteome-wide studies, more flexible linkers with longer spacer arms might be beneficial; for modeling protein structures or interaction sites, shorter spacer arms will give higher resolution results. • Start with an NHS-reactive linker; however, the combination of several linkers targeting different amino acids might increase the coverage, if needed. • Useful and detailed protocols using DSBU, CDI, 82 DSSO, 24 or the enrichable PIR 137 linker for proteomewide approaches can be found in the literature.
Step 2: Sample Preparation and Enrichment Enrichment of cross-linked species is crucial due to their low abundance compared with non-cross-linked peptides. The cross-linking reaction efficiency is controlled by steric factors (surface accessibility and proximity of amino acids), the general linker reactivity, and the protein concentration. The formation of monolinked peptides (see Applications and Bottlenecks section) and loop-linked, cross-linked, and higher order linked peptides further increases the sample heterogeneity, an issue that becomes even worse in the already complex protein mixtures of proteome-wide studies.
To prepare samples for data acquisition, which is usually done via a bottom-up approach, a proteolytic digestion of all proteins is performed. To do that, trypsin is the most commonly used protease. It cleaves peptide bonds after lysine and arginine residues. Those amino acids show a relatively even distribution in most proteins, which gives peptides of suitable and homogeneous size. Furthermore, by using trypsin, each peptide presents a terminal amino group and bears an amino group of one lysine or arginine. Under acidic conditions, this usually produces peptide ions of double-positive charge.
Journal of Proteome Research pubs.acs.org/jpr Reviews When an amine-reactive cross-linker (as most cross-linkers are) is attached to the lysine residue, this likely leads to a miscleavage site and an elongated average peptide size after digestion. Furthermore, two peptides are connected to each other, which further increases their size as well as the average charge. These properties can be capitalized for cross-link enrichment via SEC 73 or strong cation exchange (SCX). 74 Both methods were already successfully applied to analyze complex samples. For lower complexity samples, the usage of mixed-mode Stage-Tips 75 appears to be the most convenient way for separation, omitting the need for larger or expensive chromatographic systems. For higher complexity samples and for loading amounts >100 μg, fractionation on a highperformance liquid chromatography (HPLC) system leads to better coverage. 24 The final coverage of cross-links can be further boosted by sequential digestion. 22 Affinity-based enrichment strategies selectively target a tag on the cross-linked peptides (see the enrichable cross-linkers in Tables 1 and 2). Most commonly, cross-linked peptides are bound to beads (e.g., via the strong biotin−streptavidin interaction) followed by stringent washing to remove nonlinked peptides and other undesired components of the matrix. The background can be reduced to very low levels. Furthermore, only one fraction to be measured is generated. By the choice of suitable elution conditions (e.g., elution in an acid, as done for DSBSO 26,45 ), the need for a final desalting step is omitted. The downside of affinity-based enrichment workflows is that they are also selective for monolinked peptides. As such, monolinks still are usually more abundant compared with cross-links but are less informative, and they may hamper optimal analysis results.
In conclusion, no enrichment technique yields perfectly pure cross-link samples alone. However, the combination of orthogonal techniques likely improves the achieved purities. This was already impressively shown for the combination of SEC and SCX enrichment, 76 and the combination of affinitybased enrichment with SCX 77,78 was already successfully applied for in vivo studies as well. Very recently, ion mobility, which is performed online during measurement, was discovered as an additional separation dimension for cross-linking studies. Ions are thereby separated by their collisional cross-section, which is dependent on their size, shape, and charge. By combining an affinity enrichment with ion mobility separation done in between the chromatographic system and the MS, the number of interfering residual monolinked peptides was clearly reduced for protein samples of different complexities. Although monolinked peptides remained in relative abundance over cross-linked peptides for high-complexity samples, this resulted in a boost of obtained final cross-link numbers. 46 Furthermore, the use of high-field asymmetric waveform ion mobility spectrometry (FAIMS) was successfully used to filter lower charge-state ions and reduce the background signals of linear peptides. When analyzing medium-complexity samples, this technique generated similar final results as SEC but omitted the need for fractionation. The combination of SCX with FAIMS was reported to boost the final cross-link identification numbers by 56% for cross-linked HEK293 cell lysates compared with using SCX alone. 79 The combination of orthogonal enrichment techniques is justified by the clearly improved reduction of sample complexity, which improves the spectra quality and facilitates data analysis. On the downside, each added sample preparation step will result in sample loss, which might be a limiting factor, especially if the initial sample input cannot be scaled up sufficiently. This makes careful planning necessary, aiming to maximize the sample recovery as well as minimize the final sample complexity. An overview of the enrichment strategies used in the field is shown in Figure 3.
In a nutshell: How does one enrich XL peptides accordingly?
• For cross-linked single recombinant proteins, enrichment is not mandatory; with increasing sample complexity, the need for an effective enrichment increases to maintain coverage.
• If cross-linkers without an affinity tag are used, then SEC, SCX, ion mobility, or a combination of those are preferred options. Journal of Proteome Research pubs.acs.org/jpr Reviews • Affinity handles are the preferred choice for studies of whole cells, enabling a selective enrichment and stringent washing.
Step 3: Data Acquisition Another central part of each XL-MS workflow is the actual data acquisition. The specific settings will depend on the type of sample, the cross-linker reagent (compare to Figure 2), and the available mass spectrometer. Significant improvements in the sensitivity and resolution of mass spectrometers have tremendously pushed the field. Usually a liquid chromatography (LC) system is coupled to the mass spectrometer. The optimization of the LC settings might be as crucial as the actual MS acquisition. The elution of (cross-linked) peptides is usually done by a gradient with increasing concentrations of organic solvent (i.e., ACN) from a reversed-phase column. The gradient length might vary somewhere between 1 and 3 h depending on the sample complexity. To fully elute more hydrophobic cross-linked peptides, this gradient often ranges to higher organic solvent concentrations compared with the gradients used for linear peptides. Furthermore, low concentrations of DMSO (e.g., 5%) can be added to the sample tube prior to injection to minimize sample losses due to hydrophobic cross-linked peptides adhering onto the plastic material of reaction tubes. 24,45 Most data-dependent analysis (DDA) strategies will record only higher charged ions (usually with z ≥ 3+ ≤ 8+), aiming to predominantly record cross-linked peptides. In particular, early studies with MS-cleaveable cross-linkers often relied on an MS2−MS3-based acquisition strategy. Here the cross-linker reagent is specifically fragmented at lower collisional energies at the MS2 level to obtain the masses of each individual peptide (e.g., linkers with sulfoxide liable groups, e.g., refs 24 and 26) Those peptides are selected for further fragmentation at higher energies at the MS3 level for peptide sequence identification. Alternatively, a second MS2 scan can be performed, enabling the application of a complementary fragmentation (e.g., ETD). Such MS2−MS2 strategies can produce more sequence-specific peptide ions. 80 The Bruce group further developed a specialized acquisition strategy for their PIR linkers called real-time analysis for cross-linked peptide technology (ReACT 27 ), taking advantage of a specific reporter ion formed at the MS2 level, which is used to specifically select ions for fragmentation at the MS3 level.
The aforementioned techniques, however, reduce the throughput. Stepped higher-collisional-energy-dissociationbased fragmentation techniques were shown to be advantageous to boost the number of identifications. 81 Here again, the characteristic ions upon linker fragmentation are produced at lower collisional energies, and peptides are fragmented at higher energies, but all ions are contained in a single MS2 spectrum. This avoids intensity losses from MS2 to MS3, also enables acquisition on devices that cannot record on the MS3 level, and alleviates data analysis, especially for software packages that cannot deal with MS3-based data (e.g., MeroX). Such stepped strategies are also commonly used for linkers that have a urea functionality as a cleavage site with a similar stability as a peptide bond. 82 Very recently the combined use of several fragmentation energies (i.e., CID/HCD/EThcD) was reported to improve the confidence in cross-link identification; however, specialized software is needed to comprehensively analyze and compare the data. 83 In a nutshell: How does one optimally acquire XL-MS data?
• Keep in mind that cross-linked peptides are more hydrophobic, which might make adaptions in LC gradient advantageous to fully elute all peptides. • Select a proper acquisition strategy depending on the used cross-linker molecule. Stepped collisional energy acquisition omits the need for the MS3 level while delivering excellent results. Also, the combinatorial use of different fragmentation methods can be advantageous to improve the confidence in cross-link IDs.
Step 4: Data Analysis Because of the high complexity of cross-linked peptide fragmentation, the high variability of cross-link chemistries, and a high demand for reliable results, data analysis and validation are probably the most crucial parts within an experimental workflow. This has led to the development of  Table 3. In a first step, data is usually preprocessed, meaning that the RAW format from the MS device is converted to an open file format like MGF or mzML. This can be done by freely available tools such as MSConvert. 84 The (converted) input files are then used for the cross-link search. As previously mentioned, it is hard to deal with noncleaveable linkers in combination with larger or more complex samples due to the n 2 problem. Much effort, however, has lead to the implementation of algorithms capable of still tackling this issue within a reasonable search time. As previously mentioned, pLink2, one of the most commonly used programs, creates a fragmentation index that is used to identify the alpha peptide first followed by a search of the beta peptide against a peptide index. 32 Also, xiSEARCH aims to computationally unlink connected peptides to alleviate the n 2 problem. 22 For cleavable cross-linkers, XlinkX 85 and MeroX 82 are commonly used and user-friendly options that both allow us to search custom defined cross-linkers. Both tools are additionally capable of searching noncleaveable cross-linkers, and both support export functions for data visualization tools (see Step 5: Data Visualization). XlinkX exists as a stand-alone version or is integrated as nodes within the software Proteome Discoverer (Thermo), which is commercially available. It is further compatible with MS2−MS3 acquisition strategies and can directly process Thermo-RAW files without conversion (within Proteome Discoverer). MeroX is freely available as standalone software and is continuously updated. The most recent version, in contrast with XlinkX, further estimates the FDR of inter-, intra-, and monolinked peptides separately to improve the reliability of the results.
The search settings have to be optimized for each experimental setup. This is illustrated for the two more commonly used programs, MeroX and XlinkX, in Figure 4. Here BSA was cross-linked using DSBU. The digested protein was analyzed on an Orbitrap using a stepped collisional HCD method (data as published by Stieger et al.; 81 data, fasta files, and search settings are made available via the PRIDE repository with the data set identifier PXD021648). In particular, the size of the used database or the search mode highly influences the quality of the result: MeroX offers four different analysis modes. The quadratic mode is used for noncleavable cross-linkers. The rise mode is designed for MS-cleavable linkers and scans spectra for cross-link doublet ions. It exclusively searches for cross-linked spectra that contain at least one doublet signal of each connected peptide. The proteome-wide mode also scans for doublet ions from cross-linker fragments, requiring signals of at least one of the connected peptides. It subsequently tries to match fragments to this to peptide. Once one peptide is matched, the precursor mass of the second peptide can be calculated, and fragments will again be matched to the second candidate. This mode includes a prescoring mechanism and is eligible for very complex (e.g., proteome-wide) samples due to its increased speed (Figure 4 C). The riseUP mode is basically a combination of the rise and the proteome-wide mode and therefore maximizes cross-link Journal of Proteome Research pubs.acs.org/jpr Reviews identifications. 25 We analyzed the DSBU-linked BSA against a database containing BSA and including an increasing number of up to 10 000 human proteins. As shown in Figure 4A, the number of cross-links within BSA slightly decreases with the increasing size of the database. This is likely due to an increasing number of possible decoy hits and therefore a more stringent score cutoff chosen by the software. Similar results were obtained for XlinkX. On the basis of the specific experiment design, no score separation between the decoy and the target inter-cross-links is possible (as both are wrong here). This leads to a nonfunctional target-decoy analysis for inter-cross-links and the acceptance of very low scored (false-positive) inter-crosslinks. However, such comparisons can be useful to empirically find a minimal score-threshold that leads to an accepted number of false-positives (e.g., 1 or 5%). When applying the softwarerecommended score thresholds of 50 25 (MeroX) and 40 + delta score 4 85 (XlinkX), the number of false-positive non-BSA crosslinks identified drops below 5% for database sizes up to 1000 proteins and is also clearly reduced for the largest database search ( Figure 4B). On the basis of our experience, the riseUP mode should be preferred over the proteome-wide mode, assuming that the computer used is powerful enough, as riseUP will use more resources and more analysis time ( Figure 4C). For this test data, MeroX generally outperforms XlinkX; however, results likely differ for other test systems with different crosslinkers. In conclusion, the analysis of data with several algorithms might be beneficial to obtain complementary information or to increase the confidence in the obtained results. In the case of lower scored cross-links that are of a specific interest for a project, a manual inspection of the spectra is still highly recommended.
To date, a standardized solution, especially for a reliable and robust FDR estimation, is still lacking, leading to many individualized strategies.
In line with the observations from Figure 4, Beveridge et al. 86 and Ser et al. 87 showed that the actual FDR of many tools is often much higher than the estimated one (up to 32% actual instead of 1% estimated FDR 86 ), which can be alleviated by using an empirical score cutoff. 87 A more universal and reliable strategy would be to improve the FDR estimation. The classical target decoy approach used for non-cross-linked samples is commonly adapted for cross-linking MS. As discussed by Mintseris et al., 48 the size of the decoy database is much higher compared with that of the target database for cross-linked samples. They proposed that reducing the size of the decoy database uniformly simplifies the FDR estimation and reduces the search time.
Most algorithms calculate the FDR on CSM (cross-link sequence match) level. As demonstrated by the Rappsilber group 31 this leads to a potentially much higher error for the actual cross-linked residue pairs of interest. As demonstrated in a preprint, 88 combining CSMs with unique cross-linked sites increases the actual FDR up to 47% (also dependent on the size of the search database used). It is demonstrated that a separate calculation of inter-, intra-, and monolinked spectra as well as the merging of CSMs to their respective protein−protein interaction (PPI) sites prior to the FDR calculation improve the reliability, leading to correct FDR estimations.
Another common approach to validate a software-generated error rate is by matching cross-links to a known 3D structure.As recently reported by Yugandhar et al., 89 this approach might significantly underestimate the actual error. They suggest quality measurements for cross-link data in addition to structure-based measurements: the fraction of misidentifications originating from an unrelated organism (similarly as done in Figure 4), the fraction of cross-links representing known interaction sites, and, for those interactions that are presumably novel, confirmation by orthogonal experiments. Such quality measurements should be included in every cross-linking study; in particular, the combination with complementary and already known data will clearly improve the confidence in the results.
Changes in the relative abundance of formed cross-links are further relevant for studies, for example, investigating conformational changes of protein complexes. For such studies, either isotope-labeled cross-linker reagents can be used or ion intensities are compared between runs to perform a label-free quantification. Some data analysis algorithms have such a function directly implemented. However, a suitable quantitation software such as Maxquant, 90,91 Skyline, 92,93 or Apquant 94 can be used for the quantitation of cross-links that were identified by a different software.
In a nutshell: How does one analyze XL data?
• Choose a software that is suitable for the type of crosslinker and the complexity of the sample. • For proteome-wide studies using MS-cleaveable linkers, MeroX is frequently used. It is freely available and user friendly due to the graphical user interface (GUI) and the quick-setup function. • Consider including FDR controls (e.g., spike with peptides from a different organism), and note that no consensus has been reached yet on a proper validation due to the vast heterogeneity of study designs. • Consider comparing results of different analysis algorithms to increase the confidence in the obtained results.
Step 5: Data Visualization Depending on the search engine used, lists of cross-linked sites, cross-linked peptides, and monolinked peptides will be generated in a specific format. Whereas some tools, such as MeroX, directly provide limited visualization options, like showing distance constraints compared with a Protein Data Bank (PDB) structure or showing interprotein cross-links within a network graph, the data is usually exported and processed by a different software for rearrangement, validation, or graphic visualizations. Such visualizations are especially useful to get an overview of the existing PPIs or the sequence coverage of crosslink positions or to validate cross-links on a structure. Because of the diversity of search engines, variations in the output data format complicate platform overlapping comparisons. CroCo 112 is a tool that was specially developed to alleviate this obstacle by converting output files (e.g., from Kojak, Xi, pLink, or MeroX) of different search engines to input files for several data visualization tools (e.g., xVis, xWalk, or xiNET) Most visualization tools are web-server-based and are best suited for specific tasks: The tools xWalk 113 and XlinkAnalyzer 114 (within Chimera 115 ) can be used to map cross-link data to protein structure. This is usually done to validate the crosslink data. In the case of (at least partly) an unknown protein structure or investigations of conformational changes, crosslinks can be used to create a model of the protein structure or to remodel an existing similar structure (e.g., using I-TASS-ER 116,117 or DisVis 118 ). Interaction networks as well as plots of intraprotein links on any sequence can be generated by using xiNet 119 or xVis. 120 XiView, 121 in addition to enabling the visualization of 2D networks, can show MS spectra upon clicking on a cross-link in the network and supports a 3D structure view. To model the interaction site of two proteins, Haddock 122,123 can be used, and DynaXL 124 enables us to investigate protein dynamics (i.e., conformational changes, prediction of accessible space for amino acid side chains, and measurement of the shortest path for a distance constraint).
Whereas the previously mentioned tools use data from crosslinked peptides (type-2 cross-links), monolinked peptides (type-0 cross-links) deliver limited structural information as well. They are exclusively formed at solvent-accessible sites of any protein.
Their advantage over type-2 links is that they are usually predominantly formed, which is why including them for data analysis and visualization can increase the information density for structure modeling. In a recent work, the algorithm XLM-Tools was developed and was shown to improve the quality of protein models by combining type-2 and type-0 cross-link data. 125 In a nutshell: How does one get meaningful information out of XL data?
• In a first step, the validated output of the chosen analysis software needs to be converted into a proper input format for any visualization tool (e.g., by using a conversion software like CroCo). • Data from different analysis tools can be combined and visualized in the same way to obtain complementary information. • To us, XiView appears to be an excellent choice for easy and quick 2D and 3D data visualization as well as for the inspection of specific cross-links based on their spectra. In case MeroX was used for the data analysis, results can be directly exported to a format compatible with XiView. • Cross-links can be validated by plotting them on a known structure. Confident links can be used to model protein structures or interaction sites or to generate PPI networks.

■ CONCLUSIONS AND FUTURE DIRECTIONS
Recent and ongoing advances in the field of proteomics and especially in the field of XL-MS have pushed the technique forward, which is why it is now capable of analyzing PPI topologies within very complex samples as whole cell lysates. Strategies to apply a cross-linker in vivo are emerging but are still limited by the reaction efficiency, reagent solubility, membrane permeability, or sufficient enrichment. This leads to the generation of interaction information exclusively of abundant proteins.
The increasing interest in XL-MS led to the development of user-friendly data analysis and validation and visualization tools that also enable nonexpert groups to use these tools for their research questions. Simultaneously, there is an increasing need for harmonized standards for reporting acquisition and analysis details as well as for a reliable data validation strategy. In particular, for nonexperts, it is still hard to know which enrichment technique, analysis software, and software settings to choose. However, recent efforts regarding selective enrichment strategies (e.g., refs 44 and 45), the capture of lower abundant proteins, 49 the harmonization of standards, 126 clever FDR control, 86,88 as well as the improved sensitivity of mass spectrometers and increased computational power show that within the next 2−5 years, scientists will truly be able to dig deeper than ever before in the interactome of cells, organelles, or tissues. As mentioned in the Applications and Bottlenecks section, the combination with FAIMS 79 or ion mobility (caps-PASEF 46 ) as an additional separation dimension will further alleviate issues in the dynamic range and therefore boost sensitivity toward cross-link detection in future studies.
Combining XL-MS with data from other structural biology methods will also be very beneficial for future studies and will be essential to validate and complement results. Such methods could include Y2H (e.g., ref 127), proximity labeling (BioID, e.g., ref 128), affinity-purification mass spectrometry (AP-MS, e.g., refs 127 and 129), hydrogen−deuterium exchange mass spectrometry (HDX-MS, e.g., ref 130), cryoelectron microscopy (cryoEM, e.g., refs 131 and 132), X-ray crystallography (e.g., ref 133), or techniques for the direct visualization of protein interactions in cells (e.g., fluorescence confocal microscopy) Finally, the analysis of relative cross-link abundances is highly informative to uncover interactome changes or to monitor conformational changes. However, this is still very challenging to do in complex biological matrices. Lately, most studies in this field, including those within a more complex matrix, have been performed by the Rappsilber group. 134−136 So far, isotopelabeled linker reagents have mainly been used for pairwise comparisons rather than proteome-wide interactome studies. In contrast, label-free workflows enable the parallel comparison of multiple conformations or interaction strengths with a wider range of linker reagents.
In summary, XL-MS has matured in the past decade. Although much work still has to be done, it can already help to significantly contribute to our understanding of biochemical processes.