Characterizing Endogenous Protein Complexes with Biological Mass Spectrometry

Biological mass spectrometry (MS) encompasses a range of methods for characterizing proteins and other biomolecules. MS is uniquely powerful for the structural analysis of endogenous protein complexes, which are often heterogeneous, poorly abundant, and refractive to characterization by other methods. Here, we focus on how biological MS can contribute to the study of endogenous protein complexes, which we define as complexes expressed in the physiological host and purified intact, as opposed to reconstituted complexes assembled from heterologously expressed components. Biological MS can yield information on complex stoichiometry, heterogeneity, topology, stability, activity, modes of regulation, and even structural dynamics. We begin with a review of methods for isolating endogenous complexes. We then describe the various biological MS approaches, focusing on the type of information that each method yields. We end with future directions and challenges for these MS-based methods.


INTRODUCTION
Proteins encoded by cellular DNA are the workhorses of the cell, carrying out diverse biochemical tasks that generate cellular phenotypes. 1 However, many proteins do not act alone but associate with other proteins to form functional complexes, greatly increasing the complexity of molecular species found in the cell. Moreover, complex formation can depend on protein location and post-translational modifications (PTMs), further increasing the diversity of cellular species. A major challenge in biology is to map the cellular protein complexes and determine how complex composition varies as a function of cellular state. 22% of the protein coding genes in humans are represented in CORUM, a repository of experimentally studied complexes  from mammalian organisms, 2 and many more protein complexes likely exist. 3 The topology and structural dynamics of protein complexes should be determined in the endogenous state, as similar to that as in cells or tissues, for the findings to have maximal physiological relevance. However, because of their limited quantity and heterogeneity, the range of biochemical and structural techniques that can be applied to endogenous complexes is restricted. Biological mass spectrometry (MS), a term that encompasses a wide range of MS-based techniques, has provided tremendous insights into the composition and structural features of endogenous protein complexes. MS-based techniques balance throughput with resolution, filling a gap between time-intensive, all-atom structural techniques such as NMR and X-ray crystallography and high-throughput molecular biology techniques that do not provide detailed structural information. Here we detail the different MS based methods, focusing on the insights they provide into endogenous complexes and comparing between them. Depending on the approach taken, biological MS can determine the interaction partners for a particular protein, describe complex topology and structure, and even provide insights into conformational fluctuations ( Figure 1). This review is intended as an overview of sample preparation techniques and biological MS approaches available for in-depth characterization of endogenous complexes. Rather than focusing on technical aspects of specific MS techniques, which are detailed in many excellent reviews cited here 4−6 and below, we focus on the compositional and structural insights that can be gained from biological MS and compare how different MS methods can yield this information. We also highlight recent exciting advances ranging from improvements in LC/MS technology to CRISPR/CAS9 technologies that enable epitope tagging of endogenous proteins, increasing relevance for a wider range of systems.
We begin by reviewing methods for isolating endogenous complexes, highlighting advantages and disadvantages of each. We define endogenous complexes as those composed of proteins expressed in the organism of origin, either under native promoters or under conditions as close to native promoters as possible. We then introduce the different biological MS methods for characterizing those complexes, pointing the reader to an updated technical review for each technology and detailing the advantages and disadvantages of each method. For each technology, we describe the biological information that it can provide, focusing on specific examples that highlight the power of biological MS and apologizing in advance for publications excluded due to lack of space. We end with a review of future directions and exciting developments that will open new frontiers in the study of protein complexes.

General Considerations for MS Protein Complex Sample Preparation
The information extracted from MS experiments is limited by the scale and quality of the input samples. Although many MS modalities can be applied to crude biological samples, as discussed below, typically assemblies must be extracted from cells or tissues and purified for MS analysis. It is crucial to remove as many contaminants, or copurifying proteins that are not part of the complex, as possible. In this section, we discuss general principles for the enrichment of protein complexes from different tissues and cell types. Once a suitable tissue and enrichment scheme has been selected, optimization of sample preparation protocol is a combination of art and science, and there is no substitute for screening many purification conditions. In fact, rapid screening platforms to test buffers and conditions in parallel, analogous to X-ray crystallography screens, have been used for MS analysis. 7 Complex abundance will directly affect both the sample purification method as well as the choice of MS technique. Generally, bottom-up proteomics requires significantly less material than top-down MS methods (see section 3 and Figure  2). Therefore, while we focus primarily on purification strategies for proteins that are not overexpressed, we include a few strategies for ectopic overexpression with the caveat that overexpression might lead to nonphysiological complex formation. Moreover, for heteromeric complexes of unknown composition, not all subunits can be selected for overexpression. In some cases, overexpression of a single subunit may lead to proportionally altered levels of associated subunits such that subunit stoichiometry is preserved in the cell. 8 Three main strategies exist for complex enrichment ( Figure  3): biochemical purification, immunoprecipitation, and affinity tagging. All three strategies have yielded robust insight into biological complexes and their structural and compositional properties. For any strategy, it may be desirable to begin with subcellular fractionation to enable the retrieval of spatially defined complexes of interest from different cellular compartments such as the nuclei, cytosol, and mitochondria. 9,10 Because compound assembly state can depend on cofactor binding, it can be important to include cofactors during purification to prevent complex disassembly. For example, the 26S proteasome dissociates readily once removed from cells if ATP is not included in the buffers in the physiological range of 1 mM. 11 If the complex is known to have catalytic or enzymatic functions that can be reconstituted in vitro, it is worth developing in vitro assays that can be performed on fractions or purified complexes to validate that the complex is functional. For example, in our lab, 20S proteasome containing fractions are tested for degradation activity with fluorescent activity peptides. 12 The desired final buffer and assembly state depend on the MS technique applied. For bottom-up proteomics analysis (see section 3.3.1), because proteins are digested and separated on a column, the complex does not need to be in an MS compatible buffer. In fact, the complex is typically denatured before digestion, and thus protein precipitation via organic solvents such as tricholoroacetic acid can be used to separate proteins from buffer components. After digestion, peptides are loaded onto a chromatography column that exchanges them into an MS compatible buffer. For chemical footprinting methods (see section 3.3.2), complexes should be in a native state for the foot-printing step but are then denatured and digested for MS analysis, relaxing complex purification requirements. However, buffers must be compatible with the chemistry of the label used. For example, for cross-linking MS with disuccinimidyl dibutyric urea, buffers containing primary amines should be avoided because they will react with the cross-linkers. 13 For native and top-down MS (see section 3.4), complexes should be eluted in an assembled, native state and exchanged into MS compatible buffers, as detailed in ref 14. In particular, high concentrations of salts can suppress signal intensity and require rounds of buffer exchange. A number of salts that can maintain physiological ionic strength and pH while remaining compatible with MS, including ammonium acetate, ethylene diammonium diacetate buffer, and others.

Biochemical Purification
Biochemical purification strategies rely on differences in physicochemical properties between the complex of interest and its biological matrix. Starting from a homogenate of biological material, sequential centrifugation, precipitation, and/or chromatographic steps gradually yield solutions enriched in the complex of interest. Chromatographic purification steps can separate complexes based on their size (size exclusion chromatography), charge (ion exchange chromatography), or hydrophobicity (hydrophobic interaction chromatography). Protein complexes can also be enriched based on density via sucrose gradient centrifugation or via selective precipitation of some components, most popularly using ammonium sulfate. We refer the reader to guides for design of protein purification strategies for complete discussion of these techniques. 15 The advantage of a biochemical purification is that it yields endogenous complexes that are unperturbed by the addition of non-native amino acid sequences or amplification of individual subunits. Complexes are also not bound to antibodies, which must be removed for downstream MS analysis. Disadvantages of biochemical purifications, and the reason that they have been primarily supplanted by affinity purification methodologies, is that they are lengthy, lossy, and must be optimized separately for each protein complex. The lengthiness can cause dissociation of transient interactions and the lossiness means that abundance of the complex must be quite high to provide enough starting material to survive the successive enrichment steps. Given these limitations, it is not surprising that biochemical purifications have been mainly applied to prepare stable and abundant complexes for MS analysis. These include the ribosome, 16 the proteasome, 12 the COP9 signalosome, 17 and varying complexes of vinculun/Arp 18 proteins.
Biochemical purification techniques are also used in cofractionation MS analysis of complexes, discussed below. In these experiments, complex biological samples are fractionated biochemically without tracking a particular component. Crude fractions are subjected to MS analysis, enabling identification of complexes either by proteomic colocalization or by directly observing the complex via native MS. 19 A wide range of biochemical fractionation techniques have been used for this purpose, including ion exchange chromatgraphy, 19−21 isoelectric focusing, 20−22 sucrose density gradient centrifugation, 20,21 size exclusion chromatography, 22,23 native gel-based electrophoresis strategies, 24 and capillary zone electrophoresis. 25

Direct Immunoprecipitation
The ability to generate antibodies that bind specific epitopes in a protein of interest has accelerated many areas of biology. In protein purification, antibodies conjugated to sepharose or magnetic beads are used to immunoprecipitate (IP) protein complexes. Usually an antibody targeting one subunit in a complex is chosen, and IP under nondenaturing conditions will lead to isolation of the entire complex, provided that the epitope chosen is accessible in the complex. Reference 26 describes a protocol for choosing and optimizing a primary antibody for IP followed by MS analysis.
As with biochemical purifications, direct IP has the advantage that the protein does not need to be modified for purification. However, it requires robust antibodies that recognize the epitope. Antibodies can be conformer or proteoform specific, leading to a biased sampling of complex distribution. Typically, polyclonal antibodies are preferred over monoclonal antibodies because they can potentially recognize a range of epitopes and maintain population heterogeneity.
An additional consideration is dissociating the tightly bound protein/antibody complex after IP for MS analysis. This can be done by elution with high salt 27 or via denaturing methods including low pH glycine buffer or urea. However, as discussed above, high salts and detergents are not compatible with all MS methods, and quaternary structure is destroyed by denaturing elution. Additionally, contamination of the purified sample with large quantities of the primary antibody can lead to masking of the MS signals of proteins of interest.

Affinity Tagging
Arguably, the most popular method for purifying endogenous protein complexes is affinity tagging, in which additional bases are added to the DNA sequence coding for the protein of Purification of endogenous complexes can start either from unmodified cells and tissues, in which immunoprecipitation and biochemical purification can be applied, or by genetic manipulation of cells to express an epitope tagged protein, preferably at the endogenous locus (see section 2.4.1). At the end of the purification, complexes can either be eluted in a denatured state, in which case bottom-up proteomics or intact protein MS can be applied, or eluted in a native state, which will enable application of native-MS, HDX, or cross-linking MS.
interest. This sequence is transcribed along with the protein of interest, which then contains a tag that can be isolated using commercially available affinity resins for the tag of interest. Affinity methods provide a single-step purification in a highthroughput fashion that can be theoretically applied to any target of interest. The speed of purification can also preserve transient or weak biomolecular interactions.
A wide range of affinity tags are available, with novel tags introduced frequently (reviewed in refs 28−30). Table 1 summarizes the properties of some popular tags for purification of endogenous complexes. Tandem affinity tags for tandem affinity purification (TAP) are also available that combine multiple epitopes for coupling multiple affinity steps. 31−34 TAP approaches generally lead to better contaminant removal at the expense of dissociating transient interactions. The sensitivity of modern mass spectrometers combined with improved methods for contaminant detection makes the use of TAP tags less important.
When choosing a tag, its size, charge, and the method of removal from affinity resin is critical. As discussed above, this last property can be an extremely important variable for downstream MS analysis. If competitive elution strategies which involve addition of peptides or other small molecules interfere with downstream MS analysis because they limit the dynamic range, gel filtration or buffer exchange can reduce their concentration in the sample. Host organism is also an important consideration, with His tags being effective for purification of proteins from E. coli but significantly less useful from mammalian cells due to nonspecific copurification of contaminants.
Placement of the tag on the protein of interest is also crucial to avoid disrupting functional complex formation. Any known facts about protein structure and assembly should be incorporated. Tags can be placed at the N-or C-terminus of the protein, as well as internal to the sequence in a position accessible in the final folded protein structure. 35 It is not possible to empirically predict which location is better for a protein of interest. For N-terminally tagged proteins, polypeptides that are improperly terminated early will contain the affinity tag and copurify along with the desired protein; for C-terminally tagged proteins, alternative initiation sites can also lead to a mixture of different proteins with the C-terminal tag. Ideally, biochemical or biological assays should confirm that the affinity tag has not impacted the protein's endogenous fold, function, interactions, and localization. These can either be in vitro assays that assess enzymatic function or protein structural properties after tag addition, or in cell assays to confirm the location of the tagged protein and normal physiological state after tagging. However, it is always possible that the tag perturbs a completely unknown function or association that the protein engages in, and this should be considered as part of the experimental design and analysis.
2.4.1. Methods for Generating Affinity Tagged Endogenous Complexes. To use affinity tags to purify endogenous complexes, the DNA coding for at least one subunit of the complex must be modified with the affinity tag. The gold standard, which presumably produces complexes as similar to native as possible, is to tag the protein at the endogenous gene locus in the appropriate host. Endogenous tagging strategies generally capitalize on the principle of homologous recombination, reviewed in refs 41 and 42, in which homologous regions of DNA recombine and regions in between the homology patches can be swapped. Efficient homologous recombination relies on the generation of precise double-stranded DNA breaks (DSBs); the introduction of CRISPR/Cas9 technology, 43 which revolutionized the simple and specific generation of DSBs, has advanced affinity tagging in a range of organisms, including Escherichia coli 44,45 and yeast. 46 However, it is particularly powerful for endogenous tagging in mammalian cells, and several groups have presented tools and approaches for CRISPR based epitope tagging in mammalian cells. 47−50 Moreover, the use of CRISPR to generate epitope tagged whole model organisms, such as mice, 51 can significantly advance comparative purification of complexes from different tissues and under different physiological stresses. However, users of CRISPR should be aware that it can be subject to off-target effects. 52 For E. coli, 41,53,54 and yeast, 55−59 robust tools for epitope tagging predate CRISPR/CAS9 technologies, and various libraries of epitope tagged yeast and E. coli strains are available, as well as yeast donor libraries that enable fast tagging of ORFs with any desired epitope. 60,61 For mammalian cells, epitope tagging pre-CRISPR was generally accomplished via transcription activator-like effector nuclease (TALENS) or zinc finger nucleases(ZNFs), 48 which are composed of protein domains that recognize specific DNA sequences fused to nonspecific DNA cleavage domains. These modalities are more difficult to scale than CRISPR technologies, because new protein constructs must be designed for each gene targeted. Therefore, for mammalian cells in particular, when tagging of endogenous loci is not feasible or practical, other approaches exist that result in endogenous or near-endogenous levels of protein. Bacterial artificial chromosomes containing the protein of interest under control of the endogenous promoter can be stably transfected into mammalian cells 62 for protein expression. Ectopic promoters can also generate nearnative levels of protein. For example, tetON systems, in which addition of tetracycline derivatives drive gene expression, are titratable, which allows researchers to control the level of ectopic expression. 63 These ectopic genes can be introduced into cells via transient transfection, random generation of stable clones, or episomal vectors, which are maintained in the nucleus in a nonintegrated state and replicate, allowing generation of a semistable cell line. 64 Lentivirus and adenoviruses can also be used to generate stable clones by integration into the mammalian genome; however, random integration or integration of multiple copies of the gene can lead to disturbances in cellular physiology and uncontrolled gene silencing. As a solution, ectopic genes can be targeted to "safe harbors" in the human genome, 65 such as AAVS1, 66 where integration of genes has been shown to minimally perturb cellular physiology or gene regulation.
Regardless of how the DNA coding for the tagged protein is introduced, some form of selection is typically required to generate cell populations containing the tagged protein. Most often this is done using antibiotic selection, where along with the desired mutations an antibiotic resistance cassette is introduced. 44,45,48,50 Fluorescent proteins can also be added which enable cell sorting for the desired tagged proteins. 67 Both of these methods do involve expression of exogenous proteins by the cell. Single cells can also be isolated and sequenced to cultivate clonal strains with a single mutation; this enables the addition of a tag without need for additional selection cassettes. 47

Cross-Linking to Preserve Noncovalent Interactions
Depending on the mode of MS analysis chosen, it may be helpful to add a cross-linking step as part of the purification process. Cross-linking refers to covalently linking together reactive amino acids in proteins and protein complexes. 68−70 Cross-linking can be coupled to any of the three enrichment strategies discussed above ( Figure 3) and enables samples to be purified stringently while retaining weak but physiological associations. Cross-linking generally increases the number of proteins that copurify 71,72 and is also used to generate distance constraints for cross-linking MS (see section 3.3.2.1).
A wide range of cross-linkers exist, 73 and the choice of crosslinker will depend on downstream application. Cross-linker concentration and reaction time should be optimized to ensure that only physiological interactions are captured. Formaldehyde and glutaraldehyde are two popular small, cellpermeable cross-linkers that are useful for simply preserving interactions, with glutaraldehyde functional at low temperatures. 74 Other cross-linkers are designed for subsequent crosslinking MS analyses, as reviewed in ref 73 and discussed below (section 3.3.2.1), many of which are also cell permeable. Some cross-linkers can be affinity purified from cell lysate via reactive handles, often based on click chemistry, for the enrichment of otherwise low-abundant cross-linked peptides from complex biological samples. 75 Cross-linking can occur at many points throughout the purification procedure. Cell-permeable compounds can be added before cellular lysis. For example, in rapid immunoprecipitation mass spectrometry of endogenous protein (RIME), a protocol developed by Carrol and co-workers, in-cell formaldehyde cross-linking followed by immunoprecipitation and MS analysis is used to identify protein complexes. 76−78 Intriguingly, Fabre and co-workers 79 found that cross-linking before cellular fractionation prevented leakage of proteins between different fractions and preserved native localization of proteins after fractionation. Cross-linking can also take place in lysates, 80 in cell powders, 74 or after complexes are captured on affinity beads 81 Chait and co-workers have engineered lysine-free anti-GFP nanobodies for on-bead cross-linking with lysine cross-linkers; the anti-GFP nanobodies will not be affected by crosslinking. 81 Cross-linking can also be performed in gel after native gel separation 82 of endogenous complexes. However, it is important to note that Zhang et al. found different crosslinking results between in-cell and in-lysate linking; 71 therefore, for optimal preservation of biological interactions, cross-linking should take place before cell lysis.
In addition to adding exogenous cross-linkers, photo-crosslinking amino acids can be incorporated into cellular proteins and used to capture protein−protein interactions in cell following light activation. 83 Photoleucine, 84 photomethionine, 84 and photolysine 85 are incorporated into all cellular proteins, with rates ranging from 4% to 40% and have been used for MS identification of histone binding proteins. Sitespecific incorporation of photoreactive amino acids into specific proteins can be achieved using unnatural amino acid incorporation 86 or split intein technologies 87 to specifically interrogate the interactome of one protein, such as the membrane protein IFITM3 for which traditional affinity purification methods failed to recover true interactors. 88

Biological MS Delivers Insights Across Multiple Levels of Complex Regulation
A range of biological MS techniques exist which differ in the details of sample preparation and data analysis but have in common the generation of gas-phase ions of proteins or protein fragments, mass separation, and detection ( Figure 2). Techniques can be classified as either "bottom-up", in which proteins are digested into peptides before gas-phase ion creation, or "top-down", in which proteins are transferred intact into the mass spectrometer. 89 Middle-down approaches, in which proteins are partially digested into large fragments (>7 kDa 89 ) are also emerging but will not be discussed further here. In all forms of MS, specific ions can be selected for gasphase ion fragmentation in tandem MS experiments. 90,91 By fragmenting ions and analyzing the daughter ions, sequence and structural information about the parent ion can be determined.
We will cover a range of bottom-up approaches, including shotgun proteomics, cross-linking MS, HDX MS, chemical labeling MS, as well as top-down MS and native-MS. After providing a brief overview of each technique, highlighting its pros and cons, we will describe what information it can provide for endogenous complexes. By combining different sample preparation protocols described above with these varied MS modalities, many different types of information can be extracted (Figure 1).
For a given complex, it is important to determine both its composition, namely the distinct subunits that form the complex, as well as the stoichiometry of those proteins, namely how many copies of each subunit is present. The subunit interaction network, describing which proteins specifically interact, should be defined, and subunits classified as core or peripheral. Under the core-attachment model, proteins form core complexes, which have a stable, permanent relationship but are often temporally regulated through the attachment of peripheral proteins. 92−94 Members of the core complex typically share functional annotations, protein localization, and even specific deletion or mutational phenotypes, while attachment members do not necessarily share these properties. The interaction of attachment members with the core complex is highly controlled. Bottom-up proteomics, cross-linking MS, and native-MS can inform on complex composition and stoichiometry for both core and attachment complexes, although the former are easier to preserve through an MS work-flow.
A given genetic sequence can produce multiple proteoforms, a term which refers to a defined sequence of amino acids with localized modifications. 95 Different proteoforms arise due to alternate initiation sites, splicing events, and post-translational modifications (PTMs). PTMs are chemical modifications that occur following protein biosynthesis. 96 As many as 300 PTMs are known to occur physiologically 96 and can range from addition of small acetyl groups to conjugation of ubiquitin and other signaling proteins. PTMs are a key way that the cell temporally regulates the activity of different proteins and protein complexes, with evidence suggesting that proteins at the center of interaction networks are more likely to be posttranslationally modified. 97 Analysis of PTMs as a function of cellular state is an important component of understanding protein complex biology. PTMs clearly manifest in the MS spectrum as shifts in protein and peptide mass, and both bottom-up and top-down MS can inform on the PTMs of specific complexes and on large-scale proteome wide screens of PTMs.
MS-based methods, in particular cross-linking MS and ion mobility (IM)-MS, can also determine the topologies of protein complexes, which is especially relevant for endogenous, heterogeneous complexes available in limited quantities that prohibit high-resolution structure determination. For large complexes, if high resolution structures of individual subunits are available, they can be combined with MS-based topologies to create a 3D map of the entire complex. MS-based information can be converted into restraints for a particular protein or protein complex and then used to determine structures and structural ensembles, often in combination with computational modeling tools (see section 3.5).
In addition to determining static topologies and structures, complexes undergo conformational fluctuations as they move between different functional states. Footprinting MS techniques can inform on these conformational fluctuations and the ensemble of different structures present in solution.
Lastly, MS methodologies can be used to study the binding of cofactors and ligands to complexes. It is especially important to identify which small molecules are bound to the complex in its endogenous state and to unravel how these small molecules regulate complex activity. Identifying small-molecule binding is primarily accomplished via native-MS, but information can also be extracted from specialized bottom-up experiments.

Sample Separation and Fractionation
The power of mass spectrometry as an analytical tool stems from its ability to distinguish between components with different m/z ratios, enabling the detection of the full distribution of coexisting states of a given peptide or protein complex. However, like many analytical techniques, MS has a limited dynamic range, 98,99 which causes intense ions to suppress the ionization, and consequently signal, of weaker, less abundant ions. The signal of protein and peptide ions can also be suppressed by the presence of salts and other matrix components. 100 Spectral complexity is also a challenge and congested spectra containing overlapping peaks are difficult to analyze and assign.
Extensive sample fractionation is used to combat these challenges. By separating samples into different fractions, broader coverage can be achieved. Chromatographic separations can also reduce ion suppression by matrix sample components by enabling effective buffer exchange of the desired analytes. As mentioned above, different types of chromatographic separations are an important component of isolating endogenous complexes; however, here we focus on separation methods that typically constitute part of the MS analytical workflow, acknowledging that they overlap to some extent.
3.2.1. Electrophoresis-Based Methods. MS proteomic analysis of protein mixtures was first coupled to polyacrylamide gel electrophoresis, or PAGE, as an offline chromatographic separation tool. In these experiments, protein spots were excised from gels, either Coomassie or silver-stained, and subjected to in-gel digestion, 101 followed by LC/MS analysis of the digested peptides. 102 Most typically, 2D PAGE was used, 103,104 in which proteins are first separated according to isoelectric point and then by molecular weight. However, its practicality is limited by the fact that it requires extensive sample handling, is time-consuming, and cannot be coupled directly to the mass spectrometer. Moreover, 2D PAGE has limited ability to identify medium-to-low abundant proteins. 105 2D PAGE also has limited loading power and a resolving power limited by the combination of isoelectric focusing and molecular weight. Therefore, it has been primarily supplanted by liquid chromatography methods for MS analyses, although some groups continue to advance methods for increasing the separation efficiency of 2D PAGE. 106 A related electrophoretic method is the GELFrEE method, or gel-eluted liquid fraction entrapment electrophoresis, which also separates proteins based on molecular weight. 107,108 In GELFrEE, proteins elute off the end of a polyacrylamide gel column and are trapped in a collection chamber in multiple fractions. While GelFREE is not as robust as other fractionation methods, 109 it has the advantage that it can be performed in denatured or native mode, 24 permitting sizebased separation of native proteins for MS applications.
3.2.2. Liquid Chromatography. Liquid chromatography, most commonly high-pressure liquid chromatography (HPLC), is one of the most popular methods for protein and peptide separation prior to MS/MS analysis. It is especially useful because, in combination with electrospray (ESI) methods of ionization, it can be directly coupled to the inlet of the mass spectrometer. Thus, samples can be simultaneously separated and exchanged into MS compatible buffers. Programs can be controlled by software, leading to high reproducibility and easy optimization. Chemical Reviews pubs.acs.org/CR Review The most popular HPLC method to couple to an MS system is reversed-phase HPLC(RP-HPLC). 110 RP-HPLC separates proteins and peptides based on hydrophobicity, using a hydrophobic stationary phase and eluting with organic solvents. These organic solvents are MS-compatible, making RP-HPLC an excellent tool for MS analysis. RP-HPLC also offers good resolution, easy tuning of elution conditions, and is particularly suited to peptide analysis because peptides are typically recovered well from RP columns. However, RP-HPLC requires denaturing and is not applicable to native-MS applications; moreover, recovery of intact proteins is variable.
For complex proteomic samples, 2D LC/LC/MS experiments increase resolution with multiple dimensions. For example, MudPIT, introduced by Yates and co-workers in 2001, combines strong cation exchange chromatography with RP-HPLC to separate and detect a wider range of peptides. 111 Generally, ion exchange columns can also be used for fractionation but suffer in terms of coupling to the mass spectrometry because the salt used for elution will be injected into the MS system, causing severe signal suppression. Directly coupling ion exchange columns to the mass spectrometer requires the use of MS compatible volatile buffers such as ammonium acetate for ion exchange. Therefore, for analysis of peptides, RP-HPLC is almost always used as the final step before the mass spectrometer.
LC-MS is emerging as a technique for buffer exchange 112 and even purification 113 of intact proteins and complexes for native-MS. In these experiments, a small size-exchange column buffer exchanges proteins into MS-compatible buffers directly prior to analysis, and tandem chromatography methods can be used to combine affinity columns with buffer exchange. Other advances in liquid chromatography for MS analysis, as reviewed in ref 114, include the development of smaller columns which enable faster separation and introduction of new resins.
3.2.3. Capillary Zone Electrophoresis. Capillary zone electrophoresis (CZE) separates ions in solution based on their electrophoretic mobility under applied voltage. 115 The electrophoretic mobility is dependent on charge, shape, and size of proteins or peptides. CZE provides orthogonal, complementary data to liquid chromatography. Advantages of CZE include low sample requirement and low carry-over between separation experiments. It also provides relatively fast separation, which can be an advantage or a disadvantage, because analytes can elute too quickly to be separately fragmented in tandem MS experiments.
CZE has been applied to proteomics workflows in bottomup mode, detecting patterns of PTMs 116 in endogenous proteins and to rapidly separate peptides in complex proteome digests. 117−119 However, a back-to-back comparison of CZE and RPLC indicates that RPLC may identify more peptides and post-translational modifications, 120 and CZE has limited loading capacity. Both methods can be combined for increased separation capacity. 120 CZE can also be used to analyze proteins in top-down mode, demonstrating separation of antibody variants differing by single deamidation events. 121 Importantly, CZE is compatible with separation under native conditions 122 in which proteins retain their tertiary and quaternary structure. Recent work by Kelleher and co-workers applied native CZE to analysis of endogenous nucleosomes, 123 using tandem MS to characterize differences in proteoforms between endogenous nucleosomes in different cell lines.
3.2.4. Ion Mobility. Ion mobility (IM) separates protein and peptide ions within the mass spectrometer after gas phase ionization. In IM experiments, ions traverse a gas-filled tube and are separated according to their collision cross section. The collision cross section is a property of the ion-gas pair and has dependence on the charge, size, and shape of the ion. 124 An extended ion will experience more collisions with the gas and thus travel more slowly than a ion with the same mass but a more compact structure. 125−127 There are multiple IM MS devices, such as drift tube, traveling wave, differential mobility, transversal modulation overtone, field asymmetric, and trapped IM-MS, each with its own advantages and disadvantages, and we refer the readers to detailed reviews on this topic. 128,129 For analysis of peptides in bottom-up proteomics experiments, including IM greatly increases the detection capacity of the experiment, 130 enhancing selectivity, resolution, and dynamic range by spreading peptides out over an addition dimension. 131,132 Addition of IM to proteomic workflows advances the ability of proteomics to study increasingly smaller quantities of protein 133 toward the goal of single-cell proteomics.
IM is also a powerful tool for native MS analysis of proteins. Particularly, it is used not only to separate sample components but also as a structural biology tool. Extraction of collision cross sections from IM data constitute a restraint that can be used to generate information the 3D shape and conformation of protein complexes 134 (see below sections 3.4 and 3.5). Enzymes can also be sequentially applied to improve protein identification. 139 As discussed above, peptides are then typically separated on RP-HPLC columns and injected into the mass spectrometer. Tandem-MS techniques 90,91 are then applied for sequence analysis via gas-phase fragmentation patterns. Different strategies for choosing which peptides to analyze via tandem-MS strategies are available, as even with the best liquid chromatography separation strategies, a large number of peptides elute simultaneously. In targeted proteomics, or selected reaction monitoring experiments, specific ions corresponding to proteins of interest are chosen a priori for fragmentation and quantification. 140 These experiments are fast, reproducible, and extremely sensitive to the protein of interest but cannot be used to discover new proteins. For analysis of complex mixtures, researchers rely on shotgun proteomics methods which attempt to sequence all proteins in complex peptide mixtures. In data-dependent acquisition (DDA), the mass spectrometer selects peptides meeting a Chemical Reviews pubs.acs.org/CR Review certain threshold in the first MS scan and then selects them for fragmentation. DDA is the most widely used method for shotgun proteomic analysis and it aids peptide identification because the precursor mass of the ion is known. However, it is estimated that an extremely large percentage of peptides are missed, 141 prompting the development of data independent acquisition (DIA) strategies. In these methods, first proposed by Yates and co-workers, 142 specific ranges of the MS spectrum are selected sequentially for tandem MS analysis automatically. DIA holds promise for better proteome coverage 143 but the data produced is significantly more complicated. Different variations of DIA are a promising area of research to advance shotgun protoeomics. 144,145 Protein amounts can also be quantified via quantitative proteomic methods. Quantification methods generally fall into label-free methods, which include peak intensity based methods and spectral counting 146 or through isotope labeling. Peak intensity based methods, such as iBAQ, 147 rely on integrating peak intensities, which were shown to correlate well with protein concentration. 148 In spectral counting, 146 the number of peptide spectra assigned to a protein of interest is used as a proxy for relative protein abundance. Generally, peak intensity based methods are considered to be more accurate than spectral counting, but both have limitations because they rely on accurate identification of peptides across multiple spectra. The gold standard for relative quantification is the Here, three different tagged proteins were used as "bait" (top row), namely NCPB1, NCBP2, and NCBP3, which are different nuclear cap binding proteins that bind to nascent RNA. The data can be visualized as a heatmap of control-subtracted label free quantification (LFQ) intensity for each "prey" protein identified by MS. This data demonstrates that NCPB3 differs from NCBP1 and NCBP2 in selectivity, coprecipitating with members of the THO/TREX complexes. These interactions were later confirmed as supporting a biological role for NCBP3 in mRNA expression that differs from NCBP1 and NCBP2. 351  isotope labeling methods. These methods rely on the fact that peptides with the same sequence but different isotope enrichment will coelute but be separated in a single spectrum for quantification. Labeling methods include stable isotope labeling in cell culture (SILAC), in which proteins are grown in isotope enriched media, 149 or the addition of mass tags to samples postisolation. 150,151 After the MS data is generated, significant software analysis is required to identify and quantify proteins in the MS spectra; choice of data processing strategy is an integral component of experimental design. The tandem-MS experiments must be analyzed to determine the peptides present, the proteins they belong to, and to quantify their percentages. Reference 152 describes some of the different strategies used for analyzing shotgun proteomic data. Typically, MS/MS spectra are collected and compared to databases involving all spectra corresponding to a digest of the parent proteome, either theoretical or experimentally generated. Software with this capability include MASCOT, 153 MaxQuant, 154 Progenesis, and PEAKS. 155 However, de novo sequencing, or forward prediction of peptide sequence based on tandem-MS data, is also an option, with other softwares incorporating this capability. 156 When analyzing targeted proteomic data, Skyline is the most widely used. 157 The ProteoWizard toolkit provides an open source platform for converting data from different mass spectrometer formats and cross-platform analysis. 158 Several groups have compared the different available softwares and shown that the software and data processing pipeline chosen has a significant impact on the results of the LC-MS/ MS analysis, 159,160 both finding that with suitable data processing, MaxQuant is the most robust. These researchers also demonstrated, as expected, that for quantification, targeted MS has a higher dynamic range and limit of linearity because specific peptides are targeted and chosen. 160 For proteomics-based methods, the absolute sensitivity threshold is quite low; a modern mass spectrometer is able to detect as little as 1 × 10 −18 mol of peptide. 161 Other sources of error throughout the workflow lead to proteins being missed or misidentified. In the sample preparation step, poor cellular lysis, poor solubilization of proteins, and contamination from abundant proteins, keratin, or dust are all significant sources of error. As mentioned above, another challenge is the dynamic range of the proteome; as discussed in ref 99, the dynamic range of MS is only 4 orders of magnitude while that of the proteins in the cell is 7 orders of magnitude at least. Therefore, many low abundant proteins, which may constitute real interactors, are missed in analyses of protein interactomics, and enhancing proteome coverage is a major challenge for bottomup proteomics. The limited ability to select and fragment peptides can also cause peptides to be missed, even if they are present in the spectrum, and moreover, the software can misidentify or miss peptides that are present at low signal-tonoise levels. Therefore, when analyzing a particular endogenous protein complex, users should be aware that lowabundant or transient interactions may be missed even in these extremely sensitive experiments.
One area of advance in proteomics is focused on platforms that can analyze proteins from single cells. Termed "single cell proteomics", 162,163 advances towards this effort include development of miniaturized automated nanodroplet processing platforms capable of bottom-up digestion of single cells, 164,165 usage of narrow-bore LC-MS 166 and new methods for cell lysis. 167,168 This increased sensitivity has relevance for studying complex heterogeneity at the single cell level.
In addition to identifying and quantifying proteins, bottomup proteomics can identify a wide range of PTMs, which manifest as shifts in peptide mass and fragmentation patterns. 169,170 However, a number of pitfalls exist in the use of bottom-up proteomics to identify PTMs, reviewed in ref 171. Information on the coexistence of different modifications in a single polypeptide is lost because a mixture of peptides is generated and analyzed together. Similarly, it is difficult to determine which proteoforms coexist in a complex together because the bottom-up results only report on the presence or absence of the PTM. Therefore, while post-translationally modified proteins are often detected in bottom-up proteomic experiments of protein complexes, including in cell crosslinking experiments, 172 it is difficult to functionally characterize how PTMs regulate complex formation from bottom-up data alone.
3.3.1.2. Bottom-Up Proteomics Illuminates Complex Composition. In terms of characterizing endogenous protein complexes, bottom-up proteomics is typically used to provide information on the complex composition when investigating a particular complex or configurational differences that occur under different cellular states by comparative analysis of multiple conditions ( Figure 4A). In this experimental format, protein complexes are predicted from observation of which peptides, and therefore proteins, are detected as copurifying. The experimental pipeline typically involves a method of copurifying proteins under native conditions, followed by bottom-up MS analysis of the purified fractions and computational filtering to predict complexes on the basis of which proteins copurify.
One method of generating complexes for copurification identification is intensive biochemical fractionation. This has been used to fractionate lysates from a range of cell types, 21,23,173−175 with cofractionating proteins identified by MS and endogenous complexes predicted by observing which proteins cofractionate. Importantly, multiple studies corroborate that many proteins elute at a MW larger than that predicted by their sequence, implying preservation of native complexes throughout native extraction methods. 23,175 In a recent example, Kastritiis et al. used SEC to characterize endogenous protein supercomplexes, a term that refers to megadalton-sized macromolecular assemblies composed of multiple functionally related protein complexes, from Chaetomium thermophilum. 175 More popularly, co-IP, followed by MS identification, is used to identify protein−protein interactions and thereby infer protein complex composition (see ref 176 for a review). Several high-throughput, broad studies combining IP and MS have been conducted which provide excellent starting resources for understanding cellular interactions and complexes. These include endogenous TAP tag based studies in yeast, 94,177 which identified 491 94 and 547 177 heteromeric complexes. Endogenous human complexes were IP'ed from HeLa cells using over a thousand primary antibodies 178 and previously unknown complexes identified, including a complex suggested to be involved in transcriptional coregulation. Human affinity-tagged lines were also created, with Hein et al. 179 using BAC transgenes while the Gygi lab took advantage of the human ORFeome project 180 to create lentiviral lines coding for a wide range of human proteins in HEK and HELa cells. 181,182 Both of these studies yielded rich data sets that Chemical Reviews pubs.acs.org/CR Review map a wide range of interactions connecting thousands of endogenous proteins.
To determine complex stoichiometries via copurification and bottom-up proteomic experiments, subunit stoichiometry must be inferred from relative abundance of peptides corresponding to different proteins. Label-free quantification of peptides can determine relative abundances of different proteins and was used to determine the stoichiometry of NPC complexes, 183 chromatin-associated complexes, 184 and SET/ MLL complexes, 185 to give some examples. Information about total cellular protein abundance can also be incorporated to extract subunit stoichiometries 179 because the amount of protein that copurifies with a particular bait will depend on the cellular abundance of the interacting components as well as on the stoichiometry of the individual complex.
The computational approach used to predict protein complexes is a key component of the experimental workflow. As reviewed in refs 186−188, a variety of strategies exist to generate complexes from protein−protein interaction lists. For example, one algorithm, SAINT-MS, 189 uses a Bayesian model to estimate the probability of a given interaction being true or false, incorporating information from negative controls and spectral intensities, while another, COMPASS, incorporates information about spectral counts and reproducibility to score interactions. 190 However, it should be mentioned that poor reproducibility was found between different algorithms applied to the same data set. 191 Including orthogonal pieces of information, such as gene ontology annotations and gene expression data, with MS-derived interaction data can increase reliability and help remove false positives.
Users should be aware that affinity copurification methods to identify protein complexes can suffer from lack of reproducibility, false positives, and false negatives. Gavin 94 et al. found that duplicate IP-MS measurements from budding yeast had only 69% replicability, although others found that the same protocol conducted by two laboratories had 81% reproducibility. 192 In a meta-analysis of three data sets, 21,179,182 Drew and co-workers observed limited overlap between different high-throughput studies 193 with many false positives. False positives, or contaminants, generally result from nonspecific binding to either the bait protein or the affinity resin used. Specific isotope enrichment strategies, such as I-DIRT, can distinguish contaminants, 194 and the CRAPome is a repository of common contaminants for different affinity purification strategies 195 that should be checked against identified complex components to identify contaminants. False negatives in affinity-purification MS measurements generally result from complex dissociation during purification, from poor tryptic digestion of peptide interacting partners, or from data undersampling in the mass spectrometer.
Besides for high-throughput experiments, affinity-purification proteomic experiments have broad applicability to studying specific proteins and their formed complexes under conditions as close to endogenous as possible. In an early example, Yates and co-workers analyzed proteins present in the ribosome by ribosome purification and bottom-up proteomic analysis of all present components. 196 Examples of proteins studied span the gamut of cellular proteins, from identifying shared targets of an E3 ligase substrate receptor 197 to the RNA exosome 198 and LINE-1 transposon elements. 199 Figure 4A describes a recent study examining interactions of different RNA cap binding proteins that revealed a unique role for one protein in regulating mRNA expression.
An emerging bottom-up proteomics-based method to identify endogenous complexes via MS is thermal proteome profiling, or TPP. In this experiment, whole cells lysates are subjected to a temperature gradient, and then protein abundance as a function of temperature is measured via isotope quantified bottom-up proteomic methods to construct a melting curve for each protein. Although first described as a method for identifying drug-induced protein stabilization, 200 correlation of melting profiles for different proteins, termed thermal protein coaggregation (TPCA), 201 can identify associations based on the assumption that bound proteins have correlated melting curves. In the initial report, 30% of CORUM qualified complexes exhibited correlated TPCA curves. 201 TPCA is less useful for de novo complex prediction and also suffers in the analysis of low-stoichiometry complexes, for whom melting curves may be dominated by the noncomplexed protein. Nevertheless, TPCA analysis is particularly powerful for analysis of differential protein complex association between states such as throughout the cell cycle 202 or at differing time-points following virus infection. 203 3.3.2. Protein Footprinting. The sample separation and analysis strategies outlined above represent a powerful tool for detecting and analyzing peptides originating from protein complexes. A wide range of strategies, broadly dubbed "protein footprinting", 204,205 use chemical or enzymatic sample treatment before MS analysis to encode information about the protein complex's structure, topology, and interactions in the MS spectrum. The vast majority of the time bottom-up proteomic methods are used to detect these chemical modifications, but top-down methods have be used occasionally. 206 68 and 208, samples in their native state are treated with bifunctional cross-linkers that react with reactive amino groups, acidic side chains, or cysteine groups. These cross-linkers will covalently link together amino acids that are proximal in the threedimensional structure of the protein or protein complex. After digestion, peptides generated will be composed of two smaller peptides linked by the cross-linker. LC-MS/MS detection and quantification of cross-linked peptides yields information about intra-and intermolecular linkages between residues. Chavez and co-workers estimate the sensitivity of cross-linking MS to be 1−10 × 10 −12 mol for purified proteins and 5−10 mg of sample for in situ cross-linking, 209   Chemical Reviews pubs.acs.org/CR Review focused on MS-cleavable linkers, while Xquest, 213 Kojak, 214 StavroX, 215 and pLink 216 can be used for noncleavable linkers. A recent comparison of different softwares using a benchmarked library of peptides demonstrated that false discovery rates vary greatly depending on the software used, with FDR rates ranging from 2.4 to 32%. 217 Reference 218 discusses future initiatives to make XL-MS more reliable and reproducible.
Once cross-linked peptides have been accurately detected and quantified, they can be converted to distance restraints based on assumptions about the maximal distance between cross-linked residues and used to determine complex topology. One challenge is generating sufficient cross-links to properly orient subunits within the complex; Chait and co-workers estimate that at least 4−5 cross-links are needed to determine dimer orientation. 219 Combining orthogonal cross-linkers 220,221 can generate enough cross-links to reliably determine the architectures of affinity purified complexes. As discussed below (section 3.5), cross-linking MS is also commonly combined with computational modeling 222 to derive full structures and architectures of endogenous complexes. For example, the MXNL web server was developed for model validation using restraints from cross-linking MS. 223 3.3.2.1.2. Application to Endogenous Complex Analysis. Cross-linking has illuminated endogenous protein complex composition by performing the cross-linking reaction before affinity purification to prevent complex dissociation and enrich for transient interactions. This has successfully identified interactors of M-Ras, 224 the proteasome, 79,225 and estrogen receptors, 76,226 among numerous others. Cross-linking can also be used to map protein−protein interactions without affinity purification. If cross-linkers are applied to cells and cross-linked peptides subsequently identified, proteins identified as crosslinked are presumed to interact. 80 However, this approach typically targets the most abundant proteins in the sample; 80,227 in one example, proteins identified as crosslinked in synaptosomes were three times more abundant than the average synaptic protein. 227 The main power of XL-MS is its ability to illuminate endogenous protein complex topology and architecture. This approach has been used to determine the architecture of diverse endogenous complexes ranging from the coatamer module of the nuclear pore complex (NPC), 219 chromatin− protein complexes, 228 variants of the 26S proteasome, 221,229,230 TAP-purified PolII-TFIIF complexes, 231 ATP-synthase, 232 and substrate-bound TriC, 233 among others.
One strength of XL-MS for determining complex architecture is that cell permeable cross-linkers can be added to intact cells and tissues, yielding inter-and intramolecular cross-links that reflect topologies found in vivo. 234 Workflows for XL-MS in cells and tissues are described in refs 209 and 235. This approach has been used to determine the topology of respirasome supercomplexes directly from mitochondria, 236 sarcomere protein complexes directly from heart tissue, 237 and the expressome composed of RNA polymerase and the ribosome linked by NusA and NusG 238 ( Figure 5A). Rappsilber and co-workers recently demonstrated that it is possible to generate de novo cross-linked structures of bovine serum albumin in plasma. 239 Blankenship and co-workers 240 applied XL-MS to intact cyanobacteria, using affinity purification of photosystem II to generate a structure of the megacomplex composed of photosystem I, photosystem II, and the phycobilisome.
Because cross-links report on a given distance cutoff between two residues, cross-links can be a source of information on the dynamics and conformational fluctuations in a complex. Cross-links can be compared between different samples, for example, following a biological treatment. Schmidt and co-workers noted that upon dephosphorylation of the Ftype ATPase, intersubunit cross-links in the head, stators, and stalk portions changed in abundance, likely reflective of phosphorylation dependent nucleotide binding. 241 Chavez and co-workers used a quantitative XL approach in live cells to monitor changes in conformations of HSP90 upon treatment with 17-AAG 242 ( Figure 6A). Cross-links can also be analyzed to extract information about pools of a protein complex population in a single sample. The coexistence of cross-links that satisfy mutually exclusive conformations can reflect intrinsic heterogeneity and flexibility of the complex. Sinz and co-workers recently used XL-MS to examine the conformational plasticity of ribosomes, mapping a wide range of cross-links onto the model of the E. coli 70S ribosome and demonstrating that no single static structure satisfies all crosslinks. 243 3.  244, the "footprint" is produced by exchange of protons in a protein sample with deuterons. Protonated protein samples are incubated in a deuterium-based buffer for variable periods of time to allow exchange of amide backbone protons with deuterium in the buffer. Exchange of amide protons depends on the position of the amide site (backbone or side chain) and its dynamics (see below), with rapidly fluctuating regions exchanging quickly while rigid regions can remain stable for hours. 245 Amide exchange is then quenched by pH and/or temperature 246 and the extent of deuterium exchange measured. Historically, HDX exchange rates were measured by a range of methods, including NMR, but the coupling of HDX with MS readout, which easily distinguishes between deuterated and protonated peptides based on mass, greatly accelerated the application of this technique. 247 The intrinsic exchange rate for an amide backbone is on the order of milliseconds. The rate and extent of deuterium exchange observed for a particular peptide in a folded protein, specifically reflects the local rate of hydrogen bonding for the amides in question. 248 Generally, it can be assumed that this exchange rate is faster for amides that are solvent exposed and not involved in hydrogen bonds. If a particular region is undergoing dynamic fluctuations, local hydrogen bonds will be disrupted and potentially engage in exchange. Therefore, quantification of peptide deuteration as a function of time is generally assumed to report on solvent exposure and dynamics of a particular region. Moreover, HDX rates can be measured as a function of protein complex or protein state and therefore report on differences in subunit interfaces or dynamics between the states.
Measurements of HDX at the residue level are challenging, and therefore typically exchange rates are reported on the peptide level, meaning that this method does not have residue level information but rather information corresponding to the peptide level. HDX-MS is not be applied in vivo or in cell because following deuterium exchange for a given period of time, exchange must be quenched and the level of deuterium exchange preserved until peptides are analyzed. This prohibits sample purification after labeling and means that HDX-MS is only applicable to purified protein complexes. Much effort has Chemical Reviews pubs.acs.org/CR Review been expended toward optimizing workflows to prevent backexchange, as back-exchange during sample digestion and LC-MS analysis is a key drawback of the method. Recent developments in HDX-MS are reviewed in ref 249. Effective HDX-MS requires approximately 100 × 10 −12 mol of protein, 250 which is a fairly high sample requirement relative to other MS methods. HDX-MS provides information on complex dynamics 251 and is especially powerful when applied in a comparative format between two different states, such as upon binding of regulatory subunits 252 or ligands. 253 It has also been used to study protein folding and aggregation kinetics. 254 3.3.2.2.2. Application to Endogenous Protein Complexes Analysis. While most examples of HDX-MS are for recombinant proteins, it has tremendous potential for the dynamic analysis of endogenous protein complexes. In a recent example, the 20S proteasome purified from erythrocytes, as well as the immunoproteasomes purified from spleens, were analyzed and compared via HDX-MS, 255 revealing differences in flexibility between the two isoforms ( Figure 6B). The interaction of these complexes with the PA28 proteasome regulators were also studied. Another example using F-type ATPase, probed E. coli derived membrane vesicles, in which Konerman and co-workers used HDX-MS to examine mechanical stress in the motor during active and inactive states. 256 3.3.2.3. Chemical Labeling of MS. 3.3.2.3.1. Overview of the Method. As opposed to HDX-MS, which represents a reversible change (the swapping of a proton for a deuteron), footprinting via irreversible modification of amino acids is also used. 204 This can be accomplished either with chemical modifiers or with radicals, most typically oxygen and hydroxyl radicals. These provide a permanent modification to the protein sequence that will not be lost during bottom-up proteomic analysis. Typically, solvent exposed amino acids will be chemically modified, again provided that the structure allows for reactive accessibility.
A wide range of nonradical chemical modifiers exist (reviewed in ref 204), which react on the second to millisecond time scale. Many specifically target one type of amino acid, such as n-ethylmaleimide, which targets cysteine, or glycine ethyl ester (GEE), which targets aspartic and glutamic acids. Diethylpyrocarbonate (DEPC) labels a broad variety of nucleophilic residues, 257 including histidine, tyrosine, threonine, serine, lysine, and cysteine, yielding approximately 30% coverage over the range of surface exposed amino acids in the average protein.
Radicals, and specifically hydroxyl radicals, are a more popular choice. They can modify 19/20 of the amino acids, and thus, similar to HDX, provide broader coverage over the sequence. Radicals are generally produced on the millisecond and microsecond time scales. A variety of methods for generating radicals exist, including Fenton chemistry and radiolysis of water; these methods are reviewed in refs 258 and 259. One broadly applied method of generating hydroxyl radicals, fast photochemical oxidation of proteins, or FPOP, 260 reviewed in ref 261, generates radicals on even faster time scales of the nanosecond to microsecond time scale. This enables FPOP to be used to study protein folding events 262 and other fast dynamic events. In FPOP, samples are incubated with H 2 O 2 , and laser irradiation of the sample creates OH radicals which broadly label nearby amino acids.
Similar to HDX, chemical labeling methodologies provide information on the solvent accessible surface area of the protein or protein complex and therefore report on the topology of the complex. They are typically applied comparatively between two states to examine changes that occur upon treatment, for example, during heat treatment of an antibody. 263 Fast labeling approaches, such as FPOP, can illuminate fast dynamic events including protein folding. 262 Drawbacks of these chemical labeling methods include the requirement to work in special buffers compatible with the chemical or radical of choice, as well as separation of peptides postlabeling. Moreover, it can be difficult to determine the exact location of an oxidative modification in the case of radical mediated footprinting, especially if multiple oxidations occur. Quantification of % of labeling based on spectral intensities is also difficult to assess accurately. However, Sharp and coworkers demonstrated that electron transfer dissociation fragmentation tandem MS can efficiently generate quantitative tandem MS product ions 264 for analysis.
3.3.2.3.2. Application to Endogenous Protein Complexes Analysis. Because of the irreversible nature of chemical labeling applied, the method can be applied in vivo to endogenous proteins, unlike HDX experiments. Zhu et al. applied hydroxyl radical footprinting to proteins in the E. coli outer membrane and studied the voltage gating of OmpF, observing changes in oxidation inside the porin upon changing the conditions of the cells. 265 Jones and co-workers have advanced the use of FPOP to analyze proteins both in live cells 266,267 and in Caenorhabditis elegans, 268 demonstrating its potential for use as a structural technique in a multiorgan system. Chemical labeling was also applied to study human vitamin K epoxide reductase in cell using cysteine alkylation to determine the redox status of key catalytic cysteines. 269 3.3.2.4. Proximity-Induced Labeling. A related approach is proximity induced labeling, in which a protein of interest is genetically tagged with a protein that catalyzes the addition of a chemical tag, usually biotin, onto spatially proximal proteins. Popular proteins include APEX, 270,271 an engineered version of ascorbate peroxidase, which is fast and quite promiscuous, 272 or variants of biotin ligase, including BioID 273,274 and TurboID. 275 This technology provides information on protein−protein interactions, as opposed to topology and dynamics. Proteins can be enriched specifically if interactors are tagged with biotin or other affinity handles. However, these methods can struggle to identify specific interactors because they can promiscuously label all proximal proteins, although approaches have been developed to filter for specific interactions. For example, Krogan and co-workers combined APEX labeling with spatial reference proteins to filter for specific interactors, identifying interactors of GPCRs in response to agonist stimulation. 272 3.3.2.5. Limited Proteolysis MS. In limited proteolysis (LiP) mass spectrometry, proteins are briefly exposed to a promiscuous protease that cleaves exposed regions before denaturation and digestion for MS quantification. 276,277 The promiscuous protein will produce a cleavage pattern, or footprint, reflective of the structure of the protein or complex. The peptides detected by bottom-up proteomics will reflect these structure-dependent cleavages. By comparing between two samples, the effects of an external condition can be monitored. Picotti and co-workers have advanced LiP MS for the unbiased analysis of crude biological samples 277 In top-down MS, intact proteins or protein complexes are transferred directly into the gas phase. 280 Once in the gas phase, tandem MS can be used to cleave proteins into fragments for sequence analysis. Top-down MS is uniquely suited to identification of proteoforms. 95 This is because the mass of the intact proteoform is known, and thus modifications to individual amino acids can be cross-correlated with each other for an accurate description of the whole proteoform's simultaneous or mutually exclusive PTMs.
However, top-down proteomics methods are much less developed than bottom-up methods and face a range of challenges described in ref 281, including poor solubility of intact proteins, challenges in adequate fractionation of intact proteins, low sensitivity due to dynamic range issues, lowthroughput, and the complexity of data analysis. Thus, topdown proteomics often detects only the most-abundant proteins or those present at lower molecular weights. Additionally, there can be a lack of residue level sequence coverage due to poor fragmentation of intact proteins; efforts to increase fragmentation of intact proteins are ongoing. 282,283 Top-down MS has been used to study PTMs to specific endogenous proteins, often in combination with affinity purification strategies. This strategy has been used to study glycan heterogeneity in plasma proteins, 284 phosphorylation of endogenous mouse cardiac troponin I, 285 post-translational modifications of pili proteins from Neisseria meningitidis upon host-cell contact, 286 and examined how endogenously isolated K-Ras proteoforms are correlated with mutational status. 287 Unbiased high throughput top-down proteomics workflows, pioneered by Kelleher and co-workers, have been used to study post-translational modifications in response to induction of DNA damage and treatment with chemical stressors 288 as well as PTMs of cardiac proteins upon myocardial infarctions. 289 Proteoforms of whole protein complexes, correlating PTMs on one subunit to those on another, can be studied by combining native-MS ionization of whole complexes (see section 3.4.2) with top-down sequencing. Skinner et al. developed a native proteomics workflow to identify 217 proteoforms, including those of complexes such as dimeric creatine kinase M or DJ-1. Hybrid native and proteomic techniques were applied to identify the proteoforms of the C8 complex directly from human plasma. 290 In our lab, a multistage native MS strategy was used to examine how the endogenous yeast FBP1 complex adapts to glucose-rich conditions and heat shock 291 (see Figure 7). This study revealed that the complex responds differently to changes in growth conditions by tuning phosphorylation dynamics, thus illuminating the role of this PTM in FBP1. In another study, we used a top-down native approach to identify a new proteoform of the 20S proteasome PSMA7 subunit, a proteoform which would have eluded detection by bottomup proteomics due to its proximity to a lysine-rich region subject to tryptic digestion. 292 3.4.2. Native MS of Protein Complexes. 3.4.2.1. Overview of the Method. Native mass spectrometry takes the topdown approach one step further by retaining aspects of the protein's native tertiary and quaternary structure in the gas phase. 293 In native MS, conditions are optimized to ionize protein complexes while retaining noncovalent interactions, with experimental evidence confirming that aspects of native complex structure are maintained in the gas phase. 294 Complexes can then be broken down in the gas phase by tandem-MS techniques. Native-MS can have increased signalto-noise relative to denatured top-down methods because folded proteins populate fewer charge state peaks, increasing the intensity of the remaining peaks. 295,296 The addition of IM (see section 3.2.4) to the native-MS workflow yields information on the 3D shape of protein complexes. 134 IM separates ions according to their collision cross section, which reflects complex conformation and mass, providing a source of structural information. 125−127 This data can be combined with modeling tools that use collision cross sections to refine and develop structural models 297,298 (see section 3.5).
Native MS studies of protein complexes directly measure both composition and stoichiometry of protein complexes because quaternary structure is preserved in the gas phase (see Figure 4B for an example). Moreover, different coexisting complexes are separated in the spectrum based on their masses and can be compared in terms of their relative abundances for a particular sample preparation. Tandem-MS dissociates the complexes into their components to assign and confirm complex composition, to distinguish between core and peripheral subunits and determine subunit network. Contaminants are also easily distinguished because they are not part of the complex charge series.
Analyzing native-MS spectra involves assigning different peaks in the spectra to charge state series for proteins or complexes of interest. This is can be done manually or via computational spectral deconvolution. A variety of softwares exist to enable this, including open source softwares such as UniDec, which deconvolutes spectra using Bayesian inference, 299,300 NaVia, a program to augment manual assignment, 301 as well as vendor-provided software. Software for analyzing native IM-MS experiments presents several challenges, as reviewed in ref 298. Data must be interpreted to assign and extrapolate charge states as well as determine collision cross sections, which can then be used to develop structural models. Determining the collision cross section is typically done via a standard calibration curve, 302 with an open source automated collision cross section software recently introduced by Metz and co-workers. 303 Software has also been introduced to analyze IM experiments coupled to gas-phase activation or collision induced unfolding experiments. 304,305 The main drawback of native-MS for determining protein complex composition is that relatively large quantities of complexes are needed for each experiment, with at least 10 μL of midnanomolar concentration required for a detailed native-MS experiment. Assignment of native MS peaks to particular complexes also requires some preknowledge of expected molecular weights in the sample. Therefore, it is useful to have preknowledge of the proteins present in solution and the exact molecular weights of the different subunits, information that can be obtained from proteomics and intact MS of denatured samples, respectively (see section 3.4).
Recent advances in native MS have the potential to extend its applicability to endogenous complexes available in limited quantities. One example is the development of individual ion MS via charge detection mass spectrometry, 306 in which the ion charge is measured directly to generate a mass spectrum rather than a spectrum showing mass-to-charge ratio. Implemented on various experimental platforms including Orbitrap 307 and ion trap FT instruments, 308 this method has provided isotope resolved spectra of complexes as large as βgalactosidase, 466 kDa. This can enable very high resolution spectra of endogenous assemblies in complex mixtures because the measurement of an ion's true mass resolves ambiguities in crowded spectra.
3.4.2.2. Application to Endogenous Complexes Analysis. Endogenous affinity purification followed by native-MS has been used by multiple laboratories to characterize the yeast exosome. 309−312 This work has determined both the stoichiometry of various exosome components as well as compositional differences between nuclear and cytoplasmic exosome complexes. Synowsky and Heck showed via native-MS that the Ski complex, which associates with the yeast exosome, is a heterotetramer as opposed to a heterodimer as previously assumed. 313 Novel complexes have also been directly identified via native-MS; using endogenous purification from muscle tissue followed by native-MS, we identified novel complexes consisting of Arp2/3 subunits and vinculin, likely involved in the formation of focal adhesion complexes 18 ( Figure 4B). New subunits for existing complexes can also be Chemical Reviews pubs.acs.org/CR Review seen in native-MS spectra. For example, COP9 signalosome (CSN) complexes purified directly from erythrocytes harbored a novel subunit, named CSNAP, that had previously evaded detection. 17 Semi-high throughput native proteomics workflows, analogous to the fractionation-MS bottom-up methods described above, have also been developed. 19,25 Intriguingly, Skinner et al. 19 detected novel homomers, which often elude bottom-up proteomic approaches that rely on indirect measurement of protein association. In addition to soluble complexes, native MS can be applied to membrane proteins, work pioneered by Robinson and coworkers. 314 By ejecting endogenous membrane complexes directly from native membranes, 315 complexes of the chaperone DnaK and the outer membrane protein OmpA were observed, confirming the involvement of DnaK in membrane porin assembly. Endogenous membrane proteins can also be purified, and Gross and co-workers studied endogenous photosynthetic reaction centers via native-MS in detergent micelles. 316 Moreover, the method can be applied to very large protein complexes such as the ribosome (2.30 mDa) and its subcomplexes. Native and tandem-MS of the ribosome stalk complexes from a wide range of organisms revealed differences in stalk composition and stoichiometry. 317 Intact protein MS of ribosomes examined the stoichiometry of different ribosomal proteins, revealing heterogeneity and substoichiometric binding of ribosome associated proteins. 318,319 Affinity captured NPC endogenous complexes, which are upward of 80 megadaltons, were analyzed by native MS, confirming the overall stoichiometry and membrane components of the NPC complex. 183 Because protein complexes retain their solution noncovalent interactions, complex topology can be determined by analyzing the pattern of subunit dissociation in tandem native MS experiments ( Figure 5B). Generation of subcomplexes can be aided by addition of small amounts of methanol or other solvents that disrupt hydrophobic interactions, thus encouraging dissociation across weaker quaternary interfaces. Subunit connectivity is then determined by assuming that when large complexes dissociate, they generate subcomplexes from proteins already proximal in the intact structure. This method was used to determine the topologies of a range of endogenous yeast complexes, [310][311][312][313]320 including structural models of the yeast exosome that were later confirmed by crystallographic studies. 310,320 As discussed above, IM can also be used to derive structural information on endogenous complexes. While IM has not been broadly applied to purified endogenous complexes, our lab recently characterized orthologous endogenous 20S proteasome complexes from a range of species using IM, demonstrating an enlargement of the complex size from the archaeal prokaryotic complex to the eukaryotic 20S proteasomes in yeast and mammals. 292 Protein complexes are often modulated by cofactor binding. Native mass spectrometry is uniquely suited to the analysis of cofactor/complex binding because cofactors will appear as mass shifts in the MS spectrum of the complex. Moreover, a single native-MS experiment can reveal the full range of coexisting complexes, allowing relative ratios to be quantified in a single experiment. 321 Skinner et al. in their top-down proteomic study 19 demonstrated that the holoenzyme of αenolase was only present in the dimer, as opposed to the monomer, indicating that Mg stabilizes the dimer, and that GDP and magnesium supported formation of the RHOA heterodimers with RHOGDI1. In a similar study from E. coli, Shen and co-workers 25 identified protein complexes bound to metals and glutamine.
The impact of biomolecule binding can be also studied in the context of membrane protein complexes. Robinson and coworkers have made seminal contributions to the use of mass spectrometry to study the binding of lipids to protein complexes, many of them endogenously isolated from host membranes 241,315 (Figure 8). Membrane proteins display preferences for which endogenous lipids they bind, 322 and special tandem MS methodologies enable the identification of lipids directly from samples of purified membrane proteins. 323 These experiments have revealed the stoichiometry of lipidmembrane protein binding 241 and investigated the effect of lipids on membrane protein stability. 324 For example, in complexes ejected directly from endogenous membranes, the BAM complex from E. coli membranes was observed bound specifically to cardiolipin (Figure 8), while ANT-1 from the mitochondrial membrane bound fatty acids with palmitate headgroups, among others. 315 These studies are often Native MS can also be accomplished directly from crude samples, detecting abundant proteins without the need for purification. 325 This approach uses the limited dynamic range of MS as an advantage, and clear spectra can be acquired of abundant proteins. In an extension of the direct-MS approach, native MS was recently applied to individual erythrocytes yielding spectra of hemoglobin, which constitutes the vast majority of protein in erthrocytes. 326 Individual cells were selected and a microinjector used to transfer the cells into an MS emitter. However, these methods are limited to very abundant proteins; to apply this technique to a broader range of proteins, online separation and increased signal-to-noise must be applied.

Integrative Approaches and Computational Modeling
As described above, each MS method provides specific information about endogenous complexes, and each has their own pros and cons. The information provided from different MS experiments is complementary; for example, bottom-up proteomics can be exquisitely sensitive to complex components while native-MS directly detects whole complexes. Therefore, it is often advantageous to combine multiple MS modalities to derive a more complete functional and structural model of a particular endogenous complex. Termed "hybrid" mass spectrometric approaches, 229,319 in this method, information from different experimental modalities is combined to characterize complexes. For example, Robinson, Aebersold, and co-workers combined information from native-MS, bottom-up proteomics, chemical cross-linking, and IM to derive integrative models of the 26S proteasome lid complex and its submodules. 229 Similarly, Heck and co-workers used hybrid MS approaches to analyze a ribosomes 319 from multiple kingdoms of life. For native MS experiments that determine complex stoichiometry, bottom-up proteomics are often important to identify the composing subunits, 310,313 with complex stoichiometry determined by native-MS measurement of the intact complex.
Moreover, information from MS can be used in combination with computational molecular modeling tools, with MS data serving as a restraint for the generation of complete structural models. Cross-linking MS is a natural companion to computational modeling, as described in ref 222. Chemical labeling methodologies can also be combined with computational methods to determine protein structures and topographies. Huang et al. advanced the field by converting hydroxyl radical footprinting data into a protection factor that can be used as a restraint in molecular modeling. 327 Similarly, hydroxyl radical footprinting data has been used to differentiate between molecular models 328 and combined with molecular dynamics data for model determination data. 329 IM can also be combined with modeling tools that use the collision cross section as a restraint to refine and develop structural models. 297,298,330 Because of the sensitivity and robustness of these methods, they hold great promise for the study of endogenous complexes available in limited quantity.
Moreover, modeling tools form the basis of integrative structural biology, 332 of which MS is often an important component. As described in refs 332 and 333, this experimental approach combines data from multiple experimental and theoretical methods to build a model of a large biological system for which no single method is sufficient. A wide range of experimental data can be used, including different MS modalities, cryo-EM, SAXS, FRET, and protein− protein interaction data. Different types of experimental and theoretical data are phrased as restraints, and an algorithm driven by a scoring function determines the structural and dynamic model that satisfies most data. Central to this experimental approach is flexible software that can generate the complete range of structures that satisfy the available data. Popular available programs include Rosetta 334 and the Integrative Modeling Platform (IMP), 335 although there are other options available that may be tailored to a particular experimental tool, as described in refs 333 and 336.
Several important and extremely large structural models have been built using integrative tools that use MS data as a central modality. For example, MS data has played an integral role in the structural biology of the nuclear pore complex. One of the first integrative models was reported in 2007 by Alber and co-workers, 337 later updated in 2018, 183 combining information from proteomic affinity experiments, charge detection native-MS, cross-linking MS, cryo-EM, among other information, to develop a full structural model of the complete NPC, which is greater than 40 mDa. Beck and coworkers also used integrative modeling, relying again on crosslinking MS and cryo-EM tomography, to analyze the full nuclear pore complex 338 and build a model of the inner scaffold ring. 339 Cross-linking MS along with cryo-EM data was used to describe the architecture of the 10 mega Da pyruvate dehydrogenase supercomplex. 331 Other examples of structural models relying on MS data include the 26S proteasome, 230 constructed by combining cross-linking MS, proteomics interaction data, and cryo-EM, and the human Polycomb repressive complex 2 complex, generated primarily by cryo-EM but also with cross-linking MS data. 340 Another related area is the increasing use of MS for structural proteomics efforts that combine MS and cryo-EM for the analysis of cryo-EM grids prepared from crude fractions as reviewed in ref 341. MS identification of components in SEC or sucrose gradient fractions is key to enabling the fitting of electron density maps to specific proteins. 342−344 Integrating the MS methods described here with these and other emerging structural biology techniques will only further the study of endogenous complexes.

CONCLUSION AND FUTURE DIRECTIONS
Biological MS has made seminal contributions to the analysis of endogenous protein complexes. Depending on the technology chosen, insights range from characterizing the interaction partners of a specific protein to determining the topology of endogenous complexes to analyzing conformational changes between different states. Importantly, by determining these properties for endogenously isolated protein complexes, results are generated that report on biologically relevant interactions and conformations.
Over the course of this review, it has been repeatedly highlighted that one of the main challenges in characterizing endogenous complexes is their low abundance, which is in part what makes MS a useful tool for characterizing such complexes. One of the main solutions to this issue is to process larger amounts of starting material. While it is relatively straightforward to scale up yeast and bacterial growth protocols, even in an academic setting, it is more difficult and costly for eukaryotic cells. HEK293 cells and other strains have Chemical Reviews pubs.acs.org/CR Review been adapted to growth in high density suspension culture but may not be the most relevant physiologically due to altered gene expression. 345 Importantly, for native-MS, emerging research indicates that enrichment, rather than complete purification, can yield well-resolved spectra of proteins. 325 Thus, for native MS efforts where intact complexes are detected, proteins may not need to be fully purified for complex identification. Advances in mass spectrometer technology are also increasing the sensitivity of measurements and enabling application of biological MS to less abundant proteins. Single-molecule MS, 346 in which both charge and m/z ratio are measured to directly detect the mass of the ion, was recently applied to native macromolecular protein assemblies on Orbitrap instruments 306,347 by two different groups. Singleparticle charge detection not only increases the sensitivity of native-MS but also permits the resolution of overlapped charge states in heterogeneous samples. Very small amounts of sample can be analyzed with novel microfluidic devices. 348 For bottom-up proteomics, new instruments that includes variants of IM devices can greatly increase sensitivity and sequencing speed 145 and were recently shown to identify proteins from single cells. 349,350 Applying these technologies to the analysis of protein complexes will require miniaturization of purification techniques but can potentially enable comparison of complexes across different tissues available in limited amounts. This latter goal, namely understanding how complexes vary between tissues and physiological states, is a crucial frontier for biological MS. Thus far, however, characterization of endogenous complexes has largely been limited to model organisms such as yeast and model cell lines such as HEK and HeLa. Extending this approach to the analysis of protein complexes and proteoforms from a range of tissues can be enabled by CRISPR genome tagging approaches, which permit affinity labeling of proteins from a broader range of germ lines. It is also conceivable to construct affinity tagged mice and purify and compare complexes from a broad range of tissues. We expect that in the future biological mass spectrometry will be applied to a broader range of organisms and under healthy and disease states.
It is our hope that this review, while broad and general in its outlook, will serve as a guide to scientists interested in studying endogenous protein complexes and will stimulate the application of biological MS to a broader range of problems.