Multimodal Mass Spectrometry Identifies a Conserved Protective Epitope in S. pyogenes Streptolysin O

An important element of antibody-guided vaccine design is the use of neutralizing or opsonic monoclonal antibodies to define protective epitopes in their native three-dimensional conformation. Here, we demonstrate a multimodal mass spectrometry-based strategy for in-depth characterization of antigen–antibody complexes to enable the identification of protective epitopes using the cytolytic exotoxin Streptolysin O (SLO) from Streptococcus pyogenes as a showcase. We first discovered a monoclonal antibody with an undisclosed sequence capable of neutralizing SLO-mediated cytolysis. The amino acid sequence of both the antibody light and the heavy chain was determined using mass-spectrometry-based de novo sequencing, followed by chemical cross-linking mass spectrometry to generate distance constraints between the antibody fragment antigen-binding region and SLO. Subsequent integrative computational modeling revealed a discontinuous epitope located in domain 3 of SLO that was experimentally validated by hydrogen–deuterium exchange mass spectrometry and reverse engineering of the targeted epitope. The results show that the antibody inhibits SLO-mediated cytolysis by binding to a discontinuous epitope in domain 3, likely preventing oligomerization and subsequent secondary structure transitions critical for pore-formation. The epitope is highly conserved across >98% of the characterized S. pyogenes isolates, making it an attractive target for antibody-based therapy and vaccine design against severe streptococcal infections.


Supplementary Figures
Figure S1A).Annotated SLO domains with residue numbers, colored respectively.B).The ELISA result for monoclonal antibody binding specificity against SLO, unpaired t-test analysis was conducted to compare conditions.C).Six views of the Fab interaction space directing the SLO protein presented in cartoon, constructed by any one to all four distance constraints via the DisVis 1 complete analysis mode.

S-3 D). DisVis interaction analysis
showing residue IF at the interface within all possible Fab-SLO pairwise complex conformation, consistent with at least 1 to 4 distance constraints.E).The redundancy map of 202 common peptides identified in both apo SLO and nAb-SLO complex from the HDX-MS experiments.Peptide fragments are displayed as bar with the length correlating to peptide length, with residue redundancy visualized by a color gradient from 0 to 16. F).The cumulative Da changes due to deuterium uptake across all shared peptides are shown in a butterfly plot.Peptide region with differential deuterium uptake are framed by dashed lines.First and last peptides from that protected region are labeled with residue numbers accordingly.

Figure S2A
).The redesign process of the novel d3_m construct (orange) to mimic the protective epitope located in the native domain 3 (green) of SLO protein.Table S5.The summary of modeled Fab-SLO pairwise complexes using HADDOCK 2.4 antibody-antigen docking protocol [6][7][8] .394 generated Fab-SLO structures were clustered into 5 groups, representing 98.5% of the final water-refined models.Top three clusters are shown here as above, the top-ranked Fab-SLO model (determined by the lowest HADDOCK combined score as -145) from the cluster 1, also as the largest cluster, was used for following analysis (including both PRODIGY 9 and SpotOn 10 ) to identify the interface residues.

Protein production and purification
The complete SLO sequence (Uniprot ID: P0DF96), excluding the signal peptide, was engineered to incorporate an N-terminal Strep-HA-His tag.The plasmid carrying the tag-SLO gene was synthesized and assembled by the Lund University Protein Production Platform, subsequently transformed into BL21(DE3) Competent Cells (Thermo Scientific).The induction and purification of the target tag-SLO protein was carried out in-house following a previously published protocol 11 .The reverse-engineered d3_m construct was expressed and purified by the Protein Production Sweden Umeå node.

Monoclonal antibody binding specificity and its neutralization against SLO hemolysis
For ELISA assays, an equivalent quantity of SLO protein or the d3_m construct (ca. 3 µg) was immobilized onto a MaxiSorp plate (Thermo Scientific), which was subsequently incubated with either 0.4 µg nAb (followed by serial half-dilution), Xolair or 1x PBS (Phosphate buffer saline tablet, Sigma Aldrich) as a background control.Post thorough washing with PBST buffer (1x PBS, 0.1% Tween 20), a secondary HRPconjugated anti-mouse IgG goat antibody (Bio-Rad) and HRP substrate kit (Bio-Rad) were applied in sequence.After 3 min development, the plate was read in a microplate reader (BMG Labtech) at a wavelength of 415 nm.To compare binding of the nAb against full length SLO and d3_m construct, Prism 10 built-in non-linear regression (Equation: One site --Specific binding) mode was applied to derive the Kd and Bmax with 95% confidence interval.For the SLO cytolysis inhibition assay, sheep red blood cells (Thermo Scientific, Oxoid) were first diluted with 1x PBS accordingly 12 , followed by the addition of 0.1 µg of active SLO protein reduced by TCEP (Sigma Aldrich) and 5 µg of corresponding IgG or different IgG fragments.Xolair was purchased from Novartis.P.IgG represents a pool of immunoglobulin G isolated from the plasma of a donor who recently recovered from a GAS infection.A FragIT kit (Genovis) was utilized to digest P.IgG and generate two fractions: F(ab')2-and Fc-fragments.Following incubation in ThermoMixer (Eppendorf) at 37°C and 300 rpm for 30 minutes, the plates were centrifuged, and the supernatant was transferred to a new plate for reading at 541 nm wavelength by a microplate reader (BMG Labtech).Measured absorbance value correlates to the quantity of leaked hemoglobin due to hemolysis.The positive control using SLO alone was defined as a 100% lysis rate.

De novo sequencing of the nAb, assembling protein and modeling of the antibody Fab
The full-length nAb was initially reduced with 5 mM TCEP (Sigma Aldrich) and alkylated with 10 mM IAA (Sigma Aldrich), followed by overnight digestion in ThermoMixer (Eppendorf) at 37°C and 500 rpm, using trypsin, chymotrypsin, elastase, and pepsin (Promega) at an enzyme to substrate ratio of 1:20.The digested peptides were then purified using a C18 clean-up spin column (Thermo Scientific), concentrated in a SpeedVac (Eppendorf), and reconstituted into buffer A (2% acetonitrile, 0.2% formic acid) prior to mass spectrometry analysis.Approximately 1 µg of peptides from each sample, quantified using a NanoDrop spectrophotometer (DeNovix), were loaded onto an EASY-nLC 1200 system interfaced with a Q Exactive HF-X hybrid quadrupole-Orbitrap mass spectrometer (Thermo Scientific).Each enzyme-digested sample was analyzed in duplicate injection.The peptides were first concentrated on a precolumn (PepMap100 C18 3 μm; 75 μm × 2 cm; Thermo Fisher Scientific) and then separated on an EASY-Spray column (ES903, column temperature 45 °C; Thermo Fisher Scientific), in according with the manufacturer recommendations.Two solvents were used as mobile phases: solvent A (0.1% formic acid) and solvent B (0.1% formic acid, 80% acetonitrile).A linear gradient from 5 to 38% B was employed over 180 minutes at a constant flow rate of 350 nl/min.For data acquisition, a data-dependent acquisition (DDA) method was implemented as follows.
An initial MS1 scan with a scan range of 350-1650 m/z, resolution of 120,000, auto gain control (AGC) target of 3e 6 and maximum IT (injection time) 45 ms was followed by the top 15 MS2 scans at a resolution of 15,000, AGC target 1e 5 , 30 ms IT and a stepped normalized collision energy (NCE) of 20, 25 and 30.Charge states of 1, 6-8 and above were excluded, except in samples digested with other enzymes than trypsin where singly charged ions were included.The performance of the LC-MS system was controlled by analyzing a yeast protein extract digest (Promega).A total of eight datasets were collected and subsequently processed using multiple de novo MS sequencing algorithms 13 .A novel approach involving cumulative fragment-ion evidence was applied to enhance de novo peptide sequencing and subsequent primary protein structure assembly.Peptide candidates were generated using three deep-learning-based de novo peptide tools: PointNovo 14 , CasaNovo 15 , and InstaNovo 16 .For PointNovo, two in-house multienzyme-trained models 17 were utilized, whereas default models were employed for CasaNovo and InstaNovo.The study considered twelve fragment ions: a +1 , a +2 , b +1 , b +2 , y +1 , y +2 , a-H2O, b-H2O, y-H2O, a-NH3, b-NH3, and y-NH3, with candidate selection based on a 20 ppm tolerance at both MS1 and MS2 levels and the observation of a minimum of four fragment ions.A positional evidence vector with length L was constructed for each candidate, filled by the count of found fragment ions at each position.These vectors were then pooled across all spectra from all samples.Subsequently, peptides were segmented into 5-mers retaining the pooled positional fragment ion information.Cumulative MS evidence for these new 5-mer vectors was compiled elementwise.Finally, a cumulative MS score (cMS) for each peptide candidate was calculated by condensing the positional ion global evidence and normalizing for peptide length.This MS-evidence-driven strategy effectively disseminates information on redundant and overlapping segments identified across all MS samples analyzed.The highest-ranked peptide candidates were those that not only showed maximal overlap but also robust MS evidence supporting the target protein sequences.The positional confidence score 17 for the assembled heterodimeric chains of the Fab domain is illustrated in Figure 1E.
Next, proABC-2 was employed to predict the hypervariable region of the nAb Fab domain, of relevance for subsequent docking studies 18 .This was informed by the previously acquired sequences of both the heavy and light chains of the nAb.The structure of the nAb Fab fragment was predicted with AlphaFold-Multimer (v2.3.1) 19using the version-specific Docker container with default settings.The resulting structures were verified by calculating the inter-chain pDockQ values for each model 20 .The predicted model with the highest pDockQ value was used for downstream analysis.

Cross-linking mass spectrometry and data analysis
The exploratory modeling process involved XL-MS for an in-solution cross-linking reaction and subsequent cross-linked peptide identification.The nAb and SLO were mixed at a 1:1 molar ratio in 1x PBS solution at 37°C with 500 rpm agitation for 1 hour incubation.The duplet linkers, DSS-H12/D12 or DSG-H6/D6 (Creative Molecules), were then added to cross-link the samples over the course of two hours.The reaction was quenched with 4 M ammonia bicarbonate (Sigma Aldrich), and followed by a standard reduction and alkylation procedures as stated above.We then employed a two-step digestion process (involving lysyl endopeptidase, from FUJIFILM Wako Chemicals U.S.A. Corporation, followed by trypsin, Promega) in-solution to generate the cross-linked peptide pairs.The peptides were cleaned, dried, and reconstituted before analysis by mass spectrometer as using the same protocol described above.Approximately 800 ng of the peptides from each sample, quantified using a NanoDrop spectrophotometer, were loaded into an Ultimate 3000 UPLC system connected to an Orbitrap Eclipse Tribrid Mass Spectrometer (Thermo Scientific).Each sample was performed technical duplicate injections.Column equilibration and sample loading were performed according to the manufacturer guidelines.The mobile phases used included Solvent A (0.1% formic acid) and Solvent B (0.1% formic acid and 80% acetonitrile).The gradient was linear and ranged from 5 to 38%, with a consistent flow rate of 300 nl/min over 90 minutes.The DDA method consisted of one MS1 scan with a scan range of 350-1650 m/z, resolution of 120,000, a standard-mode AGC target and auto-mode maximum injection time.The fragment setup included a 3-second cycle time with MS2 scans, 15,000 resolution, standard AGC target, 22 ms IT and NCE of 30.All runs incorporated charge states from 2 to 6.The LC-MS performance was monitored prior to analysis using HeLa protein digest standard (Thermo Fisher Scientific).The cross-linking datasets were analyzed using pLink2 2 , configuring DSG-H6/D6 and DSS-H12/D12 (Creative Molecules Inc.) linker modifications according to the manufacturer product pages.The searched library included the sequences for the heavy, light chains of nAb and SLO along with common contaminants, and a maximum of two missed cleavage sites were allowed.Visualization of the top two nAb-SLO crosslinked peptides was done by xiSPEC 21 , based on the corresponding extracted peak lists.

DisVis interaction analysis based on distance constraints generated by XL-MS
All crosslinked peptides that connected the nAb Fab fragment and SLO, identified from both cross-linking experiments with duplet linkers, were summarized.DisVis was employed for quick scanning and interaction analysis 1 , utilizing the crystal structure of SLO and the modeled Fab structure for exploratory modeling.To begin with, the distance between Cα-Cα derived from the inter-protein cross-linked residues was set within a range of 0-30 Å.Initially, all identified XL constraints were used to calculate the z-score and group the clusters.The cluster comprising four XLs, which included the two most abundant XLs, was further utilized in the interaction analysis, with more stringent range of Cα-Cα distance as elaborated in supplementary table 4. The prediction of accessible residues of both Fab and SLO was conducted using NetSurfP-3.0 3 , focusing on those with a relative solvent accessibility (RSA) value greater than 40%.The interaction fraction index was computed through the interaction analysis mode of DisVis 1 , designating consistent IF values greater than 0.5 as putative residues that contribute to forming the suggested Fab-SLO binding interface.Above cut-off values were set according to DisVis developer manual to maximize confidence and reliability of the modeling practice.A step-by-step tutorial recommended by the developers could be found here, https://www.bonvinlab.org/education/Others/disvis-webserver/.

Distance-information-driven docking of Fab-SLO protein complex by HADDOCK
The HADDOCK 2.4 antibody-antigen information-driven docking protocol was implemented to construct the most accurate possible complex and to identify relevant interface contact residues [6][7][8] .Both the modeled Fab structure and the crystal structure of SLO were applied, with the predicted HV loops of nAb Fab fragment assigned as active residues, and DisVis-distance-derived putative interactive residues of SLO designated as passive residues.Two prevalent crosslink sites with a Cα-Cα range of 0-30Å were established as the center of mass constraints to enforce contact.Separate sampling parameters were set for rigid body docking, semi-flexible refinement, and final refinement at 10000, 400, and 400 models, respectively.
From the final refined candidate models, the one with the lowest combined HADDOCK score (indicating highest confidence) was selected for downstream interface extraction following manual inspection.Finally, PRODIGY 9 was employed to classify the corresponding types of interaction within the interface, and SpotON 10 to predict "hotspot" residues, which are presumed to be considerably involved in intermolecular interactions.

HDX-MS experiment and data analysis
The HDX-MS experimental setup involved a LEAP H/D-X PAL™ platform for automated sample preparation, which was interfaced with an LC-MS system that consisted of an Ultimate 3000 micro-LC connected to an Orbitrap Q Exactive Plus MS.HDX was carried out on SLO protein, both with and without commercially obtained nAb, in 10 mM PBS.Apo state (unbound SLO) and epitope mapping (Ab-bound SLO) samples were incubated for t = 0, 60, 1800, 9000 seconds at 20 °C in either PBS or an HDX labelling buffer of identical composition prepared in D2O.The experiment was conducted in a single, continuous run, with three replicates undertaken for each state and timepoint.The labelling reaction was quenched through dilution with 1% TFA, 0.4 M TCEP, 4 M urea, at 4 °C.The quenched sample was directly injected and subjected to online pepsin digestion at 4 °C.A flow of 50 μL/min 0.1% formic acid was applied for 4 minutes for online digestion and trapping of the samples.Digestion products underwent online solid phase extraction and washing with 0.1% FA for 60s on a trap column (PepMap300 C18), which was switched inline with a reversed-phase analytical column (Hypersil GOLD).Separation occurred at 1 °C, with mobile phases of 0.1% formic acid (A) and 95% acetonitrile/0.1% formic acid (B), using a gradient of 5-50% B over 8 minutes and then from 50 to 90% B for 5 minutes.The separated peptides were analyzed on a Q Exactive Plus MS, equipped with a heated electrospray source (HESI) operating at a capillary temperature of 250 °C with sheath gas 12 au, auxiliary gas 2 au, and sweep gas 1 au.For HDX analysis, MS full scan spectra were obtained at 70,000 resolution, automatic gain control 3e 6 , Max ion injection time 200 ms and scan range 300-2000 m/z.The identification of generated peptides was performed by analyzing separate un-deuterated samples using data dependent acquisition MS/MS.A library pool of peptides that included peptide sequence, charge state, and retention time was curated for the HDX analysis by running pepsindigested, un-deuterated samples against the SLO sequence on PEAKS Studio X (Bioinformatics Solutions Inc.).HDX data analysis and visualization were performed using HDExaminer v3.1.1 (Sierra Analytics Inc.).Ab-bound states were analyzed compared to Apo states, using a single charge state per peptide.Given the comparative nature of the measurements, the deuterium incorporation for the peptic peptides was derived from the observed relative mass difference between the deuterated and non-deuterated peptides without back-exchange correction using a fully deuterated sample.The spectra for all time points were manually inspected; low scoring peptides, obvious outliers, and peptides for which retention time correction could not be made consistent were removed.Deuteros 2.0 22,23 was further applied to perform the hybrid significance test and to visualize the change of deuterium uptake in coverage plot, butterfly plot, volcano plot, kinetic uptake, as well as projecting the protected peptide residue coordinates to the indicated 3D structure.

Carriage and epitope conservation analysis, and reverse-engineering of the construct
A dedicated BLAST database was established using genomic data from Streptococcus pyogenes sourced from The Bacterial and Viral Bioinformatics Resource Center (BV-BRC) 24 as of March 28, 2023.Genomes classified as being of poor quality, sourced from plasmids, or as duplicate entries were excluded.This led to the creation of a curated database comprising 2216 genomes, narrowed down from the original 2283 entries.Employing BLASTp 25 , the Streptolysin O sequence (Uniprot ID: P0DF96) was queried against this database.Hits that covered over 98% of the query sequence were considered matches, with the threshold selected based on data characteristics.Domain 3 of Streptolysin O from Streptococcus pyogenes M3, spanning residues 250-299 and 346-420, was determined as the target for construct design.This domain, interrupted by a subsequence of domain 1, required a re-engineered approach for continuity.The Message Passing Neural Network ProteinMPNN 26 algorithm was utilized to determine an alternate amino acid sequence capable of maintaining the original backbone conformation of domain 3. To start with, the redesign process commenced with the substitution of the domain 1 sequence with a predefined tetra-glycine linker (-GGGG-), chosen for its structural flexibility.Then, -APNG-linker sequence was predicted for replacing domain 1 subsequence.This modified sequence's quaternary structure was then predicted using AlphaFold2 27 and subsequently aligned to the native structure in ChimeraX to confirm that domain 3's integrity was preserved post-substitution.Additionally, ensuring the structural stability of the designed construct was paramount.An assessment was conducted to ascertain if substituting any internal amino acids could yield a more stable conformation.ProteinMPNN 26 was employed to evaluate potential replacements for the internal residues, while all surface residues remained unchanged.This optimization process revealed that the native internal residues were optimal, and no substitutions were necessary.
The workflow illustration was created by BioRender.comwith an academic license.GraphPad Prism 10 was used to generate dot plots, bar lots, and heatmaps, and perform statistics and regression analysis.
ChimeraX and Chimera 28 were applied to visualize corresponding cross-links as pseudo-bonds, to depict 3D structures of proteins, domains, and epitopes, and to perform superimposition and alignment.

Supplementary Figure 1 :
Tang et al common peptides between states SLO and nAb-bound SLO (95.7% coverage across 202 peptides)

Table S2 .
The summary of DSG or DSS cross-linked peptide pairs found between nAb Fab domain and SLO.The two most representative cross-links are highlighted in bold.No.12 cross-link was removed for unspecific interaction.

Table S6 .
List of hotspots (most important interface residues) and nullspots (non-hotspot residues) on SLO protein, predicted by SpotOn 10 within the top-ranked Fab-SLO pairwise complex model.AA: amino acid.