Predicting and Experimentally Validating Hot-Spot Residues at Protein–Protein Interfaces

Protein–protein interactions (PPIs) are vital to all biological processes. These interactions are often dynamic, sometimes transient, typically occur over large topographically shallow protein surfaces, and can exhibit a broad range of affinities. Considerable progress has been made in determining PPI structures. However, given the above properties, understanding the key determinants of their thermodynamic stability remains a challenge in chemical biology. An improved ability to identify and engineer PPIs would advance understanding of biological mechanisms and mutant phenotypes and also provide a firmer foundation for inhibitor design. In silico prediction of PPI hot-spot amino acids using computational alanine scanning (CAS) offers a rapid approach for predicting key residues that drive protein–protein association. This can be applied to all known PPI structures; however there is a trade-off between throughput and accuracy. Here we describe a comparative analysis of multiple CAS methods, which highlights effective approaches to improve the accuracy of predicting hot-spot residues. Alongside this, we introduce a new method, BUDE Alanine Scanning, which can be applied to single structures from crystallography and to structural ensembles from NMR or molecular dynamics data. The comparative analyses facilitate accurate prediction of hot-spots that we validate experimentally with three diverse targets: NOXA-B/MCL-1 (an α-helix-mediated PPI), SIMS/SUMO, and GKAP/SHANK-PDZ (both β-strand-mediated interactions). Finally, the approach is applied to the accurate prediction of hot-spot residues at a topographically novel Affimer/BCL-xL protein–protein interface.


Supplementary Figures and
. Summary of methods used for computational prediction. 1 Only with scripting multiple separate calls to the command line and collating data; 2 Requires user-specified input file for residues; 3 Interface residues must be manually assigned for each interface to allow for repacking

Supplementary methods
Expression and purification of proteins MCL-1 and BCL-xL were expressed, purified and characterized following minor adaptations to methods described previously 1, 2 For SUMO and SHANK1-PDZ new protocols were elaborated.
Experimental methods for preparation of these proteins are detailed in the following sections.

MCL-1
The pet28a His-SUMO Mcl-1 (172-327) construct was over-expressed in the E.coli strain Rosetta 2. 10 ml of overnight starter culture was used to inoculate 1 L 2 xYT containing 50 μg/ml Kanamycin and 50ug/ml Chloramphenicol. Cultures were grown at 37 °C plus shaking until OD600 ~ 0.6 -0.8, the temperature was then switched to 18 °C and protein expression induced by the addition of 0.8mM IPTG.

SHANK1 PDZ
Human Shank1 PDZ domain (656-762) was cloned into the pGEX-6P-2 expression vector and transformed to BL21 Gold cell line. 10 ml overnight starter culture was inoculated to 1 l 2xYT media containing 50 g/ml chloramphenicol. Cells were grown at 37°C until OD600 0.6-0.8 and induced with 0.1 mM IPTG and incubated overnight at 18ºC. Cells were harvested and resuspended in 20 mM Tris, pH 8, 500 mM NaCl, containing protease inhibitor and 1U of DNAseI per litre of cell culture. Cells were lysed by sonication (8 cycles, 20 seconds on 40 seconds off, 10 A) and centrifuged at 25.000 g for 45 minutes at 4ºC. The supernatant was filtered (0.22 m membrane) and applied to glutathione beads and washed with 10 CV 20 mM Tris, pH 8, 500 mM NaCl. GST was cleaved on-column overnight at 4 ºC using Prescission protease. The eluted fractions were concentrated and purified by size-exclusion chromatography on S75 26/60 pg column in 20 mM Tris, 150 mM NaCl, 5% glycerol pH 7.5 buffer.
Collected fractions were analysed by SDS-PAGE and concentrated. Pure protein was analysed by high resolution mass spectrometry: expected m/z: 12326.3 measured m/z: 12325.6. Concentration was determined using UV-VIS spectroscopy in 6M urea using the 8480 M -1 cm -1 extinction coefficient.
Cells were harvested by centrifugation and resuspended in 25 mM TRIS pH 8.0, 500 mM NaCl, 15 mM imidazole, containing protease inhibitor and 1U of DNAseI per litre of cell culture. Cells were lysed by sonication (6 cycles, 20 seconds on 40 seconds off, 10 A) followed by centrifugation at 25.000 g for 45 minutes at 4ºC. Supernatant was filtered and applied onto a preequilibrated 5 ml HisTrap column.
The cleared cell lysate was then allowed to flow through the HisTrap with the aid of a peristaltic pump. The

BCL-xL
pet28a His-SUMO Bcl-xl (Chimera with BCL-2, 1-198, missing 26-81) was expressed using a similar protocol as described above for MCL-1. Briefly bacterial cultures were grown at 37°C to a density of OD600 = 0.5-0.8 and protein expression then induced by addition of 0.5 mM isopropyl-β-D- HisTrap was then washed with 10 CV of 50 mM TRIS pH 8.0, 500 mM NaCl, 15 mM imidazole followed by 10 CV 50 mM TRIS pH 8.0, 500 mM NaCl, 50 mM imidazole and 10 CV 50 mM TRIS pH 8.0, 500 mM NaCl, 100 mM imidazole. The Affimer was then eluted from the HisTrap with 50 mM TRIS pH 8.0, 500 mM NaCl, 300 mM imidazole. Successful elution was confirmed on a gel before further purification was undertaken. The eluted Affimer was concentrated (Amicon Ultra centrifugal filter, MWCO 10,000) to approximately 5 ml. The sample was then filtered before being loaded onto a Superdex 75 column (GE healthcare) equilibrated in 50 mM TRIS pH 8.0, 250 mM NaCl, 0.5 mM DTT, 2.5% Glycerol. The protein eluted as a monomer from gel filtration. The purified was concentrated to ~ 6 mg/ml and stored at -80 °C with the addition of 5% Glycerol.
All protein constructs were expressed in the E. coli strain Rosetta 2.

Usability and limitations of CAS methods
The command-line tools of BudeAlaScan and FoldX readily process ensembles of structures, although the latter requires a script to run the program and collate the data. Likewise, Rosetta Flex_DDG can process structure ensembles with scripting but is several orders of magnitude slower than the other methods. Robetta is accessed via a website and processes a single structure at a time, with the interface automatically assigned. These jobs are batch processed so the time for results to be returned depends on server load. BeAtMuSiC and mCSM are both accessed by webservers and used interactively. BeAtMuSiC has the feature of mutating the interface to all residue types, not just to alanine. mCSM requires the definition of one residue at a time and a structure upload for each calculation, but this can be scripted in contrast with servers like DrugScore PPI that requires human input for each submission.

BudeAlaScan
BudeAlaScan is command-line python application for computational alanine scanning. It employs ISAMBARD 4 for structure manipulation, a customised version of the Bristol University Docking Engine (BUDE) 5 for energy calculations and SCWRL 6 for side chain repacking (this latter feature is not used in this paper, but is required for calculating hot-regions or clusters of interfacial residues, that will be described elsewhere). Side chain flexibility in BudeAlaScan is addressed by calculating the interaction energy of a set of rotamers of the residue being replaced by alanine with the "receptor" and averaging all rotamers with a favourable energy. This feature is enabled by default for residues of the class DERKH and is designed to account for the entropic cost of freezing a residue in a salt bridge. The program was run in scan mode with default parameters. The application will be available via the BAlaS server: http://coiledcoils.chm.bris.ac.uk/balas

Molecular Dynamics Simulations
All simulations were performed using the GROMACS 5.1.4 suite and the following general protocols.

Cycles for Automated Peptide Synthesis
Peptides were prepared on a microwave assisted Liberty Blue CEM peptide synthesiser followed this cycle:
For methods that did not use microwave assistance, the reaction cycle was the same, expect the microwave method for deprotection and coupling was replaced by agitation of the resin at r.t. for 15 min and 25 min respectively.
After the final residue, the resin was ejected from the reaction vessel and any further linker coupling, capping, cleavage and deprotections were performed manually using methods A to G Arginine was subjected to double coupling as standard.

Method
FMoc-PEG linker was treated as any other conventional amino acid and coupled using microwave assisted synthesis

Method A: Coupling of Aminohexanoic acid (Ahx) and gamma-aminobutyric acid (Ga)
Following ejection from the automated synthesiser, the resin was placed in an fritted empty SPE tube and the desired unnatural amino acid (5 equiv.), DIPEA (5 equiv.) and HCTU (5 equiv.) were dissolved in DMF (1 mL) and added to the resin, followed by agitation for 1 h. For double couplings, this step was repeated. After removal of the reagents by filtration, the resin was washed with DMF (3 × 2 mL × 2 min) and the success of the coupling determined by a negative colour test (Method C). Deprotection of the Fmoc-protected N-terminus then followed (Method B).

Method B: Deprotection of N-Fmoc protecting groups
N-terminal Fmoc-protecting groups were removed by the addition of 20% piperidine: DMF (v/v) (5 × 2 mL × 2 min), followed by rinsing the resin with DMF (5 × 2 mL × 2 min). Successful deprotection was determined by a positive colour test (Method C).

Method C: Kaiser Test 7
The Kaiser Test was used for the determination of the successful coupling or deprotection for any residue coupled manually. A small number of resin beads were rinsed with CH2Cl2 and placed in a vial, followed by the addition of two drops of each of the three solutions below: The solution was then heated to ca. 100 °C for five minutes. A successful coupling gave no change in the colour of the beads, whereas bright blue beads demonstrated a successful deprotection.

Method D: N-terminal acetylation
Acetic anhydride (10 equiv.) and DIPEA (10 equiv.) were dissolved in DMF (1 mL) and the solution was transferred to the resin. After 2 h, the resin was drained, washed with DMF (3 × 2 mL × 2 min) and successful capping determined by a negative colour test (Method C).

Method E: N-terminal FITC labelling
Fluorescein isothiocyanate (6 equiv.) was dissolved in 12:7:5 pyridine:DMF:CH2Cl2 and the solution transferred to the resin in the dark. After 16 h, the resin was washed with DMF (3 × 2 mL × 2 min) ahead of cleavage and deprotection. The solvents were of anhydrous grade and distilled before use.

Method 5: Cleavage and deprotection of Rink amide MBHA resin
After elongation and N-terminal capping was complete, the resin was washed with CH2Cl2 ( Peptides were purified by preparative UV-or MD-HPLC using a Jupiter Proteo or a Kinetex EVO C18 preparative column (reversed phase) on an increasing gradient of acetonitrile in water + 0.1% formic acid (v/v) at a flow rate of 10 mL min -1 . Crude peptides were dissolved in H2O or DMSO at an approximate concentration. Purification runs injected a maximum of 5 mL of crude peptide solution and were allowed to run for 30 min, with acetonitrile increasing at a stated gradient. In regards to UV-HPLC, the eluent was scanned with a diode array at 220, 210 and 280 nm. In regards to MD-HPLC the mass directed chromatography software Masshunter by ChemStation (Agilent) was used to allow the collection of the desired peptide by mass, with the eluent split into an Agilent 6120 Quadropole LCMS which triggers collection of eluent at a programmed m/z. Fractions containing purified peptide were combined, concentrated in vacuo and lyophilised.

Isothermal titration calorimetry (ITC)
ITC experiments were carried out using Microcal ITC200i instrument (Malvern) at 25°C in 20 mM Tris, 150 mM NaCl, pH 7.5 buffer. ShankPDZ was dialysed against the buffer prior to experiment, lyophilized peptides were dissolved in the same buffer. 150 M shankPDZ was present in the cell and titrated with 1.4-2 mM peptide solutions loaded into the syringe using 20, 2 uL injections with 120 s spacing between the injections for 20 injection. Heats of peptide dilution was subtracted from each measurement raw data. Data was analysed using Microcal Origin 8 and fitted to a one-binding site model.

Fluorescence anisotropy
Fluorescence anisotropy assays were performed in 384-well plates (Greiner Bio-one). Each experiment was run in triplicate and the fluorescence anisotropy measured using a Perkin Elmer EnVisionTM 2103 MultiLabel plate reader, with excitation at 480 nm (30 nm bandwidth), polarised dichroic mirror at 505 nm and emission at 535 nm (40 nm bandwidth, S and P polarised) for FAM and FITC labelled peptides.
The excitation and emission wavelength for Bodipy labelled BAK peptide were set to 531nm and 595nm respectively. The excitation and emission wavelength for Fitc labelled BID peptide were set to 490nm and 535nm respectively.

Direct binding assays
Fluorescence anisotropy direct titration assays were performed with protein concentration diluted over 16-24 points using ½ dilutions. 20 µL of buffer were first added to each well. 20 µL of a solution of protein was added to the first column. The solution was well mixed and 20 µL was taken out and added to the next column and so on. This operation consists on serial dilution of the protein across the plate.
Finally, 20 µL of tracer was added to the wells. For control wells, the tracer peptide was replaced with an identical volume of assay buffer and plates were read after 1 hour. 150nM, respectively. For control wells, the tracer peptide was replaced with an identical volume of assay buffer. The total volume in each well was 60 μL. Plates were read after 1 h (and 16h for Bcl-XL assays) of incubation at room temperature.

Data analysis
The data from both the P (perpendicular intensity) and S (parallel intensity) channels, resulting from this measurement and corrected by subtracting the corresponding control wells, were used to calculate the intensity and anisotropy for each well following Equations 1 and 2: Where I is the total intensity, G is an instrument factor which was set to 1 for all experiments and r is the anisotropy. The average anisotropy (across three experimental replicates) and the standard deviation of these values were then calculated and fit to a sigmoidal logistic model (Equation 3) using OriginPro 9.0 which provided the IC50 and error values.
Samples were prepared in 10 mM Na-phosphate buffer pH 7.5 at 100 M peptide concentration. CD spectra were acquired from 185-260 nm using 1 nm step size, 2nm bandwidth at 25 °C in a 1 mm path length quartz cuvette. Samples were measured twice and averaged, buffer baseline spectrum was substracted from each measurement, and data was converted to mean residue ellipticity.
Tabulated HRMS data of synthesised peptides are shown in below. Peptide identity was confirmed by the inspection of multiple charge states and are quoted as the monoisotopic peak for the Expected (Exp d ) and Observed (Obs d ) masses.

SIMS and Variant Peptides
Tabulated HRMS data of synthesised peptides are shown in below. Peptide identity was confirmed by the inspection of multiple charge states and are quoted as the monoisotopic peak for the Expected (Exp d ) and Observed (Obs d ) masses.