A General Small-Angle X ‑ ray Scattering-Based Screening Protocol Validated for Protein − RNA Interactions

: We present a screening protocol utilizing small-angle X-ray scattering (SAXS) to obtain structural information on biomolecular interactions independent of prior knowledge, so as to complement a ﬃ nity-based screening and provide leads for further exploration. This protocol categorizes ligand titrations by computing pairwise agreement between curves, and separately estimates a ﬃ nities by quantifying complex formation as a departure from the linear sum properties of solution SAXS. The protocol is validated by sparse sequence search around the native poly uridine RNA motifs of the two-RRM domain Sex-lethal protein (Sxl). The screening of 35 RNA motifs between 4 to 10 nucleotides reveals a strong variation of resulting complexes, revealed to be preference-switching between 1:1 and 2:2 binding stoichiometries upon addition of structural modeling. Validation of select sequences in isothermal calorimetry and NMR titration retrieves domain-speci ﬁ c roles and function of a guanine anchor. These ﬁ ndings reinforce the suitability of SAXS as a complement in lead identi ﬁ cation. due either to lack of nucleotide interactions or ﬂ exibility of protein contacts in solution. The G-anchor sits on site 4, but also accepts U. Sites 3 and 6 exert U-selection via simultaneous hydrogen-bonding and π -stacking interactions. 19,20 Site 10 exerts minor pyrimidine preference through steric arrangements. Detailed methods, a ﬃ nity K D and e ﬃ cacy K Deff from ITC and SAXS measurements, SAXS measurements for three of the measurements, K Deff histogram of RNA binding screens, RNA binding screens involving Sxl 10GS across three ESRF Grenoble sessions, bu ﬀ er-subtracted SAXS curves for RNA sequences, isothermal titration calorimetry experiments, summary of ensemble-based modeling using EOM from ATSAS, similarity of SAXS curves between three synchrotron sessions at ESRF Grenoble, NMR chemical shift perturbations of Sxl 10GS − RNA interactions, 1 H − 15 N HSQC spectra of NMR titrations, summary of SAXS data for two RNA sequences that do not perform well, and theoretical modeling of the Sxl − RNA interaction (PDF)

S mall angle X-ray scattering (SAXS) intensities from dilute macromolecular solutions contain global structural information averaged over orientations and types of components in the sample, 1 which, in the absence of interactions can be expressed as a linear combination (1) where v i and I i (q) are the volume fraction and intensity of the ith component, respectively, q = 4π sin θ/λ is a function of scattering angle 2θ, and λ is the photon wavelength. This linear decomposition property of SAXS enables its usage as a structure-based screening method, where molecular interactions can be detected via the associated change in scattering intensities. Departure from the sum of component signals points to binding, oligomerization, or ligand-induced conformational changes. SAXS is far more sensitive to changes in the global structure than to changes due to binding of small ligands alone, and this distinguishes SAXS-based screens from affinity-based techniques such as isothermal titration calorimetry (ITC). The combination of structural and energetic approaches can thus distinguish a binding event from downstream impacts on macromolecular structure and dynamics. This possibility is tantalizing for ligand identification as either agonists or antagonists but also for interactions between proteins and nucleic acids.
The identification of a high affinity RNA motif with a welldefined binding configuration is a prerequisite for structure determination of resulting protein-RNA complexes via X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. However, RNA-binding proteins (RBPs) often exhibit both specific and nonspecific binding depending on the oligonucleotide sequence. 3 This complicates the identification of cognate RNA sequences for RBPs. Despite recent methodological advances, 4−7 RNA sequence identification also suffers a number of limitations: (i) maximal sequence length of nine nucleotides, 5−7 (ii) expensive RNA library generation and/ or sequencing steps, and (iii) limited transferability of in vivo motifs to in vitro conditions due to potential cooperativity with other RBPs in the live cell. Utilizing synchrotron beamlines for screening mitigates the costs of expensive RNA synthesis, where measurement automation 8 further permit a throughput and reduced sample requirement that is more efficient than alternative structural methods such as NMR, 9 and comparable to the recently published switchSENSE. 10 While SAXS-based ligand identification has been previously reported, 11,12 we sought to create a generalizable analysis workflow that can be adapted to the wide range of systems measurable by SAXS. The resulting workflow for SAXS-based screening (SAXScreen) ranks putative interactions based on intensities alone, but can be further augmented by field-specific knowledge and software. This method is applied to describe the affinity profile and binding mechanisms of the two RRM-domain protein Sex-lethal (Sxl) 13 Figure 1. Illustration of the SAXScreen workflow on a single screening against 13 RNAs at the European Synchrotron Radiation Facility (ESRF) beamline BM29. Raw scattering intensities in (a) of the protein and RNA mixtures plus their respective buffers can be used directly to compute χ lin . To compute standard parameters, such as V C and R g , buffer subtraction is performed using calculated protein−RNA addition ratios (b), and then smoothed using AUTOGNOM of ATSAS2.8 2 to reduce noise and remove low and high-q artifacts (c). These three parameters measuring individual aspects of the SAXS curve can be formulated as titration curves (d), which can be fitted simultaneously to derive K D eff (e). Variations in K D eff reflect the effect of binding on different parameters. An alternative method to visualize variations is to compute the pairwise reduced-χ between buffer-subtracted I(q) that share the same titration ratio (f).  Figure S7. Note that misclassification can occur due to sample contamination (e.g., [# G 2]UGU 8 in Figure S3). (d) Binding mechanism suggested by ensemble-based SAXS modeling fitted against SAXS curves of apo SXL 10GS , and mixtures at 1:1 protein−RNA ratios. Optimized R g distributions of three ensembles are included, with full results recorded in Figure S6. (e) Schematic of proposed site selectivity based on combined structural knowledge. The majority of sites enforce little selectivity due either to lack of nucleotide interactions or flexibility of protein contacts in solution. The G-anchor sits on site 4, but also accepts U. Sites 3 and 6 exert U-selection via simultaneous hydrogen-bonding and π-stacking interactions. 19,20 Site 10 exerts minor pyrimidine preference through steric arrangements.

ACS Combinatorial Science
Letter based on its known interaction with G-substituted poly-U rich oligonucleotides, 14,15 thus illustrating SAXScreen's role as a hypothesis generator to identify cognate RNA motifs and their potential binding mechanisms.
An illustrated example of the SAXScreen workflow is shown in Figure 1, containing results of a short synchrotron session. In biological SAXS, the raw scattering intensity I(q) contains contributions from sample molecules I S as well as their buffer components I bS . When interactions between a large receptor R and a small ligand L obey the simple equilibrium relation R + L ⇌ RL, mixtures between R and L can be decomposed into five components: the two unbound molecules, their respective buffers, plus the bound complex RL. Since all components except I RL can be independently derived by measurement in isolation (Figure 1a), I RL can be formulated as a perturbation from the linear sum (eq 1) and quantified via weighted χminimization The stepwise formation of RL in a titration can therefore be modeled by changes in χ lin , until saturation to an unknown maximum value max χ lin . When well-sampled with observed ligand saturation, this dosage response curve can be fitted to yield the dissociation constant K D .
To reduce RNA synthesis costs, we assume tentatively that all ligands share a comparable binding mechanism and thus the same max χ lin . This assumption captures ligand specific influences upon max χ lin and other deviations from two-state binding within a modified affinity K D eff (see Supporting Information), which expresses the relative efficacy of a ligand to the most effective member of a screen under nonsaturation conditions. This allows all dosage response curves to be fitted simultaneously, which produces a final ligand ranking based on K D eff . Aside from χ lin , specific characteristics of biomolecular SAXS, such as particle volume PV, radius of gyration R g , and volume of correlation V C, 16 can also be fitted. Here, automated processing of buffer subtracted curves is utilized to reproducibly derive titration curves (Figure 1b−e), resulting in comparable qualitative K D eff rankings. In addition, SAXS curves at equivalent titration points can be directly compared via the pairwise reduced-χ (Figure 1f), which can be further averaged across all points to compute a net deviation between two titrations for clustering. These matrices are analogous to the structural similarity map adopted by Hura et al. 17 and correlation maps adopted by Franke et al. 18 To validate the SAXScreen procedure, we investigated the RNA interactions of a Sxl mutant containing a 10-residue GSlinker extension (Sxl 10GS ) across multiple synchrotron sessions. This linker mutation was added to enhance the magnitude of SAXS changes upon RNA-binding, noting that equivalent results have been obtained for wildtype Sxl ( Figure S1). The full data of three sessions covering 35 titrations are presented in Figures S1 and S2, while we summarize the main Sxl 10GS findings in Figure 2.
Using the χ lin metric, the closely related U 10 and UGU 8 exhibit widely disparate K D eff , contrasting with the small 2-and 10-fold differences detected via ITC and NMR (Table S1 and Figure S4). This arises due to a difference in preferred binding stoichiometry: when ensemble modeling 21 is applied to SAXS curves at 1:1 protein-RNA ratios ( Figure S5), U 10 replicates are consistent with 1:1 complexes while two of three UGU 8 curves are instead consistent with 2:2 complexes. All optimized ensembles are dominated by RRMs in close-contact, which is not dependent upon the presence nor sequence identity of added RNA ( Figure   S6). Notably, both of the above findings of 2:2-stoichiometry 22 and transient inter-RRM interactions 23 have been previously reported in tandem RRM proteins.
A broader screen further confirms that Sxl 10GS 's stoichiometric preference is RNA sequence dependent. The clustering of titration responses (Figure 2c) identifies four major clusters, three of which correspond to the preferred stoichiometry seen in ensemble-modeling and one corresponds to short sequences that are unable to induce significant SAXS perturbations. Here, the additional active cluster corresponds to sequences such as U 5 C 5 that are consistent with both 1:1 and 2:2-stoichiometry, suggesting that both species are present in solution. Reading in closer detail, the categorizations and measured K D eff together reveal the impact of the RRM2 G-anchor motif and influence of G-positioning: The location of guanine at the second, third and eighth positions specifically encourage 2:2-binding, while the ninth position is exclusively 1:1. Meanwhile, C 5 U 5 is 2:2-binding, but U 5 C 5 /UGU 3 C 5 are mixed. Given that during crystallization RRM2 orients upon the 5′ and RRM1 upon the 3′, 19 these results suggest that RRM2 significantly outcompetes RRM1 in poly-U interactions, thus dominating the sequence selection.
To validate our assumptions based on SAXScreen, we performed NMR titrations for three pairs of sequences U 10 vs UGU 8 , U 7 GU 2 vs U 8 GU, and U 5 C 5 vs C 5 U 5 ( Figures S8 and S9). The four poly-U sequences exhibit similar 1 H− 15 N chemical shift perturbations (CSPs) across both RRM1 and RRM2, indicating that the local RNA contacts are retained regardless of stoichiometry. Whereas, the presence of poly-C lead to selective and significant reduction of CSPs relative to poly-U sequences ( Figure S10): U 5 C 5 CSPs indicate preferential reduction of canonical RRM1 contacts, while C 5 U 5 CSPs indicate elimination of inter-RRM contacts. These reinforce the SAXS results, and further suggest that the 2:2 stoichiometry complexes containing poly-C are supported by nonspecific charged interactions of RRM1 rather than specific interactions at the canonical sites. We thus summarize the binding model and site-selectivity in Figure  2d and e).
The role performed by SAXScreen in the study of Sxl 10GS involves exploration of RNA sequence space and classification of binding modes and affinities for further investigation. It successfully confirms optimal sequence length and composition, including G-anchor positioning and anti-C selectivity. The use of χ lin as the primary metric here shows that structure-specific knowledge, although useful, is not required to derive valuable information, and highlights possible adaptation to other applications, such as whole-cell SAXS. 11 We caution however that χ lin can suffer from spurious overfitting, for example, in cases where binding-associated perturbations lead to changes that emulate the differences in buffer scattering, or when parasitic scattering or aggregation have not been fully eliminated. The former is illustrated in Figure S11, where A 10 and UGU 4 titration curves are artificially suppressed in χ lin but not V C . We note that structural quantities can themselves suffer from lack of sensitivity in cases where the overall binding process do not result in a net change, for example, an internal domain reorientation may result in insignificant R g changes. A cross-validation across several metrics is therefore always recommended, and this is achieved here by monitoring R g , V C , and χ lin together. Such redundant measurements assist in identification of spurious artifacts such as sample contaminations and aggregation.
Several directions are pertinent for ongoing development of SAXScreen. Although practical expenses and Sxl's sequencedependent oligomerization hinder us from measuring RNA−

ACS Combinatorial Science
Letter excess conditions and thus evaluate absolute affinities, we can estimate the theoretical sensitivity of this ∼20 kDa protein to span 0.1−1000 μM at 1 mg mL −1 concentrations. The detection limits for discovering minor populations in SAXS is dependent upon the extent of conformational change and the signal-to-noise ratio of beamline setups, while the applicable affinity range is dependent upon available molar concentrations as dictated by the mass concentrations amenable to scattering measurements. In this context, our protocol represents a balance between throughput and ability to discriminate binding processes by measuring eight titration points below ligand saturation. More measurements per titration are likely required to reproducibly predict binding affinities, as evidenced by sporadic deviations between replicates in our screen (cf, Figure S1 and S2). Recommendations and additional considerations will be discussed in the Supporting Information.
The optimized protocol at the ESRF and PETRA-III synchrotron beamlines allows for 10 3 measurements per day, corresponding to 10 2 titrations depending on the desired precision of affinity estimation. An increase of 1−2 orders of magnitude is expected in the near future with improvements to beamline setups aimed at reducing downtime between measurements. We also note that similar automated setups are available at synchrotron sites around the world, 24−27 and can be utilized for screening. This puts SAXScreen close to the throughput of biophysics-based screens and can also be used as a source of complementary information, where simultaneous affinity and structural information is required to further refine the data from higher throughput techniques or from in vivo sources. In particular, SAXS-based screening is expected to be particularly useful for biochemical processes associated with measurable structural changes.

■ EXPERIMENTAL PROCEDURES
We describe below an essential summary of experimental and analysis protocols, and refer readers to the full description in the Supporting Information.
Sample Preparation. The preparation of wildtype Sxl and Sxl 10GS follows previously published protocols, 20,28 noting that the mutant plasmid contains an additional 10-residue stretch (GGSGSGGGGS) after position 204, between two glycines in the interdomain-linker ARPGGESIK. RNA oligonucleotides were ordered from commercial suppliers Microsynth and IBA to specifications of highest purity, separately for each synchrotron session, and resuspended in Milli-Q water to the required concentration for SAXS and NMR experiments.
SAXS Measurements. Preliminary and production screening sessions were conducted at Hamburg P12 and ESRF BM29 beamlines, respectively, using the automated sample changer setup 8 to simplify measurement on 96-well plates. The plate geometry was leveraged to conduct 6 or 8-point titrations spanning apoprotein to 1:1 mixtures, at 50 μM protein concentration and 40 μL final volume. Exposure times was left near recommended values specified by the site, noting that no radiation damage was observed throughout screening. The frameaveraged scattering intensities were taken as the starting point for automated analysis in SAXScreen below, discarding data outside 0.3 ≈ q ≈ 3.0 nm −1 to universally exclude aggregation, parasitic scattering, and high-noise regions.
Ensemble-Based Modeling. The ensemble-optimization method (EOM) software 21 was used to derive structural models consistent with experimental scattering. The crystallographic 1:1-Sxl complex 19 as assumed to represent canonical binding, with the attachment of RNA to RRM domains varied to represent different encounter complexes. Both 1:1 and 2:2 stoichiometry models were proposed, and initial ensembles of 10 4 and 10 6 (respectively) were generated for fitting against SAXS curves at 1:1 protein−RNA ratios.
SAXScreen Analysis. Buffer subtraction of raw intensity curves was carried out manually to take into account different input ratios of protein buffer versus Milli-Q water used to resuspend RNA, using the averaged buffer-water difference signal across the beamtime session as the correction curve.
Hierarchical Clustering. The pairwise reduced-χ between two SAXS curves is defined as after minimization with respect to two free parameters F and C to mitigate concentration and subtraction differences. The instance metric required for hierarchical clustering of titrations is produced by calculating χ ab for all pairs of SAXS curves at identical protein−RNA ratios, and then defining distance d ab between two titrations as the symmetrized average χ ab across all protein−RNA ratios as follows: Analogous clustering procedures can be carried out using information from structural similarity maps 17 and correlation maps, 18 which removes the bias of χ ab toward high-intensity regions with lower signal-to-noise ratios.
Computation of I(q) Features. The ATSAS package 29 version 2.8 has been utilized to derive smoothed scattering curves for computation of I(0), R g , V C , and unscaled particle volume PV. χ lin is derived directly from the raw scattering intensities using the measured RNA curve, the averaged apoprotein curve, and the averaged buffer and water curves as the four component curves in eq 2. This value is again minimized over free parameters a, b, c, and d, which are allowed to be negative so as to conduct buffer subtraction. Uncertainties are computed by 100-trial repetition with added Gaussian noise using δI(q).
ITC Measurements. The energetics of RNA oligonucleotides binding to Sxl 10GS was studied using a MicroCal iTC200 System (Microcal). All proteins were dialyzed overnight against ITC buffer comprising 10 mM sodium phosphate, 50 mM sodium chloride, and 1 mM DTT at pH 6.5. Titrations were performed at 25°C by stepwise addition of 200−500 μM RNA solution to the 10−20 μM protein solution in the cell. Data were corrected for the heat of dilution and analyzed using the MicroCal OriginTM software package.
NMR Titration Experiments. Replicate samples are prepared under the same conditions as SAXS except that 15 N-labeled Sxl is used to measure RNA-binding via chemical shift perturbations (CSP). The NMR buffer comprises 10 mM sodium phosphate, 50 mM sodium chloride, and 10 mM DTT at pH 6.5. 1 H− 15 N HSQC spectra have been recorded on a Bruker Avance III with a magnetic field strength corresponding to a proton Larmor frequency of 700.303 MHz, starting at pure Sxl 10GS with stepwise addition of corresponding RNA. See Supporting Information for a detailed titration protocol.
Plotting Software. Visualization of crystal structures was done in VMD-1.9.2 30 and rendered with Tachyon. All HSQC spectra presented here have been plotted using python modules nmrglue and matplotlib. Most graphs have been visualized in xmgrace, while χ-matrices have been visualized in python or gnuplot-5.
Code Availability and Maintenance. The SAXScreen workflow codebase used for this work has been made available on GitHub under the MIT license: https://github.com/zharmad/SAXScreen. Data Availability. The titration data over SAXS, ITC, and NMR are available from the authors upon request. Four representative SAXS data have been deposited in SASBDB with accession codes SASDDX4, SASDDY4, SASDDZ4, and SASDD25 containing respectively apoprotein curves for Sxl, Sxl 10GS , 1:1 mixture of Sxl 10GS :[# G 2]U 8 GU, and 1:1 mixture of Sxl 10GS :[# G 3]UGU 8 .

ACS Combinatorial Science
Letter ■ ASSOCIATED CONTENT

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscombsci.8b00007.
Detailed methods, affinity K D and efficacy K D eff from ITC and SAXS measurements, SAXS measurements for three of the measurements, K D eff histogram of RNA binding screens, RNA binding screens involving Sxl 10GS across three ESRF Grenoble sessions, buffer-subtracted SAXS curves for RNA sequences, isothermal titration calorimetry experiments, summary of ensemble-based modeling using EOM from ATSAS, similarity of SAXS curves between three synchrotron sessions at ESRF Grenoble, NMR chemical shift perturbations of Sxl 10GS −RNA interactions, 1 H− 15 N HSQC spectra of NMR titrations, summary of SAXS data for two RNA sequences that do not perform well, and theoretical modeling of the Sxl−RNA interaction (PDF)