A Switch between Two Intrinsically Disordered Conformational Ensembles Modulates the Active Site of a Basic-Helix–Loop–Helix Transcription Factor

We report a conformational switch between two distinct intrinsically disordered subensembles within the active site of a transcription factor. This switch highlights an evolutionary benefit conferred by the high plasticity of intrinsically disordered domains, namely, their potential to dynamically sample a heterogeneous conformational space housing multiple states with tailored properties. We focus on proto-oncogenic basic-helix–loop–helix (bHLH)-type transcription factors, as these play key roles in cell regulation and function. Despite intense research efforts, the understanding of structure–function relations of these transcription factors remains incomplete as they feature intrinsically disordered DNA-interaction domains that are difficult to characterize, theoretically as well as experimentally. Here we characterize the structural dynamics of the intrinsically disordered region DNA-binding site of the vital MYC-associated transcription factor X (MAX). Integrating nuclear magnetic resonance (NMR) measurements, molecular dynamics (MD) simulations, and electron paramagnetic resonance (EPR) measurements, we show that, in the absence of DNA, the binding site of the free MAX2 homodimer samples two intrinsically disordered conformational subensembles. These feature distinct structural properties: one subensemble consists of a set of highly flexible and spatially extended conformers, while the second features a set of “hinged” conformations. In this latter ensemble, the disordered N-terminal tails of MAX2 fold back along the dimer, forming transient long-range contacts with the HLH-region and thereby exposing the DNA binding site to the solvent. The features of these divergent substates suggest two mechanisms by which protein conformational dynamics in MAX2 might modulate DNA-complex formation: by enhanced initial recruitment of free DNA ligands, as a result of the wider conformational space sampled by the extended ensemble, and by direct exposure of the binding site and the corresponding strong electrostatic attractions presented while in the hinged conformations.

after the addition of 15% glycerol to avoid crystallization.
The final total protein concentration was 0.4 mM for NMR and 0.1 mM for EPR experiments.
EPR experiments. Continuous Wave (CW) X-Band measurements have been recorded using an X-band Bruker E500 instrument (9.4 GHz, TE 012 resonator). All CW experiments were recorded at room temperature; a 1 mm capillary sealed at one side has been inserted in a 4 mm tube. Data were then recoded with a modulation frequency of 100 KHz and modulation amplitude of 1.0 G. Pulsed EPR experiments at Q-band were performed on a Bruker ELEXYS E-580 spectrometer with a SuperX-FT microwave bridge and a Bruker ER EN4118X-MD4 dielectric resonator. Cryogenic temperatures (50 K) were achieved by an Oxford flow cryostat. The field-swept EPR spectra were recorded by electron spin echo (ESE) detection; electron-spin-echo (ESE)-detected EPR experiments were carried out with the pulse sequence: π/2τ-π-τ-echo. The microwave pulse lengths used were t π/2 = 8 ns and t π = 16 ns at a  value of 204 ns. A two-step phase-cycle was applied to remove all unwanted echoes. PELDOR / DEER measurements were performed at 50 K on a Bruker ELEXSYS E580 Q-AWG (arbitrary waveform generator) pulse Q-band spectrometer equipped with a 50 W amplifier. A 4-pulse sequence with Gaussian, non-selective observer and pump pulses of 8 or 16 ns length with 55 MHz frequency separation was used. An eight-step phase cycling was performed together with 0-π phase cycling to remove unwanted effects of running echoes from the DEER trace. The evaluation of the DEER data was performed using DeerAnalysis2018. 2 The background of the primary DEER traces was corrected using exponential functions with homogeneous dimensions. A model-free Tikhonov regularization analysis was employed to extract distance distributions from the background corrected form factors. [3][4][5] The home-written MatLAB routine (script) for fitting of the room temperature CW EPR spectra can be found in the Supporting Information. NMR experiments. 1 H-15 N transverse relaxation optimized spectroscopy (TROSY) for PRE measurements was recorded at 20 °C using a Bruker HDIII wide-bore 800 MHz spectrometer. Spectra were recorded in the States-TPPI/PFG mode for quadrature detection with carrier frequencies for 1 H N and 15 N of 4.73 and 120.0 ppm, respectively. The samples contained 0.4 mM MAX, 25 mM MES, and 25 mM NaCl (pH All NMR spectra were processed and analyzed using NMRPipe and SPARKY. 6,7 A squared and 60° phase-shifted sine bell window function was applied in all dimensions for apodization. Time domain data were zero-filled to twice the data set size, prior to Fourier transformation. 1 H-15 N cross peak assignments were obtained from the biological magnetic resonance data base (BMRB) entry 5956 and the work by Sauvé et al. 8,9 Full NMR spectra are shown in the Supporting Information Fig. S11a).

Molecular Dynamics.
We performed molecular dynamics (MD) simulations using GROMACS 2019.1. 10 The solution structure of a DNA-free MAX 2 homodimer (PDB ID: 1R05) was used in our simulations to build our initial atomistic model. The complex was confined with 37199 water molecules in a dodecahedral box so that the edges of the box were always at least 1 nm away from the complex. The structure was electro-neutralized with Cl − ions, and Na + Cl − ions were added to achieve a salt concentration of 25 mM as used in the experiments. NVT S3 equilibration was performed at P = 1 bar and T = 310 K in a constrained box for 80 ps, with a step of 2 fs using the Verlet cutoff scheme set to 1.2 nm. The modified Berendsen thermostat temperature scheme coupled the protein and non-protein thermostats. Subsequently, NPT equilibration was performed under similar conditions. Nosé-Hoover thermostat coupling was used, which allows wide fluctuations and produces more natural dynamics than the Berendsen coupling. The extended simple point charge water model (SPC/E) and the AMBER03 protein force field 11 were used. The MD simulations continued NPT equilibration under unconstrained conditions. The MD trajectories were sampled every 2 ns, for a total simulation time up to 200 ns.
The extracted PRE signal suppression ratios V for an entire trajectory is shown in the Supporting Information Fig. S11b.
To simulate the distance distribution with the spin-labeled R5C mutant, the non-native protein was modelled using the YASARA software package. The AMBER 03 force field was employed with periodic boundary conditions. 12 Non-bonded interactions were cut off at 10.5 Å.
Long-range Coulombic interactions were treated by a smoothed particle-mesh Ewald method. 13 Non-native amino acids were built using YASARA and semi-quantum-mechanically parameterised (YAPAC-AM1). The modelled structures were subject to energy minimisation in vacuum, subsequently randomly placed in the simulation box and solvated by water at pH 7.4, charge neutralised by addition of 1 % NaCl, and again minimised (steepest descent minimisation followed by simulated annealing). The chosen time increment was 2 fs. An MD trajectory of 50.0 ns length was accumulated. Intermolecular forces were recalculated at every second simulation sub-step. Temperature rescaling was employed with a set-temperature of 37°C. The box dimensions (dodecahedral of 92 Å side length) were controlled so as to yield a solvent pressure of 1 bar. Snapshots of the simulations were taken every 10 000 fs.

Supplementary MD analysis
To further corroborate that the two sub-ensembles depicted in Fig. 1 of the main text are indeed well-distinct, we performed an additional analysis of the MD data using the center of mass of MAX 2 as reference point. Figure S1. shows a scatter plot of distances r  between the C  positions of residues R5 in both subunits of MAX 2 (yellow dots in the insert in Fig. S1a) vs. distances r(5-M) between the C  position of residue R5 of one chain and the center of mass of MAX 2 (red dot in the insert in Fig. S1a). These distances were extracted from the trajectory underlying Fig. 1 of the main text.
The contour levels in Fig. S1 indicate clusters of data obtained through fitting a Gaussian mixture distribution to the data (implemented in the MATLAB 'fitgmdist' function). Fig. S1 a and b, respectively, represent data for the two different subunits of MAX 2 . i.e. two the different r(5-M) vectors. It can clearly bee seen that the hinged sub-ensemble (annotated as cluster 1) is well distinct from the data points representing the extended ensemble (annotated as cluster 2). This indicates that only a small number of intermediate states are sampled during the transition between hinged and extended conformations and that the two states can be considered as distinct sub-ensembles within the conformational space of MAX 2 .
Additionally, the reference to the distance to the center of mass, i.e. to r(5-MC) shows that short distances r  indeed correlate with proximity of residues R5 and the center of the protein. In other words, it can be excluded that the observed short r(5-5) distances stem from transient encounters of the two NTDs in their extended states. Figure S1. Cluster analysis of the MD data shown in Fig. 1 in the main text for the two sub-chains (shown in a) and b), respectively). The hinged and extended sub-ensemble constitute well-distinct clusters. The distance between C  positions of residues R5 is denoted r .
The distances between the C  position of residue R5 and the center of mass of MAX 2 are denoted as r(5-M).

Supplementary Information (EPR)
To support the CW EPR data of the main text further assays involving three different mutants of MAX 2 were performed; in addition to R5C-R5C, R5C-G35C and R5C-R55C mutants have also been investigated. Experimental spectra and best fits for CW data of all mutants are shown in Fig. S1-S3. The three mutants exhibit similar trends, i.e. good fits could be obtained assuming a superposition of a faster and a slower correlation time  c of the spin labels. The fast species clearly dominates according to the fit results and as summarized in Table S4.
Importantly, all three samples yielded a similar contribution of the slow component of ca. 15 %.
Upon DNA binding the CW spectral features are significantly affected (Table S1 and Fig. S4): the conformational flexibility is reduced and the ratios between the two main components of the spectrum shift towards the slow component (Table S4). Indeed, the presence of the DNA ligand inverts the ratios of the fast and slow-motion components for all three mutants.
CW spectra have been fitted by home-written MatLAB routines supported by the EasySpin package as detailed in the main text. To crosscheck the choice of two correlation times, additionally, a fit of the CW EPR spectra mutant R5C-R55C has been carried out assuming three components (Supporting information Fig. S5). The improvement achieved by adding a third conformer to the fitting procedure was only marginal as indicated by the final RMSD values (0.0071 vs. 0.0080). Thus, we chose a minimized number of species (i.e. two components) for the fitting procedure to remove unnecessary degrees of uncertainty.
In order to also validate the results obtained for the R5C-R5C mutant by DEER EPR, the additional two mutants were also used for a PELDOR/DEER analysis. As for the R5C-R5C mutant, the DEER/PELDOR form factor (dipolar evolution function) exhibits a first maximum of the cosinusoidal form factor around 1.5 µs in the free state of the dimeric structure (Supporting information Fig. S7 for the R5C-G35C); upon binding to EBOX DNA, these maxima diminish, due to the presence of another component with a second maximum at longer times