Sequence-Selective Minor Groove Recognition of a DNA Duplex Containing Synthetic Genetic Components

The structural basis of minor groove recognition of a DNA duplex containing synthetic genetic information by hairpin pyrroleimidazole polyamides is described. Hairpin polyamides induce a higher melting stabilization of a DNA duplex containing the unnatural P·Z basepair when an imidazole unit is aligned with a P nucleotide. An NMR structural study showed that the incorporation of two isolated P·Z pairs enlarges the minor groove and slightly narrows the major groove at the site of this synthetic genetic information, relative to a DNA duplex consisting entirely of Watson−Crick base-pairs. Pyrrole-imidazole polyamides bind to a P·Z-containing DNA duplex to form a stable complex, effectively mimicking a G·C pair. A structural hallmark of minor groove recognition of a P·Z pair by a polyamide is the reduced level of allosteric distortion induced by binding of a polyamide to a DNA duplex. Understanding the molecular determinants that influence minor groove recognition of DNA containing synthetic genetic components provides the basis to further develop unnatural base-pairs for synthetic biology applications.


INTRODUCTION
Fundamental to all living organisms is the storage and transmission of genetic information in the form of a fourletter nucleic acid code. DNA is the predominant genetic repository used for this purpose, with information typically embedded in an anti-parallel B-type duplex where A pairs with T, and G pairs with C. Whilst Nature uses two sets of Watson-Crick base-pairs, expanding this code by incorporating synthetic variants offers the opportunity to develop artificial genomes, which have the potential to perform novel functions not possible with natural systems [1][2][3] .
Rearranging the hydrogen-bond donor (D) and acceptor (A) groups is one design approach to expand the information content in DNA beyond Nature's four letter alphabet 4 . The P•Z pair is the most prominent exemplar of these artificially-expanded genetic information systems (AEGIS), where the mode of selective recognition is via a three hydrogen-bond arrangement not present in either a G•C or an A•T pair (Figure 1a) [5][6][7][8] . This strategy is distinct from other artificial base-pairs (e.g., Romesberg's dNAM•dTPT3 9 pair and Hirao's Ds•Pa [10][11] pair) as P•Z pairing is more closely aligned with Watson-Crick-like hydrogen-bonding rather than shape complementarity 8,[12][13][14][15] .
From a topological perspective, the minor groove hydrogen-bond profile of a P•Z pair is equivalent to that of G•C (Figure 1a).
Recent structural studies have revealed the incorporation of AEGIS results in a transition from conventional B-type to an A-type duplex when the density of P•Z in a duplex is increased from one to six consecutive pairs [7][8] . This suggests that AEGIS could impact auxiliary molecular recognition processes such as groove recognition, which is essential for transcriptional initiation. At present, the molecular basis for sequence-selective recognition of synthetic genetic components is not known. Herein, we report the first NMRbased structural analysis of a DNA duplex containing AEGIS base-pairs and show how this synthetic genetic information is selectively targeted by a minor groove binding PIP (Figure 1b) 38,[41][42] .

Experimental approach
The objectives of this study were to understand how a P•Z base-pair incorporated into a DNA duplex influences (i) double-helix structure in solution, and (ii) minor groove recognition by PIPs (PA1-3, Figure 2). The well-established recognition rules of PIPs for Watson-Crick pairs 37-38, 40, 43-47 rendered PIPs excellent candidates to investigate whether the N3 atom of an N-methyl imidazole (Im) unit hydrogenbonds with the exocyclic amine (N2) of P, much akin to hydrogen-bonding observed with the cognate N2 amine of G. Furthermore, 8-ring PIPs such as PA1-3 predictably bind to their target dsDNA sequences in a 'forward' orientation where the N-terminus of each PIP (in this study, all three PIPs contain an N-terminal Im8 unit) binds in a 5'→ 3' direction with respect to the DNA backbone 40,48 . All three PIPs in the series (i.e., PA1-3) bind to 7 base-pair dsDNA sequences. The PIPs vary in their recognition of the base pair in position X (5΄-WWGXWCW-3΄, where W = A/T), which aligns a Py2/Py7 (PA1), Py2/Im7 (PA2) and a Im2/Py7 (PA3) pairing at position X.
PA1 was chosen for NMR-based structural studies as it has a well-demonstrated high-affinity binding profile for the palindromic sequence 5΄-WWGWWCW 44, 49-51 , whereas PA2 and PA3 preferentially bind to 5΄-WWGGWCW and 5΄-WWGCWCW, respectively [52][53][54][55] . Introduction of a P•Z pair into position X•Y of a target DNA sequence (Figure 3a) would allow a greater understanding of how each Py and Im PIP pairing combinations influences duplex stabilization.

Im building blocks incorporated into polyamides preferentially hydrogen-bond with P nucleotides in P•Z-containing DNA duplexes
To gain insight into the duplex stabilization and sequence preferences of PIPs (PA1-3) in the presence of P•Z pairs relative to naturally-occurring Watson-Crick base-pairs, UV-vis melting experiments were conducted using duplexes DNA1-6 ( Figure 3a, Figures S1-6, Table S1) [56][57][58] . These duplexes were chosen in order to determine (i) if hairpin PIP pairings discriminate a P nucleotide over a Z in a P•Z pair, analogous to the preferential pairing of an ImN3 unit with G in a G•C base-pair (versus a C•G), and (ii) the relative differences in duplex stability of PIP binding to dsDNA when a G  Consistent with a lower binding determined for PA3 relative to PA2 52 , the extent of duplex stabilization is lower. Taken collectively, these studies reveal a preferential recognition mode of a PIP binding to a target dsDNA  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 sequence when an Im unit is aligned with a P/G nucleotide relative to Z/C. Furthermore, Im units show a higher duplex stabilization with a P•Z pair relative to a G•C in the same position.

NMR structural characterization of a DNA duplex containing unnatural P•Z base-pairs
Solution-based NMR studies were undertaken to explore how the incorporation of two P•Z base-pairs in a selfcomplementary dodecamer d(CGATPTAZATCG) 2 (DNA7) impacts duplex structure relative to a duplex incorporating two G  Figure S9). Stronger NOE intensities were also observed for all correlations between the nucleobase protons (PH8 and ZH4) and H2ꞌ compared with H3' (Figure S14) 59 . In addition, correlations between H1ꞌ and H2ꞌ/H2ꞌꞌ were observed in COSY and TOCSY spectra, which is consistent with all the deoxyribose sugars present in DNA7 and DNA8 adopting the C2ꞌ endo conformation 61 . Finally, the structural impact of incorporating a P•Z base-pair into a DNA duplex is evident by comparative analysis of the 31 P NMR spectrum of DNA7 relative to DNA8. Perturbations of the sugar-phosphodiester backbone in DNA7 is evident at the site of the P•Z pair (i.e., P5/Z8) and the flanking base pairs T6•A19 and T4•A21. (Figure S10). Whilst the data confirms that DNA7 adopts an overall B-type duplex, the P•Z pair imparts local distortions to the phosphodiester backbone relative to DNA8, which contains a G•C pair in the same position.
Two independent NMR-restrained molecular dynamic (MD) production runs in explicit solvent were obtained after a simulated annealing protocol for DNA7. The trajectories were analyzed separately and an ensemble of ten structures was obtained by combining the five most relevant geometrical conformations for each run ( Figure S15), whereas the ensemble for DNA8 was previously reported (PDB 5OCZ) 49 . NMR-restrained MD structures of DNA7 and DNA8 provided further insight into the structural impact of a P•Z pair relative to a G•C in the same position (Figure 4ab). Subtle but distinct differences in the geometry of the P•Z pairing (DNA7) relative to equivalent G•C pairs (DNA8) are present. In particular, the PO6-ZN6 hydrogen-bond (1.84 Å, Figure 4c) is slightly shorter than the equivalent GO6-CN4 (1.93 Å, Figure 4d). Our MD structures also suggest an average hydrogen-bond distance of 2.11 Å for the exocyclic ZN6 and the Z-NO 2 group, is consistent with previously observed distances for other 2-nitroanilines 60 .
Whilst our MD calculations show that DNA7 adopts an overall B-type duplex (Figure 4a-b), there is a change in the shearing, stretching and propeller twist at the site of the P•Z pair in DNA7 relative to the cognate G•C pair in DNA8 (Figure 4e & S20). Secondly, the twist of TP/ZA and ZA/TP steps of DNA7 is larger (approx. 40°) than the corresponding TG/CA and CA/TG steps of DNA8 (approx. 25°). As a result of this, the local inclination of DNA7 is reduced relative to DNA8 ( Figure S21). Lastly, roll and inclination of the central base pair step AT/TA parameters in DNA7 versus DNA8 have similar magnitude but opposite direction. These local differences could be due to the stacking of the Z-NO 2 group with the adjacent adenine nucleobase ( Figure S23). Thus the overall result of these structural changes is a slightly enlarged minor groove but narrower major groove at the central base step AT/TA for DNA7 compared to DNA8.
In summary, although the incorporation of isolated P•Z pairs maintains a B-type structure, these unnatural basepairs induce structural perturbations, which may arise from a stronger hydrogen-bond network between the P and Z nucleotides relative to a G•C pair in the same position 62 . The presence of the electron-withdrawing Z-NO 2 group is contributing to the stronger P•Z pairing by a combination of unique molecular features such as the presence of an intramolecular hydrogen-bond with the exocyclic amine of ZN6 60 , and influencing base-stacking of the Z-NO 2 group with A7/A21. This is also consistent with previous crystal structures of Z-P-containing duplexes 7-8 .

DISCUSSION
This work was designed to gain insight as to how the incorporation of P•Z within a DNA duplex influences duplex structure and minor groove molecular recognition in solution. We highlight here several key observations which have emerged from our results.

Incorporation of P•Z base-pairs in a DNA duplex induces local structural perturbations
Previous crystallographic analysis of a DNA duplex containing two isolated P•Z pairs highlighted a widening of both grooves 8 at the site of the synthetic nucleotides. In addition, increasing the density of P•Z up to six consecutive base-pairs induces the formation of an A-type duplex where the Z-NO 2 group prefers to stack on top of the adjacent nucleobase 7,16 . Thus, the steric and/or electronic properties of the Z-NO 2 group plays an influential role in altering dsDNA structure, particularly in duplexes containing consecutive P•Z pairs.
Our NMR-derived structure of DNA7 also shows a widening of the minor groove at the site of the P•Z pair (Figure 4a & 6c). The P•Z pairing geometry differs quite markedly from a G•C pair in this position. In our structure, a slight narrowing of the major groove at the site of the P•Z basepair was observed (Figure 6d). We attribute this to a combination of a shorter hydrogen-bond between PO6-ZN6 (Figure 4c-d) most likely facilitated by the presence of the Z-NO 2 group forming an intramolecular hydrogen-bond and stacking of the Z-NO 2 group with the adjacent adenine base.
Concerning the latter point, we speculate that the Z-NO 2 group projecting into the major groove could be playing a role in constraining conformational freedom both at the site of a P•Z pair and on adjacent pairs [7][8] . These localized perturbations are particularly evident on the adjacent sugar atoms (A7C2'/H2'-H2''), which are positioned ~ 1 Å further away from Z8-NO 2 compared to the C8H5 of DNA8 ( Figure  S22). These localized changes are also observed by an upfield shift at A7C2' ( 0.74) and a downfield shift on the A7H2'-H2'' resonances ( 0.054 and 0.237) relative to the equivalent atoms present in DNA8 (Table S2-S7).

Allosteric perturbations induced by polyamide binding to naturally-occurring DNA8 relative to Z•P-containing DNA7
A structural hallmark of previous PIP•dsDNA complexes is that PIP binding induces extensive helical bending, widening of the minor groove, and concomitantly, compression of the major groove directly opposite to the site of binding 43-44, 49, 63-65 . In contrast to this being observed for PA1•DNA8, this is less evident in the PA1•DNA7 complex where two P•Z pairs are incorporated within the PIP-binding sequence (Figure 5b). In fact, PA1 binding to DNA7 results in only minor structural perturbations to the overall duplex. Our thermal UV studies reveal the existence of high stability complexes formed between P•Z-containing dsDNA and PIPs (Figure 3b-c), which demonstrates that the reduced level of allosteric perturbation observed upon binding of the PA1 to DNA7 does not negatively impact minor groove recognition. We surmise that PA1 binding profile to DNA7 is more akin to a lock and key model rather than induced fit as previously observed for other minor groove binders such as Hoechst33258 in complex with Atract dsDNA sequences [66][67] .
Taken collectively, the unique structural features of a P•Z pair not only influences groove width but also reduces allosteric modulation in a DNA duplex. This is manifested in the reduced level of allosteric perturbation of minor groove recognition by a PIP which, we speculate, is pre-organized for more optimal binding. Since sequence-selective recognition of DNA duplexes involves an interplay between direct base contacts and shape complementarity of the duplex 29-30 , our work could be used as a basis to design next-generation molecules which could preferentially bind to synthetic genetic information with enhanced selectivity.

CONCLUSION
In summary, this work demonstrates sequence-selective minor groove recognition of synthetic genetic information incorporated in dsDNA by PIPs. Although the unnatural P•Z base-pair mimics the hydrogen-bond profile of a G•C projected into the minor groove, the distinct differences of the P•Z base-pairing geometry plays an influential role in modulating the local helical structure of the free duplex (DNA7) and when in complex with a PIP (PA1•DNA7). We envisage that the unique structural signatures of DNA duplexes containing synthetic genetic information could offer new opportunities in the field of synthetic biology, including the development of new strategies to regulate gene expression 68 or as orthogonal pathways for transcriptional initiation and elongation.

ASSOCIATED CONTENT
Distance restraints and structures ensembles are deposited in the PDB databank with access code 6I4O (DNA7), 6I4N (PA1•DNA7), 5OCZ (DNA8) and 6RIO (PA1•DNA8). PIPs characterization, NMR chemical shifts and molecular dynamics statistics can be found in the supporting information. This material is available free of charge via the Internet at http://pubs.acs.org.