Major G-Quadruplex Form of HIV-1 LTR Reveals a (3 + 1) Folding Topology Containing a Stem-Loop

Nucleic acids can form noncanonical four-stranded structures called G-quadruplexes. G-quadruplex-forming sequences are found in several genomes including human and viruses. Previous studies showed that the G-rich sequence located in the U3 promoter region of the HIV-1 long terminal repeat (LTR) folds into a set of dynamically interchangeable G-quadruplex structures. G-quadruplexes formed in the LTR could act as silencer elements to regulate viral transcription. Stabilization of LTR G-quadruplexes by G-quadruplex-specific ligands resulted in decreased viral production, suggesting the possibility of targeting viral G-quadruplex structures for antiviral purposes. Among all the G-quadruplexes formed in the LTR sequence, LTR-III was shown to be the major G-quadruplex conformation in vitro. Here we report the NMR structure of LTR-III in K+ solution, revealing the formation of a unique quadruplex–duplex hybrid consisting of a three-layer (3 + 1) G-quadruplex scaffold, a 12-nt diagonal loop containing a conserved duplex-stem, a 3-nt lateral loop, a 1-nt propeller loop, and a V-shaped loop. Our structure showed several distinct features including a quadruplex–duplex junction, representing an attractive motif for drug targeting. The structure solved in this study may be used as a promising target to selectively impair the viral cycle.


■ INTRODUCTION
G-quadruplexes are alternative secondary structures formed by guanine-rich nucleic acids. Four runs of at least two guanines linked by short mixed nucleotide sequences are prone to fold in a monomolecular G-quadruplex structure, built up from planar G-tetrads where four guanines interact through Hoogsteen hydrogen bonds. 1 Different strand polarities and different loops and groove dimensions give rise to a large variety of G-quadruplex topologies. 2,3 Physiological concentrations of potassium and sodium cations efficiently stabilize G-quadruplexes. 4−6 Potential G-quadruplex forming sequences are widespread in the human genome and implicated in key genomic functions, such as transcription, replication, repair, and telomere maintenance. 7−11 Particularly, overrepresented in the promoter regions of oncogenes, G-quadruplexes act as regulatory elements of gene expression. 9,10 Targeting the G-quadruplex structures in the promoters of oncogenes c-myc, c-kit, and bcl-2 with G-quadruplex-stabilizing agents leads to gene transcription inhibition and decreased levels of gene expression, 12 suggesting G-quadruplexes as promising anticancer targets. 13,14 Besides the human genome, viral genomes also contain G-quadruplex-forming sequences, and emerging evidence suggests that they could be implicated in the regulation of key steps in the viral cycles. 15 In Epstein−Barr virus (EBV) an RNA G-quadruplex regulates translation of EBNA1 mRNA. 16,17 Multiple G-quadruplex-forming sequences located in the long control regions of some human papilloma virus (HPV) genomes suggest G-quadruplex involvement in transcriptional regulation. 18 In herpes simplex virus-1 (HSV-1), G-quadruplexes that form in the virus DNA genome were visualized in infected cells and were shown to peak at the time of virus genome replication; 19 in addition, the DNA replication step was affected by a G-quadruplex ligand, BRACO-19, inferring a regulatory role of G-quadruplexes in the viral replication. 20 G-quadruplexes have also been described in the human immunodeficiency virus 1 (HIV-1), a lentivirus that is the etiological agent of the acquired immunodeficiency syndrome (AIDS). HIV-1 is characterized by a ssRNA genome that, once retrotranscribed by the viral reverse transcriptase enzyme, integrates into the host cell chromosome in the provirus form. The provirus can then undergo a productive replicative cycle or remain in a dormant state known as "latency". Effective progression of the viral cycle relies on the proper function of the 5′-long terminal repeat (5′-LTR), which is characterized by transcription factor binding sites and serves as unique viral promoter. 21 Formation of multiple G-quadruplexes in the viral and proviral genome, 22,23 and in particular in the LTR promoter, 22,24 has been reported. LTR G-quadruplexes act as repressor elements of viral transcription initiation: stabilization by G-quadruplex ligands intensifies this effect, 25,26 while cellular proteins modulate viral transcription by inducing/unfolding LTR G-quadruplexes. 27,28 The observation that 5′-LTR G-quadruplex forming sequences are conserved in all primate lentiviruses 29 further validates viral G-quadruplexes as novel antiviral targets. However, selective targeting of viral G-quadruplexes with small molecules is challenging and very few compounds have been shown to recognize specific G-quadruplex structures. 30 Highresolution structures of viral G-quadruplexes may give new insights to achieve higher level of selectivity and specificity.
Within the LTR G-rich sequence, in the U3 region of the proviral genome, formation of multiple G-quadruplex conformations involving different G-tracts is possible. This sequence was divided into three main G-quadruplex-forming components, namely LTR-II, LTR-III, and LTR-IV ( Figure 1A). In previous studies, the G-quadruplex formed by the LTR-III sequence showed the highest thermal stability in circular dichroism and FRET melting experiments. Moreover, Taq polymerase stop assay on the full-length LTR sequence, in K + solution, revealed a stop site prevalently occurring at the LTR-III site and this effect was exacerbated with G-quadruplex ligands, such as BRACO-19 ( Figure 1B). 22,25 The cellular protein nucleolin is involved in the regulation of viral promoter activity through binding to the LTR G-quadruplex structures. 27 Specifically, the LTR G-quadruplexstabilizing effect translates into the decrease of viral promoter activity. In contrast, the cellular protein hnRNP A2/B1 binds and unfolds the LTR G-quadruplexes, i.e. LTR-II and LTR-III, activating viral transcription. 28 Interestingly, the activity of promoters with mutations totally or partially abolishing LTR-III G-quadruplex formation is not affected by nucleolin and hnRNP A2/B1 binding as compared to the wild-type sequence.
This evidence supports the key role of LTR-III G-quadruplex within the LTR G-quadruplex-folding motif in the regulatory events of HIV-1 transcription. Thus, selective targeting of the LTR-III G-quadruplex conformation with stabilizing ligands may represent an attractive strategy to inhibit virus production.
Here we report on the high-resolution NMR solution structure of the 28-nt LTR-III G-quadruplex 5′-GGGAGGCGTGG CCTGGGCGGGACTGGGG-3′, containing an interesting duplex−quadruplex junction that can potentially be specifically targeted. We also demonstrate that the LTR-III G-quadruplex structure persists in a longer LTR sequence, suggesting LTR-III as a major G-quadruplex structure formed in the HIV-1 LTR.

■ MATERIALS AND METHODS
DNA Sample Preparation. Unlabeled and site-specific labeled DNA oligonucleotides were synthesized using reagents from Glen Research (Sterling, USA). Samples were deprotected in ammonium hydroxide solution, purified using Poly-Pak cartridges following Glen Research protocol, and then dialyzed overnight against 20 mM KCl solution. The excess of KCl was removed by dialysis against water for 2 h. Upon lyophilization DNA was obtained in powder form. DNA samples were dissolved in buffer containing 70 mM potassium chloride and 20 mM potassium phosphate (pH 7).
Gel Electrophoresis. DNA samples at 100 μM strand concentration in potassium phosphate buffer were loaded on 15% native polyacrylamide gel containing 10 mM KCl. An electrophoresis was run at 90 V for 30 min at room temperature in Tris-Borate-EDTA-KCl buffer, DNA bands were visualized by UV shadowing.
Circular Dichroism. CD spectra were recorded on a Jasco J-815 CD spectrometer at 20°C using a quartz cuvette of 10-mm optical path length. The reported spectra of DNA samples at 5 μM concentration in potassium phosphate buffer (pH 7) were the average of 3 scans over the 220−320 nm wavelength range, at the scanning speed of 50 nm/min, baseline-corrected for buffer contribution.

Journal of the American Chemical Society
Article Thermal Denaturing. Thermal denaturing experiments were performed on Jasco V-650 UV spectrometer. DNA samples at 5 μM or 100 μM strand concentration were initially heated to 95°C for 5 min and cooled to 20°C by a temperature ramping rate of 0.1°C/min, followed by heating from 20 to 95°C at the same rate. UV absorbance at 295 nm was measured every 0.5°C. Obtained data were plotted as folded fraction against temperature and the melting temperature was determined as the value at which the folded fraction was 0.5.

LTR-III Forms a Stable Intramolecular G-Quadruplex
Structure. The 28-nt LTR-III sequence d[GGGAGGCGT GGCCTGGGCGGGACTGGGG] contains six tracts of 2−4 guanines (underlined). Using UV, CD, and NMR spectroscopy we investigated the G-quadruplex formation of LTR-III in K + solution. The NMR spectrum of LTR-III showed 12 wellresolved peaks from 10.5 to 12.5 ppm, suggesting the formation of three G-tetrads, and three peaks from 12.5 to 13.5 ppm, suggesting the formation of Watson−Crick base pairs (Figure 2A). The CD spectrum of LTR-III showed a maximum peak at 260 nm and a shoulder peak around 285 nm, suggesting the formation of a nonparallel G-quadruplex topology ( Figure 2B). 33,34 The melting temperature of LTR-III, measured by UV absorption ( Figure S1A) in ∼100 mM K + , was found to be 65.5°C and independent of the DNA strand concentration (5 to 100 μM), consistent with the formation of a monomeric G-quadruplex structure. Additionally, on a native gel the migration of LTR-III was similar to that of a monomeric threelayered G-quadruplex structure 35 ( Figure S1B). Overall, these data support the formation of an intramolecular monomeric G-quadruplex structure.
LTR-III G-Quadruplex Adopts a (3 + 1) Folding Topology Containing a Diagonal Stem-Loop. To elucidate the folding topology of LTR-III, NMR spectral assignment was performed using well-established protocols. 36 Imino protons (H1) involved in base-pairing formation were assigned using site-specific low-enrichment (2−4%) 15 N-labeling ( Figure 3A), 37 except for G11 for which H1 was assigned using NOE connectivities observed at low temperature (10°C) ( Figure S2). Subsequently, imino protons of guanines were correlated to their corresponding aromatic protons (H8) using through-bond JR-HMBC experiment 38 ( Figure 3B). Other aromatic protons were assigned or confirmed using H-to-D site-specific labeling and correlations through bond and space (TOCSY, 13 C-HSQC, and NOESY experiments).
The V-shaped loop is formed between G25 and G26 residues with structural features similar to those observed in a G-quadruplex formed by an intronic human sequence. 39 Within the long 12-nt diagonal loop, six nucleotides are interacting by Watson−Crick hydrogen bonds to form a hairpin (or stem-loop) structure with a capping G8-T9-G10 loop ( Figure 3E). A possible additional base pair (A4•T14 or G3•T14) at the junction bridging the large distance (>20 Å) between the diagonal corners of the G-tetrads 32 was not observed in our experiments, even at low pH and temperature ( Figure S4).
Solution Structure of LTR-III G-Quadruplex. NMR solution structures of LTR-III were calculated based on restraints obtained from NMR experiments ( Table 1). Ten lowest-energy structures were superimposed using heavy atoms in the G-tetrad core and represented in Figure 4A. Both the G-tetrad core and the stem-loop are well-converged individually ( Figure 4, Figure S5, Table 1), however the orientations between them vary (Figure 4), mainly due to the lack of constraints involving G3 and A4 residues where few inter-residue NOEs were detected. In addition, peak broadening was observed for G3 indicating a possible flexible linker between the G-tetrad core and the stem loop.
The stem-loop is composed of three Watson−Crick base pairs (G5•C13, G6•C12, and C7•G11) showing regular B-DNA-like features. In our calculated model, T14 is stacked on top of the G2•G26•G15•G19 tetrad, as seen by numerous NOEs (Figure S6), while the G3 and A4 residues are pointing outside. In the lateral loop A22-C23-T24, A22 and T24 stack below G21 and G25, respectively while C23 is positioned below A22 and T24. The V-shaped loop between G25 and G26 is bridging the last and first G-tetrads with both syn G25 and G26 residues.
LTR-III Sequence Mutations: Probing the Stem-Loop and Quadruplex-Duplex Junction. We investigated the effects of different sequence mutations in the LTR-III G-quadruplex structure ( Table 2). In particular, we mutated residues in the diagonal stem-loop and at the quadruplex−duplex  Figure 5, Figure S7). Imino proton spectrum of the G10A sequence showed three peaks in the 12.5 to 13.0 ppm region significantly sharper than those of LTR-III, suggesting a more stable hairpin formation.
To replace the G6•C12 base pair by an A•T base pair, G6 and C12 were substituted by A6 and T12, respectively, in the G6A-C12T sequence. NMR imino proton spectrum of G6A-C12T showed one significant downfield-shifted peak at ∼13.5 ppm ( Figure 5, Figure S7), supporting the formation of an A•T base pair. The junction between the stem-loop and the G-tetrad core is an important structural feature. Deletion of the G3 base in the ΔG3 sequence led to 1D NMR and CD spectra with features similar to those of LTR-III ( Figure 5, Figure S7): peaks at 12.5−13.0 ppm remained sharp and slight variations were observed for peaks from 10.8 to 12.2 ppm. This indicates that the G3 base is not crucial for the formation of the G-tetrad core or the stem-loop, consistent with NOE data and our calculated structure. In contrast, mutation/deletion of A4 and T14 resulted in the disappearance or broadening of the resonances in the 12.5−13.0 ppm region, while 10−12

Journal of the American Chemical Society
Article resonances in the 10.8−12.2 ppm region were still observed despite a pronounced chemical shift variation. These data suggest a possible role of A4 and T14 in the quadruplex− duplex junction and the stabilization of the duplex stem.
Similar results were also observed for mutated sequences containing both improved cap, as in the G10A sequence, and mutations at the quadruplex−duplex junction ( Figure S8).
Whereas in most of our calculated models T14 is stacked on the top G-tetrad and the G3 and A4 are pointing outside, in some models the A4 base is close to T14. To test the hypothesis on the formation of a transient Watson−Crick base pair between A4 and T14, we ran structure calculation with additional Watson−Crick A4•T14 base pair constraints. The formation of an A•T Watson−Crick base pair was compatible with the structure and our collected NOEs ( Figure S9). We also tested the formation of a possible G3•T14 base pair in our structural calculation by adding hydrogen-bond constraints, but no stable base-pair could be observed without a large NOE violation or high increase in energy penalty.
LTR-III G-Quadruplex Structure Persists in a Longer LTR Sequence. Formation of LTR-III G-quadruplex was assessed in a longer sequence containing LTR-III and LTR-IV sequences. 27 In principle, the LTR-III+IV sequence is able to alternatively form both LTR-III and LTR-IV G-quadruplexes. However, NMR spectrum of LTR-III+IV displayed 12 wellresolved peaks at 10−12.5 ppm and 3 broad peaks at 12.5− 13.5 ppm in the imino proton region, which shared many similarities with the 1D NMR spectrum of LTR-III sequence (Figure 6), suggesting that LTR-III+IV might form a G-quadruplex fold containing a stem-loop similarly to LTR-III.
Using site-specific labeling strategy, we demonstrated that the 12 well-resolved peaks in the imino proton region of the LTR-III+IV sequence originate from the LTR-III part of the sequence, while the guanines involved only in LTR-IV G-quadruplex structure (G30, G32, and G33) are not engaged in Hoogsteen hydrogen bond formation ( Figure 6, Figure S10).
Moreover, solvent exchange experiments showed that the guanines involved in the central tetrad of LTR-III+IV G-quadruplex (G1, G16, G20, and G27) exactly correspond to the guanines in the same position of LTR-III ( Figure 6, Figure S11).
According to these data, it is clear that the longer sequence favors the single conformation of LTR-III G-quadruplex, conserving its unique features. This fact suggests that the LTR-III G-quadruplex, previously demonstrated to be the major and most stable form of the considered region, is prevalent in the longer and more dynamic context.

■ DISCUSSION
In this work, we demonstrated that LTR-III folds in a hybrid quadruplex−duplex conformation with a three-layered G-tetrad core arranged in a (3 + 1) topology and a long 12-nt loop forming a hairpin structure. NMR analysis of the sequence named LTR-III+IV and able to form both LTR-III and LTR-IV G-quadruplexes showed that the folding topology of LTR-III is still conserved, suggesting the preferential folding of LTR-III G-quadruplex within the dynamic context of multiple conformations.
Hybrid quadruplex−duplex structures have been described previously as artificial constructs with different relative quadruplex−duplex orientations, exploring junction and connection varieties. 32 Our structure of LTR-III G-quadruplex containing a duplex hairpin across a diagonal loop reveals a significant tilting between the helical axis of the duplex and that of the quadruplex, contrasting the feature observed for an artificial hybrid quadruplex−duplex also containing a duplex hairpin across a diagonal loop (PDB code 2M91) ( Figure S12). This difference arises from the difference in the junction composition: in the 2M91 structure an adaptor G•A base pair suitably bridges the large distance (>20 Å) across the diagonal corners of a G-tetrad, while in LTR-III the junction structure formed with T14 on one strand and G3-A4 on the other strand of the duplex might be more floppy and dynamic, providing an opportunity for targeting. Even though bioinformatic studies on the human genome showed the potential of over 80 000 sequences prone to fold in such a structure, 41 so far high-resolution structures of naturally occurring and biologically relevant hybrid quadruplex−duplex topologies have not been reported. 41−45 The guanine content in the HIV-1 G-quadruplex forming region is highly conserved. 22 Mutations in the sequence that forms the stem-loop may disrupt Watson−Crick base pairing in the duplex component of the structure. Therefore, the conservation of the nucleotides participating in the stem-loop formation has also been assessed, revealing high percentages of conservation (70−99%) for all the nucleotides with the exception of cytosine in position 7, which displayed around 50% of probability for thymine mutation.
Conserved multiple G-quadruplex structures in the LTR promoter region of HIV-1 and primate lentiviruses have been proposed as regulatory elements of viral transcription 22,29 and therefore as promising targets for viral cycle inhibition. Stabilization of viral G-quadruplexes by the well-known G-quadruplex ligand BRACO-19 resulted in the inhibition of viral production. 25 Recently, newly synthesized naphthalene diimide (NDI) compounds with an extended core were found to act as antiviral agents with a G-quadruplex related mechanism, selectively targeting viral over telomeric G-quadruplexes. 26 Moreover, a novel NDI Cu(II) complex was found to act as DNA-cleaving agent, targeting the LTR-III G-quadruplex with high selectivity. 46 In particular, the binding geometry of the NDI Cu(II) derivative to the LTR-III structure resolved here defined the proximity of the Cu catalytic site to the nearby regions and helped explain the sharp cleavage observed at two main sites of the LTR-III sequence.

Journal of the American Chemical Society
Article Interestingly, we found that mutations in the LTR-IV G-quadruplex component do not abolish the inhibitory effect on viral transcription probably due to the stable presence of LTR-III conformation. 47 Therefore, selective targeting of the major LTR-III G-quadruplex component may be a promising strategy for viral transcription inhibition.
Such a singular structure of LTR-III G-quadruplex opens the possibility of improving selectivity by targeting the quadruplex-duplex junction. Examples of compounds targeting this feature may come from DHFBI fluorogens intercalating on the junction between the G-quadruplex and hairpin of RNA light-up aptamers, 48 or from recently published molecules that can simultaneously bind G-quadruplex and a proximal duplex. 49,50 ■ CONCLUSION The emerging importance of the LTR G-quadruplexes as antiviral targets opened the possibility of exploring novel G-quadruplex ligands as anti-HIV-1 agents. Considering the high G-quadruplex content in human cells, one of the main challenges is to achieve selectivity toward viral G-quadruplexes. We provided here a starting point to the rational drug design approach by defining the solution structure of LTR-III G-quadruplex, the major component within the LTR G-quadruplex-forming motif. Given the fact that the majority of G-quadruplex binding ligands tested so far display structural features directed to target G-tetrads prevalently by stacking interaction 51−60 and high selectivity can be achieved at duplex region, our findings open new perspectives to the possibility of discriminating among different G-quadruplex conformations. The future approach may thus be directed to the development of small molecules with structural features compatible with unique loop sequences and arrangements. This strategy toward the LTR-III structure presented in this work may provide new selective anti-HIV-1 agents with a G-quadruplex-mediated mechanism of action.