Diaminopurine in Nonenzymatic RNA Template Copying

In the RNA World before the emergence of an RNA polymerase, nonenzymatic template copying would have been essential for the transmission of genetic information. However, the products of chemical copying with the canonical nucleotides (A, U, C, and G) are heavily biased toward the incorporation of G and C, which form a more stable base pair than A and U. We therefore asked whether replacing adenine (A) with diaminopurine (D) might lead to more efficient and less biased nonenzymatic template copying by making a stronger version of the A:U pair. As expected, primer extension substrates containing D bound to U in the template more tightly than substrates containing A. However, primer extension with D exhibited elevated reaction rates on a C template, leading to concerns about fidelity. Our crystallographic studies revealed the nature of the D:C mismatch by showing that D can form a wobble-type base pair with C. We then asked whether competition with G would decrease the mismatched primer extension. We performed nonenzymatic primer extension with all four activated nucleotides on randomized RNA templates containing all four letters and used deep sequencing to analyze the products. We found that the DUCG genetic system exhibited a more even product distribution and a lower mismatch frequency than the canonical AUCG system. Furthermore, primer extension is greatly reduced following all mismatches, including the D:C mismatch. Our study suggests that D deserves further attention for its possible role in the RNA World and as a potentially useful component of artificial nonenzymatic RNA replication systems.

1 Materials and Methods

General information
Materials.Reagents and solvents were obtained from Fischer Scientific, Sigma-Aldrich, Alfa Aesar, Acros Organics, Combi-Blocks and were used without any further purification unless otherwise noted.Diaminopurine nucleoside (≥99%) was purchased from ChemImpex.
Phosphoramidites and reagents used for solid-phase RNA synthesis were purchased from Glen Research (Sterling, MA) and ChemGenes (Wilmington, MA).Deuterated solvents for NMR were purchased from Cambridge Isotope Laboratories (Tewksbury, MA).
Low-resolution Mass Spectrometry (LRMS).All samples were diluted to 200 μM in Milli-Q water (18.2MΩ•cm) with a few drops of acetonitrile immediately prior to analysis.The spectra were obtained by direct injection on an Esquire 6000 mass spectrometer (Bruker Daltonics), operated in the alternating ion mode.

High-resolution Liquid Chromatography Mass Spectrometry (HR LC-MS).
The samples were separated and analyzed using an Agilent 1200 High-performance Liquid Chromatography (HPLC) coupled to an Agilent 6230 Time-of-flight Mass Spectrometry (TOF MS) equipped with a diode array detector.The samples were separated by IP-RP-HPLC on a 100 mm × 1 mm (length × i.d.) Xbridge C18 column with 3.5 μm particle size (Waters, Milford, MA).The samples were eluted between 2.5 and 15% methanol in 200 mM 1,1,1,3,3,3-hexafluoro-2propanol with 1.25 mM triethylamine at pH 7.0 over 16 minutes with a flow rate of 0.1 mL/min at 50°C.The samples were analyzed in negative mode from 239 m/z to 3200 m/z with a scan rate of 1 spectrum/s.

5ʹ imidazolium bridged dinucleotides
Phosphorylation of diaminopurine nucleoside.The 5ʹ-OH of the diaminopurine nucleoside was phosphorylated using a previously reported procedure. 1To a pre-chilled mixture of diaminopurine nucleoside (1 equiv.) in OP(OMe)3 (0.1 M with respect to the nucleoside) was added POCl3 (4 equiv.)under vigorous stirring at 0°C.After complete solubilization of the nucleoside, DIPEA (0.5 eq.) was added drop-wise to the stirring reaction.Three additional portions of DIPEA were added (0.5 eq.each) at 20 minute intervals.Once the starting material disappeared, monitored by LRMS, the reaction was quenched using 1 M TEAB (5X volumes, pH 7.5).The product was purified by reverse phase flash chromatography using gradient elution between (A) aqueous 2 mM TEAB (pH 7.5) and (B) acetonitrile.The product was eluted between 0% and 15% B over 13 CVs with a flow rate of 40 mL/min.Fractions containing product were collected and lyophilized.The diaminopurine nucleotide was used for downstream activation without further purification.

Synthesis and characterization of 5ʹ-5ʹ imidazolium-bridged dinucleotides. The synthesis of A*A
and D*D follows a previously reported procedure. 3The detailed characterization (NMR and HR-MS) of A*A can be found in reference 3 and that of D*D is listed below.

Synthesis of 2-aminoimidazole activated trinucleotides
The 5ʹ-phosphorylated trinucleotides (GAC and AGG) were prepared by solid-phase synthesis on a MerMade 6 DNA/RNA synthesizer.The trinucleotides were subsequently deprotected and purified by reverse phase flash chromatography on a 50 g C18Aq column over 12 CVs of 0-10% acetonitrile in 2 mM TEAB buffer (pH 8.0).The fractions corresponding to the 5ʹ-phosphorylated trinucleotides, as monitored by LRMS, were collected and lyophilized to complete dryness.

Synthesis of oligonucleotide primers, templates and blocker
The oligonucleotides made of canonical nucleobases were purchased from Integrated DNA Technologies (Coralville, IA).The oligonucleotides containing D were prepared on an Expedite 8909 DNA/RNA synthesizer.The oligonucleotides were cleaved from the solid support and deprotected with AMA (ammonium hydroxide/40% aqueous methylamine 1:1 v/v) at 65°C for 20 minutes.The mixtures were lyophilized, 2ʹ-deprotected by removal of the 2ʹ-TBDMS groups and purified using Glen-pak RNA cartridges (Glen research, Sterling, MA).The oligonucleotides were then lyophilized and resuspended in 7M urea to be further purified by 20% (19:1) polyacrylamide gel electrophoresis (PAGE, National Diagnostics, Atlanta, GA) and Sep-pak C18 plus short cartridges (Waters, Milford, MA).

Nonenzymatic primer extension reactions
Michaelis-Menten kinetics.The annealing solutions consisting of the primer/template/blocker complexes were prepared at 5X final concentration: 7.5 μM primer, 12.5 μM template, 17.5 μM blocker, 50 mM Tris-Cl pH 8.0, 50 mM NaCl, and 1 mM EDTA.The solution was heated at 85°C for 30 s and then slowly cooled to 25°C at a rate of 0.1°C/s in a thermal cycler machine.The The sequences of oligonucleotides can be found in Table S1.
Primer extension reactions with activated mononucleotide and downstream activated trinucleotide helper.The annealing solution consisting of primer/template complexes were prepared at 5X final concentration: 7.5 μM primer, 12.5 μM template, 50 mM Tris-Cl pH 8.0, 50 mM NaCl, and 1 mM EDTA.The solution was annealed as previously described and diluted to yield the resulting final concentrations: 1.5 μM primer, 2.5 μM template, 200 mM Tris-Cl pH 8.0, and 100 mM MgCl2.
For initiating the reactions, activated mononucleotides (*A, *D, *U, *C, *G) and activated trinucleotides (*GAC or *AGG) were freshly prepared and added to the annealed primer/template solution to yield the final concentrations of 20 mM activated mononucleotides and 0.5 mM activated trinucleotides.At each time point, the reaction was quenched as previously described.
The sequences of oligonucleotides can be found in Table S4.
Stalling effect of the D:C mismatch.The reaction conditions of the stalling experiments are as described in the primer extension reactions with activated mononucleotide and downstream activated trinucleotide except that the reactions were initiated by adding bridged dinucleotides (G*G) to a final concentration of 10 mM.The sequences of oligonucleotides can be found in Table S5.

Crystallization
0.33 mM self-complementary RNA sequences in nuclease-free water (Invitrogen, Waltham, MA) were heated to 90°C for 2 minutes and then slowly cooled to room temperature.Crystal Screen S8 HT, Index HT, Natrix HT (Hampton Research, Aliso Viejo, CA) and Nuc-Pro HTS (Jena Bioscience, Jena, Germany) were used to screen crystallization conditions at 20°C using the sitting-drop vapor diffusion method.An NT8 robotic system and Rock Imager (Formulatrix, Waltham, MA) were used for crystallization screening and monitoring the crystallization process.
The sequences of self-complementary RNA duplexes are listed in Table S6 and the optimal crystallization conditions are listed in the Table S7.

Crystal data collection, structure determination and refinement
Diffraction data were collected under a liquid nitrogen stream at 99 K at a wavelength of 1.038413 Å or 1.033216 Å on Beamline 201 at the Advanced Light Source in the Lawrence Berkeley National Laboratory (USA).The crystals were exposed for 0.25 s per image with a 0.25° oscillation angle on Beamline 201.The distances between detector and the crystal were set to 200-300 mm.The data were processed by HKL2000 4 or XDS.The structures were solved by molecular replacement by PHASER 5 using the structure of 3ND4 as the searching model 6 .All structures were refined by Refmac5 in CCP4i 7 or Phenix 8 .After several cycles of refinement, some water molecules were added in Coot 9 .Data collection, phasing, and refinement statistics of the determined structures are listed in Supplementary Table S8 and S9.

Sequencing
RNA sample preparation for Illumina Sequencing.The procedure for RNA sample preparation for sequencing is adapted from the reported protocol. 10Table S10 lists all sequences used for the sequencing experiments.The annealing solution consisted of the self-complementary hairpin constructs were prepared at 5X final concentration: 5 μM hairpin construct (oligo 6N for AUCG system, and oligo 6D for DUCG system, Table S10), 6 μM blocker, 50 mM Tris-Cl pH 8.0, 1 mM EDTA, 50 mM NaCl.The solution was annealed as previously described and diluted to yield the resulting final concentrations: 1 μM hairpin construct, 1.According to the nearest-neighbor (NN) model 12 , for the RNA duplex construct used in Figure 2 (Table S1), the Gibbs free energies of AA and DD systems are calculated as: , with coaxial stacking energies approximated using the NN values for contiguous base pairs.
Assuming ΔG°) *,,-./0$12/34250,-,.,5.,1-andΔG°) *,:;$$/.0; are constant in both equations, the change in Gibbs free energy becomes: the ΔG associated with D is in general smaller than that with A (Table S2).Consequently, the negative change in Gibbs free energy predicted by the NN model indicates that substituting A with D can increase the thermodynamic stability of RNA duplexes, a conclusion that is in alignment with the trends observed in our experimentally measured value.
annealed solution was diluted to yield the resulting final concentrations: 1.5 μM primer, 2.5 μM template, 3.5 μM blocker, 200 mM Tris-Cl pH 8.0, and 100 mM MgCl2.Stock solutions of bridged dinucleotides (A*A or D*D), freshly prepared at 2X titrating concentrations, were added to the annealed primer/template/blocker solution to initiate the templated primer extension reactions.At each time point, 0.5 μL of reaction sample was added to 25 μL quenching buffer containing 25 mM EDTA, 1X TBE, and 4 μM of an DNA sequence complementary to the template in formamide.
2 μM blocker, 200 mM Tris-Cl pH 8.0, and indicated amount of MgCl2 (10 mM or 100 mM).Bridged dinucleotides (N*N) mixtures (for AUCG system, N=A, U, C, G; for DUCG system, N=D, U, C, G) were obtained by equilibrating freshly prepared bridged homo-dinucleotides at room temperature for 2 hours (for AUCG system, equilibration of A*A, U*U, C*C and G*G; for DUCG system, equilibration of D*D, U*U, C*Cand G*G). 11N*N mixtures were added to the solution at a 10 mM or 20 mM final concentration RT.Kd can be approximated using Km by assuming that the on and off rates of the imidazolium-bridged dinucleotide substrate are fast relative to the chemical step.3The equation then becomes ΔΔG = ln RT.Inserting the values of K $ %% and K $ && from Figure2C, we computed ΔΔG°' ( = −1.76kcal/mol.

Figure S1 .
Figure S1.DUCG system decreases biases in complementary product incorporation by enriching the distribution of D*N and U*N bridged dinucleotides.(A) Schematic representation of the bridged dinucleotides' distribution.The exact sequences of the bridged dinucleotides are inferred from the template.(B-D) Position-dependent frequency of bridged dinucleotides in the AUCG & DUCG systems and the frequency ratio between AUCG and DUCG.Heatmaps are generated for the following reaction conditions: (B) 10 mM MgCl2 and 10 mM N*N (C) 10 mM MgCl2 and 20 mM N*N (D) 100 mM MgCl2 and 20 mM N*N.For the frequency ratio heatmap, red represents greater frequency in the DUCG system whereas blue represents greater frequency in the AUCG system.

Figure S2 .
Figure S2.Mismatch heatmaps show the position-dependent distribution of mismatched pairs (template: product) among the mismatched incorporation.(A) Schematic representation of the mismatched incorporation.At each position, the frequencies of all 12 possible T:P mismatch pairs are normalized to 1. (B-G) Heatmaps are generated for the following reaction conditions: 10 mM MgCl2 and 10 mM N*N in (B) AUCG and (E) DUCG systems.10 mM MgCl2 and 20 mM N*N in (C) AUCG and (F) DUCG systems.100 mM MgCl2 and 20 mM N*N in (D) AUCG and (G) DUCG systems.

Figure S3 .
Figure S3.The effect of mismatches at position 1 on subsequent primer extension.(A) Extension probability over each mismatch at position 1. (B) Stalling factor for each type of mismatch at position 1, defined as the ratio of the extension probability after a complementary pair compared to that after a mismatched pair.

Figure S4 .
Figure S4.Extending mismatch stacked barplots show the position-dependent relative frequency of mismatch followed by mismatch and non-mismatch at position 1-4 in AUCG and DUCG systems.(A) Schematic representation of mismatch followed by mismatch and non-mismatch.At each position, the frequencies of mismatch followed by mismatch and mismatch followed by non-mismatch are normalized to 1. (B-G) Stacked barplots are generated for the following reaction conditions: 10 mM MgCl2 and 10 mM N*N in (B) AUCG and (C) DUCG systems.10 mM MgCl2 and 20 mM N*N in (D) AUCG and (E) DUCG systems.100 mM MgCl2 and 20 mM N*N in (F) AUCG and (G) DUCG systems.

Table S5 . Sequences of the primers, templates, and complementary oligonucleotides used in the stalling effect experiments.
1Underlined bases vary in each set of sequences (primer, template and complementary strand) while the other bases of the oligonucleotides hold the same.

Table S6 . Sequences of the self-complementary RNA duplexes used in crystallographic studies.
1Underlined bases indicate the probed base pair interactions.