Sequence-Dependent Fluorescence of Cy3- and Cy5-Labeled Double-Stranded DNA

The fluorescent intensity of Cy3 and Cy5 dyes is strongly dependent on the nucleobase sequence of the labeled oligonucleotides. Sequence-dependent fluorescence may significantly influence the data obtained from many common experimental methods based on fluorescence detection of nucleic acids, such as sequencing, PCR, FRET, and FISH. To quantify sequence dependent fluorescence, we have measured the fluorescence intensity of Cy3 and Cy5 bound to the 5′ end of all 1024 possible double-stranded DNA 5mers. The fluorescence intensity was also determined for these dyes bound to the 5′ end of fixed-sequence double-stranded DNA with a variable sequence 3′ overhang adjacent to the dye. The labeled DNA oligonucleotides were made using light-directed, in situ microarray synthesis. The results indicate that the fluorescence intensity of both dyes is sensitive to all five bases or base pairs, that the sequence dependence is stronger for double- (vs single-) stranded DNA, and that the dyes are sensitive to both the adjacent dsDNA sequence and the 3′-ssDNA overhang. Purine-rich sequences result in higher fluorescence. The results can be used to estimate measurement error in experiments with fluorescent-labeled DNA, as well as to optimize the fluorescent signal by considering the nucleobase environment of the labeling cyanine dye.


■ INTRODUCTION
The fluorescence of molecules is always sensitive to environmental conditions, although the magnitude of changes in the fluorescence intensity of any particular fluorophore depends on its specific modes of interaction with its environment. 1 Fluorescent molecules can be used as molecular environmental probes by selecting dyes with strong responses to, for example, pH, 2 viscosity, 3 polarizability, 4 elasticity, 5 and polarity; 6 however, in applications where the fluorescent intensity is to serve as a proxy for the abundance of the labeled molecule, environmental sensitivity is a liability that can result in reduced measurement accuracy. 7 The cyanine dyes Cy3 and Cy5 are among the most widely used and versatile 8 oligonucleotide labels in, e.g., microarray experiments, fluorescent in situ hybridization (FISH), real-time PCR (RT-PCR), and FRET studies 9,10 and are considered to be relatively environmentally insensitive. 11 However, Cy3 and Cy5 consist of two indole rings connected by three or five carbon polymethine bridges which can undergo cis−trans isomerization from the first excited singlet state which competes with fluorescence. 12−15 In viscous or restrictive environments, or with conformationally locked dye variants, the rate of isomerization is reduced or eliminated and the dyes are more fluorescent. 16 When Cy3 and Cy5 are tethered to the end of double-stranded DNA they assume a planar capping configuration similar to that of an additional base pair, 17,18 which inhibits isomerization and increases their fluorescence quantum yield and lifetime. 15 At least in the case of Cy3, the range of motion available is not fully restricted when attached to either single-or doublestranded DNA, with time-resolved fluorescence anisotropy measurements indicating decay components corresponding to rotation with DNA as well as relative to DNA.
Recent experiments have shown that both Cy3 and Cy5 are also quite sensitive to the particular nucleobase sequence of the ssDNA oligonucleotide to which they are attached, 19,20 with the fluorescence intensity varying by a factor of about 2 between the brightest and the darkest labeled oligonucleotide in the case of Cy3, and a factor of about 3 in the case of Cy5. The variation in fluorescence intensity for ssDNA is strongly correlated with purine content, with purine-rich sequences associated with high intensity, and high pyrimidine content, particularly cytosine, with low intensity. 19 The magnitude of the sequence-dependent fluorescence is large enough to affect the accuracy of experimental data derived from Cy3-and Cy5-labeled single-stranded DNA, but there is currently no data available on sequence-dependent effects in double-stranded DNA. In experimental methods based on labeled oligonucleotides, fluorescence is recorded either from the double-stranded hybrid (e.g., Sanger and next-generation sequencing, and molecular beacons 21 ) or from the unhybridized strand alone (e.g., hydrolyzed labeled TaqMan probe fragments 22 ). High-throughput DNA sequencing-by-synthesis is likely to be particularly vulnerable to sequence-dependent fluorescence because all short nucleobase sequences will be repeatedly encountered, and detection failures (deletion errors) from sequences highly unfavorable to fluorescence would be systematic and therefore not easily detectable with resequencing. Furthermore, the optical systems of sequencers need to balance dynamic range of detection with throughput, making their throughput sensitive to dyes with significant variations in fluorescence. 23 Even though our fluorescence data are obtained on microarrays, most genomics microarray data is fairly insensitive to sequence-dependent fluorescence because the labeling is typically based on reverse transcription using labeled random primers or other quasi-random methods. 24 Nevertheless, gene-specific fluorescence intensity effects, due to differences in the relative abundance of nucleobases in particular genes, have been detected. 25 Since both Cy3-and Cy5-labeled single-and double-stranded oligonucleotides are commonly used, we present here comprehensive results for double-stranded DNA to complement and strengthen previous results for Cy3 and Cy5 5′labeled single-stranded DNA. 19 Two types of sequencedependent dye−dsDNA interactions, as illustrated in Figure  1, have been measured: relative intensity of the dyes at the 5′ end of each of the 1024 possible double-stranded DNA 5mers ( Figure 1B), and relative intensity of the dyes bound to the 5′ end of a fixed-sequence double helix, but with a variable 5mer sequence 3′ overhang adjacent to the dyes ( Figure 1C). The sequence-dependent contribution of the overhang is relevant since in many experimental contexts, such as PCR and FISH, a short 5′-labeled oligonucleotide is used to quantify the presence of much longer DNA or RNA molecules. Detailed data on the sequence-dependent fluorescence of cyanine dyes on singlestranded DNA ( Figure 1A) has been previously reported for Cy3, Cy5, Dy547, and Dy647; 19,26 this ssDNA data showed that over the range of all possible 5mers, the intensity of Cy3 varied by about a factor of 2, and in the case of Cy5, by a factor of about 3. There was also a clear pattern to the data: the fluorescence follows, to a good approximation, the cumulative distribution function of a normal distribution, with purine-rich sequences resulting in high intensities and pyrimidine-rich sequences resulting in low intensities. In addition, 5′ guanines promote higher fluorescence much more so than 5′ adenosines, and 5′ cytosines result in much lower fluorescence in comparison with 5′ thymidines. Here we will show that broadly similar trends also hold true for double-stranded DNA.

■ RESULTS AND DISCUSSION
The results for the sequence-dependent fluorescence of cyanine dyes have been highly consistent, with the adjacent purine bases promoting fluorescence relative to pyrimidine bases in singlestranded DNA, 19,26 and with the results presented here in double-stranded DNA. In addition, for both ssDNA and dsDNA, a guanine immediately adjacent to the dye consistently results in the highest fluorescence, but in the more distal positions, adenine, rather than guanine, typically results in higher fluorescence. Of the pyrimidines, cytosine, rather than thymine, is most strongly associated with low fluorescence.
Cy3 and Cy5 dsDNA Interactions. Figure 2 summarizes the results for both the 5′ Cy3 and Cy5 terminal labeling experiments on dsDNA. These data correspond to the case where the random linker is used and the permuted nucleobases form a double strand (Scheme 1A). Here, the dye interactions with the single-stranded segment are present, but the data will   Figure  1B). (A) Relative fluorescence intensity of Cy3 and Cy5 end-labeled 5mers, ranked from most to least intense. The intensity falls by 55% for Cy3 and almost 70% for Cy5. The horizontal lines show the fluorescence intensity of single-stranded reference sequences on the same arrays. Fluorescence intensity consensus sequences of all 1024 dsDNA 5mers 5′-end-labeled using (B) Cy3 and (C) Cy5. The fluorescent range was equally divided into eight bins of equal intensity ranges, and the consensus sequence for all the 5mers is plotted for each such octile. reflect the average over all possible sequences. As was the case with the data from Cy3 and Cy5 labeled ssDNA, 19 the overall range of florescence intensity is about a factor of 2 for Cy3 and a factor of 3 for Cy5 ( Figure 2A). In order to be able to compare the fluorescence intensity data for dsDNA with ssDNA, the array design included reference ssDNA sequences. These sequences have a very similar design, but with bases rearranged to prevent hybridization. Figure 2A shows that both Cy3 and Cy5 on dsDNA have a somewhat extended range of fluorescence intensity in comparison to Cy3 and Cy5 on ssDNA (horizontal lines). Most of the additional range of intensity is on the lower edge of intensity, i.e., the sequences resulting in the highest fluorescence result in similar intensity for both ssDNA and dsDNA.

Bioconjugate Chemistry
The fluorescence intensity of intensity of most, or perhaps all, dyes is dependent on the nucleobase environment. In many cases the mechanism is a photoinduced charge transfer between the bases and the dye (fluorescein, 27 coumarin, 28 rhodamine, 29 and pyrene 30 ), in which case the quenching efficiency is determined by proximity and base redox potential, dG < dA < dC < dT, when the bases are reduced, or the reverse order when oxidized. 28 Ethydium bromide, another well-known dsDNA fluorescence label, undergoes quenching via proton transfer to the solvent; intercalation enhances fluorescence by reducing solvent exposure. 31 In the case of the cyanine dyes, however, charge transfer is not thermodynamically favored. 32,33 Instead, the intensity of cyanine dyes conjugated with DNA is attributed to the modulation of the rotational isomerization barrier in the excited state. 12−14 NMR data indicate that Cy3 and Cy5, 5′-linked to dsDNA, are positioned at the end of the double helix similarly in a capping configuration, in a manner similar to that of a base pair. 17,18 This arrangement should restrict the rate of cis−trans isomerization of the dyes, increasing fluorescence relative to the free dye. However, relative to the same dyes bound to the end of ssDNA, differences in the rate of isomerization are less clear since the dyes stack with the terminal base in both cases.
Simulations and experiments indicate that the quantum yield of Cy3 is higher on ssDNA vs dsDNA, and that on dsDNA the strength of the stacking interaction depends on the identity of the terminal base pair. 15,34,35 Our experiments indicate that the fluorescence of Cy3 and Cy5 is somewhat greater on dsDNA; however, the differences between our results and previously published results, 15 which show a 2-fold greater fluorescence of Cy3 on ssDNA, may be due to the particular choice of cyanine dye. In particular, we conjugate with DNA using the Cy3 and Cy5 phosphoramidites, rather than the sulfonated versions of these dyes, used by Sandborn et al., 15 and which are more commonly used for protein labeling. The sulfonates increase the hydrophilicity of the dyes, which could affect the strength of the stacking interactions with the nucleobases. We have previously measured the intensity of sulfonated Cy3 and Cy5 on DNA, and found a very strong pattern of sequence-specific fluorescence distinct from that of the unsulfonated dyes. 19 In order to visualize the relationship between the nucleobase sequence and the fluorescence intensity, the consensus sequences for each octant of intensity are plotted in Figure  2B and C for Cy3 and Cy5, respectively. These data are quite similar to those obtained with the same dyes on ssDNA. 19 The most apparent differences in the dsDNA data are that cytosine is less prominent in the weakly fluorescent sequences, and that cytosine is more prominent in the distal positions of the strongly fluorescent sequences, particularly for Cy5. If, as previous studies have indicated, the fluorescence intensity of cyanine dyes is greater on ssDNA, there might be bias in the consensus toward adenine-and thymine-rich sequences, which will tend to destabilize the double helix near the dyes, resulting in a higher locally single-stranded ("frayed" ends) population of DNA. In relationship to our previous data of Cy3 and Cy5 on ssDNA, this trend is not apparent. In the dsDNA data ( Figure  2), the melting temperature of the consensus sequences for the most fluorescent intensity octants are higher than those in the equivalent octants in the ssDNA data for both Cy3 and Cy5 due to the increased population of cytosines.
Cy3 and Cy5 Overhang Interactions. In the results described above, the dyes must also be interacting with the immediately adjacent ssDNA overhang segment as illustrated in Figure 1C and Scheme 1B. In order to estimate how this ssDNA modulates the fluorescence, the random nucleobase linker was replaced with segments representing all possible 5mers. To avoid having too many overall permutations, only two dsDNA sequences were used, one associated with strong fluorescence (GAAAA) and one with weak fluorescence (CGTGG). About 10 replicates of each of the 2048 resulting sequences fit on a single microarray, allowing accurate relative intensity comparisons between sequences. In the dsDNA data shown in Figure 2, the sequence GAAAA resulted in the 33rd and 100th brightest fluorescence for Cy3 and Cy5, respectively. The sequence CGTGG resulted in the 1008th and 898th brightest fluorescence for Cy3 and Cy5, respectively. The results from the overhang experiment, using Cy3 as the dye, are shown in Figure 3. In Figure 3A, the intensity of each sequence has been normalized to that of the most intense sequence, which, as expected, belongs to the Cy3-dsGAAAA set. Most of the sequences with Cy3-dsCGTGG are darker than any of those with GAAAA. Figure 3A clearly shows that the intensity of the dye is similarly determined by both the dsDNA segment and the adjacent ssDNA segment since the intensity difference between the two curves is similar to the range in intensities within each curve.
The relationship between the nucleobase sequence of the permuted overhang and the fluorescence intensity is shown using consensus logos in Figure 3B and C, for Cy3-dsGAAAA and Cy3-dsCGTGG, respectively. The consensus sequences show a similar pattern to those of the ssDNA data and the dsDNA data with the random overhang; the most fluorescent signal results from sequences with high purine content and the least florescence signal results from sequences with high pyrimidine content, particularly cytosine. Two additional trends are clearly visible in the consensus sequence data. First, the Scheme 1. Sequence Design for the 5′-Dye Self-Hybridizing DNA Strands a a Sequence (A) is used to measure the interaction of the dyes with dsDNA and sequence (B) is used to measure the interactions of the dyes with the ssDNA overhang of dsDNA.

Bioconjugate Chemistry
Article information content (bits) for each position is typically lower than that for the data with the random overhang. This is because in the present case, there is no single dominant base at any position, e.g., both purines are approximately equally probable in the most florescent sequences. This trend can also be anticipated by the shape of the intensity curves in Figure 3A, which, spanning a lower range of intensity in comparison to that in Figure 2 for the same number of permuted sequences, indicate a reduced sequence dependence of fluorescence. Second, the more distal bases are more prominent in the consensus sequences, which suggests that the dye is interacting more strongly with these more distal bases. One possibility is that the presence of the dye on the terminus of the doublestranded segment may tend to displace the more proximal overhang bases to conformations where they cannot affect the cis−trans isomerization rate. This is consistent with NMR data indicating that Cy3 occupies much of the available stacking space at the end of dsDNA. 18 Data for Cy5 on double-stranded DNA with a permuted overhang is shown in Figure 4. These data were collected using the same methods and the same microarray design, only using Cy5 instead of Cy3. As with Cy3, the intensity difference between the two curves in Figure 4A is similar to the range in intensities within each curve, clearly showing that the intensity of Cy5 is similarly determined by both the dsDNA segment and the adjacent ssDNA overhang segment. Unlike in the case of Cy3, all of the Cy5-dsCGTGG sequences are darker that the darkest of the Cy5-dsGAAAA sequences. The specific sequence Cy5-dsGAAAA in the random linker data set resulted in an intensity of 0.8 relative to that of Cy5-dsGAACC, the most intense suggesting that the gap between the curves in Figure 4A could be significantly increased by using GAACC as the fixed double-stranded sequence. Although the two curves in Figure  4A appear to have different shapes, this is due only to the large fluorescence intensity difference between them. Independently normalizing the Cy5-dsCGTGG data would cause it to overlap very closely with the Cy5-dsGAAAA data, indicating that both double-stranded sequences modulate the interaction of the dye with the overhang bases to a similar extent.   Figure 1C). The dsDNA strand to which the Cy5 is attached has one of two sequences: GAAAA (bright) or CGTGG (dark). (A) Relative fluorescence of Cy5-GAAAA and Cy5-CGTGG, ranked from most to least intense over the range of all ssDNA 3′ overhang 5mers. The intensity falls by ∼40% for both Cy5-GAAAA and Cy5-CGTGG. Fluorescence intensity consensus sequences of all 1024 5mers on the 3′-overhang of (B) Cy5-dsGAAAA and (C) Cy5-dsCGTGG. The fluorescent was equally divided into eight bins of equal intensity ranges. The consensus sequence is plotted for each bin.

Article
The relationship between the nucleobase sequence of the permuted overhang and the fluorescence intensity is shown using consensus logos in Figure 4B for Cy5-dsGAAAA and in Figure 4C for Cy5-dsCGTGG. Like in the case of Cy3, the highest fluorescence is strongly associated with purines while the lowest fluorescence is strongly associated with pyrimidines. Between the purines, guanine is clearly more relevant than adenine in promoting fluorescence. Cytosine is also much more common than thymine in the sequences associated with low fluorescence. As a result of the dominance of these two bases, the information content of the consensus sequences is higher in the case of Cy5. The trend observed for Cy3, that the dye interacts more strongly with more distal bases, is also the case with Cy5.
For both Cy3 and Cy5, sequences resulting in the lowest intensity among the dye-dsCGTGG subset have intensities similar to the darkest from the data sets with the random overhang in Figure 2. Since the use of a random nucleobase linker should be equivalent to averaging over all linker base permutations, the expectation was that the minimum fluorescence measured in the permuted overhang experiments would be significantly lower than those measured using random overhang. One possibility is that the range over which the fluorescence intensity of Cy-dyes can be modulated via interactions with DNA is restricted. This seems reasonable since the total range over which the fluorescence quantum yield of Cy3 can be lowered by restricting the rate of cis−trans isomerization is about a factor of 8 at room temperature, and Cy3 on DNA appears to be limited to the lower half of this range. 15 Nevertheless, some additional range of fluorescence intensity could likely be measured in permuted sequences longer than 5mers. In most of the consensus sequences in Figures 2, 3, and 4, there is information content in the fifth base, the most distal; indicating that this base also participates in modulating the intensity, so a sixth or seventh base is also likely to contribute to the modulation of fluorescence. Another perspective in this regard is that the shapes of the curves in Figures 2A, 3A, and 4A can be interpreted as cumulative distribution functions where the variable is the normalized relative fluorescence. To a good approximation, the fluorescence intensities of Cy3 and Cy5 on random DNA sequences have probability mass functions approximating those of binomial distributions, where the two results are purine or pyrimidine. 19 Most random 5mer sequences will contain a mix of purines and pyrimidines, which will result in intermediate fluorescence in the central region of the distribution. A few sequences will contain mostly or exclusively purines or pyrimidines, resulting in, respectively, fluorescence at the high and low tails of the intensity distribution. Increasing the permuted sequence length (Bernoulli trials) should result in a few sequences in the tails of the distribution that extend the range of fluorescence.
These results are consistent with previous experiments on the fluorescence of Cy3 and Cy5, which have also shown similar patterns of nucleobase dependency. Studies on the interactions of Cy3 with nucleoside monophosphate solutions have found a pattern of nucleobase-specific enhancement of fluorescence, dG > dA > dT > dC > no DNA. 36 Experiments on an intercalating cyanine dye derived from thiazole orange demonstrated a strong association of fluorescence with purine DNA homopolymers but not with pyrimidine homopolymers; the resulting fluorescence relative intensities followed the pattern dG > dA ≫ dC > dT > no DNA (100, 39, 2.3, 1.8, and 0.5, respectively). 37 Computer simulations in this study also indicated that the dyes associate poorly with poly(dC) and poly(dT), while binding strongly to poly(dG) and poly(dA). All these results fit well with the model that π−π interactions between cyanine dyes and nucleobases decrease the cis−trans isomerization rate. Purines, with a more extensive π system, are more effective than pyrimidines. The extent of the π system follows the order dG(14) > dA(12) > dT(10) = dC(10) in terms of number of π electrons, and the order dG(153 Å 2 ) > dA(142 Å 2 ) = dT(142 Å 2 ) > dC(127 Å 2 ) in terms of surface area. 38 These results apply directly to the 5′ nucleobase in our terminal labeling experiments since this is the base that is directly adjacent to the dye. We consistently observe, for both single-and double-stranded data, that cyanine dye fluorescence follows the same trend, dG > dA > dT > dC, indicating that the terminal base directly affects rotational isomerization. The data also consistently shows that adjacent nonterminal bases modulate dye fluorescence, with a distance-dependent influence, indicating that sequence-dependent rigidity of the single-or double-stranded DNA also contributes to the observed fluorescence of Cy3 and Cy5. We hypothesize that the ability of the terminal base to hinder the rotational isomerization of the dye increases when it is part of a more rigid sequence of bases. The flexibility of DNA, particularly dsDNA, is of ongoing interest due to its role in packing and in the formation of protein−DNA complexes. 39 Many available degrees of freedom of the bases contribute to DNA rigidity or flexibility, not all of which may be relevant to restricting the isomerization of the terminal dye; nevertheless, multiple experimental approaches indicate that purine stacks are more rigid than pyrimidine stacks in ssDNA. 40,41 A similar pattern is observed in dsDNA, also related to differences in base stacking area, dG (139 Å 2 ) > dA (128 Å 2 ) > dC (102 Å 2 ) > dT (95 Å 2 ), and stacking free energy, dA ≫ dG > dT ≈ dC (2.0, 1.3, 1.1, and 1.0 kcal·mol −1 ), for B-form geometry, based on melting temperature changes. 38 Other experiments based on 5′ dangling DNA hairpins and 3′ RNA unpaired nucleotides give similar stability results: A ≈ G > T/U > C. 42,43 Sequence specificity of the flexibility of di-and tetramers, 44,45 obtained from crystal structures and molecular dynamics simulations, appear to be less relevant in this case because they treat paired bases symmetrically and as a single rigid unit, such that, e.g., the deformability of AA(TT) = TT(AA). While this treatment is relevant to the ability of dsDNA to bend, the hydrogen bonding between Watson−Crick pairs does not contribute to duplex stabilization; instead, duplex stability is mainly determined by base-stacking interactions. 46 This suggests that, at short length scales, the relevant modes of DNA dynamics are largely decupled from the complementary strand and interact with the cyanine dyes by restricting the available torsional volume and by changing high-frequency coordinates of the potential energy surface of the excited state. 47 Our experiments are based the two cyanine dyes commonly used for DNA labeling, but sulfonated variants of Cy3 and Cy5 appear to interact differently with nucleobases. 15,19 The sulfonates increase water solubility, but may modify the stacking interaction with DNA bases; stacking stability is dominated by hydrophobic effects with contributions from dispersion and electrostatic forces, 38 all of which are likely to be affected by the charges on the sulfonates.

■ CONCLUSION
With the data presented here, we have sought to clarify and quantify the impact of sequence-dependent fluorescence of Cy3 and Cy5 tethered to double-stranded DNA. The results are consistent with previous results of Cy3 and Cy5 and similar cyanide dyes tethered to single-stranded DNA. 19,26 The results are also consistent with measurements of the fluorescence yield of Cy3 in solution with each of the DNA nucleoside monophosphates, which also follows the pattern G > A > T > C. 36 The preponderance of evidence supports the hypothesis that stronger cyanine dye−nucleobase stacking interactions of the purines relative to the pyrimidines restrict the cis−trans isomerization rate of these dyes, enhancing fluorescence. The results can be used in the planning and analysis of experiments based on the labeling of DNA (and probably RNA) with cyanine dyes. For example, TaqMan or molecular beacon PCR probes and FISH probes using cyanine dye reporters can be designed with one or more guanines or adenines immediately adjacent to the dye for increased signal. The sequence for the latter two of these probes can also be adjusted so that the reporter dye is adjacent to a purine-rich segment of the target upon hybridization. In the case of next-generation sequencingby-synthesis, where high throughput relies on maintaining the low end of the dynamic range near the noise threshold, 48,49 the data analysis pipeline can take into account the effect on measured fluorescence of adjacent nucleobases when determining the probability of a correct nucleobase assignment.

■ EXPERIMENTAL PROCEDURES
Microarray Synthesis. Glass slides (Schott Nexterion D, cleanroom-cleaned) were functionalized with N-(3-triethoxysilylpropyl)-4-hydroxybutyramide (Gelest SIT8189.5). The slides were loaded in a stainless steel rack, placed in a plastic container, and covered with 0.5 L of a solution consisting of 10 g of the silane in a 95:5 (v/v) ethanol:water plus 1 mL acetic acid. The slides were gently agitated for 4 h at room temperature and then washed twice for 20 min each with the above solution without the silane. The slides were drained, blown dry with argon, and cured in a preheated vacuum oven (120°C) overnight and stored in a desiccator cabinet.
For the synthesis of terminally labeled oligonucleotides on microarrays we used the technique of maskless array synthesis (MAS). 50,51 MAS was developed for in situ synthesis of highdensity DNA microarrays and consists of an optical system and a chemical delivery system. The optical system consists of a digital micromirror device (DMD), an array of individually tiltable mirrors, which direct ultraviolet light from a mercury lamp to the corresponding feature on the microarray via 1:1 imaging optics. Microarray layout and oligonucleotide sequences are determined by selective removal of the photocleavable protecting groups on the phosphoramidites at the 5′ termini of the oligonucleotides.
A computer synchronizes the light exposures pattern with solvent reagent delivery to the synthesis surface. The chemical system consists of a slightly modified Perspective Biosystems Expedite 8909 synthesizer. Oligonucleotide synthesis chemistry is similar to that used in conventional solid-phase synthesis. The standard acid-labile 5′-OH protecting group of the phosphoramidites is replaced with the photocleavable nitrophenylpropyloxycarbonyl (NPPOC) group. 52 Upon absorption of light near 365 nm, the NPPOC group comes off, leaving a free hydroxyl group that is able to react with an activated phosphoramidite in the next coupling cycle. An exposure solvent consisting of 1% (m/v) imidazole in DMSO is needed during ultraviolet exposure to promote the cleavage of the NPPOC group. 51 The coupling reactions were performed with 30 mM NPPOC phosphoramidite monomers and 0.25 M dicyanoimidazole (both from SAFC) for 60 s. In the case of the Cy3 and Cy5 phosphoramidites (GE Healthcare 28−9172−98 and Glen Research 10−5915−95), Figure 5, the coupling reaction time was extended to 10 min at a monomer concentration of 15 mM. Acetylation with a 1:1 mix of tertbutylphenoxyacetyl acetic anhydride in tetrahydrofuran (Cap A) and 10% N-methylimidazole in tetrahydrofuran/pyridine (8:1) (Cap B) after each coupling reaction was used to ensure that only correctly synthesized sequences receive the fluorescent label.
After microarray synthesis the substrate was vigorously washed for 2 h with acetonitrile in a 50 mL Falcon tube to remove uncoupled Cy3 or Cy5 phosphoramidites, which tend to adhere nonspecifically to the glass surface. The base and phosphate protecting groups were removed by immersing the glass slide into 1:1 (v/v) ethylenediamine in ethanol for 2 h at room temperature. Following deprotection, the microarrays were washed twice with distilled water and dried with argon.
Microarray Design. In principle, the resolution of the digital micromirror device, 768 × 1024, allows for simultaneous measurement of all possible n-mers up to n = 9 (262 144), but in these experiments, only permutations of 5mers were included in order to include multiple replicates and to dedicate more microarray surface area to each sequence and therefore to achieve a good signal-to-noise ratio. The 1024 sequences were laid out in a 25 in 36 pattern, that is, each "feature" (contiguous area were a single sequence is synthesized) on the microarray corresponded to a 5 by 5 block of mirrors surrounded by a onemirror-sized margin where no DNA was synthesized. Each of the 1024 single-sequence features was replicated 20 times on each microarray in the case of the double-stranded experiments ( Figure 1B), and 10 times in the case of the double-stranded DNA with single-stranded overhang experiments ( Figure 1C).
Double-Stranded DNA Annealing. To promote hairpinloop formation and self-hybridization, after deprotection the array was incubated in 40 mL PBS buffer (0.65 M Na + , pH 7.4) starting at 50°C and cooled to room temperature over 30 min. Then it was washed with final wash buffer for a few seconds and dried with a microarray centrifuge. Successful hairpin loop formation was then verified by hybridization of a Cy3-labeled oligonucleotide (5′-Cy3-GGC GGC GGG TTC A-3′) to two unlabeled complementary sequences on the array: (1) a sequence (TGA ACC CGC CGC CGT CCA TCCT TGG ACG GCG GCG GGT TCA) that self-hybridized via hairpinloop formation in the previous step and is therefore blocked from hybridization with the added oligonucleotide, and (2) a

Bioconjugate Chemistry
Article sequence (TGA ACC CGC CGC C) that cannot self-hybridize but is fully complementarity with the added labeled sequence.
Sequence Design. Three principle considerations were applied to the sequence design: (1) The double-stranded sequences should all have equal melting temperatures since they must all form duplexes equally under the single hybridization condition of the microarray, (2) the melting temperature should be relatively high in order to ensure stable duplex formation, and (3) the surface density of labeled oligonucleotides should be constant for all experimental oligonucleotides on the microarray so that fluorescence intensity differences between them can be attributed to sequence-dependent effects. To meet these design principles the double-stranded oligonucleotides have the design illustrated in Scheme 1.
The sequences contain self-complementary segments to allow for duplex formation. The central TCCT sequence is known to bend easily to promote hairpin loop formation. 53 The N i represents the 5mer experimental nucleobases that base pair with the complementary N ic . On the 3′ side of the N i is the fixed sequence CCGCCGCC which hybridizes with the GGCGGCGG sequence on the opposite side of the hairpin. This GC-rich stretch is used to increase the melting temperature. The P 1 P 2 P 3 P 4 P 5 sequence is derived from the experimental 5mer sequence N 1 N 2 N 3 N 4 N 5 using nonidentity, noncomplementarity logic: for all i, if N i = dA then P i = dC; or if N i = dC then P i = dT; or if N i = dG then P i = dA; or if N i = dT then P i = dG. These strands hybridize with their complementary sequences P 5c P 4c P 3c P 2c P 1c . The P i and P ic sequences have a double function: (1) they equilibrate the base composition in order to ensure equal number density of all experimental sequences on the array, and (2) they increase and homogenize the melting temperatures (to T m = 63°C, salt adjusted, 50 mM Na + ) by giving all the complementary DNA sequences on the array exactly five of each nucleobases (plus the fixed GC sequences) while retaining self-complementarity. The sequences are separated from the glass substrate with a random linker 10mer sequence synthesized from an equimolar mix of the four DNA phosphoramidites. The random linker replaces the traditional poly(dT), and linker to avoid the potential bias of any particular interaction of the dye and a dT homopolymer. An alternative perspective is that the dye will interact with both the double-stranded and single-stranded segments, but the interaction with the single-stranded segment will be the average of all possible sequences. In the second set of experiments, the single-stranded sequence is permuted. The results of both data sets can be used to estimate the relative contributions, to dye intensity variation, of the single-vs double-stranded segments.
With these rules, all of the sequences (excluding the linker) have exactly 5 adenosines, 15 cytidines, 13 guanosines, and 7 thymidines. Since the coupling efficiency of each of the four DNA phosphoramidites can be different and can vary with time and by batch, equal numbers of each base in each of the sequences assures equal representation of the experimental oligonucleotides. This sequence design, in conjunction with acetic anhydride capping after the coupling reactions, ensures equal density and melting temperature and that only accurately synthesized sequences receive the final coupling with the Cy3 or Cy5 phosphoramidite. An alternative approach, to use simpler sequences and then adjust the data for the measured coupling efficiencies, is less reliable since the coupling efficiencies of the phosphoramidites used in maskless array synthesis are measured with fluorescent dye terminal labeling experiments, 54−57 which limits their accuracy due to the sequence-dependent fluorescence intensity of single-stranded DNA. 19 The second set of experiments, with the dyes attached to fixed-sequence double-stranded DNA and a variable singlestranded overhang, has a similar design (Scheme 1B). Here, the permuted overhang sequence N 1 N 2 N 3 N 4 N 5 is added at the 3′ end to put it adjacent to the 5′ fluorescent label. The F i and F ic are complementary but are no longer permuted; N 1 N 2 N 3 N 4 N 5 is either GAAAA or CGTGG. GAAAA and CGTGG were chosen from the initial double-stranded experiments as sequences resulting in high and low fluorescence intensity, respectively, for both Cy3 and Cy5.
In order to allow direct comparisons between the relative fluorescence intensities of the dyes on single-vs doublestranded DNA, each dsDNA microarray design included sequences that cannot self-hybridize to form dsDNA, but have a very similar overall sequence design and base composition. Since most of the microarray features were needed for the dsDNA permutations, only a sampling of 57 labeled ssDNA permutations was included. These sequences were chosen to be representative of the range of expected fluorescence intensities for ssDNA found in previous experiments. 19 To prevent the self-hybridization of these sequences, the N 5c N 4c N 3c N 2c N 1c segment was inverted to N 1c N 2c N 3c N 4c N 5c , the P 5c P 4c P 3c P 2c P 1c segment was inverted to P 1c P 2c P 3c P 4c P 5c , palindromic N i 5mers were avoided, and the segment GGCGGCGG was reordered to GCGGCGGG.
Data Extraction and Analysis. Fluorescent images of the microarrays were obtained using a GenePix 4100A scanner with resolution of 5 μM and with PMT voltages set to give similar intensity ranges for both Cy3 and Cy5, and no saturated pixels, 350 and 450 V, respectively. Dye fluorescence was excited using 532 and 635 nm solid-state lasers for Cy3 and Cy5, respectively. Fluorescence was collected through 550−600 nm and 655−695 nm bandpass filters for Cy3 and Cy5, respectively. Fluorescence was collected using a 0.68 NA objective lens with a focal length of 3.1 mm. Microarray scanners are designed to provide intensity values that are highly consistent across the scanned surface. This allows highly reliable relative fluorescence comparisons between microarray features. The presence of the microarray surface, a lossless glass−air dielectric interface, close to the fluorophores does not influence the relative emission intensity or wavelength. 58 In addition to the high throughput available with microarray experiments, a significant advantage is that the density of fluorescence groups can be closely controlled to avoid the aggregation-induced quenching artifacts that can occur in solution experiments with hydrophobic dyes such as Cy3 and Cy5.
The fluorescence intensity data was extracted from the scan image with NimbleScan v 2.1 software from NimbleGen and further processed in Excel. For each microarray, fluorescence intensity values were calculated as the average of the replicates of each sequence, which were randomly located on each microarray. For the double-stranded experiment, there were 20 sequence replicates per array. For the overhang experiment there were 10 replicates per array because of the inclusion of 2 experimental sets, one with double-stranded sequence which strongly promotes fluorescence (dye-GAAAA) and one with a double-stranded sequence resulting in weak fluorescence (dye-CGTGG). Error was calculated as the standard error of the Bioconjugate Chemistry Article mean. The consensus sequence figures were generated by ranking the 1024 sequences by fluorescence intensity and then dividing the sequences into 8 bins spanning equal ranges of intensity. Consensus logos for the sequences in each of these octiles of fluorescence intensity were generated using Weblogo (http://weblogo.berkeley.edu/). 59 Each of the 8 consensus sequence logos per fluorescent label represents 1/8 of the intensity range and are arranged together left to right in order of decreasing intensity to compactly depict the relationship between sequence and fluorescence for the entire data set. The relative fluorescence intensity data for all the experimental sequences are available as Supporting Information.