Self-Sorting Governed by Chelate Cooperativity

Self-sorting phenomena are the basis of manifold relevant (bio)chemical processes where a set of molecules is able to interact with no interference from other sets and are ruled by a number of codes that are programmed in molecular structures. In this work, we study, the relevance of chelate cooperativity as a code for achieving high self-sorting fidelities. In particular, we establish qualitative and quantitative relationships between the cooperativity of a cyclic system and the self-sorting fidelity when combined with other molecules that share identical geometry and/or binding interactions. We demonstrate that only systems displaying sufficiently strong chelate cooperativity can achieve quantitative narcissistic self-sorting fidelities either by dictating the distribution of cyclic species in complex mixtures or by ruling the competition between the intra- and intermolecular versions of a noncovalent interaction.


General procedure for the Sonogashira cross-coupling reaction
A dry THF, DMF, THF/NEt3 or DMF/NEt3 (4:1) mixture was subjected to deoxygenation by three freezepump-thaw cycles with argon. It was then poured over a round-bottom flask containing the corresponding amount of the compound bearing the ethynyl group, the right proportion of halogenated species, Pd(PPh3)2Cl2 (0.02 eq.) and CuI (0.01 eq.). The resulting mixture was stirred under argon atmosphere at the corresponding temperature for each case. Once completed, the mixture was filtrated over a celite plug and the solvent was evaporated under reduced pressure. The residue was purified by silica gel column chromatography using the respective eluent to give the desired products. Any slight modification of this procedure will be remarked in each case. diG. Following the general procedure for the Sonogashira cross-coupling reaction described above, this compound was prepared from d-Br 2 (1.0 eq., 31 mg, 0.071 mmol), iG (1.0 eq., 33 mg, 0.071 mmol), NEt3 (1.2 eq., 12 µL, 0.085 mmol) and DMF as solvent (3 mL). The reaction mixture was stirred at 70ºC overnight.

Synthesis of Ga2C and iGdiC
Scheme S0C. Synthesis of the dye-labelled Ga2C and iGdiC dinucleosides via two consecutive Sonogashira reactions between the central block and the corresponding nucleobase derivatives.

S1. Building Speciation Profiles
Speciation curves were generated using the Hyss (Hyperquad Simulation and Speciation) software, version 4.0.31, developed by http://www.hyperquad.co.uk/index.htm. Simulations were built considering the following equilibrium constants (K; see Figure S1A) and effective molarity (EM) values: Figure S1A. Dimerization (red-shadowed area) and association constants (in CHCl3) between nucleobases used in the Hyss simulations. The triply H-bonded Watson-Crick and reverse Watson-Crick are shown within blue-and greenshadowed areas, respectively. Please note that the A:U pair may also bind through reverse Watson-Crick interactions.  Figure S1A below), an arbitrary association constant of 10 2 M -1 in CHCl3 (which is probably a higher limit) or in CHCl3/CCl4 2:3 was used.
3) The same association constants between the nucleobases were employed in the dinucleosides. In  At the right or at the bottom, the different graphs in each row or column, respectively, are overlaid. It is clear that narcissistic self-sorting is complete over a wider range of concentrations only when chelate cooperativity is sufficiently high, which can be achieved by increasing either EM or K. Otherwise, the cyclic assemblies are in equilibrium with non-sorted linear oligomers and, at low concentrations, with the unbound monomers.
Figure S1B. Self-sorting of a mixture of dinucleosides. Speciation curves showing the distribution of species in a 1:1 hypothetical mixture of two dinucleoside monomers (called here M 1 and M 2 ) with the same supramolecular features as the experimentally studied GC+iGiC system, as a function of the association constant (K; horizontal direction) and/or the hypothetical effective molarity of both the cM 1 4 and cM 2 4 macrocycles (EM; vertical direction). It is clear that virtually complete narcissistic self-sorting would be achieved when chelate cooperativity is strong, that is, at high K·EM values (top-right corner in Figure S1C). Even though, complete self-sorting can only be achieved within an intermediate concentration window. For instance, at K= 10 5 M -1 and EM = 10 3 M (top-right simulation) within the 10 -6 -10 -2 M concentration range, >95% of GC molecules are associated as cycles, while G and C establish an equilibrium between G:C associated and dissociated species. Lower concentrations obviously lead to dissociation of both bimolecular G:C complexes and cyclic assemblies. On the other hand, higher concentrations are against intramolecular associations and the trimolecular C:GC:G complex, integrating all species, start to compete. Such competition is more important as the EM of the cyclic system diminishes (see the evolution along the right column), so that, for instance, if EM would be 10 -2 M (bottom-right simulation), the c(GC)4 macrocycle could only reach a maximum of <75% relative abundance, and narcissistic self-sorting would be lost, even at relatively high K values. It is interesting to note, on the other hand, that the association constant K does not influence the relative abundance of C:GC:G, but instead dominates the relative abundance of dissociated vs associated species, either cyclic or non-cyclic (see the evolution along the top line).
If we now analyze the simulations shown in Figure S1C from the other corner (bottom-left), self-sorting is totally absent when both K and EM are low, and actually supramolecular association can only be achieved at relatively high concentrations. But again, even maintaining a low association strength, a decent degree of narcissistic self-sorting can be achieved if EM values are sufficiently high (see the evolution along the left column), so that the c(GC)4 macrocycle could reach 90% relative abundance at K= 10 2 M -1 and EM = 10 3 M (top-left simulation) close to a 10 -2 M concentration.

Mixtures
We started by examining the 1 H NMR spectra in 1:1 mixtures of complementary mononucleosides (G+C, A+U and iG+iC) at a fixed concentration (1.0·10 -2 M) and temperature (298 K) in CDCl3. As can be observed in Figure  These mononucleoside pairs were further combined in 1:1:1:1 mixtures (G+C+A+U and G+C+iG+iC). In the case of the G+C+A+U mixture, no significant changes in the 1 H NMR spectrum were detected Figure   S2Ab. Only for the G+C+iG+iC combination, a slight broadening and shift of some signals was observed.
2D NOESY experiments performed in the same conditions could confirm the proximity of the relevant Hbonded protons in the complementary pairs and provide an assessment whether they self-sort or not in their 1:1:1:1 mixtures. As shown at Figure S2Ac, the G+C+iG+iC mixture exhibit cross-peaks between all possible combinations of Watson Crick and reverse Watson-Crick pairs (G:C, iG:iC, G:iC and iG:C), but also between G and iG. To our surprise, the G+C+A+U mixture also displayed cross-peaks between all possible pairs (G:C, A:U, G:U, A:C, G:A and C:U). This may be due to the formation of non-complementary (or mismatched) pairs and/or to the association in higher-order species (trimolecular complexes. etc.), but in any case these results clearly show that no binding selectivity is observed in the quaternary mononucleoside mixtures and any kind of self-sorting phenomena is absent.

S2.2. 1 H NMR and NOESY Spectroscopy Measurements of Dinucleoside Mixtures
We then turned our attention to the behavior of 1:1 mixtures of dinucleosides in similar conditions. We first recorded the 1 H and NOESY NMR spectra of the individual dinucleosides ( Figure  were not detected. Therefore, in this particular case, narcissistic self-sorting is clearly not ruled by H-bonding complementarity, but by chelate cooperativity, that is, by the strong tendency of both dinucleoside molecules to form cyclic tetramers with high EMs. Only when GC and iGiC associate independently, each cyclic tetramer species can be assembled because a Watson-Crick 90° angle is required. S24 Unfortunately, we were not able to properly study the 1:1 mixtures of iGiC + AU (or the 1:1:1 ternary mixtures of the three dinucleosides GC + iGiC + AU) due to a combination of solubility and stability problems.
On one hand, a solvent of lower polarity than CDCl3 or low temperatures are needed to assemble cAU4 quantitatively. Figure S2Ca shows the downfield region of the 1 H NMR spectra of a 1:1 GC + AU mixture.
In 100% CDCl3, while cGC4 is formed quantitatively, the cAU4 assembly is in equilibrium with mixtures of short open oligomers (signal around 10.5 ppm; see our previous work). 4 In 100% CCl4, the sample containing these two dinucleosides was not totally dissolved and the spectrum is not well-resolved. As increasing amounts of CDCl3 were added to CCl4 while maintaining overall concentration, the solubility of both monomers is enhanced and cAU4 could be formed quantitatively, in the presence of cGC4, in a CDCl3/CCl4 (2:3) mixture. However, and unfortunately, iGiC revealed rather broad 1 H NMR spectra already in 100% CDCl3 (or CD2Cl2) and the samples are not exceptionally soluble. As shown in Figure S2Cb

S3.1. CD and Emission Spectroscopy Measurements of Mononucleoside Mixtures
While NMR experiments already provided a reasonably clear picture of the self-assembly of mixtures of mono-and dinucleosides, we complemented these studies with CD and emission spectroscopy experiments using donor and acceptor FRET pairs. 5 In these measurements, concentration was lowered to the 10 -4 -10 -6 M regime and toluene was used to increase binding strength between base pairs (particularly the weaker A:U pair). As we determined in previous studies, 2 association constants in this apolar solvent are increased in about one order of magnitude with respect to CHCl3 (over 10 5 M -1 for G:C/iG:iC and over 10 3 M -1 for the A:U pair).
So, CD and emission spectroscopy work in these conditions as complementary tools to study our selfassembling mixtures. First, CD reveals the existence of macrocycles since, as determined in all of our previous work, 1,4,5,12 the dinucleoside molecules reveal clear Cotton effects only upon cyclotetramerization, and the monomer, possible open oligomers or bimolecular Watson-Crick pairs are not CD-active. This is attributed to the fact that cyclization, in contrast to unbound molecules or open oligomers, fixes to a higher extent the conformation of the π-conjugated backbone. Since we have endowed our dinucleosides with chromophores that absorb in different regions of the spectral window, it should be easy to differentiate macrocycles made of any of these dyes by CD spectroscopy. Second, if mono-or dinucleosides that bear FRET-complementary donor and acceptor pairs (i.e. d and a1 or a1 and a2) are in close proximity because of intermolecular binding, an energy transfer process would be activated that will quench donor fluorescence emission and, most often, enhance acceptor emission.
We followed the same rationale as in the previous NMR experiments: the spectroscopic features of mononucleoside complementary pairs or of dinucleosides were examined first at a given concentration, and then the relevant mixtures were generated at that concentration and spectroscopic changes were monitored with time until the equilibrium was reached. Therefore, stock solutions of the mono-and dinucleosides were prepared and divided in two fractions: one of them was diluted to reach the desired concentration of the individual monomers whereas the remaining stock fractions were mixed. Thus, depending on whether they are employed in binary, ternary or quaternary mixtures, stock solutions were prepared doubling, tripling and quadrupling, respectively, the concentration at which the experiment will be carried out We again started examining the mononucleoside mixtures by emission spectroscopy, since they are not S27 Figure S3A shows an example of three of these combinations. For instance, Figure S3Aa,b display the emission spectra of the dG+dC+a1G+a1C and diG+diC+a1G+a1C mixtures, respectively, compared to the emission spectra of their parent solutions at the same concentration. In both cases, emission intensity in the donor area is noticeably quenched, which suggests that a FRET process becomes active in these mixtures. This is likely due to the formation of dG:a1C and dC:a1G pairs, in the first case, and diG:a1G and diC:a1C pairs in the second case, where FRET donor and acceptor dyes are interacting strongly. Self-sorting is therefore absent in these control mixtures due to the fact that we are not employing two pairs of self-

S3.2. CD and Emission Spectroscopy Measurements of Dinucleoside Mixtures
These results were then contrasted to the behavior of the 1:1 dinucleoside mixtures in the same conditions ( Figures S3B-E). We again first performed control experiments in which energy donor and acceptor couples were combined in monomers having the same base pairs (i.e. GdC+Ga1C and AdU+Aa1U) and recorded the spectroscopic changes experienced by the system as a function of time until chemical equilibrium is reached. At equilibrium, a statistical mixture of six different cyclic tetramers should be formed (see Figure S3B), since the central π-conjugated blocks have identical lengths and are end-capped with the same nucleobases. Donor and acceptor moieties are closely positioned in some of these macrocycles, thus allowing for resonance energy transfer to take place, which should be evidenced by a decrease of donor emission and, frequently, an increase in acceptor emission. It is interesting to note that, in contrast to what was seen with the mononucleosides, equilibrium is reached very slowly with these dinucleoside mixtures in toluene, within a timescale of several hours, which underlines the extraordinarily high kinetic stability of the cyclic assemblies. When performing the same experiments in CHCl3 or THF, equilibrium was instead reached within a few minutes. Figure S3Ba and   We then studied the scenario where the bases in the dye monomers are different. GdC+Aa1U and AdU+Ga1C mixtures where examined first ( Figure S3Ca-b). In sharp contrast to what was observed before, negligible changes were detected over a period of 24 hours in the emission or CD spectra when these dinucleoside combinations were mixed together at the 10 -4 -10 -6 M concentration range in toluene or CHCl3.
This indicated that a strong narcissistic self-sorting process takes place in solution, each dinucleoside interacting only with itself in the form of cyclic tetramers. The same results were found when combining iG-iC and A-U dinucleosides ( Figure S3Cc The question now arises whether a donor-acceptor iG-iC + G-C mixture, having nucleobase pairs that do not promote self-sorting, would self-sort as well in the corresponding cyclic tetramers, as NOESY NMR experiments demonstrated. These experiments had to be performed with compound iGiC as donor and in CHCl3 as solvent, due to the low solubility found in general for iG-iC dinucleosides, as noted above in the NMR measurements. Anyways, Figure S3Da shows that when iGiC and Ga1C dinucleosides are combined, their equilibrium mixture exhibits virtually the same spectroscopic features as the sum of the spectra when these samples are analyzed separately. This is also the case when iGiC and Ga2C are combined ( Figure   S3Db). This implies that iG-iC and G-C dinucleosides self-associate independently in their corresponding cyclic tetramers and no mixed assemblies, where G would bind to iC or iG to C, are formed.
In short, these experiments using optical spectroscopy and dyes that absorb and emit in different spectral regions also support the notion that narcissistic self-sorting is primarily governed by the strong chelate cooperativity manifested by each dinucleoside monomer when assembled as a cyclic tetramer. Finally, once the study of self-sorting processes of binary mixtures of dinucleosides was completed and understood, we proceeded with the analysis of more complex ternary mixtures of the G-C, A-U and iG-iC dinucleosides. For such goal, we wanted to employ the three chromophores: d, a1 and a2 that absorb and emit in different regions of the spectrum and that constitute two pairs of FRET couples. Due to its higher association strength and high solubility and reliability, we decided to install a2 in the G-C scaffold (Ga2C), while, for solubility reasons as mentioned above, iGdiC was substituted by iGiC, which actually presented very similar absorption and emission features, only slightly blue-shifted with respect to iGdiC. Hence the actual ternary mixture was iGiC + Aa1U + Ga2C, which, as clearly shown in Figure S3Ea, displayed virtually the same emission spectrum as the sum of the spectrum of the three components. This indicates, as demonstrated for the binary mixtures, that due to the strong self-sorting phenomena induced by the high chelate cooperativities of these systems, the three dinucleoside molecules can be mixed and each of them will associate independently in the corresponding cyclic tetramer. As a control experiment, d, a1 and a2 were mixed in dinucleosides with the same complementary nucleobases at the edges, namely GdC + Ga1C + Ga2C. As shown in Figure S3Eb, this ternary mixture exhibits substantial quenching of the GdC chromophore emission, weaker quenching of Ga1C emission, and significant emission enhancement of Ga2C, which strongly suggests that a mixture of all possible macrocycles is formed in solution where donors and acceptors are combined in the same assembly and FRET is activated.

S4. Selective Dissociation Studies
As stated in the main text, previous work performed in our group concluded that the thermodynamic stability of the cAU4 macrocycle is considerably lower than that of the cGC4 and ciGiC4 analogues due to both a weaker binding strength between complementary bases and a reduced chelate cooperativity stemming from the symmetric nature of the DAD-ADA H-bonding pattern. 4,5 We hence reasoned that gradually taking the binary or ternary systems to conditions where association is disfavored, either by a decrease in concentration, an increase in temperature or by addition of a polar cosolvent, would result in the selective and sequential dissociation of the cyclic tetramers as a function of their relative thermodynamic stability. In this section, we collect a number of experiments that demonstrate this idea through diverse spectroscopies.
For instance, in temperature-dependent experiments in CDCl3 within the 253-323 K range (see Figure   S4Aa), only the cAU4 macrocycle is dissociated at high temperatures, whereas cGC4 remains intact in the whole temperature range. This is clearly evidenced in the disappearance of the H-bonded U-imide and Aamine proton signals at 14.0 and 8.6 ppm, and the concomitant appearance of the solvent-bound U-imide proton signal at around 11-10 ppm. A very similar result was observed by changing solvent composition.
Addition of DMSO-D6 to (2:3) CDCl3:CCl4 solutions of GC+AU mixtures led to the observation of two clear regimes ( Figure S4Ab). In the first one, from 0 to 12% v/v of DMSO-D6, cAU4 is progressively dissociated in the presence of the stronger cGC4 macrocycle, which show no sign of denaturation. This is evidenced by the appearance of the AU monomer U-imide signal at ca. 11.8 ppm. In the second regime, starting over ca.
20% DMSO-D6, cGC4 is then dissociated to the monomeric species, showing a G-amide signal at 10.9 ppm.
It should be remarked that both cyclic tetramers are in slow exchange in the NMR timescale with their respective monomeric species, and that no other associated species is detected in these experiments, which highlights the extraordinarily strong cooperativity of the cyclotetramerization process. Figure Figure S4Bc, the CD spectrum of this mixture remains invariable in the studied concentration range due to the high thermodynamic stability of the c(iGiC)4 and c(Ga1C)4 assemblies. Finally, we studied the same iGiC + Aa1U + Ga2C ternary mixture as before (see Figure S3E and main text) by variable-temperature CD spectroscopy ( Figure S4Ca). As expected, only the weaker Aa1U cyclic tetramers is dissociated at high temperatures, while the other two iGiC and Ga2C macrocycles resist. In the control GdC + Ga1C + Ga2C mixture ( Figure S4Cb), however, the CD spectra remains invariable because all macrocycles present a similarly high stability and do not break under these conditions.

S5. Self-sorting in mixtures of mono-and dinucleosides.
We next examined if self-sorting occurred in a mixture of mononucleosides and dinucleosides that share the same Watson Crick H-bonding interaction. We selected two systems of very different cooperativity: c(AU)4 (KAU (CDCl3) ~ 3·10 2 M -1 ; EMAU ~ 10 -1 -10 -2 M) and c(GC)4 (KGC (CDCl3) ~ 3·10 4 M -1 ; EMGC ~10 2 -10 3 M), and combined them with 1:1 mixtures of the corresponding A+U and G+C mononucleosides.  Figure S5A, the cyclic tetramer is seen to disappear as increasing amounts of mononucleosides are added and mixed, non-sorted associated species, like U:AU, AU:A or U:AU:A, are formed, which coexist in fast exchange with other non-cyclic oligomers, the A:U pair, and dissociated A and U. The same applies to the mixture GC+G+C. The difference between the two dinucleosides is the amount of 1:1 mononucleoside mixture required to fully destroy the cyclic self-sorted assembly. This is lower than 3 equivalents for c(AU)4, whereas c(GC)4 can resist up to ca. 25 equivalents. This means that the intra-and intermolecular versions of the G:C Watson-Crick pair can indeed coexist in solution without much interference, giving rise to self-sorted assemblies, as long as the relative amount of competing mononucleoside mixture is not too high. Figure 5B shows the changes observed in the 1 H NMR spectra of a 1:1:1 AU+A+U and a 1:2:2 GC+G+C mixture as a function of temperature in CDCl3:CCl4 (2:3) and THF-D8, respectively. As it happens for the c(AU)4 macrocycle alone, [4] as the temperature increases the cycle is dissociated into short, non-cyclic (AU)n oligomers and AU monomer, which are in fast exchange between themselves, and in these conditions also with mixed species like U:AU, AU:A or U:AU:A, the A:U pair, and dissociated A and U. This is observed in a progressive intensity decay of the characteristic c(AU)4 1 H signals, obtained by integration and represented in the graph at the right side, at the expense of the non-sorted mixture of species. Also, as the temperature increases, the abundance of dissociated AU, A and U increases, and the signals corresponding to this fastexchanging mixture shift upfield. For the GC+G+C mixture, due to the much higher stability of the c(GC)4 macrocycle, its relative population remains constant even as the temperature is increased and it is only the G:C pair that is seen to dissociate, since the signals for this complex shift upfield with temperature. Hence, due to the sufficiently strong chelate cooperativity of c(GC)4, we can selectively break the intermolecular association without affecting the self-sorted, intramolecularly bound species. where k is the exchange rate constant, m is the mixing time, XA and XB are the molar fractions of molecules in states A and B, respectively, IAA and IBB are the diagonal peak intensities, and IAB and IBA are the crosspeak intensities. However, in the GC+G+C mixture no exchange cross-peaks could be detected even at the longest mixing times, which highlights the kinetic stability of the self-sorted c(GC)4 + G:C mixture.
Finally, Figure S5D displays the DOSY NMR spectra of the same 1:1:1 AU+A+U and 1:2:2 GC+G+C mixtures, where two sets of diffusing species in slow exchange are clearly seen: 1) The c(AU)4 and c(GC)4 macrocycles, which are larger in size and thus display smaller diffusion coefficients.
2) The mixture of fast-exchanging oligomers (for AU) or the G:C pair (for GC). Figure S5A. Titration of the dinucleoside, initially associated as cyclic tetramers, with increasing amounts of the corresponding 1:1 mixture of complementary mononucleosides. (a) AU with A + U in CDCl3:CCl4 (2:3); (b) GC with G + C in THF-D8. In both cases, the ca. 8-15 ppm region of the 1 H NMR spectra is shown, where the most relevant Hbonded proton signals are found. At the right, the abundance of dinucleoside molecules associated as cyclic tetramers is represented as a function of the equivalents of 1:1 mononucleoside mixture added. (a) A 1:1:1 mixture of AU + A + U in CDCl3:CCl4 (2:3); (b) a 1:2:2 mixture of GC + G + C in THF-D8. In both cases, the ca. 8-15 ppm region of the 1 H NMR spectra is shown, where the most relevant H-bonded proton signals are found. At the right, the abundance of dinucleoside molecules associated as cyclic tetramers is represented as a function of the temperature. Figure S5C. Exchange dynamics of the dinucleoside associated as cyclic tetramers and as open oligomers. NOESY NMR spectra at different mixing times (m) of (a) A 1:1:1 mixture of AU + A + U in CDCl3:CCl4 (2:3); (b) a 1:2:2 mixture of GC + G + C in THF-D8. In both cases, the ca. 11-15 ppm region of the 1 H NMR spectra is shown, where the Uimide and G-amide proton signals can be found. Figure S5D. Diffusion of the mixture of dinucleoside and complementary mononucleosides. DOSY NMR spectra of (a) A 1:1:1 mixture of AU + A + U in CDCl3:CCl4 (2:3); (b) a 1:2:2 mixture of GC + G + C in THF-D8.