Sequence Programming with Dynamic Boronic Acid/Catechol Binary Codes.

The development of a synthetic code that enables a sequence pro-grammable feature like DNA represents a key aspect toward intelli-gent molecular systems. We developed herein the well-known dynamic covalent interaction between boronic acids (BA) and catechols (CA) into synthetic nucleobase analogs. Along a defined peptide backbone, BA or CA residues are arranged to enable se-quence recognition to their complementary strand. Dynamic strand displacement and errors were elucidated thermodynamically to show that sequences are able to specifically select their partners. Unlike DNA, the pH dependency of BA/CA binding enables the de-hybridization of complementary strands at pH 5.0. In addition, we demonstrate the sequence recognition at the macromolecular level by conjugating the cytochrome c protein to a complementary polyethylene glycol chain in a site-directed fashion.

M olecular interactions in Nature are often involved in a complex network of energy landscapes where individual components, from small molecules to organelles, form transient systems on demand. 1 Proteins have their origin within the genome, where molecular information is encoded and translated based on precise complexation and spatially controlled chemistry. In this respect, there has been much progress in the creation of artificial DNA prototypes to program molecular functions, 2,3 adapting both supramolecular 4−8 and dynamic covalent interactions. 9−12 Despite these successes in pursuit of artificial life, the advent of synthetic chemistry in this field remains far slower than their biochemistry counterparts. 13−16 The development of synthetic DNA-type systems is deceptively challenging. On the sequence level, it is necessary that the interacting motifs are small so that both sterical demand and synthesis are not compromised by increasing sequence length. Molecularly, the binding event should be selective for complementary sequences while remaining dynamic and exchangeable in mild aqueous conditions. At first glance, synthetic functional groups in supramolecular chemistry (e.g., cyclodextrin, 17 curcubituril, 18 ureido-pyrimidinones 19 ) or dynamic covalent chemistry (e.g., spiropyran, 20 diarylethenes 21 ) seem to satisfy the above criteria. However, most motifs are either sterically bulky, offer too high binding constants already for the first binding event and/or they are laborious to be incorporated along a sequence.
Unlike most recognition motifs, the interaction between boronic acids (BAs) and electron rich vicinal diols such as catechols (CAs) fulfills these criteria well. 22 The binding complex is equipped with (1) small spatial requirement, (2) fast binding kinetics and (3) pH-responsiveness within the physiological range (5.0−7.4). 23 Hence, its dynamic covalent binding capabilities have been applied broadly in therapeutics, 24 biosensors, 25,26 stimulus responsive ligation tools 27 and as self-regenerative materials. 28,29 Herein, we propose that BA/ CA motifs in a binary sequence would encode molecular recognition in a stimulus responsive fashion.
The permutation between binary "1/0" events and the length of the backbone defines the coding space ( Figure 1). Inspired by peptide nucleic acids, a peptide backbone was designed to ensure greater hydrolytic stability. 30 Moreover, existing peptide synthesis methodologies facilitate the preparation of longer sequences, installation of customizable end groups and flexibility of spacer groups, i.e., lysine or alanine to control solubility, steric demand or surface charges. As a demonstration in a broader context, a protein (cytochrome c) and a polymer (polyethylene glycol) were conjugated via these dynamic covalent tags to demonstrate recognition specificity at the macromolecular level.
The unnatural amino acid 7 containing the BA was synthesized based on a protocol for iodo-phenylalanine (Scheme 1). 31 L-Phenylalanine was iodinated to afford 1, with the carboxylic acid and the amine groups protected by the methyl ester (2) and the boc group (3), respectively. Suzuki− Miyaura borylation was conducted to install the BA pinacol ester moiety (4) with a subsequent global deprotection to yield the boronated L-phenylalanine 5. Here, a switch in protecting groups is necessary because the pinacol ester is sensitive to microwave conditions. Additionally, longer sequences containing multiple BA cause aggregation on the solid phase. Thus, after the installation of the fmoc moiety (6), the BA was protected with a highly bulky pinanediol to afford 7. The corresponding CA containing amino acid, dihydroxy-L-phenylalanine, was commercially available.
Peptide sequences containing all permutations of BA and CA in a hexa/octa-peptide format were synthesized to elucidate the influence of sequence, length and positional defects. The amino acids that do not participate in the coding segment are filled by lysines (X) to improve water solubility. To demonstrate the chemical versatility, the N-terminus was modified with reactive functionalities, i.e., amine, thiol or maleimide (Scheme 1). In this way, oligopeptides containing one, two and three BA [(AX) 1 , (AX) 2 , (AX) 3 ] as well as their complementary CA counterparts [(BX) 1 , (BX) 2 , (BX) 3 ] were synthesized. The binding affinities of the dynamic covalent interactions between (AX) 1 -(BX) 1 , (AX) 2 -(BX) 2 , (AX) 3 -(BX) 3 were evaluated by fluorescence microscale thermophoresis in 300 mM phosphate buffer, pH 7.4 ( Figure 2a).
In each series, fluorescein labeled BA peptides act as the template strand and are titrated with their complementary CA peptides. For a single BA/CA binding event, (AX) 1 -(BX) 1 , a binding affinity of 1300 ± 300 M −1 was observed, which is consistent with published data. 32 By increasing the binding event to divalent (AX) 2 -(BX) 2 and trivalent (AX) 3 -(BX) 3 , a respective 10-fold (12 500 ± 1100 M −1 ) and 70-fold (81 400 ± 7300 M −1 ) increase were observed. Importantly, the absence of binding errors to form unstructured aggregates was supported by the defined fluorescence decay and dynamic light scattering ( Figure S4). The binding of (AX) 3 -(BX) 3 was also confirmed independently by Forster resonance energy transfer (FRET) ( Figure S5). The lesser increase in binding affinity with each subsequent binding code suggests that energy is required to compensate the backbone structure in the bound state. By bringing the findings into perspective of DNA hybridization,

Journal of the American Chemical Society
Communication the binding affinity of (AX) 3 -(BX) 3 is comparable to about eight base pairs (with 50% G-C content) on a DNA level. 33,34 The binding structure of the (AX) 3 -(BX) 3 complex was characterized by multidimensional 1 H NMR spectroscopy. The binding of the boronic acids and catechols leads to the complete transformation of the chemical environment, shifting the 1 H aromatic signals of the components (Figure 2b, S6). Resonance peaks of the boronic acids 1 H AR in the total correlation spectroscopy (TOCSY) were shifted into high-field confirming that the boron center becomes less electron withdrawing upon binding ( Figures S7 and S8). Nuclear Overhauser effect spectroscopy (NOESY) analysis of (AX) 3 -(BX) 3 shows many additional and shifted intramolecular through space 1 H-couplings compared to the separate components (Figures 2c, S9 and S10). These new interactions ascertain the formation of chemical environments that increase the intramolecular through space interactions between the aromatic groups and the H α , H β of the boronic acid or catechols. Diffusion ordered NMR (DOSY) confirms the binding event as an increase in the diffusion time of the complex (Figures 2d and S11). Additionally, the monovalent complexes are stable to heat up to 70°C at the using variable temperature NMR (Figures S15−S19).
The observed BA−CA interaction was supported independently by matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) MS, where the formation of (AX) 3 -(BX) 3 was characterized by an m/z value at 2079.01 at pH 7.4 (Figure 2e, top). Noteworthy, significant fragmentation of the complex occurs due to the acidic nature of the matrix (αcyano-4-hydroxycinnamic acid, α-CHCA). At pH 5.0, only separate components of (AX) 3 and (BX) 3 were found ( Figure   2e, bottom). In the Fourier-transform infrared (FTIR) of (AX) 3 -(BX) 3 , the vibrational mode at 1289 cm −1 was lost while a new peak was observed at 1495 cm −1 , suggesting an increase in bond energy and CC character for the dynamic covalent interaction ( Figure S20). 35,36 The electron deficient boron withdraws electrons through the vicinal diols, which reduces the electron density of the catechol aromatic system. Circular dichroism spectroscopy found that, at pH 7.4, the (AX) 3 -(BX) 3 complex shows a strong negative molar ellipticity at 210 nm, corresponding to the n-π* transition of the carbonyl group (Figure 2f, S21). Together with the absence of Cotton effects, these observations imply that the tetrahedral boron center has a strong through-space effect on the CO bond. Density functional theory (DFT) calculations confirmed that hydrogen bonding interactions between the hydroxyl groups of the tetrahedral boron and the carboxyl groups of lysines contribute to lowering the electronic energy of (AX) 3 -(BX) 3 (Figures 3a, S23 and S24). The combination of these observations seems to corroborate the shifts associated with the TOCSY, NOESY cross-peaks. In contrast, at pH 5.0, there is no interaction between the complementary sequences ( Figure S21). Here, DFT calculations using a simplified structure revealed a low activation energy (10 kcal mol −1 ) for the first boron−oxygen (catechol) bond breaking step, representing the fast hydrolysis of the tetrahedral boron observed at acidic pH ( Figure S26).
Interestingly, sequences containing consecutive BA/CA residues without the alternating spacers (A 3 X 3 , B 3 X 3 ) do not bind to their complementary partners ≤500 μM (Table 1, Figure S22, S27). Above this concentration, precipitation of A 3 X 3 occurs, indicating the importance of the lysine spacer in

Journal of the American Chemical Society
Communication the alternating (AX) arrangement providing both solubility and relief of steric constraints for the binding event.
Next, we encoded a sequence specific binding event by using (1) mixed sequences (ABA and BAB) and (2) inclusion of a nonbinding event Y. Importantly, mixed sequences contain partially complementary parts, whereas the nonbinding event provides an "error" in the sequence. We observed that the ABA-BAB interacts with a binding affinity of 79 400 ± 5200 M −1 , which is comparable to the homogeneous (AX) 3 -(BX) 3 pairs (Table 1). However, the interaction took significantly longer (about 8 h) suggesting that error correction requires a certain time frame, similar to DNA. 37,38 On the other hand, sequence hybridization can be weakened correspondingly by AAYA-BYBB, where Y is a nonbinding, noninteracting amino acid (alanine). The increase in the chain length also increases the energy needed to compensate, resulting in a lower binding affinity (21 100 ± 6100 M −1 ) to a divalent level. Mismatched partners such as (AX) 3 -BYBB further weaken complementarity. Taking another mismatch pair of identical length, BAB-(BX) 3 , nonbinding residues appear in 1 H NMR ( Figure S12), indicating very weak interactions.
Since strand displacement dynamics is an important tool in DNA nanotechnology, 38−40 we investigate the displacement dynamics of our conjugate by the binding of monovalent (BX) 1 and a trivalent (BX) 3 against a (AX) 3 template. The fluorescein-(AX) 3 was first titrated with (BX) 1 until binding saturation (Figure 3b, black). Displacement and sorting was then achieved by titrating the (AX) 3 -(BX) 1 against Dy-light650-(BX) 3 , monitored independently by FRET. The intersection between the measurements indicates a stoichiometric ratio where (BX) 1 and (BX) 3 can competitively displace each other by at least 50% (Figure 3b). For (BX) 1 to displace (BX) 3 by 50%, a stoichiometric factor of >4000 mol % is required. On the other hand, (BX) 3 would only require >2.5 mol % due to its multivalent effect.
Based on these findings, the established binary codes could be used to enable recognition of macromolecules, i.e., proteins and polymers. As PEGylation of proteins remain an important aspect in protein therapeutics, 41,42 we functionalized PEG 5000 and yeast cytochrome c (CytC) with (AX) 3 and (BX) 3 , respectively. The conversion to PEG 5000 -(AX) 3 (BX) 3 -CytC was quantified by a fluorogenic sensor, Alizarin Red S in a titration assay (Figure 3d). The construct was characterized additionally with MALDI-TOF MS albeit with partial dissociation of the construct due to the acidic matrix ( Figure  3e). Additionally, topological height increases due to complexation was visualized by atomic force microscopy ( Figure  S19). 43 In summary, we have demonstrated, to the best of our knowledge, the first application of boronic acid chemistry in molecular sequence programming under physiological conditions. By combining the recognition and binding customization of boronic acid/catechol chemistry with a peptide