Initial Amino Acid:Codon Assignments and Strength of Codon:Anticodon Binding

The ribosome brings 3′-aminoacyl-tRNA and 3′-peptidyl-tRNAs together to enable peptidyl transfer by binding them in two major ways. First, their anticodon loops are bound to mRNA, itself anchored at the ribosomal subunit interface, by contiguous anticodon:codon pairing augmented by interactions with the decoding center of the small ribosomal subunit. Second, their acceptor stems are bound by the peptidyl transferase center, which aligns the 3′-aminoacyl- and 3′-peptidyl-termini for optimal interaction of the nucleophilic amino group and electrophilic ester carbonyl group. Reasoning that intrinsic codon:anticodon binding might have been a major contributor to bringing tRNA 3′-termini into proximity at an early stage of ribosomal peptide synthesis, we wondered if primordial amino acids might have been assigned to those codons that bind the corresponding anticodon loops most tightly. By measuring the binding of anticodon stem loops to short oligonucleotides, we determined that family-box codon:anticodon pairings are typically tighter than split-box codon:anticodon pairings. Furthermore, we find that two family-box anticodon stem loops can tightly bind a pair of contiguous codons simultaneously, whereas two split-box anticodon stem loops cannot. The amino acids assigned to family boxes correspond to those accessible by what has been termed cyanosulfidic chemistry, supporting the contention that these limited amino acids might have been the first used in primordial coded peptide synthesis.


■ INTRODUCTION
Molecular recognition events control the specificity of the writing and reading steps of translation according to the genetic code.The code is written by aminoacyl-tRNA synthetases recognizing specific tRNAs and attaching cognate amino acids to their 3′-termini.The code is read by sequential anticodon:codon recognition in the decoding center of the ribosome.How such a process could have evolved is an intriguing question with the mechanism of individual amino acid−codon assignments at its heart.Before the advent of aminoacyl-tRNA synthetases�be they ribozymes or en-zymes�it is difficult to imagine selective tRNA aminoacylation unless molecular recognition between the amino acid residue and the tRNA itself was responsible for this.If such recognition took place, parsimony suggests that it should have been between the amino acid residue and the anticodon, but this is countered by the fact that the 3′-terminal aminoacylation site is some 75 Å distant from the anticodon.Having a trinucleotide sequence elsewhere in the tRNA acting as the aminoacylation specificity control element alleviates this distance constraint 1 but decouples the nature of the aminoacyl residue and the sequence of the anticodon, necessitating an alternate mechanism for amino acid−codon assignments.
Our recent finding that the terminal trinucleotide sequence of a tRNA acceptor stem analogue influences the specificity and stereochemistry of 3′-terminal aminoacylation suggests that loosely coded aminoacyl-tRNA acceptor stem-overhang domains could have been produced by prebiotic chemistry. 1,2owever, the degree of intrinsic coding is low, and we suspect that higher fidelity coding is unlikely without extrinsic auxiliary control elements such as ribozymes.To us, this implies that if the initial loosely coded aminoacyl-tRNAs had participated in the early stages of ribosomal peptide synthesis, the short peptides thus produced would not have been advantageous to the system per se.We have thus started to investigate the possibility that the primary function of early peptide synthesis was not the produced oligopeptide.Rather, that in a scenario of adventitiously produced aminoacyl-RNA molecules, peptide bond formation would have provided a mechanism to deblock RNA 3′-termini so that they were rendered extension/ligationcompetent.For example, a self-priming substrate for chemical or in trans ribozymic copying in which a duplex 3′-overhang primes strand displacement synthesis via a folded-back conformation�such overhangs also being susceptible to aminoacylation by a variety of mechanisms. 2,3According to this idea, the selective pressure behind the development of transpeptidation would have been to allow more efficient RNA replication�peptides produced by the process would simply have been waste products.Second-order transpeptidation would have had to be faster than pseudo-first-order hydrolysis of aminoacyl-and peptidyl-RNAs to be selected for this purpose.One can envisage that the proto-peptidyl transferase center of the ribosome initially catalyzed transpeptidation between acceptor stem overhangs simply by binding them to increase their effective molarities.Subsequent fusion of the acceptor stem-overhang domains with anticodon stem-loop domains could have facilitated this process if it further contributed to the proximity required for transpeptidation (Figure 1a). 4 One way in which this could have occurred would have been through anticodon binding of two such fusions to contiguous codons on a short piece of singlestranded RNA (ssRNA). 5In investigations of extant biochemistry, it has been shown that two tRNA Phe (UUC) molecules do not appreciably bind contiguous codons on a short oligonucleotide simultaneously absent the ribosomal decoding center. 6However, we reasoned that this might be a specific case of weak binding between Phe-codon (UUC) and anticodon (GAA) loops.Accordingly, we decided to investigate the effective molarity of aminoacyl-/peptidyl-RNAs required to favor uncatalyzed transpeptidation relative to hydrolysis and to study the energetics of codon:anticodon loop binding across the whole genetic code.

■ RESULTS AND DISCUSSION
The first assessment of the effective molarity of tRNA termini required to ensure that transpeptidation can compete with hydrolysis by 55 M water was published by Weber and Orgel. 7,8These authors found that 0.4 M 2′/3′-glycyladenosine-5′-(O-methylphosphate) gave rise to diglycine and its diketopiperazine (DKP) in ∼15% combined yield along with a trace of triglycine after 2 h at pH 8.2 or 9.0.We confirmed their results by 1 H NMR analysis of similar reactions (415 mM, pH 8.2), observing yields of 6% diglycine, 12% DKP, and <0.5% triglycine (Figures 1b and S1).Incubation of equimolar glycine with 2′/3′-glycyl-adenosine-5′-O-methylphosphate demonstrated a substantial reaction between free amino acids and aminoacyl-nucleotides (Figure S2).This indicates that a translational system in the presence of free amino acids�either initially endogenous or those produced by substantial aminoacyl ester hydrolysis�would produce high yields of uncoded peptides.This data was supported by functionalization of the crude products of our reactions with dansyl groups and analysis by HPLC, which demonstrated similar yields for diglycine and triglycine (Tables S1 and S2 and Figure S3).The vast majority of the aminoacyl esters, however, underwent hydrolysis to free glycine as also reported by Orgel.Dipeptidyl species were hardly detectable by HPLC or NMR when the aminoacyl ester concentration was reduced by more than an order of magnitude (35 mM, Figure S4).
DKP formation is a known hurdle to be overcome for the production of oligopeptides. 9Incubating conventionally synthesized 2′/3′-diglycyl-adenosine-5′-O-methyl-phosphate at a low concentration at pH 8.2 gave DKPs and linear dipeptides in a 3:1 ratio (Figure S6).One of the ways biology overcomes the DKP problem is by initiating peptide synthesis with an N-formyl amino acid.To reduce DKP formation in our model system and to investigate whether transpeptidation is more efficient when one reaction partner is N-acylated, we incubated a 1:1 mixture of 2′/3′-glycyl-adenosine-5′-Omethylphosphate with 2′/3′-N-formyl-glycyl-adenosine-5′-Omethylphosphate at a combined concentration of 415 mM.While this reduced DKP formation and consequently improved the yields of linear peptides, the reaction still suffered from similar low total peptide yields (1.5% diglycine, 5.4% N-formyl-diglycine, 5.2% DKP, and ∼0.1% tripeptides, Figure S7).We thus conclude that uncatalyzed transpeptidation, even using a combination of a free nucleotide aminoacyl ester and an N-acylated one, is an intrinsically inefficient process in water.Any early transpeptidation catalyst would therefore have had to bring about mutual proximity of the reaction partners equivalent to an effective molarity of both species of the order of hundreds of mM.
There have been several recent attempts to recapitulate the function of the extant ribosomal peptidyl-transferase center using heavily truncated variants of the large ribosomal subunit, but thus far, only trace transpeptidation activity has been detected. 10,11We have also started efforts to (re)produce a minimal peptidyl-transferase center capable of inducing proximity between a free aminoacyl-nucleotide and an Nblocked one.Given the high proximity requirement for transpeptidation to outcompete hydrolysis, we anticipate that any primitive entropic catalyst will benefit from other contributions to the proximity of the reaction partners.As mentioned above, one such additional contribution could be afforded by fusing the acceptor stem-overhang domains to anticodon domains.Simultaneous binding of two such fusions to contiguous codons on a short oligonucleotide would augment the proximity induced by the minimal peptidyltransferase center.Production of intact tRNAs by this sort of domain fusion has been suggested previously, 4,12 but the potential benefit afforded to transpeptidation caused by contiguous anticodon binding to ssRNA has not been alluded to.It is reasonable to expect this benefit to scale with the strength of binding between tRNA codons and anticodons. 5−15 However, since there is limited experimental data regarding the binding of tRNA anticodon loops to codons, we investigated their binding affinity using biolayer interferometry (BLI).
First, we prepared biotin-labeled ssRNAs containing our desired codon sequences.We also prepared RNA hairpins with an identical 5-base pair stem and differing 7-mer loops containing 64 anticodon triplets (Tables S3−S5).The remaining four nucleotides of the loop were limited to bases equivalent to the anticodon loop residues 32C, 33U, 37A, and 38A of a tRNA based on the consensus sequences of existing tRNAs. 16Thus, these hairpins are suitable models for both an early translation system and for other tRNA-like molecules containing no base modifications in their anticodon loop.However, it is known that variations at these positions can alter the geometry of the anticodon loops and subsequently influence the binding strength (Figure S8). 17,18Using these synthetic hairpins, we determined the association (k on ) and dissociation rate constants (k off ) for binding of the anticodon loop to the corresponding immobilized ssRNA using BLI, thus enabling calculation of the equilibrium dissociation constant (K D ) (Figures 2 and S9 and Table S6).For all the measurable K D s, the average value of (4-fold degenerate) family-box 19 codon:anticodon binding is 100 μM, while the average of splitbox codon:anticodon interaction is 460 μM (P < 0.0001,

Journal of the American Chemical Society
Figure 2A).As we had anticipated, the previously studied UUC codon:GAA anticodon pairing showed weak binding as evidenced by a high K D (470 μM).The disparity between the two classes of codons is most substantial when codons end with cytosine (average 49 vs 486 μM, P < 0.005, Figures 2B,C  and S10).With this terminal codon base, a threshold of 90 μM can be set to divide the two classes of codons.The precise rules for codons ending with bases other than C are less clear.Further data gained using isothermal titration calorimetry (ITC), with all oligonucleotides in free solution, substantially agree with the BLI results (Figure S11 and Tables S7−S9).Stronger codon:anticodon binding pairs would have provided a more pronounced effect on the catalysis of early transpeptidation. 5,20Weaker pairs would have been less useful until auxiliary contributors, for example, nucleobase modifications, emerged to compensate for their lower binding affinity.We note that the third position of the codon has been hypothesized to be either promiscuous (through various wobble interactions) or inoperative in early genetic codes. 19,21ome of our earlier research has focused on the prebiotic synthesis of Strecker amino acid precursors, which are products of reductive homologation of hydrogen cyanide. 22he precursors of Asp, Asn, Glu, and Gln derived from cyanoacetylene which we originally sourced by in situ stoichiometric oxidative coupling of acetylene and hydrogen cyanide by Cu(II).However, we now favor a model in which atmospherically derived cyanoacetylene is stored as a dicyanoimidazole adduct for exclusive incorporation into pyrimidine ribonucleotides, 23 so our current view is that these amino acid precursors are not part of the cyanosulfidic set.The remaining Strecker precursors correspond precisely with the amino acids encoded by family-box codons in extant biology.Hence, we suggest that the current assignment of amino acids to codons is in part due to this correlation between the prebiotically accessible early amino acids and the stronger family-box codon:anticodon interactions.There are two split-box examples where codon:anticodon binding is of similar strength to those of the family boxes; AGG (85 μM) and GAG (140 μM) encoding Arg and Glu.Arg is also a family-box amino acid, but Glu is only encoded by a split box in extant biochemistry.We thus contend that this suggests that the family-box amino acids were used in the earliest forms of coded peptide synthesis and that Glu either was postbiosynthetically assigned to GAG or, if a plausible cyanosulfidic synthesis is uncovered, was also one of the initially assigned prebiotic amino acids.
Transpeptidation requires the proximity of two stemoverhang domains.This proximity would be most enhanced if both overhangs were coimmobilized, much as they are in modern biology when two tRNAs are bound to contiguous codons.Again using BLI, we demonstrated that two anticodon loop domains can bind to an ssRNA simultaneously without additional partners but that the binding is typically only strong when both pairings involve family-box codons (Figure 3A).
We immobilized a biotinylated oligoribonucleotide 5′-AGCCCUCC containing two contiguous family-box codons ending in C (GCC and CUC, Figure 3B and Table S10).We saturated the CUC codon with a corresponding hairpin RNA to give a complex which was subsequently incubated with a second hairpin designed to bind the GCC codon.The K D for the second binding (GCC) was found to be slightly lower (13  μM) than when binding to ssRNA alone (15 μM), suggesting that there was no significant synergy in the termolecular interaction.We were also able to reverse the order of hairpin addition and saw a slightly stronger binding for the second hairpin (CUC K D = 50 μM, compared to 65 μM for when binding singly to ssRNA).It has previously been observed that two complementary anticodon loops can bind each other tightly forming a loop− loop complex. 24We also found that specific pairs of anticodons can bind each other with affinities higher than we had seen for codon:anticodon binding (e.g., GCC to GGC 1.0 μM; GCC to GGU 1.7 μM; Figure S12 and Table S7).To avoid loop−loop complexes, we hypothesized that only a subset of noninteracting anticodon loops could have been used at the origin of genetic coding.Through exhaustive search, we identified 14 such subsets containing 8 anticodons each encoding a different family-box amino acid of sufficiently high binding affinity (<130 μM), that cannot bind any of the other anticodons in the same subset significantly (Tables S11 and S12).We selected one subset and, using four different ssRNAs, measured the binding affinity to matching contiguous codons from that subset (Figures 3C and S13 and Table S10).The affinities for sequential binding remained largely similar to those of single binding.Those showing a slight weakening in binding affinity could be accounted for by steric clash between the first and second anticodon loop.A strengthening in binding affinity of the second hairpin could be accounted for by an energetic reorganization of the ssRNA template, much like the kinking of mRNA in the ribosome. 25e then tested an immobilized ssRNA containing two contiguous split-box codons UUC and CAU (Table S10 and Figure S14).The weak ssRNA binding affinity of the corresponding split-box anticodon loops and the limit of their solubility meant that presaturation by one hairpin was not possible.Instead, both hairpins were incubated simultaneously affording an apparent K D (190 μM) nearly identical to the single site binding for the stronger codon (180 μM).This suggests that only single-site occupation could be observed, consistent with the observations of Labuda et al. 6 We found that similar experiments using ITC qualitatively complemented the results seen with BLI (Figure S11 and Tables S7 and S9).Furthermore, when a split-box anticodon loop was added to a preformed complex of family-box anticodon bound to an ssRNA (5′-AGACGCCC; family-box codon underlined), we observed reduced binding of the split-box anticodon loop (sequential binding K D = 580 μM vs single binding K D = 240 μM) (Figure S13 and Table S10).When the order of addition was reversed, family-box binding also decreased by a factor of 2.4 (sequential binding K D = 34 μM vs single binding K D = 14 μM).A second example, this time using an ssRNA containing a lower-affinity split-box codon (5′-AGUCUACC; family-box codon underlined), also demonstrated reduced binding.Preincubation of an anticodon loop to UAC reduced the subsequent binding affinity to GUC by a factor of 3, from 80 to 240 μM, while reversing the order made the split-box binding event unmeasurable.This case is remarkable in that the binding to GUC is reduced even though the affinity between the split-box codon UAC and its anticodon complement is extremely low (3400 μM).In both these examples, it is plausible for partial anticodon:anticodon interactions between the different hairpins to occur (GGC to GUC and GAC to GUA; Watson−Crick binding underlined, wobble italicized).The weakened binding demonstrates that there would exist an evolutionary driving force to restrict to an orthogonal set of family-box anticodon loops in a background lacking significantly complementary split-box anticodon loops, thereby avoiding inhibition of the former's binding to ssRNAs.
Coded peptide synthesis by translation requires two tRNAs to be bound adjacently to an ssRNA.Our results (albeit from a limited number of pairwise combinations) suggest that the earliest forms of translation could only utilize subsets of orthogonal family-box codons.Incorporation of split-box codons would have required further stabilization, for instance, by the evolution of the decoding center in the rRNA small subunit. 26Furthermore, as more codons were assigned, nascent biology would have had to evolve strategies to avoid the inhibition of codon binding by anticodon:anticodon complementarity.

■ CONCLUSIONS
Coded peptide synthesis by translation is ubiquitous in extant biology and its emergence would have constituted an extremely beneficial evolutionary innovation.Key to the mechanism of translation is spatial restriction of two aminoacyl residues to enable transpeptidation.Complementing previous work, we have shown that an effective concentration of aminoacyl (oligo) nucleotides of hundreds of mM is required to form peptide bonds at a synthetically useful level in water.
The ribosome constrains transpeptidation substrates, in part, by localizing aminoacyl-tRNA and peptidyl-tRNA to an ssRNA template through codon:anticodon binding.A proto-translational mechanism could have benefited greatly from such an increase in effective molarity in proportion to the binding strength.The interaction of codons with anticodon loops has not been studied previously in the absence of ribosomal architecture, but various relationships have been hypothesized.
Herein, we present experimental data demonstrating the binding strength between anticodon loop domains and their complementary codons on ssRNA.In general, our findings indicate that family-box codons exhibit stronger binding compared to codons from split boxes.This is most evident for codons ending in base C. Furthermore, contrary to previous reports, 6 two distinct anticodon loops can simultaneously bind to an RNA strand containing contiguous familybox codons without auxiliary components like the ribosome.Fourteen different subsets of orthogonal, strong binding affinity, family-box codons encoding all eight prebiotic amino acids can be envisaged, which avoid the problems of loop− loop complexation while maintaining high binding affinity to ssRNA.The binding of the second hairpin is generally slightly weakened, suggesting that inherent strong binding affinity of each individual hairpin is the essential property required for binding two hairpins contiguously.ssRNA containing weakbinding split-box codons failed to simultaneously bind two respective anticodon loops.
These findings provide a strong argument for the selection of strong binding family-box codons over split-box codons at the origin of translation.Expansion of the early genetic code would have required additional components to stabilize weaker binding loops.Combining this selection mechanism with the prebiotic availability of amino acids produced by the cyanosulfidic scenario 22 would explain why glycine, alanine, valine, leucine, serine, threonine, arginine, and proline are all assigned to family-box codons in modern biology.In light of our recent work on coded aminoacylation, 1 which family-box codons code for each amino acid would be dependent on the fusion of acceptor stems and anticodon domains.It is possible that the precise relationship between any particular family-box codon and its encoded amino acid is a "frozen accident". 27ince the evolution of functional coded peptides by transpeptidation of (oligo)nucleotide aminoacyl esters would have faced extreme challenges, we propose that trans-Journal of the American Chemical Society peptidation was initially developed instead as an RNA deblocking mechanism.Activation of RNA in the presence of (N-acyl-)amino acids would also lead to RNA diol termini becoming (N-acyl-)aminoacylated. 1,28 Acylated RNA termini would prevent extension/ligation chemistries, preventing RNA replication until the (N-acyl-)aminoacyl group was removed.Development of a catalyst which freed RNA termini by transpeptidation would have been evolutionarily favored because it would have increased the efficiency of RNA replication.The initial catalyst might have acted on (N-acyl-)aminoacyl RNA domains in free solution but would have been greatly improved upon by conformational restriction of the two hairpin RNAs.Fusing aminoacyl stem overhangs to family-box anticodon-containing hairpins would have enabled ssRNA chains to act as a template, increasing the effective concentration of (N-acyl-)aminoacyl groups.The strength of template binding could have been additionally improved by later emerging auxiliary molecules.These auxiliaries could also develop further, along with anticodon loop nucleotide modifications, to stabilize split-box codon:anticodon interactions thus enabling the use of the full set of 64 possible codons.The production of waste peptides with partial coding from various mechanisms could later undergo exaptation to produce useful peptides, hence building the modern translation system.Understanding the nature of the initial transpeptidation catalyst is essential and a current focus in our laboratory.

Figure 1 .
Figure 1.Evolution of tRNA and the mechanism of transpeptidation.(A) Schematic representation of aminoacyl-transfer on the acceptor stem overhang 1 and the fusion of the acceptor stem domain and anticodon stem-loop domain; (B) high effective concentration of aminoacylated tRNAs and the power of exclusion of water by the ribosome during translation is demonstrated by the incubation of 0.4 M 2′/3′-glycyl-adenosine-5′-Omethylphosphate in solution producing a minority of di-and tripeptide species compared to free glycine.Created with BioRender.com,tRNA PDB ID: 4TNA.R�H or formyl group.

Figure 2 .
Figure 2. Equilibrium dissociation constant (K D ) of codon:anticodon loop binding as determined by BLI.Box and whisker plots of K D of (A) all family-box codons and split-box codons and (B) C-ending family-box codons and split-box codons; (C) plots of association constants and dissociation constants of measured anticodon loop/codon ending with C/U/G/A in two codon classes.P-Value of permutation test for Pearson correlation coefficient is displayed.

Figure 3 .
Figure 3. Sequential binding of two anticodon loops to ssRNA.(A) Schematic representation of anticodon loops binding to a biotin-labeled ssRNA containing the respective codons; (B) typical BLI curves of ssRNA containing two family-box codons.Equilibrium dissociation constants of the anticodon:codon complex after the first loop binding to the ssRNA (i.e., sequential binding) are annotated beneath their respective codons.Bracketed equilibrium dissociation constants are those single anticodon loop:ssRNA binding; (C) bar chart comparing the K D of single binding and sequential binding of one orthogonal anticodon loop subset [i.e., a subset of anticodon loops which show significant (K D < 130 μM) binding to ssRNA and are not able to bind each other by loop−loop interactions].Conditions: 200 μM for the first anticodon loop; 0−400 μM for the second anticodon 100 mM NaCl, 100 mM MgCl 2 , 50 mM HEPES, pH 7.2, 20 °C.