Stochastic Emergence of Two Distinct Self-Replicators from a Dynamic Combinatorial Library

Unraveling how chemistry can give rise to biology is one of the greatest challenges of contemporary science. Achieving life-like properties in chemical systems is therefore a popular topic of research. Synthetic chemical systems are usually deterministic: the outcome is determined by the experimental conditions. In contrast, many phenomena that occur in nature are not deterministic but caused by random fluctuations (stochastic). Here, we report on how, from a mixture of two synthetic molecules, two different self-replicators emerge in a stochastic fashion. Under the same experimental conditions, the two self-replicators are formed in various ratios over several repeats of the experiment. We show that this variation is caused by a stochastic nucleation process and that this stochasticity is more pronounced close to a phase boundary. While stochastic nucleation processes are common in crystal growth and chiral symmetry breaking, it is unprecedented for systems of synthetic self-replicators.


Preparation of dynamic combinatorial libraries
Building blocks 1 and 2 were purchased from Cambridge Peptide (Birmingham). All the libraries were prepared by dissolving the building blocks (1.0 mM) in borate buffer (25 mM B2O3 in water, pH 8. 2), and stirring the solution in the presence of oxygen from the air. The pH of the solution was adjusted by the addition of 1 M KOH solution such that the final pH was 8.2.
All library experiments were performed at ambient temperature. A small aliquot of each sample was moved to another vial and diluted five or ten times with doubly distilled water prior to HPLC or UPLC(-MS) analysis.
The buffer was prepared from boric acid (Merck Chemicals) dissolved in doubly distilled water from in-house double distillation facilities. Sodium perborate used for the oxidation of the thiols was purchased from Sigma Aldrich. Acetonitrile (ULC-MS grade), water (ULC-MS grade) and trifluoroacetic acid (ULC-MS grade) were obtained from Biosolve BV. Libraries were prepared in clear HPLC glass vials (12 × 32 mm) closed with Teflonlined snap caps purchased from Jaytee. Library solutions were stirred using Teflon coated micro-stirrer bars (2 × 2 × 5 mm) obtained from VWR. Samples were stirred on an Heidolph MR Hi-Mix D magnetic stirrer at 1200 rpm.
In the experiments with 10 parallel repeats, a library of 6 mL ([1] = [2] = 0.5 mM) was set up and allowed to oxidize at room temperature by the oxygen from the atmosphere until 70% of the monomers were oxidized (no agitation -ca. 12 days). This library was then split into 10 fractions (of 500 μL each). These 10 libraries were stirred at 45 o C and their compositions were monitored by RP-UPLC.

UPLC Method
UPLC analyses were performed on Waters Acquity UPLC I-class, H-class or H+class systems equipped with a PDA detector. All analyses were performed using a reversed-phase UPLC column (Aeris Peptide 1.7 μm XB-C18 150 × 2.10 mm, purchased from Phenomenex). UV absorbance was monitored at 254 nm. Column temperature was kept at 35 °C. UPLC-MS was performed using a Waters Acquity UPLC H-class system coupled to a Waters Xevo-G2 TOF. The mass spectrometer was operated in the positive electrospray ionization mode. Capillary, sampling cone, and extraction cone voltages were kept at 2.5 kV, 30 V, and 4 V, respectively. Source and desolvation temperatures were set at 150°C.
Solutions containing peptides 1 and/or 2 and their oxidation products were prepared by diluting a small aliquot of a DCL in water. These diluted samples were analyzed using the following methods (all gradients are  Figure S1. RP-UPLC trace (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after which it was stirred at 1200 rpm for 3 days. In this sample, one can observe the presence of both replicators (hexamers and octamers).

Analysis and characterization of dynamic combinatorial libraries
No replication when quiescent: Figure S2. RP-UPLC trace (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 without agitation for 5 weeks. In this sample trimers and tetramers dominate.
S6 Figure S3. Time evolution: RP-UPLC traces (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after which it was stirred at 1200 rpm for 6 days. In this DCL, one can observe the emergence and growth of the 6mer replicators. Small shifts in retention times are observed. Figure S4. Time evolution: RP-UPLC traces (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after stirring at 1200 rpm for 6 days. In this sample, one can observe the emergence and growth of the 8mer replicators. Small shifts in retention times are observed. Figure S5. RP-UPLC trace (monitored at 254 nm using method B) of the product mixture dominated by trimers and tetramers obtained by oxidation of peptides 1 and 2 without agitation for 7 days. Figure S6. RP-UPLC trace (monitored at 254 nm using method B) of the product mixture obtained by oxidation of peptides 1 and 2 for 7 days after which it was stirred at 1200 rpm for 7 days. In this sample, one can observe the presence of both replicators (hexamers and octamers). Using method B the 6mers and 8mers with different compositions can be separated from each other. Figure S7. RP-UPLC trace (monitored at 254 nm using method B) of the product mixture obtained by oxidation of peptides 1 (15 mol%) and 2 (85 mol%) for 7 days after which it was stirred at 1200 rpm for 7 days. In this sample, one can observe the presence of many different macrocycle sizes (trimers, pentamers, hexamers, heptamers and octamers).

Quantitative RP-UPLC analysis of dynamic combinatorial libraries
Quantitative analysis of the library composition was done based on the total peak area that was obtained from RP-UPLC analysis monitored at a wavelength of 254 nm. This method is only valid if the molar absorption of 1 and 2 at 254 nm is the same and independent of the environment (i.e. type of ring) in which these building blocks reside. Figures S8-S12 show that the total peak area that is obtained at 254 nm is constant throughout the experiment. This is shown for a DCL that is dominated by hexamers as well as one that is dominated by octamers at the end of the experiment. The total peak area also remains constant when the ratio between 1 and 2 is varied (Figures S13+S14). Figure S8. Time evolution of the total peak area obtained from RP-UPLC analysis (monitored at 254 nm using method A) of two product mixtures obtained by oxidation of peptides 1 and 2 for 11 days after which they were stirred at 1200 rpm for 6 days. The total peak area remains fairly constant throughout the experiment. Fluctuations can be caused by sample preparation. Figure S9. RP-UPLC trace (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after which it was stirred at 1200 rpm for 6 days. This trace corresponds to the data point of the 6mer dominated library after 0 days in Figure S8. The total peak area obtained is 3.14*10 6 AU. Figure S10. RP-UPLC trace (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after which it was stirred at 1200 rpm for 6 days. This trace corresponds to the data point of the 6mer dominated library after 6 days in Figure S8. The total peak area obtained is 3.01*10 6 AU. Figure S11. RP-UPLC trace (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after which it was stirred at 1200 rpm for 6 days. This trace corresponds to the data point of the 8mer dominated library after 0 days in Figure S8. The total peak area obtained is 3.21*10 6 AU. Figure S12. RP-UPLC trace (monitored at 254 nm using method A) of the product mixture obtained by oxidation of peptides 1 and 2 for 11 days after which it was stirred at 1200 rpm for 6 days. This trace corresponds to the data point of the 8mer dominated library after 6 days in Figure S8. The total peak area obtained is 3.26*10 6 AU. Figure S13. RP-UPLC trace (monitored at 254 nm using method B) of the product mixture obtained by oxidation of 67% peptide 1 and 33% peptide 2 for 12 days after which it was stirred at 1200 rpm for 6 days. The total peak area obtained is 6.88*10 6 AU. Note that when method B was used the DCL was only diluted only fivefold, where a tenfold dilution was used for sample preparation when method A was used. Figure S14. RP-UPLC trace (monitored at 254 nm using method B) of the product mixture obtained by oxidation of 33% peptide 1 and 67% peptide 2 for 12 days after which it was stirred at 1200 rpm for 6 days. The total peak area obtained is 7.06*10 6 AU. Note that when method B was used the DCL was only diluted only fivefold, where a tenfold dilution was used for sample preparation when method A was used. S11 Figure S15. Mass spectrum of monomer 1 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S16. Mass spectrum of monomer 2 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S17. Mass spectrum of dimer 1121 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S18. Mass spectrum of dimer 22 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5). S13 Trimers Figure S19. Mass spectrum of trimer 13 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S20. Mass spectrum of trimer 1221 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S21. Mass spectrum of trimer 1122 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S22. Mass spectrum of trimer 23 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S23. Mass spectrum of tetramer 14 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S24. Mass spectrum of tetramer 1321 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S25. Mass spectrum of tetramer 1222 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S26. Mass spectrum of tetramer 1123 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).  Figure S27. Mass spectrum of tetramer 24 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S5).         Octamers Figure S40. Mass spectrum of hexamer 1622 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S6).  Figure S41. Mass spectrum of hexamer 1523 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S6) Figure S42. Mass spectrum of hexamer 1424 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S6) Figure S43. Mass spectrum of hexamer 1325 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S6) Figure S44. Mass spectrum of hexamer 1226 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S6) Figure S45. Mass spectrum of hexamer 1127 from the LC-MS analysis of a stirred library made from peptides 1 and 2 (corresponding to Figure S6)

Repeats of emergence experiments
A library of 6 mL ([1] = [2] = 0.5mM) was set up and allowed to oxidize at room temperature by the oxygen of the atmosphere until 70% of the monomers were oxidized (no agitation -ca. 12 days). This library was then split into 10 fractions (of 500 µL each). These 10 libraries   Figure S47, expressed as percentage of total observed peak area in the RP-UPLC data at 254 nm.     Tables S1-4 and Table S7) made from equimolar amounts of 1 and 2.  Table S12. Average fraction of 8mer and 6mer and corresponding standard deviation for the final library compositions of libraries with varying ratios between 1 and 2, expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. All data in Tables S1-S11 was used to calculate these values, resulting in 10 entries for each ratio, except for 50 mol% 1 which has 45 entries (including also the data in Figure 2 and Figures S46-S48) and 85 mol% 1 which has 5 entries.

Changing experimental conditions
Additional experiments were performed to study the influence of different parameters on the observed stochasticity in the nucleation of the different replicators (6mers and 8mers).

Effect of stirring the mother solution.
The reason that in our experimental design the mother solution was not stirred and kept at room temperature prior to being split over separate vials, is that, in our experience, replicator emergence tends to be suppressed at low temperature and in the absence of stirring. Indeed, when the mother solution was immediately stirred at 45 o C for 7 days the observed final distribution is enriched in 8mers (see Table S13 and Figure S50). Table S13. Final composition of 10 libraries containing 50 mol% 1 where the mother solution was stirred at 45 o C, expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. The fraction 8mer is determined by dividing the peak area for the 8mers by the total peak area of all replicators (6mers+8mers). "Others" indicate the peaks in the RP-UPLC chromatograms that could not be assigned with confidence using LC-MS. . Variation in the fraction of the hexamer (blue) and octamer (green) replicators in the final library composition (determined by RP-UPLC) where the mother solution (50 mol% in 1 ) was either stirred at 45 o C or kept at room temperature without agitation. The fraction is defined as the amount of observed replicator divided by the total amount of replicators (hexamers+octamers). The datapoints show the average and the error bars the standard deviation. The data point corresponding to the stirred mother solution represents the average over a total of 10 samples ( Table S13). The data point for the quiescent mother solution was taken from the data shown in Figure 5). The dotted red line indicates the boundary between DCLs that are rich in hexamers (below the line) and rich in octamers (above the line).

S35
Varying total building block concentration. We also investigated the influence of the total building block concentration on the stochastic nature of the emergence process. We therefore prepared a mother solution of 2.0 mM in building block with equimolar amounts of 1 and 2 (in 25 mM B2O3 buffer, pH 8.2). After incubation at room temperature without agitation until 80% of the material had been converted to disulfides the mother solution was used to prepare five DCLs each at five different concentrations: 2.0 mM, 1.0 mM, 500 µM, 100 µM and 50 µM. The observed relative abundances for each DCL after stirring for 7 days at 45 o C are shown in Tables S14-16. The variation in the observed fractions for the 8mer and 6mer replicators is shown in Figure  S51 and the total amount of observed replicators in Figure S52.
In all of these DCLs the 8mers were the dominant replicators. This could be a result of the 8mer already nucleating in the mother solution prior to its division over different vials. Nevertheless the degree in variability in replicator composition remained substantial, far exceeding that observed at most other building block ratios (see Figure 5). Note that at 50 µM and 100 µM concentrations the conversion of building block to replicator was less efficient than at higher building block concentrations ( Figure S52). We believe this drop in total replicator formation can be explained by the reaction rates slowing down at lower concentrations. In addition, these concentrations start to approach the critical aggregation concentration of the mixture of monomers, trimers and tetramers (estimated previously to be in the 10-100 µM). 1 Figure S51. Variation in the fraction of the hexamer (blue) and octamer (green) replicators in the final library composition (determined by RP-UPLC) as a function of the total building block concentration. The fraction is defined as the amount of observed replicator divided by in the total amount of replicators (hexamers+octamers). The datapoints show the average and the error bars the standard deviation. The dotted red line indicates the boundary between DCLs that are rich in hexamers (below the line) and rich in octamers (above the line). Figure S52. Variation in the conversion of building block into replicators (6mers + 8mers) in the final library composition (determined by RP-UPLC) as a function of the total building block concentration. The conversion is defined as the amount of observed replicator (6mers + 8mers) divided by the total peak area. The datapoints show the average and the error bars the standard deviation. Each data point is the averaged of a total of 5 samples (Tables S14-18). Table S14. Final library composition of 5 libraries containing 50 mol% 1 (2.0 mM total concentration) , expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. The fraction 8mer is determined by dividing the peak area for the 8mer by the total peak area of replicators (6mers+8mers). "Others" indicate the peaks in the RP-UPLC chromatograms that could not be assigned with confidence using LC-MS.  Table S15. Final library composition of 5 libraries containing 50 mol% 1 (1.0 mM total concentration) , expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. The fraction 8mer is determined by dividing the peak area for the 8mer by the total peak area of replicators (6mers+8mers). "Others" indicate the peaks in the RP-UPLC chromatograms that could not be assigned with confidence using LC-MS. Fraction 8mer 1,00 0,60 0,61 0,78 0,60 Table S16. Final library composition of 5 libraries containing 50 mol% 1 (500 µM total concentration) , expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. The fraction 8mer is determined by dividing the peak area for the 8mer by the total peak area of replicators (6mers+8mers). "Others" indicate the peaks in the RP-UPLC chromatograms that could not be assigned with confidence using LC-MS. Fraction 8mer 0,64 1,00 1,00 0,81 0,52 Table S17. Final library composition of 5 libraries containing 50 mol% 1 (100 µM total concentration) , expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. The fraction 8mer is determined by dividing the peak area for the 8mer by the total peak area of replicators (6mers+8mers). "Others" indicate the peaks in the RP-UPLC chromatograms that could not be assigned with confidence using LC-MS. Others 27,2 6,9 11,5 5,3 10,9 Fraction 8mer 0,55 0,29 0,17 0,37 0,49 Table S18. Final library composition of 5 libraries containing 50 mol% 1 (50 µM total concentration) , expressed in percentage of total observed peak area in the RP-UPLC measurements at 254nm. The fraction 8mer is determined by dividing the peak area for the 8mer by the total peak area of replicators (6mers+8mers). "Others" indicate the peaks in the RP-UPLC chromatograms that could not be assigned with confidence using LC-MS.

Thioflavin T (ThT) fluorescence assay
A ThT stock solution (2.2 mM) was prepared in 10 mL phosphate buffer (50 mM phosphate, pH 8.2) and filtered through a 0.2 μm syringe filter. On the day of analysis, 50μL of the stock solution was diluted into 5 mL phosphate buffer (50 mM phosphate, pH 8.2) to generate the working solution of 22 μM. The fluorescence intensity of 450 μL ThT solution was measured by excitation at 440 nm (slit width 5 nm) and emission between 480-700 nm (slit width 5 nm), averaging 3 accumulations. An aliquot of 80μL of peptide solution (100μL in borate buffer) was added to the HELMA 10*2 mm quartz cuvette, incubated for 2 min, and the intensity was measured over 3 accumulations. All fluorescence measurements were performed on a JASCO FP6200 fluorimeter equipped with a 480 nm high pass cut-off filter on the emission channel to avoid high order diffractions coming from the excitation. Figure S53. Thioflavin T fluorescence emission of DCLs made from building block 1 and 2 (50 mol% each). Blue: library containing mostly 6mers. Green: library containing mostly 8mers.

Circular dichroism (CD) spectroscopy
All CD spectra were recorded using a JASCO J715 spectrophotometer and HELMA quartz cuvettes with a path length of 1 mm. Spectra were recorded at room temperature from 190 nm to 350 nm, with a 1nm step interval and averaged over 3 scans using a scanning speed of 200nm/min. Solvent spectra were subtracted from all reported spectra. Samples were diluted using borate buffer (50 mM, pH 8.2) to a concentration of 0.15 mM. The concentration is expressed in monomer units. Figure S54. CD spectra of DCLs made from building block 1 and 2 (50 mol% each). Blue: library containing mostly 6mers. Green: library containing mostly 8mers.

Sample Preparation
An aliquot (5.0 μL) of the sample was deposited on a 400 mesh copper grid covered with a thin carbon film (Van Loenen Instruments). After 60 seconds, the droplet was blotted on filter paper. The sample was then stained twice (5.0 μL each time) with a solution of 2% uranyl acetate deposited on the grid and blotted on filter paper after 30 seconds each time. The grids were observed in a Philips CM120 cryo-electron microscope operating at 120 kV. Images were recorded on a slow scan CCD camera.

Analysis
The libraries that consist of mostly octamers (Figure S55, left panels) shows well defined fibers that create large supramolecular networks. The fibers are laterally associated but show no helical or twisting structures. For the libraries that contain mostly hexamers ( Figure S55, right panels), the observed fibers are not easily distinguishable. They are laterally associated and have a twisted morphology. Figure S55. TEM micrographs of libraries made from building block 1 and 2. Left side: Samples consisting mostly of octamers, with a network-like structure created from well-defined fibers. Samples consisting of mostly hexamers that show a twisted structure of laterally associated fibers.

Correlation between UPLC and
ThT data RP-UPLC is the only analytical technique that can distinguish between self-replicators with different macrocycle sizes. The analysis is however indirect, the fibers fall apart in individual macrocycles during analysis. To justify the use of UPLC data as a measure for (fibrous) replicator formation we followed five separate DCLs containing equimolar amounts of 1 and 2 with both ThT and UPLC. Figure  S56 shows that the obtained maximum ThT fluorescence intensities (at 490 nm) correlate well with the formation of 6mer and 8mer macrocycles followed by UPLC, which justifies the use of the UPLC data of 6mer and 8mer formation as a measure of replicator formation. Figure S56. Five repeats (a-e) of replicator emergence in DCLs made from building block 1 and 2. Total replicator formation (6mer and 8mer), expressed as relative UPLC peak area, is shown on the left axes (black) and ThT fluorescence intensity at 490 nm on the right axes (red).

Data Fitting
In our model we have included this nucleation process as free fitting parameters for each hexamer and octamer separately. This means that each replicator nucleation event is fitted with a different nucleation time. The distributions of the found nucleation times by fitting this model to the UPLC traces provides strong indirect evidence that the nucleation events are indeed the cause of the observed stochastic behaviour.
The relative concentration of the different species (hexamer, octamer and trimers/tetramers) was fit to a system of ordinary differential equations (ODEs, Scheme S1, see also main text) using Symfit. 1 The fit was performed 1000 times using random initial values for the parameters. The best resulting fit had a regression coefficient of 0.933 and is depicted in Figure S57. The corresponding fit parameters are depicted in Figure 4 in the main text, and can be found in Table S19. In order to assess the uncertainty in the fit parameters we numerically determined the Hessian matrix using a second order central difference scheme where the matrix was constructed to be symmetric. The covariance matrix was obtained by inverting the Hessian, and the standard deviations of the fit parameters were taken as the square root of the diagonal elements of the covariance matrix. Table S19. Best fit parameters and associated standard deviations (StdDev). The numbering corresponds to the experiments as follows: 0-8 correspond to the experiment shown in Figure 2 and Table S4; 9-18 correspond to Repeat C (Figures S48 and Table S3); 19-28 correspond to Repeat A (Figures S46 and Table S1).  Figure S57. Fit of the ODE model shown in Scheme S1 to RP-UPLC data. The fits are depicted as solid lines, the RP-UPLC data as points. Precursors are blue, hexamers are green, octamers are red. X axis depicts time (in days), Y axis relative concentration (in %). t=0 was defined to be the time of the first RP-UPLC measurement. The regression coefficient (R 2 ) is 0.933.

Value
Scheme S1. Simplified model system of ODEs to which the RP-UPLC data is fitted to obtain an estimate of the nucleation times for the self-replicators. The fit was performed using Symfit. 1