Identifying Physical Causes of Apparent Enhanced Cyclization of Short DNA Molecules with a Coarse-Grained Model

DNA cyclization is a powerful technique to gain insight into the nature of DNA bending. While the wormlike chain model provides a good description of small to moderate bending fluctuations, it is expected to break down for large bending. Recent cyclization experiments on strongly bent shorter molecules indeed suggest enhanced flexibility over and above that expected from the wormlike chain. Here, we use a coarse-grained model of DNA to investigate the subtle thermodynamics of DNA cyclization for molecules ranging from 30 to 210 base pairs. As the molecules get shorter, we find increasing deviations between our computed equilibrium j-factor and the classic wormlike chain predictions of Shimada and Yamakawa for a torsionally aligned looped molecule. These deviations are due to sharp kinking, first at nicks, and only subsequently in the body of the duplex. At the shortest lengths, substantial fraying at the ends of duplex domains is the dominant method of relaxation. We also estimate the dynamic j-factor measured in recent FRET experiments. We find that the dynamic j-factor is systematically larger than its equilibrium counterpart—with the deviation larger for shorter molecules—because not all the stress present in the fully cyclized state is present in the transition state. These observations are important for the interpretation of recent cyclization experiments, suggesting that measured anomalously high j-factors may not necessarily indicate non-WLC behavior in the body of duplexes.

Cyclization simulations were performed in three phases: exploratory, equilibration and production. In all cases, we use a virtual-move Monte Carlo (VMMC) algorithm [S1] in combination with umbrella sampling [S2].
In the exploratory phase, we iteratively adjusted the umbrella sampling bias for the windows associated with the open and cyclized states, yielding a flat population of states for both windows. We use discrete potentials for both dimensions of the order parameter (see main text). For ee , we use variable width distance increments: 0-1. 7036 nm, 3.4072, 5.1108, 8.518, 12.777, 17.036, 21.295, 25.554, 34.072, 42.59, 51.108 and > 51.108 nm. For bp , the increment is fixed, and is simply the number of base pairs formed (0 ≤ bp ≤ s ). As the weighting iterations were semi-automated, simulation times varied in the range 10 6 -10 7 VMMC steps per particle. Simulations were performed with one molecule in a cubic box of dimension 170.36 nm, which corresponds to a unimolecular concentration of 336 nm. To further simplify sampling, we forbid the formation of base pairs that are not intended in the design of the system (non-native base pairs).
In the equilibration phase, we equilibrated the system for 10 7 VMMC steps per particle using the aforementioned umbrella weights. Adequate sampling (number of transitions in ee and bp ) and decorrelation (via a block average decorrelation method) of potential energy, bubble size, fraying and structural kinking were checked. For the production phase, each simulation (five independent simulations per measurement) was initialized with a statistically independent starting configuration and a unique random seed. This is accomplished by randomly drawing starting configurations from the equilibration phase, one for each production trial, ignoring the first 2 × 10 6 VMMC steps per particle of equilibration. The production trial run time is 10 7 VMMC steps per particle. For reference, the characteristic decorrelation time for the potential energy is ∼ 10 4 VMMC steps per particle, while the decorrelation time for kinking is ∼ 10 5 VMMC steps per particle.
The "seed moves" used to build clusters in the VMMC algorithm were: • Rotation of a nucleotide about its backbone site, with an axis chosen uniformly on the unit sphere, and with an angle drawn from a normal distribution with a mean of zero and a standard deviation of 0.10 radians.
• Translation of a nucleotide, where the displacement along each Cartesian axis is drawn from a normal distribution with a mean of zero and a standard deviation of 0.085 18 nm.

B. Dimerization simulations
Dimerization simulations follow a very similar procedure as cyclization simulations (Section S1 A), but adapted for a bimolecular system. With the exception of the starting configuration (bimolecular), exploratory, equilibration and production procedures are identical. As discussed in the main text, each molecule has one complementary sticky end and one blunt end. This precludes the formation of multimers and circular dimers, allowing only the formation of linear dimers. Simulations were performed with these two complementary, non-palindromic, sticky-ended duplexes, in a cubic box of dimension 170.36 nm, which corresponds to a dimer concentration of 336 nm. To estimate the impact of excluded volume effects, dimerization systems were also performed in a box of half the size and therefore 8 times the concentration (85.18 nm per side, which corresponds to a dimer concentration of 2.69 µm). Non-native base pairing was explicitly disallowed for the 2.69 µm simulations, but not the 336 nm simulations. Any difference for these simulations would be marginal, as evidenced by the similarity of the equilibrium constants, since the ensemble is dominated by configurations in which domains are fully bound or fully unbound.
Analogously to the cyclization simulations, the dimerization simulations are windowed using the order parameters ee and bp , respectively the distance of closest approach and the number of base pairs formed between complementary sticky ends. The window associated with the dimerized state is given by bp ≥ 1 bp ( ee = min ee , the small separation between base sites of the base paired nucleotides), while the window associated with the undimerized state is given by bp = 0 bp.

C. Computation of equilibrium constants
The cyclization reaction is unimolecular (Figure 1 (a) of the main text): where A and B are the open and cyclized states respectively, cyc is the forward rate constant (cyclization) and uncyc is the reverse rate constant (uncyclization). The equilibrium constant for cyclization, cyc eq , can be estimated directly from simulations of a unimolecular, isolated system as cyc eq = B A . (S2) Here A and B are the probabilities with which uncyclized and cyclized systems are observed in simulation The dimerization reaction is a bimolecular association of distinct molecules: In this case, the bimolecular equilibrium constant can be inferred from a simulation of a single pair of dimerizing monomers via in which [ 0 ] is the total concentration of strand type A in simulation, and A/B and AB are the probabilities with which monomers and dimers are observed in simulation. This expression follows from Eq. 7 of Ref. [S4]. The resultant dim eq obeys the standard relation for bulk systems in equilibrium, .
Recall from the main text that we use a definition for the equilibrium -factor eq that is consistent with Vafabakhsh and Ha (V&H) [S3]: This is a measure of the effective concentration of one sticky end in the vicinity of the other when the dimerization reaction involves distinct monomers, forming a heterodimer, with only one sticky end per monomer. Researchers often estimate dim eq using the dimerization of identical monomers, forming a homodimer, with palindromic sticky ends. In this case, given the same underlying interaction strength between sticky ends (and hence the same cyc eq ), dim eq would be increased by a factor of two due to combinatorial (4 times the number of possible dimers) and symmetry effects (a factor of 1/2 for a homodimer rather than a heterodimer) [S4, S5]. As a result, with this approach the equilibrium -factor is estimated as In the case of identical monomers with two nonpalindromic sticky ends, there is no pre-factor in thefactor estimate, as the combinatorial (twice the number of possible dimers) and symmetry effects (a factor of 1/2 for a homodimer) cancel. Overall, under the approximation of ideal behaviour of separate complexes, these three approaches are equivalent.

D. Structural criterion for kink detection
While it is often visually straightforward to identify kinks, automating their detection is not without difficulty. We have developed both energetic and structural criteria for identifying kinks, which are based on disruption of stacking interactions and changes in base orientation, respectively. In this study, we rely on the structural criterion; we have discussed the differences between the two criteria in detail in the supplementary material of [S6].
Structurally, we define a kink using the relative orientation of consecutive nucleotides. In oxDNA, the orientation of each nucleotide is unequivocally determined by two orthogonal unit vectors, the base-backbone vector and the base-normal vector. The base-backbone vector connects the backbone and base interaction sites of each nucleotide. The model is designed such that, in a relaxed duplex, the base-normal vector at index ,â , and at the consecutive index + 1,â +1 , are approximately parallel; that is,â ·â +1 ≈ 1. A kink is defined to be present ifâ ·â +1 < 0, a condition implying a more than 90 ∘ change in orientation of consecutive nucleotides along one strand. Naturally, for duplex regions, we consider kinking along either strand; although, usually if a kink is present in the duplex, then both strands are kinked. To limit false positives due to fraying, we do not include the first and last 3 pairs of nucleotides in the duplex region in our analysis.
In the fully cyclized configuration there are two "nicks" where the two ends of each strand meet. Kinks in nicked regions are treated slightly differently, as our criterion is only well defined along an intact strand. A kink at a nick is detected on the opposite strand. As kinks at a nick may diffuse slightly, we define a region of 3 base pairs on either d / bp Note Cyclization sequence 20/30 dimerization only ‡   CAG AAT CCG TGC TAC ACC TCC ACC GTT TCA   57/67   CAG AAT CCG TGC TAG TAC CTC AAT ATA GAC TCC CTT TGA CCT GAC TAT CCT CAC CTC CAC CGT TTC A   CGA TCA TGG AGT TAT ATC TGA GGG AAA CTG GAC TGA TAG GAG TGG AGG TGG CAA AGT GTC TTA GGC A   59/69   CAG AAT CCG TGC TAG TAC CTC AAT ATA GAC TCC CTT TGA CCC ATG ACT ATC CTC ACC TCC ACC GTT TCA   61/71   CAG AAT CCG TGC TAG TAC CTC AAT ATA GAC TCC CTT CTA ATT GAC TGA CTA TCC TCA CCT CCA CCG TTT CA   63/73   CAG AAT CCG TGC TAG TAC CTC AAT ATA GAC TCC CTT CTA ATT GAC CAT GAC TAT CCT CAC CTC CAC CGT TTC A   64/74   CAG AAT CCG TGC TAG TAC CTC AAT ATA GAC TCC CTT TAA GTT GAC CCA TGA CTA TCC TCA CCT CCA CCG TTT   side of the nick on the intact strand. If the intact strand is kinked in this 6 base pair region, then the molecule is considered kinked at that nick.
Since distinguishing fraying from kinking at low bp is problematic, we compute kinking only for the most probable values, e.g. bp = 8 − 10 for the s = 10 molecules depicted in Figure 2 of the main text. The lower bp configurations appear in our simulations only because of biased sampling and have a negligible contribution to the equilibrium kinking probability.

A. Dimerization equilibrium
The equilibrium constant for dimerization dim eq is length-independent to within two times the standard error of the mean for the oxDNA average-base parameterization (Table S2). Varying concentration suggests that the role of excluded volume effects is small between 336 nm and 2.69 µm.
For the sequence-dependent parameterization, there is some variation in dim eq , but this does not represent length-dependence per-se, rather it is most likely due to sequence variation in the bases at the interface between the duplexes and the sticky ends.
For the computation of eq and dyn , we use average values for 336 nm. For sequence-dependent results at bp = 73, a length where the dimerization equilibrium constant was computed, we use the appropriate lengthspecific value instead of the average.

B. Initial stress in cyclized system
There is a general trend towards a larger activation free-energy barrier to cyclization (Δ ‡ cyc ) at shorter bp . This is due to the bending stress imposed upon the system by the formation of the initial base pair ( bp = 1). At the very shortest lengths bp ≈ 30 − 45 , this trend is reversed because the complementary single-stranded sticky ends (of length s = 10) are sufficiently long relative to d that an initial base pair can form with less bending stress in the duplex region. States at the top of the free-energy barrier (i.e. bp = 1) for cyclization and dimerization at bp = 30, 101 are compared in Figure S1, clearly showing bending in the cyclized molecules, but not the dimerized molecules. Importantly, at bp = 30, the relatively long single-stranded region ( s = 1 /2 × d = 10) reduces the requirement to bend the duplex ( Figure S1(c)), compared to the strong bending necessary at bp = 101 ( Figure S1(a)).

Length of complementary sticky ends
We compare oxDNA to available V&H equilibrium data [S3] for d = 91, s = 8 − 10 . Specifically, V&H report the fractional occupancy of the cyclized state cyc . Akin to the nomenclature in Section S1 C, cyc + open = 1, where open is the fractional occupancy of the open (uncyclized) state.
Fractional occupancy can be a somewhat misleading measure by which to compare systems, because the entire transition region cyc ≈ 0.25 − 0.75 is covered by a free-energy difference of ∼ 2 B , while a similar freeenergy difference at the extrema ( cyc → 0 or 1) would be indistinguishable. We therefore make our equilibrium comparison in terms of Δ cyc , where OxDNA appears to be in good agreement with experiment; although, this may be partly coincidental as V&H do not report either the temperature (presumed to be 298 K) or a salt concentration (V&H give their imaging buffer cation concentration as [Na + ] = 500 − 1000 mm or [Mg 2+ ] = 10 − 30 mm; recall that oxDNA is parameterized to [Na + ] = 500 mm). At s = 9 and s = 10, oxDNA and experiment differ by ∼ B , while at s = 8, oxDNA under-reports experiment by ∼ 3 B (cyclized state is less likely in oxDNA than experiment).
As expected from the SantaLucia model [S7, S8] of nearest-neighbour thermodynamics, the linear relationship between oxDNA results in Figure S2 (a) is reflective of the ∼ 3 B stabilization of each additional base pair. A possible explanation for the discrepancy between oxDNA and V&H [S3] is that hairpin formation in the s region could decrease the fraction of cyclized molecules; but given the sequence, hairpin formation should be negligible. In particular, we cannot explain the discrepancy between oxDNA and the V&H data for s = 8, although we note that identifying a very low yield in experiment can be challenging, for instance due to the presence of impurities inducing large changes in Δ cyc .  Table S2. Equilibrium constant for dimerization ( dim eq ) and activation free-energy barriers to dimerization (Δ ‡ dim ) and undimerization (Δ ‡ undim ) as defined in the main text for both the oxDNA average-base and sequence-dependent parameterizations. Simulations were performed at = 298 K. The first column gives the lengths of duplex regions of the two monomers. The complementary sticky ends are of identical sequence to those of length s = 10 from in Table S1. Duplex sections are taken from comparable length sequences in Table S1. For the simulations at [Conc]− −336 nm, each monomer has the same duplex section as that with d = d1 = d2 in Table S1. For example, in the 20;20 simulation,each monomer has the same duplex section as the = 20, bp = 30 cyclization system from Table S1. For the simulations at [Conc]− −2690 nm, the relevant duplex section from Table S1 is split between ther monomers; d = d1 + d2 . Thus 10;10 corresponds to splitting the duplex section from the = 20, bp = 30 entry in Table S1 between the two monomers.

Nicks and mismatches
We examine the role of structural defects, namely nicks and mismatches, on cyclization using the average-base oxDNA parameterization. Unsurprisingly, we find that nicked and mismatches molecules cyclized more readily than their intact counterparts ( Figure S3).
V&H briefly examine the impact of nicks and mismatches on the rate of cyclization, recording cyc versus time ( Figure S3 in Ref. [S3]). It does not appear that all molecules have reached an equilibrium plateau in cyc , therefore a concrete equilibrium comparison to oxDNA is difficult. We do, however, observe good agreement between the oxDNA and the experimental observations of V&H regarding the stabilizing effect of various motifs (Table S3). However, we do not reproduce the unexpected similarity between the experimental values for intact sequences at bp = 69 and bp = 97. This is consistent with our inability to reproduce V&H's -factors for their shortest sequences.

D. Comparison to ligase experiments
Comparison to the ligase experiments of Du et al. [S9] and Cloutier & Widom (C&W) [S10] highlight some interesting features of the oxDNA model ( Figure S4) Table S3.
cyc eq for oxDNA and experiment (V&H [S3]). The experimental assays were not intended to probe the equilibrium behaviour of the systems; as such, the last point of kinetic cyclization versus time assay may not represent equilibrium values. To emphasize the qualitative difference between intact molecules and those with structural defects (nicks, mismatches), oxDNA results are for the average-base parameterization. * Converted from yield cyc to cyc eq using Equation S9. † ΔΔ is computed as the difference in the Δ of cyclization between intact states and those with structural defects (nick or mismatch). their data gives a weaker torsional stiffness (2.4 × 10 −28 versus 4.75 × 10 −28 J m), longer persistence length (47 versus 41.82 nm) and longer pitch length (10.54 versus 10.36 bp/turn) than for oxDNA. The comparison also highlights how relatively small deviations in DNA mechanical properties (in particular a ∼ 10 % change in the persistence length) may shift the apparent eq by an order of magnitude. The differences in SY WLC expression parameters are reasonable given the different solution conditions (the persistence length decreases with increasing salt concentration [S11, S12] and temperatures (e.g. eq is enhanced by a factor of ∼ 10 between 5 ∘ C and 42 ∘ C [S13]). We note that the oxDNA persistence length is reasonable for the high salt conditions used by V&H [S12]. Free-energy profiles for cyclization for (a) an intact duplex compared to a duplex with a single mismatch at bp = 69, and (b) intact versus nicked duplexes at bp = 97, using the oxDNA average-base parameterization and s = 10 for all sequences (Table S1). efficiency for teardrop configurations [S14]. We also note that because of the potential role of teardrop configurations, the torsional modulus extracted from fits to the SY expression should be interpreted with caution, and it is interesting to note that the values typically obtained from these fits [S5, S9, S15] are significantly lower than those from single-molecule experiments with magnetic tweezers [S16, S17].
Since it is likely that C&W conducted their experiment at too high a ligase concentration, it is difficult to characterize their -factor in terms of either dyn or eq . To isolate high ligase concentration as the culprit behind an apparent deviation from WLC eq , Du et al. used conditions identical to C&W, namely buffer ([Mg 2+ ] = 10 mm) and temperature ( = 21 ∘ C for Du et al., = 20 ∘ C for C&W and = 25 ∘ C for oxDNA). Intriguingly, C&W's results mainly lie inbetween oxDNA dyn and oxDNA eq .

E. Capture radius
Comparing results for dyn with a WLC model requires that the finite dimensions of the sticky ends be taken into account. Defining a "capture radius", a measure of the distance within which the sticky ends can hybridize, is a common way of accounting for the finite size of DNA sticky ends. As oxDNA naturally accounts for the finitesize of the sticky ends, we have the ability to test the The ensemble average of the oxDNA capture radius ⟨ capture⟩ as a function of length bp at fixed s = 10. Error bars represent the standard deviation. capture radius assumption.
In oxDNA, we define the capture radius capture as the distance between the first and last bases of the duplex region ( d ) at bp = 1 ( Figure S1 (a,c)). Nucleotide positions are used to report distance, instead of the midpoint along the helical axis, because fraying may occur at the ends of the duplex. The choice of nucleotide strand does not impact our results.
Our ensemble average capture radius, ⟨ capture ⟩ is shown in Figure S5. ⟨ capture ⟩ is roughly constant at bp ≈ 100 − 200 , increases slowly between bp ≈ 50 − 100 , and then more rapidly for bp ≈ 40 − 50 , before decreasing for bp 37. The saturation value (⟨ capture ⟩ ≈ 4 nm) is quite close to the capture radius of 5 nm that has been used to interpret V&H's experiments with 10-base sticky ends [S3, S18].
The free-energy cost associated with forming the initial contact is partitioned between that for bending the duplex and that for stretching the single-stranded stickyends. At shorter bp , the amount of bending required for initial binding increases, as does the stretching of the single-stranded sticky-ends. At short lengths, this trend is reversed when the length of the unperturbed duplex approaches the capture radius at bp = 1. Interestingly, the distribution of capture is roughly Gaussian, with similar standard deviation for all lengths.
While we explicitly disallow misbonding, allowing misbonding may slightly increase the capture radius, an effect that should be more prominent for longer stickyends. Overall, the 5 nm value for the capture radius used in the interpretation of V&H's experiments appears not unreasonable.