Affinity-Selected Bicyclic Peptide G-Quadruplex Ligands Mimic a Protein-like Binding Mechanism

The study of G-quadruplexes (G4s) in a cellular context has demonstrated links between these nucleic acid secondary structures, gene expression, and DNA replication. Ligands that bind to the G4 structure therefore present an excellent opportunity for influencing gene expression through the targeting of a nucleic acid structure rather than sequence. Here, we explore cyclic peptides as an alternative class of G4 ligands. Specifically, we describe the development of de novo G4-binding bicyclic peptides selected by phage display. Selected bicyclic peptides display submicromolar affinity to G4 structures and high selectivity over double helix DNA. Molecular simulations of the bicyclic peptide–G4 complexes corroborate the experimental binding strengths and reveal molecular insights into G4 recognition by bicyclic peptides via the precise positioning of amino acid side chains, a binding mechanism reminiscent of endogenous G4-binding proteins. Overall, our results demonstrate that selection of (bi)cyclic peptides unlocks a valuable chemical space for targeting nucleic acid structures.

S1 Supplementary experimental data S1.1 Phage titers and library complexity across successive selection rounds indicate successful selection  Figure S5: FRET melting at 5 µM for b-G4pep2 and b-G4pep3 was carried out in the presence of increasing amounts of unlabelled dsDNA, ssDNA and G4 (ckit-1) competitor. As expected, the unlabelled G4 significantly reduces observed ∆T m , but high specificity is seen against dsDNA and ssDNA in both bicyclic peptides.

S6
Fluorescence quench equilibrium binding assay S8 S1.6 Starting library characterisation Theoretical Library  Figure S9: Amino acid enrichment at each variable position for the 3×3 ckit-1 selection, calculated by P after /P before , the amino acid proportion at each position after and before selection. Highest enrichments are shown in green. The SeqLogo after selection is shown again for reference (represents P after ). S10 S1.8 Bicyclic peptides identified by phage display affinity selection against G4s 4×4, hTelo 3×3, ckit-1 Figure S10: Enriched peptide sequences in final rounds of selection against hTelo and ckit-1 (first 10), with respective percentage representation in each pool. '-' represents deleted amino acids. S11 S1.9 Bicyclic peptide characterisation  For the displayed structures, we only focus on the lowest energy structures. However, for these rigid peptides the lowest minima are representative, and insight can be gained from a comparison between structures. In Fig. S13 to S16, we illustrate the individual peptides, highlighting a number of features. The linker is shown in orange, the N and C terminal groups in blue and red, respectively, and the positively charged residues are coloured green.
None of the peptides contains a negatively charged amino acid. In Table S1, all sequences are shown, with the naming used in our analysis. Table S1: Overview of the sequences used in the simulations. Seq4 and Seq5 constitute the additional good binders from experiment. Seq7 and Seq8 are sequences lost after the first round of selection. The parts of the sequence in green indicate common sub-sequences, with red highlighting a key difference for b-G4pep1. In early stages of the work, we considered another sequence, Seq6, but decided not to pursue it; hence the labelling of the sequences. b-G4pep1 A C G S C P I S V C G NH 2 Yes b-G4pep2 A C P P I C I K F C G NH 2 Yes b-G4pep3 A C P R L C R R F C G NH 2 Yes Figure   Figure S16: Lowest energy structures for Seq7 (left) and Seq8 (right), viewed from the side (top), and from opposite the linker (bottom). These sequences, in contrast to Seq4 and Seq5, were lost in the evolution. As for Seq4 and Seq5, the peptides are fairly spherical. For Seq8 there is no charged residue, likely reducing the interaction strength with the G-quadruplex. For Seq7, it is apparent in the view from below, that the positive charge and the N-terminus are very close, which does not allow for simultaneous interactions with multiple parts of the DNA structure. S17

S2.1 Secondary structure analysis for the bicyclic peptides
We analysed the secondary structure of the minima identified on the energy landscape, and found highly conserved features for each individual sequence, indicating a rigid backbone for these bicyclic peptides. Table S2 shows the secondary structure identified using DSSP, 1 indicating assignments found in more than 70 % of structures for each sequence. Table S2: Observed secondary structure in the database of local minima. The key secondary structure has been highlighted: blue indicates turns, orange indicated bends, and green helices. The first loop is indicated with 1, the second loop with 2, and the cysteine residues connected to the linker are marked as C. This analysis employed CPPTRAJ 2 and DSSP. 1

S2.2 Radii of gyration and solvent-accessible surface area
The elongated shape of b-G4pep3 and the rigidity of of the backbone, especially for b-G4pep3, is evident from the distribution of the radius of gyration shown in Fig. S17 and the average values for the radius of gyration (see Table S3).
Similar behaviour is seen in the solvent-accessible surface area and its distribution, as detailed in Fig. S18 and Table S4. r gyr (Å) 5 6 7 r gyr (Å) 5 6 7 r gyr (Å) Figure S17: Histograms of the radii of gyration for all structures located on the energy landscape for b-G4pep1 (left), b-G4pep2 (middle) and b-G4pep3 (right). The average value (solid line) and standard deviation (dotted lines) are shown for each peptide. We observe smaller deviations in the radius of gyration and a larger average radius of gyration for the better binding peptide.

S3 Supplementary data for binding simulations
In this section, we present structural measurements from the molecular dynamics simulations.
We separated the analysis into different measures to identify how the individual components of the bound complex behave.

S3.1 RMS deviations in structure for the MD simualtions
In   Figure S19: Root-mean square deviation for the G-quadruplex in the MD simulations relative to the starting configuration. The largest deviations observed are around 2Å, associated with movement of the free nucleotides, and the G-quadruplex architecture itself is unchanged.

S3.2 Geometric measurements of the peptide position
In Fig. S20, S21 and S22, we analyse the behaviour of the peptide relative to the Gquadruplex. As the G-quadruplex is essentially unchanged, it is a good reference for the peptide motion, and hence the stability of the bound complex.

S4.3 Synthesis of bicyclic peptides
All solvents and reagents were purified by standard procedures 3 or used as supplied from   Raw reads were then trimmed to remove adapter sequences and low quality reads using fastx_clipper and fastx_quality_filter from http://hannonlab.cshl.edu/fastx_toolkit/index.html. Trimmed reads were split by inline barcodes with fastx_barcode_splitter.
Aligned sequences were then analysed with home-made R scripts.

Summary of downstream analyses
DNA sequence plots calculated the proportion of each base (A, C, G, T or Deletion) at each aligned position. DNA sequences were translated using standard E. coli codons, except for TAG which was translated as Gln (TG1 strain). Amino acid distributions were calculated by counting all amino acids in the bicyclic peptide variable regions, and evaluating as percentages of the total. Library diversity was calculated using no. of unique peptides total no. of peptides . Mutants were counted by detecting all amino acid deletions ('---' in base sequence) and any frameshifts ('NN-', 'N-N' or '-NN'). SeqLogos were plotted using WebLogo v.2.8.2 5 (https: //weblogo.berkeley.edu/logo.cgi), using the 'Frequency plot' setting, and data was plotted by multiplying each sequence by its proportion in 10,000 (e.g. a peptide sequence with 1 % proportion would be repeated 100 times). Amino acid enrichments were calculated by first calculating the proportion of each amino acid in each variable position, and then dividing the proportions after selection by the proportions before selection (P after /P before ).
Annealing G4 structures was performed as follows: heating to 95 • C for 10 min and then held at 4 • C for 1 h. 96-well plates used were Bio-Rad Hard-Shell PCR plates; thin walled, white, skirted, low profile (HSP9655).   13 For the linker, as it is a 1,3,5-methylated benzene, we used the standard all-atom representation types CA and HA, and CT and HC for the aromatic and tertiary carbons and hydrogens, respectively. We used CYX as the state for the linking cysteine residues, using default values for the angles and bonds on either side of the sulfur atom. The charges of the linker are zero overall. While the use of CYX would generally imply a S-S bond, the small difference in electronegativity between S and C (2.58 and 2.55 on the Pauling scale) allows the use of this residue type with a C-S bond. All sequences in Table S1 were used in this part of the study.

S5.2 Sampling procedure
Initial structures were constructed from sequence using XLEAP 14 to add the linker. After a single point optimisation using OPTIM, 15 we initiated a basin-hopping [16][17][18] global optimisation run in GMIN 19 from these structures consisting of 150,000 steps for each sequence.
The moves used were group rotations 20,21 of the side chains, with conservation of chirality.
The 250 structures with the lowest energy were used to seed a database of stationary points for discrete path sampling 22,23 (DPS) to construct a kinetic transition network (KTN). 24,25 Within DPS, we used the doubly-nudged elastic band (DNEB) algorithm [26][27][28][29] to locate transition state candidates, which were subsequently refined accurately using hybrid eigenvector-following. 30  Structures were docked by hand, using PyMOL. 35 We used a crystal structure for C-kit1 avilable from the PDB databse (id: 2O3M) 36 as the template. The bicyclic peptides were then added, with the linker facing away from the G-quadruplex. The peptides were brought close enough for interactions to occur, but without introducing atom clashes. Only the five sequences highlighted in Table S1 were used in this part of the study. The C-kit1 architecture exhibits two loops of free nucleotides, one at the 5 tetrad, consisting of nucleotides 11 and 12 (CT), and a longer one at the 3 tetrad, formed by nulceotides 16 to 20 (AGGAG). The AGGAG loop allows access from two sides to the 3 tetrad, but only one of them permits easy access to the tetrad. We considered three potential binding sites for each tested sequence, namely on top of the 5 tetrad, and the two possible ways of accessing the 3 tetrad. The three sites are shown in Fig. S23.

S6.2 Global optimisation
The basin-hopping [16][17][18] global optimisation runs in GMIN 19 for the docked structures consisted of 150,000 steps for each sequence. Group rotation moves 20,21 of the peptide side chains were employed, while conserving chirality. The calculations were GPU accelerated. [37][38][39] The chosen force field for the peptide was ff14SB 9 and for the G-quadruplex we used ff99+bsc0. 40 As for the landscape explorations, an implicit solvent model was adopted, and no explicit ions included. The G-quadruplex is not stable in these conditions, and we therefore used rigid bodies [41][42][43] to constrain the tetrads and preserve the structure.
The constraints and the moves for the peptide side chains allowed us to focus on the peptide, as thge main objective of the global optimisation was to find low energy bound Figure S23: Representation of the potential binding sites probed in our simulations. The open 5 tetrad side (red) is easily accessible. The two 3 tetrad sites are smaller. Direct access to the tetrad (blue) offers a number of potential interaction sites, whereas the second way of accessing the G-quadruplex (green) at this face is more restricted.
complexes for the different binding sites and sequences, identifying whether the low energy complexes are similar across the tested sequences.
We employed four different rigidification schemes for all sequences and binding sites, resulting in 12 basin-hopping runs for every sequence. The different rigidifaction schemes allow us to identify possible errors from the rigidification, and test the stability of individual tetrads in more detail. The different schemes used are given in Table S5. The three tetrads are tetrad 1 (T1) containing nucleotides 2, 6, 10 and 13, tetrad 2 (T2) containing nucleotides 3, 7, 14 and 21, and tetrad 3 (T3) consisting of nucleotides 4, 8, 15 and 22. T1 is the 5 tetrad, T3 the 3 tetrad, and T2 is the central tetrad.
Schemes 1 and 2 allow us to assess the importance of relative motion of the tetrads, while for schemes 3 and 4 we gain insight into the effect of peptide binding on the exposed tetrads.
For the peptide moves employed, the motion of the tetrads and the G-quadruplex as a whole will be in response to interactions with the peptide. Table S5: Rigidification schemes applied in BH runs to avoid dissociation and unfolding of the G-quadruplex. The two-loop and the five-loop are flexible in all cases.

S36
Applied rigidification Scheme 1 one rigid body containing all three tetrads Scheme 2 three rigid bodies, one for every tetrad Scheme 3 two rigid bodies, one for tetrad 1 and 2 Scheme 4 two rigid bodies, one for tetrad 2 and 3

S6.3 Molecular dynamics simulations
The molecular dynamics simulations serve two purposes: (a) they provide us with a more dynamical picture of the bound complexes, and (b) we use explicit solvents and ions, overcoming some limitations of the model employed in the global optimisation runs.
The simulations were carried out using the AMBER12 14 code with GPU acceleration. 37,38 The structures were solvated in a truncated octahedral box of TIP3P water molecules with a solvent buffer of at least 14Å, on each side. The two central ions were selectively added to the G-quadruplex, along with 30 Cl − ions and neutralising K + counterions. TIP3P specific ion parameters 44 were employed throughout.
The solvated system was initially minimised for 5000 steps with a 50 kcal mol −1 harmonic restraint on all atoms in the peptide and G-quadruplex as well as the central ions. A second minimisation was then performed without restraints. Minimisation was followed by 100 ps of heating, where the temperature was increased from zero to 300 K with harmonic restraints of 50 kcal mol −1 on all atoms in all amino acid residues and a time step of 2.0 fs. The temperature was regulated using a Langevin thermostat 45 employing a collision frequency of 0.2 ps −1 . The positional restraints were systematically reduced in three cycles of NVT dynamics each of length 50 ps at 2.0 fs steps, followed by 100 ps without constraints at 300 K.
A 250 ps restrained simulation at 300 K (Langevin thermostat 45 with a collision frequency of 1.0 ps −1 ) and a constant pressure of 1 atm was carried out to locally equilibrate the density of the solvent, followed by an otherwise identical simulation without harmonic constraints of 250 ps. This procedure was followed by a 50 ns production run in the NVT ensemble, with S37