Pyrrolysine-Inspired in Cellulo Synthesis of an Unnatural Amino Acid for Facile Macrocyclization of Proteins

Macrocyclization has been touted as an effective strategy to enhance the in vivo stability and efficacy of protein therapeutics. Herein, we describe a scalable and robust system based on the endogenous biosynthesis of a noncanonical amino acid coupled to the pyrrolysine translational machinery for the generation of lasso-grafted proteins. The in cellulo biosynthesis of the noncanonical amino acid d-Cys-ε-Lys was achieved by hijacking the pyrrolysine biosynthesis pathway, and then, its genetical incorporation into proteins was performed using an optimized PylRS/tRNAPyl pair and cell line. This system was then applied to the structurally inspired cyclization of a 23-mer therapeutic P16 peptide engrafted on a fusion protein, resulting in near-complete cyclization of the target cyclic subunit in under 3 h. The resulting cyclic P16 peptide fusion protein possessed much higher CDK4 binding affinity than its linear counterpart. Furthermore, a bifunctional bicyclic protein harboring a cyclic cancer cell targeting RGD motif on the one end and the cyclic P16 peptide on the other is produced and shown to be a potent cell cycle arrestor with improved serum stability.

Positive clones from the kanamycin selection were re-streaked onto LB agar containing ampicillin followed by incubation at 37°C overnight. Fresh clones were then inoculated into wells containing 150 µL of LB medium supplemented with 100 µg/mL ampicillin and grew at 37°C with shaking at 230 rpm for 5 to 6 h. These cultures were then used in the 2 nd screening based on mCherry fluorescence. Towards this end, 5 µL of each culture was added to a well of a 24-well plate containing 495 µL of LB medium or M9 glucose minimal medium (6 g/L Na 2 HPO 4 , 3 g/L KH 2 PO 4 , 1 g/L NH 4 Cl, 0.5 g/L NaCl, 3 mg/L CaCl 2 , 0.4% glucose and 1 mM MgSO 4 in MilliQ water) with supplementation of 50 µM IPTG, ampicillin and 2 mM ᴅ-Cys-ε-Lys, followed by incubation at 30°C overnight. 150 µL of each of the induced cultures was transferred to a black 96-well plate with clear bottom for measurements in a microplate reader (Tecan Infinite M1000 PRO). The normalized fluorescence intensity [measured fluorescence (Ex: 587 ± 5 nm; Em: 610 ± 5 nm) divided by the optical density at 700 nm] of each sample was used for selection. Cells that exhibited mCherry fluorescence stronger than the parent PylRS's were selected and cultured in 5 mL LB supplemented with ampicillin overnight for plasmid DNA extraction.
To validate that the selected PylRS mutants were true positives (i.e. do not use any endogenous amino acid as substrate and exhibit higher catalytic efficiency than the parent), the mCherry fluorescence screening was repeated in triplicate using freshly transformed colonies grown in LB medium supplemented with or without ᴅ-Cys-ε-Lys. A no ncAA control was included as a negative control.

Construction of pPylST.tL plasmid
To optimize the yield of the ncAA-containing protein, we employed the genetically recoded E.
coli strain C321.ΔA.M9adapted that has been engineered to enhance nonstandard amino acid incorporation 6 for protein overexpression. This E. coli strain, however, is incompatible with the aforementioned pETDuet-derived pPylST vectors for two reasons. Firstly, the strain is ampicillin-S4 resistant, so ampicillin cannot be used as the selection marker for the mutant library transformants, and secondly, it does not produce T7 RNA polymerase that recognizes the T7 promoter in pETDuet-based plasmids, and which is essential for transcription. Thus, in order to utilize the C321.ΔA.M9adapted cells, several modifications were made to the pETDuet-based pPylST-mCh(TAG) vector bearing the tRNA M15 and evolved PylRS EVF obtained from the directed evolution studies. In brief, these modifications involved (1) replacing the T7lac promoter located upstream of the 1 st multiple cloning site (MCS1) (harboring tRNA M15 and PylRS EVF genes) with the P tac promoter; (2) substituting the T7lac promoter located upstream of the 2 nd multiple cloning site (MCS2) (harboring a gene carrying an inframe TAG codon) with the PL lacO1 promoter; and (3) exchanging the ampicillin resistance gene (amR) with a streptomycin/spectinomycin resistance gene (smR). The primers used to amplify the gene fragments for the aforementioned modifications are provided in Supplementary Table 2, and the amplicons were assembled using Gibson Assembly to build the resultant pPylST.tL-mCh(TAG) plasmid ( Supplementary Fig. 2c).

mCherry readthrough assay
For evaluating the efficiency of different readthrough systems, E. coli competent cells (Rosetta 2(DE3) or C321.ΔA.M9adapted) cells were transformed with the relevant pPylST-mCh(TAG) or pPylST.tL-mCh(TAG) plasmid. The transformed cells were then inoculated into 50 mL of M9 medium containing appropriate antibiotics (ampicillin and/or spectinomycin) in a 96-well microtiter plate. After incubation at 30°C with shaking for 5 to 6 h, 250 μL of each culture was added into a 48-well plate.
The medium was supplemented with the appropriate antibiotic, 0.5 mM IPTG and different concentration of ᴅ-Cys-ε-Lys. The plate was then incubated at 25 °C overnight with shaking. The mCherry fluorescence intensities of the induced cultures were measured as described above. The assay was performed in triplicate.

Generation of PylC mutant library by site-saturation mutagenesis
Site-saturation mutagenesis was performed on the residues S177, E179, D233 and T256 of PylC fused to an N-terminal SUMO tag to facilitate purification using two sets of degenerate primers (primers: PylC-S177E179mut_Fwd, PylC-D233mut_Rev, PylC-D233mut_Fwd and PylC-T256mut_Rev). The SUMO-PylC full-length fragment was extended by overlap extension PCR using S5 another two sets of non-mutated primers (primer: rbs-NdeI-SUMO_Fwd, PylC-S177up_Rev, PylC-T256down_Fwd, and PylC-KpnI-pDuet_Rev). The extended SUMO-PylC mutant insert was cloned into pACYCDuet-1 vector whose promotor was replaced with P lpp to enable constitutive expression of PylC (primer: Plpp-rbs_Fwd). The reaction product yielded a PylC mutant library that could be screened in PylST expressing cells. Colonies grew on the positive selection plates were transferred into 96-well microtiter plates containing 150 µL LB medium supplemented with chloramphenicol, spectinomycin, 50 µM IPTG and 5 mM ᴅ-cysteine for quantitation of the readthrough efficiency based on mCherry fluorescence as described above. The top 50 mutants that produced mCherry fluorescence exceeding that produced from wild-type PylC were selected for plasmid extraction for subsequent DNA sequencing for the identification of the corresponding mutation(s).

Computational modeling
Modeling of PylRS C348V was based on the crystal structure of MmPylRS C-terminal domain (CTD) bound to adenylated pyrrolysine (PDB: 2ZIM). 7 The C348 residue was mutated to valine and the ligand was modified to adenylated ᴅ-Cys-ε-Lys in PyMOL 8 . Then the minimization of protein structure and analysis of surface hydrophobicity were performed using UCSF CHIMERA. 9 The MmPylRS and tRNA pyl complex model was manually built in PyMOL from two deposited structures: MmPylRS NTD-tRNAPyl complex (PDB: 5UD5) 10 and MmPylRS CTD bound with adenylated pyrrolysine (PDB: 2ZIM), using the D.hafniense PylRS CTD-tRNAPyl complex structure (PDB: 2ZNI) 11 as reference for alignment. S6 To generate the model of MmPylC bound to ᴅ-Cys-ε-Lys, model MmPylC bound to ᴅ-ornithineε-Lys was first build in the SWISS-MODEL 12 server taking the crystal structure of the M. barkeri PylC WT bound to ᴅ-ornithine-ε-Lys (PDB ID: 4FFM) 13 as a template. Then the ᴅ-ornithine-ε-Lys was replaced with ᴅ-Cys-ε-Lys in Pymol and the mutations (S177N, E179P, D233S, T256V) were incorporated into PylC chain. The final structure was refined by simulated annealing/molecular dynamics program from the CNS package. 14,15 Binding and surface hydrophobicity analyses were performed using UCSF CHIMERA.

Molecular dynamics simulation of P16p
MD simulation of P16p was conducted using GROMACS software 16

version 2021.4 with opls2001
force field and TIP3P water model. The topology file of ᴅ-Cys-ε-Lys was generated using LigParGen server [17][18][19] in GROMACS format. The initial peptide was solvated by a dodecahedral water box with approximately 4000 water molecules and neutralized by adding Clions. The solvated system was minimized by steepest descent method using a tolerance of 1000 KJ/mol·nm and step size of 0.01 nm.
The system was gradually heated from 0 to 298 K over 100 ps at the pressure of 1 bar. The production runs were carried out for 300 ns with a step size of 2 fs. The temperature was kept at 298 K by modified Berendsen thermostat with a time constant of 1 ps. The pressure was kept at 1 bar by Parrinello-Rahman scheme with a time constant of 2 ps and an isothermal compressibility of 4.5 ×10 -5 bar -1 . Particle mesh Ewald (PME) method was employed to calculate long-range electrostatic interactions and a cut-off distance of 1 nm was used to calculate the short-range electrostatic and van der Waals interactions. The LINCS algorithm was employed to constrain all covalent bonds involving hydrogen atoms. Independent 300 ns simulations of both peptides were run 3 repeats from the same initial structure. Another 10 rounds of 10-ns simulation were run in the same condition for structure comparison. RMSD and RMSF analyses were performed using algorithms in GROMACS with least squares fit calculated based on backbone atoms.

Plasmid construction of proteins targeted for ᴅ-Cys-ε-Lys incorporation
The optimized plasmid pPylST.tL was used for subcloning different protein constructs for the subsequent cyclization studies. In brief, gene fragments encoding the protein of interest harboring the UAG codon and intein-CBD-His 7 tag were inserted in between KpnI and NdeI sites by Gibson S7 Assembly (NEB). For the construct cycRGD-mCh-cycP16p, an N-terminal SUMO (Small ubiquitinlike modifier protein) tag was included upstream to enhance protein expression. The linear counterparts of the ᴅ-Cys-ε-Lys incorporated proteins under study were subcloned similarly except the UAG codon in the gene fragment for cyclized proteins was replaced with GCG codon that encodes alanine by mutagenesis using Pfu Turbo DNA polymerase (Agilent Technologies) following manufacturer's instruction and the primers O-to-Ala-mutant_Fwd and O-to-Ala-mutant_Rev.

Protein expression, purification and cyclization of ᴅ-Cys-ε-Lys-containing proteins
For the expression of protein using chemically synthesized ᴅ-Cys-ε-Lys, the pPylST.tL plasmids harboring the protein constructs for cyclization ( Supplementary Fig. 6) were transformed into E. coli At the end of the induction period, cells were harvested by centrifugation and resuspended in lysis buffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 1 mM PMSF, 1 mM benzamidine). Cells were lysed by sonication on ice for 20 min and purified by Ni 2+ affinity chromatography, except for SUMO-cycRGD-mCh-X-P16p-intein-CBD-His 7, which was purified using chitin resin to facilitate on-column cleavage by SUMO protease to remove the N-terminus SUMO tag and subsequent on-column cyclization as described below. Prior to cyclization, the purity of the purified protein was verified by SDS-PAGE.
Cyclization of the purified protein was achieved by the addition of 100 mM sodium 2sulfanylethanesulfonate (MESNA) to initiate intein cleavage and 2mM tris(2-carboxyethyl) phosphine S8 (TCEP, pH adjusted to 8.0) to maintain a reducing environment. The cyclization process was performed in room temperature for 3 h, after which, the reaction mixture was incubated with chitin resin (New England Biolabs) to remove the cleaved intein-CBD-His 7, and the cyclized protein was collected from the mobile phase. An additional cyclization step to cyclize the N-terminal RGD of the cycRGD-mCh-X-P16p construct was performed in which the protein was subjected to air oxidation at 4 °C with gently shaking for 24 h.
The expression of the linear counterpart of the cyclized proteins was performed in transformed E.
coli R2 strain cultured in LB media supplemented with 100 μg/mL ampicillin at 37°C/220 rpm until OD 600 reached 0.6, at which point, 0.5 mM IPTG was used to induce expression for 16 h at 25°C/220 rpm. The same purification procedure described above for the corresponding cyclized proteins was used to purify the linear counterparts. To remove the intein-CBD-His 7 , 50 mM DTT was added to the reaction mixture after which the intein-CBD-His 7 was similarly removed by chitin purification as previously described.

Electrospray ionization mass spectrometry
The protein band corresponding to GFP-X-P16p was excised, cut into 1 mm 3 pieces and destained by repeated wash steps using 50% MeOH/10 mM NH 4 HCO 3 , then dehydrated with ACN followed by vacuum drying. For protein reduction, 25 mM DTT was added and incubated at 56 ℃ for 1 h, followed by washing steps to remove remaining DTT before dehydration again. Trypsin was added to the dehydrated gel pieces and incubated at 37 ℃ overnight for digestion. Digested peptide was extracted from gel by sonication and the extracted samples were then separated by HPLC and analyzed on an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (Thermo Fisher Scientific). For mass spectrometric analysis of intact protein, purified GFP-cycP16p was incubated with 25mM DTT for 24 h, followed by desalting using Bio-Gel P-30 size exclusion resin and denaturation by 0.1% formic acid before subjected to HPLC-MS analysis on an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (Thermo Fisher Scientific). The capillary voltage was set to 3500 V. Spectra were acquired at a resolution of 120000 between 500-2000 m/z. Mass spectra were analyzed and deconvoluted by BioPharma Finder (Thermo Fisher Scientific) using Xtract algorithm.

Analytical size exclusion chromatography
To evaluate whether the cyclized P16p subunit on GFP-cycP16p will bind with CDK4, size analysis was performed on a mixture of GST-CDK4 (0.32 mg/mL, Sino Biological) and GFP-cycP16p

MCF-7 cells were seeded at 4.0×10 5 cells per well in 6-well plates and incubated in RPMI-1640
medium supplemented with 10% FBS and penicillin/streptomycin (P/S) for 20 h. Cells were washed twice with ice-cold PBS and lysed in NP-40 lysis buffer (50 mM Tris, 150 mM NaCl, 2 mM EDTA, 1% NP-40, 0.1% SDS, pH 7.5) supplemented with protease inhibitor cocktail (Roche), followed by incubation in low temperature with gentle shaking for 20 min. Cell lysate was clarified by centrifugation at 15000 rpm for 15 min at 4°C. The total protein concentration of the supernatant was measured using Pierce TM BCA protein assay kit (Thermo Fisher Scientific) and was diluted to l.5 mg/mL using PBS. S10 This bait protein solution containing CDK4/6 was then incubated with purified MBP-cycP16p immobilized on amylose resin at 4°C for 3 h with gentle mixing. The reaction mixture was washed 5 times with PBS, after which the resin was analyzed by western blotting using anti-CDK4 (1:500, Biolegend) following standard protocol. The same bait protein solution was incubated with amylose resin only as a negative control. Samples were resolved by 10% SDS-PAGE, transferred to nitrocellulose membranes and probed using pRb (Ser780) antibody (1:1000, Cell Signaling), pRb (Ser795) antibody (1:500, Cell Signaling) and βactin antibody (1:2500, Sigma-Aldrich). Proteins were visualized using the ECL system (Amersham).

Statistical analysis
Data from replicate experiments are presented as mean ± standard error (SD) of the mean.
Statistical significance is noted in the figure legend where appropriate. For comparison of data in different groups, ordinary one-way ANOVA with Tukey's multiple comparison test was performed using GraphPad Prism 7 software (GraphPad, San Diego, USA). A p-value less than 0.05 is considered statistically significant. *p<0.05, ***p<0.001, ****p<0.0001, ns, not significant.