Imaging-Based In Situ Analysis of 5-Methylcytosine at Low Repetitive Single Gene Loci with Transcription-Activator-Like Effector Probes

Transcription-activator-like effectors (TALEs) are programmable DNA binding proteins that can be used for sequence-specific, imaging-based analysis of cellular 5-methylcytosine. However, this has so far been limited to highly repetitive satellite DNA. To expand this approach to the analysis of coding single gene loci, we here explore a number of signal amplification strategies for increasing imaging sensitivity with TALEs. We develop a straightforward amplification protocol and employ it to target the MUC4 gene, which features only a small cluster of repeat sequences. This offers high sensitivity imaging of MUC4, and in costaining experiments with pairs of one TALE selective for unmethylated cytosine and one universal control TALE enables analyzing methylation changes in the target independently of changes in target accessibility. These advancements offer prospects for 5-methylcytosine analysis at coding, nonrepetitive gene loci by the use of designed TALE probe collections.


Plasmid cloning
Plasmid pAnJ1861, for expression of TALEs fused to three N-terminal mCherry, was cloned by Gibson assembly 1 . For this, TALE expression vector pAlM1577 2 , containing N-terminal mCherry, was linearized using primer o3661 and o3662. The two inserts mCherry_2 and mCherry_3 were amplified using o3657 and o3658 for mCherry_2 and o3659 and o3660 for mCherry_3. Gibson assembly was performed with a 2:1 insert:vector molar ratio.
To generate plasmids for expression of TALEs fused to 20 or 30 N-terminal FLAG tags, restriction sites AgeI and NheI were introduced into vector pAnI521 3 via QuikChange sitedirected mutagenesis (Agilent) using primers o4468 and o4469 for AgeI and o4470 and o4471 for NheI. A pre-existing AgeI site was removed using o4472 and o4473 via site-directed mutagenesis. To generate pPiB2670, containing a 10x-FLAG array, two inserts containing 5x-FLAG sequences each (see Appendix for sequences), were amplified using o4493 and o4494 for insert_1 and o4495 and o4496 for insert_2. For following ligation, modified vector pAnI521 was restricted with AgeI and NheI, insert_1 with AgeI and BamHI and insert_2 with BamHI and XbaI. Vector and insets were ligated with T4 ligase (New England Biolabs, #M0202T) using a 3:1 insert:vector molar ratio. Site-directed mutagenesis using o4587 and o4588 was performed to introduce a start codon. For generation of pPiB2683 containing a 20x-FLAG array, modified pPiB2670 was restricted with AgeI and NheI and ligated with restricted insert_1 and 2 as described above. For cloning of pPiB2700 containing a 30x-FLAG array, the before described restriction and ligation was repeated with pPiB2683 as vector.
The plasmid pCrW2056 for expression of TALEs fused to 24 N-terminal GCN4 tags was cloned by amplification of the 24x-GCN4 insert from pAlM1103 (Adgene, #60910) using primers o3935 and o3936. The vector pAnI521 was linearized with restriction enzymes NdeI and NotI and ligated to the insert by Gibson assembly with a 2:1 insert:vector molar ratio.
TALEs were assembled as previously described by Golden Gate Assembly 4 (see Table X for detailed RVD composition). To generate plasmids coding for TALE proteins in frame with a C-terminal His6-Tag and different N-terminal tags, the plasmids pAnI521 for 1x-GFP, pAlM1577 for 1x-mCherry, pAnJ1861 for 3x-mCherry, pPiB2683 for 20x-FLAG, pPiB2700 for 30x-FLAG and pCrW2056 for 24x-GCN4 were used as entry vectors in Golden Gate 2 reactions.
To generate the active DNMT3a3L vector pCoT3181, plasmids pAlH1894 and pJaW876 2 were restricted with AcsI and RcoRI and ligated with T4 ligase using a 2:1 insert:vector molar ratio. Generated plasmid pCoT3180 was linearized with NotI and ligated by Gibson assembly with the CMV-EBFP2 insert, amplified from plasmid EBFP2-N1 (Adgene, #54595) with primers o5028 and o5029, resulting in pCoT3181. Mutation E756A for catalytically inactive DNMT3a3L was introduced by site-directed mutagenesis using primers o2038 and o2039 to generate pAnJ3188.

TALE expression and purification
TALEs were expressed and purified as described previously 5 . Briefly, TALE plasmids were transformed in electrocompetent BL21 DE3 Gold E.coli cells and grown on LB carbenicillin (Carb, 100 mg/mL) agar plates at 37 °C overnight. 5 mL LB medium supplemented with 100 mg/mL carbenicillin were inoculated with a single colony and incubated for 4 h at 37 °C and 220 rpm. This starter culture was transferred to a flask containing 100 mL of LB + Carb and incubated under the same conditions until a OD600 of 0.6 arbitrary units (au) was reached. TALE expression was induced by addition 0.4 mM IPTG. For expression of TALEs fused to mCherry tags, cultures were incubated at 18 °C and 220 rpm overnight. Expression cultures of TALEs fused to GFP tag, FLAG tags or SunTag were incubated at 37 °C and 220 rpm for 4 h. Cells were harvested by centrifugation at 3000 g at 4 °C for 20 min. The pellet was kept at -20 °C for 2 h and resuspended in 10 mL Deep Lysis Buffer (10 mM Tris-HCl, 300 mM NaCl, 2.5 mM MgCl2, 5 % DMSO, 0.2 % sodium lauroyl sarcosinate (AppliChem), 0.1 % Triton X-100, pH = 9) containing 1 mM PMSF, 1 mM DTT and 50 μg/mL lysozyme (Sigma Aldrich). Cell lysis was aided by sonication on ice (3 min; 20 % amplitude; 4s on, 2s off). Samples were centrifuged at 14000 g at 4 °C for 20 min to remove cell debris. The supernatant was incubated with 0.5 mL HisPur™ Ni-NTA Resin (ThermoFisher Scientific, #88221) overnight at 4 °C spinning on a rotating wheel. The beads were collected and washed with PBS, twice with Lysis buffer (10 mM Tris-HCl, 300 mM NaCl, 2.5 mM MgCl2, 0.1 % Triton X-100, pH = 9) + 20 mM imidazole + 1mM DTT and three times with Lysis buffer + 50 mM imidazole + 1mM DTT. TALEs were eluted by incubating the beads with 1 mL Lysis buffer + 500 mM imidazole + 1 mM DTT shaking at 800 rpm at 4 °C overnight. Samples were centrifuged at 12000 g for 5 min and the supernatant was purified with Amicon™ Ultra-0.5 Centrifugal Filter units (Merck, MWCO: 100 kDa, #UFC510024) by centrifugation at 14000 g for 10 min at 4 °C. For washing, the volume of the sample was filled up to 500 µL with TALE Storage buffer (200 mM NaCl, 20 mM Tris, 10% glycerol, pH = 7.5) + 1 mM DTT and centrifuged at 14000 g for 10 min at 4 °C. Washing was repeated three times. Samples were recovered by centrifugation at 1000 g for 2 min and filled up with TALE Storage Buffer to a volume of 500 µL. Samples were centrifuged at 14000 g at 4 °C for 5 min and aliquots were snap-frozen with liquid nitrogen and stored at -80 °C. Protein concentrations were measured by BCA using Microplate BCA Protein Assay Kit -Reducing Agent Compatible (ThermoFisher Scientific, #23252) following manufacturer's instructions.

Image processing and analysis
Image processing and analysis was performed as described previously 2 . The intensity and subcellular localization of foci was analyzed from z-projections of image stacks (1344 × 1024 pixels, 12 bits) with maximal intensity using the FIJI distribution of ImageJ. To subtract the background, the mean intensity of an out-of-interest region was measured from each channel and subtracted from the stack. Nuclear regions were selected from DAPI images (10 μm 2 minimum area, circularity between 0.5-1.0). To analyze intensity and size of the foci in mCherry images, the "GaussFit OnSpot" plugin was applied, using elliptical shape and Levenberg Marquard fit mode with a rectangle half size of 10 pixel. Spots larger than 12 pixel or outside the nuclear regions were excluded and the prominence (signal-to-noise ratio) was adjusted for each condition to only select foci-like objects. The generated mask from mCherry images was applied to the eGFP images to measure the mean fluorescence intensity. Image processing was performed in batch, utilizing an ImageJ macro script. For each nucleus, the number, size, and intensity of the associated foci in the mCherry and EGFP images was recorded.

Data analysis and statistics
Data analysis and plotting was performed with R as described previously 2 . For each TALE, the log transformed mean fluorescence intensity of each focus was normalized to the average fluorescence intensity of all foci from the associated HCT116 DKO or DNMT3a3L KO transfected sample for each experiment. Graphs were plotted using the ggplot2 library. For statistical analysis a Stutdent's t-test was applied with GraphPad, considering the number of independent experiments as sample size (N ≥ 3 independent experiments in every case).      Chromosome  Target sequences TALE_SatIII  1  213  2  29  3  30  4  92  5  289  6  0  7  175  8  0  9  27997  10  231  11  0  12  2  13  625  14  1127  15  4297  16  3  17  443  18  0  19  0  20  395  21  1022  22 1197 x 0 y 451  Figure S1.     (see table). These TALEs turned out to be weak binders not usable for MUC4 staining with sufficient signal/noise ratios. TALEs differ in absolute nuclear background intensity, suggesting different numbers of off-target sequences. In case of CpG-containing target sequences, only the HD versions showed increased background intensities for the DKO cells, which can be explained by response to mC in off-target sequences containing a CpG opposite the HD position. For CpG-free targets, several TALEs did not show increased background signals in DKO cells, possibly due to an absence of mC in the off-target sequences. TALE single-stainings were followed by immunostaining with Rabbit anti-mCherry and Goat anti-Rabbit Alexa Fluor Plus 594 prior data acquisition.  Target sequences   TALE_B1:  TALE_B2:  TALE_B3:  TALE_B4:  TALE_B5:  TALE_B6:   TATCCACAGGTCACGCCA  TCACGCCACCCCTCTTCC  TCAGTATCCACAGGTCAC  TGACCTGTGGATACTGAG  TTCCTCAGTATCCACAGG  TGTGGATACTGAGGAAGC a) b) c) d) e) f) d S19 Figure S12. Sanger sequencing traces of PCR product from bisulfite converted DNA from HCT116 wt and HCT116 DKO cells. Sanger sequencing traces A and B were obtained from two different PCR products. TALE_M3 target sequence highlighted in purple. CpG position highlighted in light blue.