Modular Oxidation of Cytosine Modifications and Their Application in Direct and Quantitative Sequencing of 5-Hydroxymethylcytosine

Selective, efficient, and controllable oxidation of cytosine modifications is valuable for epigenetic analyses, yet only limited progress has been made. Here, we present two modular chemical oxidation reactions: conversion of 5-hydroxymethylcytosine (5hmC) into 5-formylcytosine (5fC) using 4-acetamido-2,2,6,6-tetramethylpiperidine-1-oxoammonium tetrafluoroborate (ACT+BF4–) and further transformation of 5fC into 5-carboxycytosine (5caC) through Pinnick oxidation. Both reactions are mild and efficient on double-stranded DNA. We integrated these two oxidations with borane reduction to develop chemical-assisted pyridine borane sequencing plus (CAPS+), for direct and quantitative mapping of 5hmC. Compared with CAPS, CAPS+ improved the conversion rate and false-positive rate. We applied CAPS+ to mouse embryonic stem cells, human normal brain, and glioblastoma DNA samples and demonstrated its superior sensitivity in analyzing the hydroxymethylome.

CAPS+ library construction and sequencing mESC, human normal brain and glioblastoma gDNA was spiked with 0.5% of CpG-methylated lambda DNA and 0.25% of unmodified 2 kb spike-in control. DNA samples were fragmented by Covaris M220 and size-selected to 300-500 bp with 0.55-0.85× Ampure XP beads according to the manufacturer's protocol. The sonicated DNA was additionally spiked with 0.05% of 144mer spike-in after size-selection with Ampure XP beads. End-repair and A-tailing reaction and ligation of NEBNext Adaptor for Illumina were prepared with KAPA HyperPrep Kit according to the manufacturer's protocol. The uracil in the loop of NEBNext Adaptor was removed by adding 6 l of USER Enzyme (New England Biolabs) to the ligation reaction and incubating at 37 °C for 45 min. The reaction was purified with 0.8× Ampure XP beads by washing twice with 80% acetonitrile/water (v/v). ACT + BF4oxidation and Pinnick oxidation were performed as described above. Borane reduction was then performed in a 50 l reaction using optimized condition at 37 °C and 850 r.p.m. for 6 h and purified by Zymo-IC column with Oligo Binding Buffer 4 . Converted DNA was amplified with NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1) and KAPA HiFi HotStart Uracil+ ReadyMix PCR Kit for 4 cycles according to the manufacturer's protocols. The PCR product was purified with 0.9× Ampure XP beads and quantified with Qubit dsDNA HS Assay Kit (Thermo Fisher) according to the manufacturer's protocol. Libraries were sequenced on NextSeq 2000 (150 bp paired end) with no PhiX added.

Published data
Related published data we used were downloaded from the Gene Expression Omnibus (GEO) database: CAPS (GSE155613) 4 ; TAB-seq for mESCs (GSE36173) 12 ; ACE-seq (GSE116016) 13 ; TAB-seq for adult prefrontal cortex (GSE46710) 14 . TAB-seq data for mESCs were reprocessed as previously described to obtain modified and unmodified CpG sites 12 . Blacklisted regions and common single nucleotide variants (SNVs) were also excluded. 5hmCG methylation calls in adult prefrontal cortex were downloaded from GSE46710 and converted from hg19 to hg38 using CrossMap.py 15 .

Pairwise comparison of CAPS+
To calculate correlation between two mESC replicates, 5hmCG raw signals (T/C+T) were calculated in 10 kb genomic bins. The Pearson correlation coefficient (Pearson's r) was calculated by cor function in R. The results were visualized by SmoothScatter in R. Pairwise comparisons between CAPS+ and CAPS, TAB-seq as well as ACE-seq were performed similarly.

Coverage analysis for CAPS+, CAPS and ACE-seq
CpG islands were divided into 10 equal-sized bins, while the 4 kb flanking regions were binned into 20 windows. Average per strand CpG coverage was calculated as the sum of modified and unmodified cytosines. Reads from two replicates of CAPS+ were merged and subsampled to the similar coverage of ACE-seq using samtools view with the parameter (-s 0.60).
Statistical testing of high confidence 5hmC sites and genomic annotations As previously described, a binomial test was used to estimate high confidence 5hmCG sites 4 . A p-value for each site was calculated by comparing the average false-positive rate measured by 2 kb unmodified spike-in between replicates. 5hmCG sites whose per-strand coverage was lower than 5 were filtered. 5hmCG sites with Benjamini-Hochberg (BH) adjusted p-value less than 0.01 were identified to be high confidence. For annotations of high confidence 5hmCG sites, genomic features of mouse genome were downloaded (https://github.com/gireeshkbogu/chromatin_states_chromHMM_mm9) 16 . High confidence 5hmCG sites in each regulatory genomic region were found by bedtools intersect 17 and the counts in each category were normalized by genomic coverages of corresponding category regions. To investigate the enrichment of 5hmCG sites, all detected CpG sites were sampled 10 times to generate the background distribution in different genomic elements.

Genomic view
Methylation calls in browser extensible data (BED) format were converted to bigwig (BW) format by bedGraphToBigWig 19 . 5hmCG signals in regions of interest were visualized by Integrative Genomics Viewer (IGV) 20 .
Comparison of 5hmCG signals between human normal brain and glioblastoma samples 5hmCG raw signals (T/T+C) in both normal brain and glioblastoma samples were calculated in 1 kb genomic bins. Figure S1. ESI-MS characterization of 11mer oligonucleotide treated with ACT + BF4 -. To better distinguish the newly generated 5fC oligonucleotide from the 5hmC oligonucleotide, we labelled 5fC via hydroxylamine chemistry to enlarge mass differences 21 . Calculated mass is shown in black. Deconvoluted mass is shown in red. All experiments were performed once. Figure S2. Buffer compatibility of ACT + BF4oxidation of 5hmC. In addition to phosphate buffer (pH 7.5) (results also shown in Figure 2c), various buffers such as phosphate and acetate buffer with different pH were also compatible with ACT + BF4oxidation. Reactions were conducted on 79 bp 5hmC-containing model DNA. Conversion rate was calculated by concentrations of 5hmC and 5fC. Conversion rate = Concentration5fC / (Concentration5hmC + Concentration5fC). Data are presented as means of two independent replicates. Figure S3. Mechanistic overview of ACT + BF4oxidation of 5hmC. The oxidative species is ACT + , a positively charged ion, which could access double-stranded DNA easily and overcome the fundamental limitations of negatively charged ruthenium-based oxidants: it is relatively difficult for RuO4or RuO4 2to approach the 5hmC sites without denaturation; however, denatured DNA is more prone to damage when complete conversion of 5hmC is required, thus making it hard to find a balance between high reaction efficiency and high DNA recovery. Although some transition metal cations (such as Cu 2+ , Ag + and Fe 3+ ) also bear one or more positive charges and can be involved in alcohol oxidation as a catalyst or terminal oxidant, these compounds may result in significant DNA damage due to their potential Lewis acid or redox activity [22][23] . In comparison, ACT + BF4could work in a non-catalyzed and stoichiometric manner and thus could offer milder conditions for DNA oxidation. (a) Proposed mechanisms of ACT + BF4oxidation of 5hmC into 5fC. A direct intermolecular hydride transfer step is involved. After oxidation, the ACT + ion would be transformed into the corresponding hydroxylammonium ion, so the pH of reaction mixture would not change significantly. (b) Schematic explanation of the resistance of 5fC towards ACT + BF4oxidation. Although ACT + BF4has the potential to oxidize a hydroxyl group directly to carboxylic acid 24-25 , we did not observe the formation of 5caC, possibly due to the rather low hydration equilibrium constant of 5fC, particularly in a neutral aqueous solution at ambient temperature. Figure S4. Mechanistic overview of Pinnick oxidation of 5fC. (a) Proposed mechanisms of Pinnick oxidation. This mechanisms suggest that chlorous acid (HClO2), which is formed in situ by protonation of chlorite anion (ClO2 -), would serve as the actual oxidant and react with the aldehyde group through a five-membered ring fragmentation process to generate the corresponding carboxylic acid and hypochlorous acid (HOCl) as a by-product 26 . Instead of geminal diols, a critical intermediate (shown in blue) is believed to play an important role in Pinnick oxidation. (b) Common scavengers in Pinnick oxidation and their scavenging reactions. Since HOCl is highly reactive and oxidative, it would consume ClO2and lead to undesired chlorination or degradation of substrates 27 . To eliminate the negative impact of HOCl, scavengers which are more reactive towards HOCl than the substrates would be added into the reaction system in large excess. However, only 2-methyl-2-butene and dimethyl sulfoxide (DMSO) have been tested, because both these compounds and their scavenging products would not interfere with Pinnick oxidation or induce damage to DNA. Sulfamic acid will influence the pH of reaction mixture. Phenol-type scavengers (such as resorcinol) are not stable enough to air. Hydrogen peroxide (H2O2) could cause oxidative damage to DNA, and it would also react with HOCl to generate highly reactive singlet molecular oxygen (O2 1 Δg).    . Schematic overview of reactivity of 5fC and 5caC towards soft nucleophile. Since both borane reduction and bisulfite-mediated deamination start with a critical conjugate addition (1,4-addition) of the C 5 -C 6 double bond of the pyrimidine ring, potential 1,2-addition reactivity of 5fC may result in incomplete conversion: the 1,2-adduct is much less electron-deficient than 5fC and therefore unsuitable for a second 1,4-addition. Compared to the formyl group in 5fC, the carboxylic acid group in 5caC is much more inert and less likely to undergo 1,2-addition. As a result, 5fC could serve as a worse substrate than 5caC in borane reduction and bisulfite sequencing. Figure S9. Comparisons of efficiency of ACT + BF4oxidation, Pinnick oxidation (both used in CAPS+) and ruthenium-based oxidation (used in CAPS) on 300-500 bp fragmented mESC gDNA. In addition to complete transformation of 5hmC to 5fC by ACT + BF4oxidation, combining it with Pinnick oxidation further achieved 95.5% oxidation of 5hmC to 5caC, which was comparable to the efficiency of TET oxidation 1 . Data are shown as mean ± s.d. of four independent experiments (n = 4).      Figure S16. Pie chart shows the overlap of called 5hmCGs with putative genomic regulatory elements and the relative enrichment of 5hmCGs (blue) and random sites (white) at genomic regulatory elements in mESC. 'Random' consists of ten random samplings. The mean is shown as the bar height and the error bars denote standard deviation (n = 10 random sampling events). The ratios between observed and random are shown at the top. Figure S17. Correlation density plot between CAPS+ signals in human normal brain and TAB-seq signals in adult prefrontal cortex in 10 kb bins. The color scale represents density. Table S1. Sequencing metrics and sample information.