Genome-wide DNA Methylation Signatures Are Determined by DNMT3A/B Sequence Preferences

Cytosine methylation is an important epigenetic mark, but how the distinctive patterns of DNA methylation arise remains elusive. For the first time, we systematically investigated how these patterns can be imparted by the inherent enzymatic preferences of mammalian de novo DNA methyltransferases in vitro and the extent to which this applies in cells. In a biochemical experiment, we subjected a wide variety of DNA sequences to methylation by DNMT3A or DNMT3B and then applied deep bisulfite sequencing to quantitatively determine the sequence preferences for methylation. The data show that DNMT3A prefers CpG and non-CpG sites followed by a 3′-pyrimidine, whereas DNMT3B favors a 3′-purine. Overall, we show that DNMT3A has a sequence preference for a TNC[G/A]CC context, while DNMT3B prefers TAC[G/A]GC. We extended our finding using publicly available data from mouse Dnmt1/3a/3b triple-knockout cells in which reintroduction of either DNMT3A or DNMT3B expression results in the acquisition of the same enzyme specific signature sequences observed in vitro. Furthermore, loss of DNMT3A or DNMT3B in human embryonic stem cells leads to a loss of methylation at the corresponding enzyme specific signatures. Therefore, the global DNA methylation landscape of the mammalian genome can be fundamentally determined by the inherent sequence preference of de novo methyltransferases.

I n mammals, most DNA methylation occurs at C-5 of cytosine bases. Cytosine methylation is a well-established epigenetic mark and is involved in the regulation of key biological processes, including tissue specific gene expression patterns, X-chromosome inactivation, transposon silencing, and genomic imprinting. 1−3 There are three main DNA methyltransferase enzymes, DNMT3A, DNMT3B, and DNMT1. DNMT3A and DNMT3B are de novo methylases that operate on both unmethylated and hemimethylated DNA. 4,5 In contrast, DNMT1 is a maintenance methylase that preserves methylation patterns during replication due to an inherent requirement for hemimethylated DNA. 6−8 Cytosine methylation is often described in terms of CpG and non-CpG contexts (i.e., CH, where H = A, T, or C), with the latter extended to include CHG and CHH categories based on symmetry. 9 Overall, this leads to palindromic (CpG), partially palindromic (CHG), and nonpalindromic (CHH) methylation sites. Most CpG sites gain methylation on both DNA strands in early mammalian embryonic development 1,10 and then remain highly methylated throughout development. Cytosines in CpG islands (genomic regions with a high frequency of CpG sites) associated with promoters are dynamically regulated and closely linked with gene expression. 11−14 Different tissues have distinct profiles of non-CpG methylation, and the highest levels are found in pluripotent stem cells and in the central nervous system. 11,15−18 In human embryonic stem cells (hESCs), ∼25% of total cytosine methylation occurs in a non-CpG context with 71% at CHG and 29% at CHH sites, while in human neurons, 53% of total methylated cytosines are at non-CpGs, of which >80% are at CHH sites. 18 As the maintenance methylation enzyme DNMT1 has no reported activity at non-CpG sites, 15 this raises basic questions about how distinctive DNA methylation landscapes are established and maintained.
Several factors shape the DNA methylome, including nucleosome positioning 19,20 and histone modifications. 21,22 The methylation-deficient DNMT family member DNMT3L has been reported to stimulate DNMT3A/B activity by enhancing the stability of enzyme complex recruitment to DNA or by an increased level of cofactor S-adenosyl-Lmethionine binding. 23−25 Genome engineering experiments of inserted artificial sequences in mouse stem cells have begun to uncover the contribution to methylation of the underlying genomic sequence, namely, CpG density and GC content. 26,27 Furthermore, transcription factor (TF) binding 27−29 and Gquadruplex DNA secondary structures 30 are implicated in protecting TF-bound regions and certain CpG islands from methylation, respectively. While such factors are critical for regulating DNA methyltransferases and influencing the distribution of DNA methylation, a key unanswered question that remains is how differential methylation patterns are imparted to different genomic sequences in the first place.
The preferential methylation of unmethylated CpG by DNMT3A/B and of hemimethylated CpG sites by DNMT1 has been studied biochemically and structurally. 7,31 Early work attempted to determine the flanking sequence preferences of DNMT3A/B at CpGs using four synthetic oligos with no assessment of non-CpG methylation. 32 Notably, no studies have considered a large and unbiased pool of competing substrates as a fair test of methylation preferences. It has also not been established whether CpG methylation is installed in a sequence specific manner by DNMT3A/B in the mammalian genome under physiological conditions. Recent DNA methylation maps show non-CpG methylation in nearly all human tissues, 16,33 but the question of whether DNMT3A or DNMT3B establishes these non-CpG methylation signatures is still elusive.
Herein, we describe a novel assay and systematic analyses that quantitatively interrogate DNMT3A and -3B enzyme specificity on a large and diverse set of cytosine contexts using unmethylated Escherichia coli genomic DNA as the substrate coupled with high-depth bisulfite sequencing analysis. We find that each enzyme shows distinct target sequence signatures that are unchanged upon boosted methylation activity by the inactive cofactor DNMT3L. We find that these signatures are naturally observed within the mouse and human DNA methylomes, demonstrating that the intrinsic substrate preferences of DNMT3A/B are critical for determining the distribution of DNA methylation in mammalian genomes.

■ MATERIALS AND METHODS
In Vitro Methylation Assay. Full-length recombinant human DNMT3A (Abcam, ab170408), DNMT3B (Abcam, ab170410), and DNMT3L (active motif, catalog no. 31414) were purchased from commercial providers; 100 ng of unmethylated E. coli genomic DNA (D5016, Zymo Research) was incubated at 37°C with 500 ng of DNMT3A, DNMT3B, or DNMT3L and 160 μM S-adenosylmethionine (SAM, catalog no. B9003S, NEB) in reaction buffer (50 mM Tris-HCI, 1 mM EDTA, 1 mM dithiothreitol, 5% glycerol, and 100 μg/mL bovine serum albumin) for 30, 120, and 240 min. For DNMT3L stimulation experiments, 200 ng of DNMT3A or DNMT3B and 200 ng of DNMT3L were incubated with 100 ng of E. coli DNA for 120 min. For comparison, 1 unit of bacterial CpG methyltransferase M.SssI (New England Biolabs), which has high methylation activity in vitro, was also incubated with DNA for 10, 30, and 240 min. After incubation, the reaction was terminated by the mixture being heated at 65°C for 20 min. DNA was then purified using a DNA Clean & Concentrator Kit (D4030, Zymo Research) and processed for high-throughput bisulfite sequencing.
Bisulfite Sequencing. Bisulfite libraries were prepared using a Pico Methyl-Seq Library Prep Kit (D5456, Zymo Research) by following the manufacturer's protocol. Briefly, DNA was treated with bisulfite conversion reagent at 98°C for 8 min and then at 54°C for 60 min. Converted DNA was purified and amplified using random priming. Amplified DNA was purified, adapted, and indexed. Libraries were pooled and sequenced on an Illumina NextSeq-500 platform using High-Output Kit ver. 2.5 (75 cycles) in single-end mode. The nonconversion rate was estimated to be 0.5% using E. coli DNA incubated with the inactive DNMT3L.
Calculating Sequence Context Occurrences Genomewide. The observed numbers of unique sequence contexts (flanking cytosine, CG, or CA dinucleotides) present in the forward and reverse strands of the λ, E. coli, and human reference genomes were obtained using bedtools ver. 2.27.0 34 and custom Python scripts. The observed number of occurrences for a given n k-mer was compared to the total number (t) of all possible sequence contexts, e.g., NCGN (n = 2; t = 16), NNCGNN (n = 4; t = 256), and NNNCGNNN (n = 6; t = 4096), and is represented in Figure S1.
Processing and Analysis of E. coli Bisulfite Sequencing Data. The quality of raw sequencing reads was evaluated using FastQC ver. 0.11.3 (https://www.bioinformatics. babraham.ac.uk/projects/fastqc/). Low-quality base calls were filtered, and Illumina TruSeq adapters were trimmed from the read's 3′ end using cutadapt ver. 1.123. 35 No reads smaller than 10 bp were kept (after adapter and base quality trimming). Following the read quality assessment, the first six bases of every read were also trimmed.
Bisulfite-converted reads were aligned to the E. coli K-12 MG1655 ASM584v2 reference genome (Ensembl Genomes release 41) using bismark ver. 0.19.0 36 with options non_directional−unmapped, and duplicated alignments were removed using deduplicate_bismark. Methylation calls were obtained using bismark_methylation_extractor with the option −CX_context. The sequence context for each cytosine in the E. coli genome was obtained using bedtools slop and bedtools getfasta. Only cytosines with at least 10 aligned sequencing reads were considered for further analysis.
Processing and Analysis of Mouse and Human Bisulfite Sequencing Data Sets. Public whole genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) data sets used in this study are listed in Table S1. Raw WGBS data sets from GEO were processed like the E. coli libraries, whereas RRBS data sets were quality trimmed and further clipped by three bases from the 5′ end using Trim Galore ver. 0.6.4_dev (https://www. bioinformatics.babraham.ac.uk/projects/trim_galore/). The GENCODE reference genomes 39 used were human release Biochemistry pubs.acs.org/biochemistry Article 28 (GRCh38.p12) and mouse release M18 (GRCm38.p6). For WGBS data sets, cleaned-up fastq files from human and mouse were aligned against GRCh38.p12 and GRCm38.p6, respectively, and subsequent reads were processed on a chromosomeby-chromosome basis. Unless otherwise stated, methylation on chromosome 1 of both mouse and human data sets was reported. For RRBS data sets, non-deduplicated aligned reads were processed for all chromosomes simultaneously, and methylation counting and visualization were performed as in the E. coli libraries. 40 For details about the bioinformatics data analysis, see https://github.com/sblab-bioinformatics/dnmt3a-dnmt3b.

DNMT3A and -3B Enzyme Sequence Preferences
Revealed by a High-Throughput Biochemical Methylation Assay. For a comprehensive study of methyltransferase enzyme sequence preferences, we aimed to biochemically capture a wide range of substrates that display sufficient sequence diversity and coverage to provide a fair and systematic collection of possible sequence targets. The 4.6 million bp E. coli genome is 51% G/C rich and contains 346670 CpG sites, which represents 96.6% of all possible NNNNCGNNNN (N = A, T, C, or G) sequences (63295 of 4 8 = 65536 total combinations), 96.6% of all NNNNC-ANNNN, and 99% of all NNNNCNNNN ( Figure S1). Thus, unlike previous studies using a limited range of CpG substrates (275 CpG sites altogether), 32 the E. coli genome has sufficient sequence context diversity to serve as an essentially unbiased substrate to investigate the sequence preferences of different methylases.
We then developed a biochemical assay to evaluate the methylation activity of recombinant full-length human DNMT3A or DNMT3B using unmethylated E. coli genomic DNA as the substrate, followed by methylation assessment through whole genome bisulfite sequencing and subsequent computational analysis. We sought assay conditions (10−60% total methylation) that avoided saturated methylation that would mask any differential activity. Either DNMT3A or DNMT3B was incubated with E. coli DNA for different time ranges (30,120, and 240 min) to provide a range of methylation levels for subsequent analysis. After bisulfite sequencing, average methylation at CpG and non-CpG contexts was calculated. The level of methylation at CpG sites increased with incubation time and ranged from 11% to 20% at 30 min and from 40% to 46% at 240 min for DNMT3A and DNMT3B, respectively ( Figure 1A). After 240 min, the level of non-CpG methylation was 2.7% at CHGs and 2.8% at CHHs for DNMT3A and 10.7% at CHGs and 5.3% at CHHs for DNMT3B. These results show that while DNMT3A and DNMT3B show a broadly similar level (41% and 47%, respectively) of CpG methylation after incubation for 240 min, DNMT3B has a relatively greater methylation activity for non-CpG sites (∼1.9−4-fold) compared to DNMT3A. Excluding biases introduced by bisulfite conversion, we also showed that the nonconversion level was 0.5% for both CpG and non-CpG contexts after incubating inactive DNMT3L with E. coli genomic DNA for 240 min ( Figure 1A). Furthermore, our results also show that DNMT3B but not DNMT3A has a >2fold methylation activity for CHG over CHH sequences ( Figure 1A), which implies an inherent sequence-dependent preference of DNMT3B.
To investigate the influence of sequence context on cytosine methylation by DNMT3A and DNMT3B, we ranked the median cytosine methylation levels for trinucleotide sequences with cytosine as the middle base [i.e., NCN ( Figure 1B,C)]. Both DNMT3A and DNMT3B showed a strong preference for CpG dinucleotides, resulting in more than 30% and 37% methylation, respectively. The next most methylated sequence context was for CpA dinucleotides, which was less than 4% and 15% methylation on all non-CpG sites after incubation for 240 min as in DNMT3A and DNMT3B, respectively ( Figure  1B,C). DNMT3B also showed more variable methylation than DNMT3B on CpG or CpA. The preference for CpG sites is Biochemistry pubs.acs.org/biochemistry Article independent of incubation time ( Figure S2A,B). An important control using M.SssI showed high methylation activity, and all trinucleotide sequence contexts are equally available for methylation without any preference ( Figure S2C), which also rules out any biases caused by sample processing and data analysis. Altogether, the results reveal the importance of bases flanking the substrate cytosine and the already known preference for CpG over non-CpG sites. Distinct DNMT3A and DNMT3B Sequence Preferences Are Directed by Flanking Sequences for both CpG and Non-CpG Contexts. To further investigate the sequence preferences of DNMT3A and -3B, we then explored the influence of both the 5′ and 3′ flanking bases for CpG sites by ranking the median methylation level at all known NCGN sequences. Notably, DNMT3A generally favors a pyrimidine (C or T) as the 3′ adjacent base with NCGC and NCGT sequences gaining the most (44%) and second most (38%) methylations, whereas conversely, DNMT3B prefers a 3′ purine base (G or A), with NCGG and NCGA sequences being most methylated (62% and 58%, respectively). We also observed that DNMT3B showed a preference for sequences with a T or A at the 5′ position in NCGG or NCGA contexts, respectively, whereas DNMT3A favors sites with C/A at the 5′ position (Figure 2A,B). DNMT3B also showed a greater spread in median methylation levels, from 17% to 67%, across different sequence contexts, while DNMT3A was more restricted ranging from 25% to 40% (Figure 2A,B). When longer flanking sequences (NNCGNN) were considered, a clear pattern of sequence preferences and differences between the two enzymes emerged ( Figure S4). For example, DNMT3A prefers TACGCC sequences (N = 3206; median level of methylation of 66.7%) and disfavors AGCGGG sequences (N = 2585; 12.9%), whereas DNMT3B prefers GTCGGC sequences (N = 2641; 73.9%) and disfavors GCCGTG sequences (N = 2570; 8.3%) ( Figure S4A,B). The differences in methylation range and sequence preference were independent of incubation time. There was no observed flanking sequence preference for the M.SssI control methylase ( Figure S3 and Figure 1C).  To determine whether additional flanking bases have an influence on preference, we extended our analyses to include four bases 5′ and 3′ of the CpG (i.e., NNNNCGNNNN) by calculating the consensus sequence logo of the top 1000 most methylated sequences after incubation for 30 min ( Figure  2C,D). Following from the strong enzymatic preference at the adjacent 3′ position for CpG substrates as highlighted before, DNMT3A also showed a strong preference for a T at the −2 position 5′ with NNTNCGNNNN representing 75% in all methylated sequences ( Figure 2C). In contrast, DNMT3B had a preference for T or A in both the −1 and −2 positions 5′ with NN[T/A]NCGNNNN or NNN[T/A]CGNNNN sequences representing >75% ( Figure 2D). Both DNMT3A and DNMT3B showed similar preferences for C at the +2 position 3′, and DNMT3A also showed a preference for A at the +3 position 3′ (Figure 2C,D). Longer incubation times ultimately led to full methylation at a wide range of sequence contexts ( Figure S4), obscuring intrinsic sequence preferences ( Figure  S5A,B).
For non-CpG dinucleotides, DNMT3A and DNMT3B showed higher activity at CpA than at CpC or CpT sites, with NNNNCANNNN representing 97% of all methylated sequences ( Figure 2E,F). The sequence preference of DNMT3A/B at non-CpG sites is similar to that at CpG sites, which was also independent of the incubation time before reaching the saturation level ( Figure S5C,D). Furthermore, DNMT3A and -3B each showed a similar preference for flanking sequences at the less methylated CT or CC dinucleotide sites compared to that of CA or CG sequences ( Figure S6). Overall, these in vitro methylation analyses unveil distinctive methylation signatures for human de novo methyltransferases in both CpG and non-CpG contexts, which reveals intrinsic enzymatic substrate specificities.
To examine the possible asymmetry of sequence preferences within a duplex context, we identified the 10-mer CpG sites that were both >60% methylated at the C (forward strand) and G (reverse strand) position. Then, 1748 heavily methylated duplex sites were found after incubation with DNMT3A, and 18062 sites for DNMT3B. Sequence logo analysis reveals a core [A/G]CG[T/C] signature for DNMT3A and a [C/ T]CG[G/A] signature for DNMT3B ( Figure S7). We found no evidence to support asymmetry in sequence preference. These signatures were self-complementary, in concordance with the flanking sequence signature of DNMT3A/B. DNMT3L Stimulates DNMT3A/B Activity without Altering Sequence Preference. DNMT3L is highly related to DNA methyltransferases, and though it does not have any methyltransferase activity per se, it is a key factor that stimulates de novo methylation. 23−25 Early work on DNMT3L suggested that it can modulate DNMT3A/B activity without changing the sequence preferences of DNMT3A/B. 32 However, this study focused on only a limited number of CpG sites and used near-saturation levels of methylation; thus, an unbiased and accurate assessment of the effects of DNMT3L remains open.
To further investigate how DNMT3L may affect DNMT3A/ B sequence preferences, we added full-length human recombinant DNMT3L to the methylation reaction together with DNMT3A or DNMT3B. To avoid methylation saturation due to increased overall methylation levels, 200 ng instead of 500 ng of DNMT3A or DNMT3B was used, which resulted in 14% of CG methylation for DNMT3A and 3.2% for DNMT3B ( Figure 3A). DNMT3L increased DNMT3A methylation activity by 3-fold and DNMT3B methylation activity by 11fold in a CpG context ( Figure 3A), which is consistent with previous reports. 23 −25 Methylation at non-CpG sites was also enhanced ( Figure 3B). Sequence logo analysis shows an   Figures S5 and S8). This suggests that the stimulatory effect of DNMT3L does not alter the flanking sequence preference for DNMT3A/B, which is consistent with the absence of any direct interaction between DNMT3L and DNA within a DNMT3A−DNMT3L tetramer complex. 41,42 Methylation Signatures of DNMT3A and DNMT3B in Mammalian Cells. To further expand our in vitro findings that revealed DNMT3A/B sequence preferences, we asked if the observed patterns hold true in cellular and physiological conditions. Subsequently, we explored the extent to which endogenous mammalian DNA methylomes are explained by the distinct specificities of DNMT3A and DNMT3B.
Mammalian DNMT3 protein sequences are highly conserved with 96% of amino acids (875 of 912) being identical between mouse and human DNMT3A, including 100% identical C-terminal residues and catalytic domains (508− 912). Human and mouse DNMT3B protein sequences are 88% identical (717 of 817). Due to this level of conservation, we anticipate that human and mouse DNMT3 enzymes will show equivalent sequence preferences, and therefore, we used  human or mouse methylation data sets interchangeably in the following analyses.
We examined the patterns of highly methylated sequences in wild-type (WT) J1 mouse embryonic stem cells (mESCs) compared to mESCs in which Dnmt1, -3a, and -3b have been genetically deleted (Dnmt triple-knockout or TKO cells) 43 with either Dnmt3a or Dnmt3b subsequently reintroduced ectopically. 44 In WT stem cells, no obvious pattern was observed in the top methylated 10-mer sequences (i.e., CpG ± 4 bases), which most likely reflects the close-to saturation levels of CG methylation (81.7%). In contrast, in TKO cells with reintroduced Dnmt3a or Dnmt3b, there is an average methylation level of 7.1% and 2.8%, respectively, in CpG contexts, which are nonsaturating and therefore allow the further analysis of sequence preferences (see Materials and Methods). The preferred sequence contexts were readily apparent at the methylated sites in TKO cells expressing either Dnmt3a or Dnmt3b ( Figure 4A,B). These signatures correspond closely to the patterns identified in the in vitro assay ( Figure 2C−F); namely, CpG and non-CpG sites followed by a 3′ pyrimidine gained more methylation when Dnmt3a was expressed and 3′ purine when Dnmt3b was expressed ( Figure 4A,B).
To explore if the same preferences persist in human cells, we then profiled methylation signatures in HUES64 human ESCs (hESCs) with either wild-type or DNMT knockout genotypes. 45 WT hESCs displayed a mixture of DNMT3A-type and DNMT3B-type methylation signature ( Figure 4C), which was not observed in mouse WT cells. We attributed this to the higher level of expression of DNMT3A/B in human HUES64 cells compared to mouse cells ( Figure S9A). Moreover, we observed that the DNMT3B-type signature emerges when DNMT3A is depleted, with later cell culture passages leading to more prominent effect ( Figure 4D). Similarly, removal of DNMT3B leads to the loss of the DNMT3B signature in early passages with the subsequent appearance of the DNMT3A signature, which suggests the slow dilution of DNMT3B-type methylation and accumulation of DNMT3A type over a period of 15 passages ( Figure 4E). Finally, DNMT3A and DNMT3B double knockout leads to a substantial loss of CA methylation (from 1.8% to 0.2%) and loss of DNMT3 signatures ( Figure  4F).
The clear difference in sequence preferences between DNMT3A and DNMT3B is at the 3′ base directly adjacent to the substrate dinucleotides. To further infer whether the methylation levels in CAC and CAG contexts are a good representation of the DNMT3A and DNMT3B methylation signatures, we calculated the average methylation at trinucleotides CAN (N = A, T, C, or G) in mouse and human stem cells and found that CAC gained more methylation compared to other trinucleotides when Dnmt3a was introduced. On the contrary, more methylation at CAG was observed when Dnmt3b was reintroduced, which is consistent with the preferences discovered before ( Figure S9B). In both human and mouse WT ESCs, the ratio between CAC and CAG methylation is close to 1, suggesting a balancing act between DNMT3A and -3B ( Figure 4G). Additionally, introduction of DNMT3A into mouse TKO cells (or removal of DNMT3B in human WT cells) led to ∼2−3-fold more CAC methylation; however, introduction of DNMT3B into mouse TKO cells (or removal of DNMT3A in human WT cells) led to more CAG methylation. In line with the inherent sequence preferences flanking CpG sites revealed in vitro, we also noted that TKO cells gain more CGC/CGT methylation after reintroduction of Dnmt3a and more CGG/CGA methylation after reintroduction of Dnmt3b ( Figure S9C,D). No significant change was observed in WT or DNMT3A-and DNMT3B-depleted hESCs in sequence contexts adjacent to CpG dinucleotides, which may be due to saturation levels ( Figure S9C,D).
The DNMT3A N-Terminal Domain Imparts Sequence Preferences. The distinct patterns of flanking sequence preferences for DNMT3A or DNMT3B at both CpG and non-CpG sites suggest that there are intrinsic enzyme structural features determining their specificity. To determine whether the N-terminal or the catalytic domain is a determinant for the sequence preferences of DNMT3A, we analyzed publicly available RRBS data sets generated in the Dnmt3a/b double knockout and Dnmt1 knocked down mESCs (DKO-zero) expressing either full-length (FL) or the catalytic domain (CD) of Dnmt3a. 46 The expression of either FL-or CD-Dnmt3a reinstated CpG methylation levels similar to that of WT cells. 46 The most methylated non-CpG sites in WT cells revealed a TNCA[C/G]C methylation signature combining the DNMT3A and DNMT3B's methylation signatures observed in vitro ( Figure 5A, and also Figures 2E and 4B). The knockout of Dnmt3a/b and knockdown of Dnmt1 abrogated methylation in DKO-zero cells, which resulted in no methylation signature ( Figure 5B). The reintroduction of full-length Biochemistry pubs.acs.org/biochemistry Article DNMT3A (but not the DNMT3A catalytic domain) restored the characteristic DNMT3A methylation signature observed in WT cells ( Figure 5C,D). Overall, this suggests that the Nterminal domain is a determinant for the sequence preference of DNMT3A.

■ DISCUSSION
A key challenge is to build an understanding of how the de novo methyltransferases DNMT3A and DNMT3B cooperate to establish the mammalian DNA methylome in early embryonic development. Evidence suggests that the underlying primary genomic sequence could be involved in the dynamic and recurring deposition of cytosine methylation in regulatory regions by sequence specific recruitment of transcription factors. 29 ectopically. Furthermore, depletion of DNMT3A in human HUES64 cells enhances a DNMT3B-type methylation pattern, especially in a CA context, while removal of DNMT3B leads to the appearance of a DNMT3A-type signature. Taken together, we propose that the intrinsic sequence preferences of DMNT3A/B should be taken into consideration when studying the establishment of tissue specific methylation patterns. From our analysis of mouse TKO stem cells and human DNMT3 knockout cells, it is evident that DNMT3A and DNMT3B impose methylation patterns in cells that resemble those seen in vitro from the corresponding purified recombinant enzymes in the absence of additional factors. This suggests that while the interaction with DNMT3L, 48−50 histone modifications, 21,22,44 or transcription factors 29 could modulate or guide the methylation capacity of DNMT3s at certain regions, the inherent enzyme sequence preferences shape a substantial part of the underlying methylation patterns globally.
While human DNMT3A and DNMT3B share ∼45% conservation across the whole protein, ∼80% of amino acids are conserved in the catalytic domain. This points to regulatory features outside the catalytic domain having evolved to provide each protein selectivity to methylate distinct genomic loci in different tissues and developmental stages. Epigenetic enzymes such as DNMTs and TETs are being deployed in a range of epigenetic engineering and biotechnological setups with potential clinical utility, and our examination of the intrinsic sequence preference of these enzymes could help guide the selection of DNMT3s for optimal activity.

■ CONCLUSIONS
In summary, we provide a comprehensive and robust quantitative analysis of the intrinsic sequence preferences for the enzymatic activities of de novo DNA methyltransferases on CpG and non-CpG target sites in vitro and in mammalian stem cells. The accurate determination of sequence preferences of de novo methyltransferases provides a new understanding of the origin of specific DNA methylation patterns in different cell lineages and regulatory regions.