Evolutionary Trajectories for the Functional Diversification of Anthracycline Methyltransferases

Microbial natural products are an important source of chemical entities for drug discovery. Recent advances in understanding the biosynthesis of secondary metabolites has revealed how this rich chemical diversity is generated through functional differentiation of biosynthetic enzymes. For instance, investigations into anthracycline anticancer agents have uncovered distinct S-adenosyl methionine (SAM)-dependent proteins: DnrK is a 4-O-methyltransferase involved in daunorubicin biosynthesis, whereas RdmB (52% sequence identity) from the rhodomycin pathway catalyzes 10-hydroxylation. Here, we have mined unknown anthracycline gene clusters and discovered a third protein subclass catalyzing 10-decarboxylation. Subsequent isolation of komodoquinone B from two Streptomyces strains verified the biological relevance of the decarboxylation activity. Phylogenetic analysis inferred two independent routes for the conversion of methyltransferases into hydroxylases, with a two-step process involving loss-of-methylation and gain-of-hydroxylation presented here. Finally, we show that simultaneously with the functional differentiation, the evolutionary process has led to alterations in substrate specificities.

A nthracyclines are microbial natural products that harbor significant antiproliferative activities, and several metabolites, such as doxorubicin (1, Figure 1A) and aclacinomycin A (2, Figure 1A), have been widely used in cancer chemotherapy. 1 The biological activities are complex and mediated through numerous interactions in human cells, which include poisoning of topoisomerases and intercalation to DNA, formation of reactive oxygen species, the ability to evict histones from chromatin, and proteolytic activation of transcription factors. 2−6 Anthracyclines consist of a common 7,8,9,10-tetrahydro-tetracene-5,12-quinone carbon skeleton, which is further modified in tailoring reactions and typically decorated with carbohydrate units. 1 These compounds are mainly produced by actinobacteria, and to date 408 bacterialderived anthracyclines have been described. 7 However, the diversity is likely to be much larger, since in recent years a rapidly growing number of cryptic anthracycline gene clusters, which may code for novel metabolites with improved anticancer properties, have been revealed by next generation sequencing.
The complex chemical structures of anthracyclines is reflected in the compositions of the metabolic pathways, and a typical gene cluster encodes around 30 enzymes responsible for the biosynthesis. 1 The polyaromatic aglycones are synthesized via canonical type II polyketide pathways through Claisen condensations of malonyl-CoA molecules, whereas the starting material for the saccharide units is typically D-glucose-1-phosphate. In general, the various gene clusters are surprisingly similar, and it would appear that a limited pool of gene sets is utilized for generation of the great chemical diversity of anthracyclines. 1 The key to this process lies in the tailoring steps, where evolution of the substrate specificities and catalytic properties of the biosynthetic enzymes lead to varied modifications. The evolution of enzymes associated with secondary metabolism is exceptional; as these proteins are not essential for the host, they are not bound by the constraints imposed on enzymes found in primary metabolism. 8 Consequently, proteins involved in secondary metabolism are able to rapidly acquire novel functionalities even without a gene duplication event. Examples related to anthracycline biosynthesis, where the functions of homologous enzymes have drastically changed, include polyketide cyclases acting as mono-oxygenases 9 and vice versa 10 and conversion of methyltransferases to hydroxylases. 11 One of the final steps in daunorubicin biosynthesis in Streptomyces peucetius is S-adenosyl-L-methionine (SAM)dependent 4-O-methylation by DnrK. 12 Recent studies have shown that DnrK is, in effect, bifunctional and is able to catalyze an atypical 10-decarboxylation reaction as a secondary moonlighting activity. 13 The enzyme harbors relaxed substrate specificity in regard to modifications in the anthracycline ring system, but it is quite specific in respect to the length of the carbohydrate chain at C-7, accepting only monoglycosides. 12 Conversely, the evolutionarily related RdmB (52% sequence identity) from the β-rhodomycin (3, Figure 1A) pathway in Streptomyces purpurascens lacks methyltransferase activity, and it is instead an anthracycline 10-hydroxylase requiring SAM, molecular oxygen, and a thiol reducing agent for activity. 14,15 The 10-decarboxylation and 10-hydroxylation activities have been proposed to be mechanistically related ( Figure 1B) and for the latter depend on exclusion of water molecules from the active site cavity. 13 In addition, RdmB has been shown to utilize both mono-and triglycosylated anthracyclines as substrates but, unlike DnrK, requires a 10-carboxy functional group for activity.
Here, we have traced the evolution of anthracycline methyltransferase-like proteins and discovered a third protein subtype catalyzing only 10-decarboxylation. The phylogenetic analysis suggests that the functional divergence of these proteins has occurred in situ in their respective gene clusters. Detection of komodoquinone B from cultures of S. erythrochromogenes NRRL B-2112 and Streptomyces sp. NRRL S-378 confirmed the biological relevance of the 10-decarboxylation activity and led to the identification of two gene clusters responsible for the production of komodoquinones.
We initiated the study by mining public sequence databases to identify additional SAM-dependent methyltransferases that might be involved in anthracycline biosynthesis and harbor novel activities. In the first step, putative anthracycline gene clusters ( Figure 2A) were identified in published Streptomyces genomes by the NCBI Blast server using the conserved anthracycline fourth ring cyclase SnoaL as a query. 16 Subsequently, the number of clusters was narrowed down to 12 by probing for the presence of genes homologous to the aclacinomycin 15-methylesterase rdmC 17 and 10-hydroxylase rdmB. 15 Phylogenetic analysis of the putative SAM-dependent methyltransferases revealed four distinct clades, which were composed of DnrK and RdmB-type proteins and two new groups of sequences ( Figure 2B). The evolution of these methyltransferases appeared to follow stringently the evolution of the anthracycline gene clusters, since the phylogenetic tree mirrored exceptionally well a second phylogenetic tree ( Figure  2C) constructed from concatenated sequences of four conserved proteins involved in the assembly of the anthracycline carbon skeleton. The result excluded the possibility for horizontal gene transfer, a frequently observed phenomenon in secondary metabolism, 18 and indicated that these methyltransferases have evolved in situ in their respective gene clusters.

Letters
In order to experimentally probe the activities of the newly discovered methyltransferase-like enzymes, we selected four proteins denoted as ZamB, EamK, TamK, and CalMB originating from S. zinciresistens K2, 19 S. erythrochromogenes NRRL B-2112, 20 S. tsukubaensis NRRL 18488, 21 and Streptomyces sp. CcalMP8W, respectively. These methyltransferases were produced as N-terminally histidine tagged proteins from synthetic genes codon optimized for expression in Escherichia coli. The proteins were purified to near homogeneity in a single step utilizing affinity chromatography.
The activities of the enzymes were tested with three different substrates, which included the nonglycosylated aklavinone (5), the monoglycosylated aclacinomycin T (4), and the triglycosylated aclacinomycin A (2). The activities were measured in a two-step assay. First, the 15-methylesterase DnrP was used to generate intermediates with 10-carboxylic acid functional groups required for RdmB-type activity. These compounds were extracted from the reaction mixtures and utilized as substrates in a second reaction to probe the activities of the methyltransferase-like proteins. All of the enzymes were able to utilize 4 as a substrate (Figure 3, Figure  S1), but surprisingly only DnrK catalyzed the canonical reaction of the protein family, 4-O-methylation (79% of substrate converted). RdmB (82%), ZamB (70%), and CalMB (71%) appeared to harbor relatively efficient 10-hydroxylation activity. The sole product detected in the TamK (86%) and EamK (94%) reactions was the 10-decarboxylated anthracycline derivative.
In contrast, the only enzymes able to turn over the triglycosylated 2 were the three 10-hydroxylases RdmB, ZamB, and CalMB, which in effect displayed their highest relative activities with this particular substrate ( Figure 3, Figure  S2). In order to verify the stereochemistry of 10-hydroxylation by ZamB and CalMB, the reactions with 2 were scaled up, followed by acid hydrolysis and preparative HPLC to measure circular dichroism (CD) spectra of the product aglycones. The CD spectra of ZamB and CalMB reaction products were highly similar to the one recorded for the RdmB product ( Figure S3), where the (10R)-stereochemistry has been confirmed in the ternary complex structure with SAM and 11-deoxy-βrhodomycin. 15 Since the 15-methylesterase DnrP was not able to utilize the aglycone 5 as a substrate in a satisfactory manner, we proceeded to test the equivalent gene product from S. erythrochromogenes NRRL B-2112 denoted EamC as a replacement enzyme for the initial 15-demethylation reaction. The highly improved activity of EamC with 5 allowed us to probe the methyltransferases with this substrate (Figure 3, Figure S4). The assays revealed that the 10-decarboxylases TamK (99%) and EamK (98%) were able to fully convert the substrate, but also the 10-hydroxylases RdmB (75%), ZamB (55%), and CalMB (73%) displayed moderate activities to a varying degree. To the best of our knowledge, 10hydroxylation of polyketide aglycones has not been observed previously. In contrast, only trace amounts of 4-O-methylated product could be detected from the DnrK reaction (2%), as reported previously. 12 One conceivable explanation for the lack of 4-O-methylation and 10-hydroxylation activity for TamK and EamK was that the 10-decarboxylation activity observed might be due to the use of unnatural noncognate anthracycline substrates, which bind incorrectly in the active sites and influence catalysis. The bifunctional DnrK has been shown to readily catalyze 10decarboxylation as a moonlighting activity, whereas the 4-Omethylation activity, which is based on S N 2 chemistry, has strict geometric constraints in regards to positioning of the

ACS Chemical Biology
Letters substrate and SAM cosubstrate. 12 Similarly, mechanistic studies have indicated that the 10-hydroxylation activity of RdmB relies on exclusion of water molecules from the active site cavity, and if this requirement is not met, the reaction leads to 10-decarboxylation. 13 In order to demonstrate that 10-decarboxylation is a natural activity present also in vivo, we proceeded to investigate culture extracts of S. erythrochromogenes NRRL B-2112 and Streptomyces sp. NRRL S-378 ( Figure 2) in several production media followed by metabolic profiling. Satisfactorily, S. erythrochromogenes NRRL B-2112 was found to produce a red pigmented metabolite with a typical UV/vis spectrum of anthracyclines under prolonged cultivation in 7 days in E10 medium. The molecular formula of C 19 Figure S5). The data correlated well to komodoquinone B (6, Figure 1), which has previously been characterized from Streptomyces sp. KS3. 22 The experiments verified that S. erythrochromogenes NRRL B-2112 had the ability to produce an aglycone metabolite that has gone through 10-decarboxylation but does not contain 4-O-methyl or 10-hydroxyl functional groups. Interestingly, Streptomyces sp. NRRL S-378 was also found to produce 6 on a solid medium on ISP4 plates ( Figure S6).
The anthracycline gene clusters residing in S. erythrochromogenes NRRL B-2112 and Streptomyces sp. NRRL S-378 were highly similar with sequence identities ranging between 82 and 100% ( Figure 4, Table S2) and complete conservation of gene order. It is notable that although both gene clusters encode several glycosyltransferases, only the aglycone product was observed in our cultures (Figure 4). The fact that EamC and EamK were able to fully utilize aglycone substrates, unlike the enzymes from known glycoside producers, 1 suggests that both

ACS Chemical Biology
Letters nonglycosylated and glycosylated compounds may be produced depending on environmental conditions. Bioinformatic analysis suggested that the glycosylated komodoquinones may contain L-rhodinose and L-rhodosamine units ( Figure S7), which are carbohydrates frequently found attached to anthracyclines. 1 In conclusion, here we have characterized SAM-dependent methyltransferase-like proteins situated in anthracycline gene clusters. We identified a third protein subtype catalyzing 10decarboxylation and demonstrated that only a minority of these enzymes are, in effect, methyltransferases. It would appear that 10-hydroxylation in particular provides some evolutionary advantage for the producing organism, since the phylogenetic analysis ( Figure 2) points toward two independent routes for the appearance of this feature. Previous studies have shown that RdmB-type enzymes may be engineered from the DnrK scaffold by insertion of a single amino acid, which leads to closure of the active site and enables hydroxylation in a solvent free environment. 13 In contrast, here we have identified a second two-step route, which putatively involves an initial loss of 4-O-methyltransferase activity as in TamK and EamKlike proteins, followed by a gain of 10-hydroxylation in the CalMB clade (Figure 2). In RdmB, the α16 helix adjacent to the active site is important for the gain-of-hydroxylation activity, 13 and since this region is highly divergent also in the newly discovered proteins (green, Figure S8), it may present an evolutionary hot-spot for the functional diversification of these enzymes. Even though the molecular basis for the determination of substrate specificities is not yet fully understood, structural analysis of DnrK and RdmB highlights the importance of two loop regions in the recognition of carbohydrate units. These segments are notably different in the EamK and TamK-type proteins (orange, Figure S8). The discovery of four distinct protein subfamilies presented here paves the way for detailed examination of the protein regions that determine the functional diversification and the carbohydrate binding pockets of these enzymes.

■ MATERIALS AND METHODS
Bacterial Strains and Reagents. All reagents were bought from Sigma or VWR unless stated otherwise. Aclacinomycin A (2) was obtained in a two step fermentation process by first cultivating Streptomyces galilaeus ATCC 31615 mutant HO42 for production of aclacinomycin B, followed by biotransformation to 2 using Streptomyces galilaeus ATCC 31615 mutant HO26. 23 Aclacinomycin T (4) was obtained from a fermentation of strain Streptomyces galilaeus ATCC 31615 mutant H038. 23 Aklavinone (5) was obtained through hydrolysis of 2, and komodoquinone B (6) was obtained from S. erythrochromogenes and Streptomyces sp. S-378. Production and purification of compounds is described in the Supporting Information. All plasmid isolations were made using a GeneJET Plasmid Miniprep Kit (Thermo Scientific). TALON SuperFlow resin and PD-10 desalting columns used were bought from GE Healthcare. Enzymes were concentrated using Amicon Ultra 0.5 mL centrifugation filters (10 000 nominal molecular weight limit).
Phylogenetic Analysis. The multiple sequence alignments (MSA) were done using Jalview (2.10.5) 25 and ClustalO with default settings. The phylogenetic trees were created using FastTree (2.1.9), 26 and the trees were visualized with Dendroscope (3.5.8) 27 using a midpoint root. The MSA was also used to correlate sequence similarities and secondary structure data with ESPript (3.0) 28 using PDB structure 1tw2 as a reference.  Figure 2 include the ketosynthase Eam1, the 9-ketoreductase EamA, the aromatase EamD, and the 15-methylesterase EamC. The order of genes corresponds to Table S2.

ACS Chemical Biology
Letters Enzyme Activity Measurements. The proteins were expressed and purified as described in the Supporting Information Text and Figure S9. The enzymatic activity measurements were conducted in two steps. First, the 15-methyl groups were removed from 2, 4, or 5 (120 μM) with an excess of the 15-methylesterases DnrP (130 μM) and EamC (9 μM), and the reaction products were isolated as described previously for RdmC. 13 The activity measurements with DnrK, RdmB, TamK, ZamB, EamK, and CalMB were then performed with the 15-demethylated compounds under the following conditions: 100 mM Tris·HCl (pH 7.5), 10 mM DTT, and 400 μM SAM. The concentration of all other enzymes was set to 6.0 μM. All reactions were monitored by HPLC (SCL-10Avp/SpdM10Avp system with a diode array detector (Shimadzu) using a SB-C18 column (5 μm, 4.6 × 150 mm Zorbax column (Agilent). All compounds reported were confirmed by low-resolution MS (Agilent 6120 Quadrupole LCMS system; linked to an Agilent Technologies 1260 infinity HPLC system) with identical columns, gradient, and buffer systems as described previously. 13 A Kinetex (2.6 μm, 4.6 × 150 mm) C18 column (Phenomenex) was used for reactions with 2 and 5 as substrates.
NMR Experimental. All NMR spectra were measured with a Bruker Avance III 600 NMR spectrometer (Bruker BioSpin, Fallanden) operating at 600.16 MHz for 1 H and 150.92 MHz for 13 C. The spectrometer was equipped with TCI Prodigy nitrogencooled cryoprobe. Deuterated chloroform (CDCl 3 ) was used as a solvent, and the chemical shifts were calibrated internally to tetramethylsilane (TMS, 0.00 ppm for both 1 H and 13 C). The temperature used in experiments was 25°C. To achieve the full assignment of signals (Figures S10−S15), in addition to the proton spectrum, also DQF-COSY, NOESY, CH 2 -edited HSQC, and HMBC were measured. Key HMBC correlations are shown in Figure S5.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acschembio.9b00238.

Notes
The authors declare no competing financial interest.

■ ACKNOWLEDGMENTS
We thank S. Kankaanpaä̈for assistance with protein production and purification, P. Rosenqvist for measuring CD spectra, and V. Siitonen for high resolution mass spectrometry analysis.