Mining Natural Products for Macrocycles to Drug Difficult Targets

Lead generation for difficult-to-drug targets that have large, featureless, and highly lipophilic or highly polar and/or flexible binding sites is highly challenging. Here, we describe how cores of macrocyclic natural products can serve as a high-quality in silico screening library that provides leads for difficult-to-drug targets. Two iterative rounds of docking of a carefully selected set of natural-product-derived cores led to the discovery of an uncharged macrocyclic inhibitor of the Keap1-Nrf2 protein–protein interaction, a particularly challenging target due to its highly polar binding site. The inhibitor displays cellular efficacy and is well-positioned for further optimization based on the structure of its complex with Keap1 and synthetic access. We believe that our work will spur interest in using macrocyclic cores for in silico-based lead generation and also inspire the design of future macrocycle screening collections.


■ INTRODUCTION
More than half of all targets predicted to be involved in human disease are considered to be difficult to modulate with traditional small-molecule drugs that obey Lipinski's rule of 5 (Ro5). 1 Protein−protein interactions (PPIs), proteases, some kinases as well as transferases and isomerases are important examples. 2,3 These difficult-to-drug targets often have binding sites that are large, featureless, highly lipophilic or highly polar, and/or flexible. 2−4 Finding orally bioavailable drugs that reach intracellular difficult-to-drug targets is a daunting task that often requires discovery of ligands in the uncharted chemical space beyond the Ro5 (bRo5). 2,5,6 Macrocycles 7 are enriched among oral drugs in the bRo5 space 5 because they offer superior binding to targets that have large and featureless binding sites as compared to traditional small molecules. 2,8 In addition, macrocyclization may improve plasma stability, cell permeability, and oral absorption. 9 However, macrocycles are often under-represented in screening collections, limiting their use for lead generation. For instance, the AstraZeneca collection contains less than 17 000 macrocycles out of a total of more than 3.8 million compounds.
Despite a growing number of bRo5 compounds entering clinical trials, lead generation for difficult-to-drug targets has not been systematically investigated and is challenging as the chemical space expands dramatically with increasing molecular size. 10 Natural products are an important source of drugs, 11 and we hypothesized that the cores of macrocycles may be seen as nature's privileged substructures and could play a similar role in lead generation for difficult-to-drug targets as fragments do for traditional targets. 12,13 A collection of lead-like macrocyclic cores would then be useful for in silico screening against targets such as PPIs, with subsequent optimization providing novel leads. In addition, such a collection could inspire the design of natural-product-like macrocycle screening libraries.
Here, we report how two sets of macrocyclic cores that may be used to discover leads for difficult-to-drug targets were generated by a comprehensive investigation of macrocyclic natural product chemical space. Docking of the smaller and more lead-like set into the highly charged binding site for Nrf2 on Keap1 identified the core of cyclothialidine (1) as a potential inhibitor of Keap1. A second round of docking studies using structurally simplified analogues of the core, followed by synthesis, identified a weak inhibitor of the Keap1-Nrf2 PPI. Optimization of the hit via synthesis of an additional nine compounds led to an inhibitor (14), which has a potency in the low μM range and displays cellular activity. The crystal structure of the complex of Keap1 and 14 reveals how the uncharged macrocycle 14 binds to the charged binding site of Keap1 and provides a platform for its further optimization.

■ RESULTS
Mining the Dictionary of Natural Products. The Dictionary of Natural Products (DNP) 14 contains >150 000 compounds that cover a vast chemical space. This space includes that of approved oral and parenteral drugs but also extends far beyond (Figures 1a and S1). Therefore, we considered the DNP as an attractive source for the identification of macrocyclic cores as promising starting points for difficult-to-drug targets. First, duplicates and natural products containing known toxicophores and/or very reactive groups were removed from the DNP using the HTS filter implemented in Pipeline Pilot, 12 which removes several functionalities including acyl halides, anhydrides, diazo groups, and hydrazides. Then, the remaining compounds were filtered to contain at least one macrocyclic ring, not more than ten smaller rings, and to have a molecular weight (MW) <2500 Da (Figures 1b and S2). This removed all nonmacrocycles and large macrocycles, e.g., of polysaccharidic or polypeptidic nature, which are unlikely to reach intracellular targets and provided a data set of 3657 macrocycles. The side chains of this set were pruned to attachment points consisting of at least two heavy atoms or functional groups connected to the macrocycle using the first two steps of the fragment generation algorithm reported previously. 12 In this way, fragmentation of the macrocycle ring was avoided. The resulting extended Murcko scaffolds, 15 herein denoted macrocycle cores, were filtered to contain at least a total of three oxygen and nitrogen atoms and <15 rotatable bonds, thereby removing highly lipophilic and/or flexible cores unlikely to be developable into drugs. Close to 2200 structurally different cores were identified in this macrocycle collection, which were clustered into a set of 217 representative cores. Cores that contained a few remaining reactive groups (i.e., disulfides, peroxides, and alkyl halides), had an overwhelming synthetic complexity, or that were oligomeric (oligopeptides and oligosaccharides) were removed by visual inspection. This provided a smaller set of 41 more lead-like cores that to a large extent adhered to Lipinski's rule of 5 (Table S1 and Figure S3). As molecular weight and complexity may be expected to increase during optimization, they were judged to be suitable starting points for development into cell-permeable inhibitors of difficult-to-drug targets. This set of cores originates from 35 natural products for which the absolute stereochemistry has been reported and from three natural products for which only the relative stereochemistry is known, which were included as enantiomeric pairs. The two sets of cores constitute unique in silico collections that can be used in efforts to find novel starting points in drug discovery (Supporting Excel Sheet). The structural complexity varies between the cores in the two sets; some can be readily synthesized, while simplified analogues may be more attractive starting points for other macrocycles.
Novel Class of Inhibitors of the Keap1-Nrf2 PPI. We chose the PPI between Kelch-like ECH-associated protein 1 (Keap1) and nuclear factor erythroid 2-related factor 2 (Nrf2), to evaluate our sets of macrocyclic cores. The Keap1-Nrf2 system is an important cellular response mechanism for oxidative stress, which is involved in many chronic, age-related, or inflammatory diseases. 17 Inhibitors of this PPI are of major interest for drug discovery as Nrf2 controls the expression of several cytoprotective genes but is negatively regulated through complexation with Keap1, leading to its ubiquitination and degradation. The Keap1-Nrf2 PPI consists of a highly polar pocket on Keap1, which binds the peptide motif DEETGE on Nrf2, and other negatively charged ligands, with high affinity. 18,19 This renders it challenging to find cell-permeable and orally absorbed ligands that reach and inhibit the Keap1-Nrf2 PPI, 19−21 just as for other targets that have polar binding sites. 22,23 Targeting the Keap1-Nrf2 PPI in CNS disorders is particularly challenging. 19,20 We hypothesize that macrocycles, due to their conformational restriction, may provide an optimal fit in the polar binding site of Keap1. Thus, they may be ideal for the discovery of novel uncharged inhibitors with improved cell permeability and oral absorption as compared to current Keap1 inhibitors, the majority of which are acidic. 20,21 The binding site on Keap1 for Nrf2 shows a certain degree of conformational mobility. 21 In particular, the side chain of Arg415 adopts its conformation so that the binding site is more open in the apo form and more closed when bound to smallmolecule inhibitors. To avoid bias toward a particular binding site conformation when docking the macrocyclic cores, we selected four high-resolution crystal structures of Keap1 that showed variation in the binding site: two with a bound smallmolecule ligand (PDB ID: 4IQK 24 and 3VNG) and two apo structures (PDB ID: 4IFJ and 1ZGK 25 ) ( Figure S4). Docking of the 41 cores of the lead-like set into the binding site was then performed using flexible docking in Glide. 26 The ten cores that docked best into each Keap1 structure were identified by their GlideScores (Table S2). Then, the cores that had docked into three or four of the Keap1 structures were identified from the combined top-ten lists and selected as potential hits as we anticipated them to be more likely to bind to Keap1 than cores that docked well into only one or two crystal structures (Table  S3). Interestingly, the resulting top-five macrocycles displayed a large structural diversity (Figure 1c). Inspection of the docked poses revealed that the core of cyclothialidine (1), which docked well into all four Keap1 structures, was able to bind deep in the binding site and reached into the Kelch channel ( Figure 1d). The cores of piperazinomycin (2), 8-ethoxy-3-oxo-1,2-dehydroretrorsine (3), numismine (4), and iriomoteolide 3a (5), which docked well into only three of the Keap1 structures, bound closer to the periphery, providing fewer opportunities for interactions with Keap1 ( Figure S5). To the best of our knowledge, the natural product precursors of cores 1−5 have not been reported to bind to Keap1. However, cytotoxicity has been reported for the natural product precursor of core 5, which may make 5 less suitable for development of an inhibitor of the Keap1-Nrf2 PPI (Table S4). No synthetic routes have been reported for 3 and 4, the routes to 2 and 5 are long, 27,28 while a route suitable for synthesis of analogues has been reported for the core of 1. 29 Overall, these considerations revealed the cyclothialidine core as the most promising starting point for discovery of a novel Keap1 inhibitor.
Induced-fit docking suggested that simplified analogues of cyclothialidine core, such as stereoisomers 6−9 where the hydroxyl and methyl groups of the phenylene group and the hydroxyl of the proline have been removed (Figure 2a), would also bind to Keap1 ( Figure S6). In addition, the docking also indicated that 8 and 9 were more likely to bind stronger than 6 and 7. Satisfactorily, 9 was found to inhibit the binding of Keap1 to an immobilized peptide derived from Nrf2 (K D = 237 μM) in a surface plasmon resonance (SPR)-based inhibition in solution assay (ISA) 30 (Figure 2a). Inversion of the stereochemistry of the L-proline moiety to give 10 led to a reduced binding affinity that revealed its importance for inhibition of Keap1, while hydroxylation of L-proline (11) maintained the K D (Figures 2a  and S7). Importantly, replacement of the methyl ester with different amides (12−15) provided dimethylamide 14, which is close to 2 orders of magnitude more potent than 9. Ring-opened 16 was drastically less active than 14; similarly, macrocycles ring expanded by one and three atoms, respectively (17 and 18), also lost binding affinity, confirming the critical role of the macrocycle for inhibition of Keap1. The activity of 14 in the ISA was confirmed in a direct SPR binding assay 31 and by isothermal titration calorimetry (ITC), which provided almost identical affinities for Keap1 (Figure 2b,c). Macrocycle 14 displayed cellular activity as it induced Nrf2 translocation into the nucleus by inhibiting binding to Keap1 (Figure 2d). Moreover, 14 has high aqueous solubility, low-to-moderate permeability across Caco-2 cell monolayers, and a medium in vitro microsomal clearance. It therefore constitutes a promising lead compound for further optimization into an uncharged, nonacidic Keap1 advanced lead.
Syntheses of Compounds 6−18. Compounds 6−18 were synthesized in five to seven steps from commercially available starting materials, using a modified version of the route reported for cyclothialidine (Scheme 1). 29 The synthesis of macrocycles 6−9 (Scheme 1A) was initiated with the protection of 2-(bromomethyl)benzoic acid as a 2,2,2-trichloroethyl (Tce) ester to give 19. Benzyl bromide 19 was used to alkylate the thiol of Land D-cysteine methyl ester, respectively, providing 20a and 20b, which were then coupled with L-and D-Boc-serine affording dipeptides 21a−d. The Tce protecting group was cleaved using zinc powder to obtain acids 22a−d, which were subjected to an intramolecular Mitsunobu reaction to obtain macrocycles 23a− d. To our delight, the macrocyclization conditions proved robust, providing satisfactory yields for all four diastereoisomers 23a−d. Removal of the Boc group using acidic conditions, followed by coupling of the resulting free amine with Ac-L-Pro-OH, provided compounds 6−9.
Macrocycle 23d, the key intermediate in the synthetic sequence, was used to access compounds 10−15. After cleavage of its Boc group, coupling with Ac-D-Pro-OH or Ac-L-Hyp-OH provided compounds 10 and 11, respectively (Scheme 1B). The methyl ester of compound 23d was cleaved using Me 3 SnOH, as the use of traditional alkaline metal hydroxides led to partial opening of the lactone ring. The resulting crude acid was directly coupled with four different amines to give amides 24−27 (Scheme 1C). In a similar fashion as for compounds 6−9, Figure 2. Characterization of cyclothialidine analogues. (a) Synthesized analogues of the cyclothialidine core that were evaluated as inhibitors of binding of Keap1 to an immobilized peptide derived from Nrf2 by surface plasmon resonance using an inhibition in solution assay (ISA) format. Dissociation constants, reported as mean values ± standard deviation, from three measurements on three distinct samples are given for each analogue. (b) Interaction kinetic analysis of a dilution series of macrocycle 14 in a direct binding assay using immobilized Keap1 (left). Determination of the dissociation constant (K D ) for 14 by fitting of the data to a two-parametric sigmoidal equation (right). The dissociation constant was obtained from three measurements on three distinct samples and is reported as the mean value ± standard deviation. (c) Determination of K D for the binding of 14 to Keap1 by isothermal titration calorimetry. The raw heat signals from the exothermic binding reaction (left) have been integrated to yield a binding isotherm (right) from which the thermodynamic parameters were extracted (insert). The dissociation constant was obtained from three measurements on three distinct samples and is reported as the mean value ± standard deviation. (d) Characterization of macrocycle 14 by calculated descriptors (MW and TPSA), solubility in phosphate-buffered saline at 25°C and pH 7.4, efflux-inhibited permeability across a Caco-2 cell monolayer (P app AB + inh), human microsomal metabolism (Cl int ), and induction of Nrf2 translocation into the nucleus (Nrf2 transl) at 256 μM. The values for solubility, cell permeability, and human microsomal metabolism are mean values ± standard deviation from three measurements on three distinct samples. The Nrf2 translocation into the nucleus is the mean from two measurements on two distinct samples. cleavage of the Boc group from 24−27 and coupling with Ac-L-Pro-OH provided compounds 12−15.
The synthesis of ring-opened compound 16 (Scheme 1D) started with the methylation of the sulfur of D-cysteine methyl ester, followed by coupling with Boc-D-Ser to provide dipeptide 28, the hydroxyl group of which was acylated with benzoyl chloride in the presence of DMAP leading to compound 29. After Boc removal and coupling with Ac-L-Pro-OH tripeptide 30 was obtained. Cleavage of the methyl ester followed by coupling of the resulting free acid with dimethylamine afforded the linear control 16.
The preparation of the ring-expanded macrocycles 17 and 18 (Scheme 1E,F) is based on the common intermediate 31, which was prepared by alkylation of the thiol moiety of Boc-D-Cys with commercially available 2-bromomethyl methyl benzoate, followed by coupling of the crude carboxylic acid with dimethylamine. The Boc protecting group was then cleaved, the liberated amine was coupled with Boc-D-homoserine to give compound 32, followed by saponification of the methyl ester, which gives compounds 33. The same conditions employed for the Mitsunobu reaction with acids 23a−d were successfully applied to form the 13-membered lactone of 34. Finally, Boc removal and coupling with Ac-L-Pro-OH provided macrocycle 17. For the preparation of compound 18 (Scheme 1F), the Boc group of 31 was cleaved followed by coupling with Boc-glycine to obtain dipeptide 35. The newly introduced Boc group was cleaved and the liberated amine was coupled with Boc-D-serine to afford tripeptide 36. After cleavage of the methyl ester, the obtained acid 37 was subjected to a Mitsunobu reaction to form the 15-membered core of compound 38, followed by Boc cleavage and coupling with Ac-L-Pro-OH to give macrocycle 18.
Structure of Inhibitor 14 in Complex with Keap1. We determined the structure of the complex of macrocycle 14 and Keap1 at 2.4 Å resolution to understand how uncharged 14 binds in the charged and polar binding site of Keap1. The crystal structure confirms that 14 binds to the same polar binding site in Keap1 as Nrf2 ( Figure 3a, Table S5, and Figure S8). Inspection of the structure shows that the phenylene group and parts of the macrocycle are wedged between R415 and A556 in a fairly hydrophobic pocket that reaches toward the Kelch channel, confirming that 14 is bound similarly to the docked poses of analogue 9 ( Figure S9). The complex is stabilized only by a few polar interactions in addition to the cation−π interaction between R415 and 14. The carbonyl group of Ser in 14 forms a hydrogen bond to the side chain of S602 in Keap1, similar to that of the main-chain carbonyl group of T80 in Nrf2. 32 In addition, a chloride ion bridges the NH of Ser in 14 with three residues in Keap1, in a similar manner as in the complex with a highly potent inhibitor identified by fragment-based drug discovery. 33 As physiological salt conditions were used in the SPR and ITC assays, as well as in the cellular Nrf2 translocation assay, we believe that formation of this complex also occurs in biologically relevant environments. The ability of macrocycle 14 to adopt a compact and well-defined conformation in the binding site explains why the ring-opened and ring-expanded analogues 16− 18 have a significantly lower affinity for Keap1. The solvent exposure of part of Pro and its N-acetyl group is in line with the fact that replacement with hydroxyproline (compound 11) does not affect the inhibitory potency.
The C-terminal dimethylamide of 14 is well-defined by the electron density of the X-ray structure and stacks against Y572, but inspection of the complex does not explain the large increase in potency as compared to ester 9 and amides 12, 13, and 15.
Ligand binding affinity calculations using Prime MM-GBSA, 34 which is frequently used to estimate the free energy of protein− ligand complexes, 35 provided insights both into the underlying reasons for the potency differences and into what forces stabilize  Figure S10). In agreement with the experimental dissociation constants, macrocycle 14 was predicted to have the highest affinity for Keap1, with 9 and 13 being predicted as intermediate and 12 and 15 as weak. Nonpolar van der Waals and lipophilic interactions were found to be the main contributors to the binding affinity of 14 and to that it had a higher affinity than inhibitors 9 and 13. According to the calculations, the complex of the inactive 12 is stabilized by stronger polar (Coulomb) interactions, but this is offset by a larger desolvation penalty, while inactive 15 forms the weakest polar (Coulomb) and van der Waals interactions among the five compounds investigated. Interestingly, the distances between the two N-methyl groups of 14 and the carbonyl groups of Pro and the N-acetyl group show that two intramolecular nonclassical hydrogen bonds 36 are formed between these residues (Figure 3a), an observation that was supported by quantum mechanical calculations (see Supporting Information, Computational Procedure 2). Nonclassical hydrogen bonds are weaker than hydrogen bonds, 37 but the two intramolecular bonds are likely to provide additional conformational restriction and stabilization of the Keap1-bound conformation of 14.
Comparison of 14 to Reported Keap1 Inhibitors. We assembled two sets of inhibitors of the Keap1-Nrf2 PPI to generate an overview of the diversity of the Keap1 inhibitor space and to investigate whether macrocycle 14 occupies a unique position in this space. The first set, termed "PubChem", was obtained from the 528 unique compounds reported as active in the PubChem Bioassay database after testing of >337 000 compounds. 38 Removal of compounds that did not pass a pan assay interference compounds (PAINS) filter 39 from the actives gave the PubChem set consisting of 375 inhibitors (Table S8). A different set of "validated" inhibitors was assembled from two sources. First, nine compounds reported 20 to show reproducible activity in a triad of orthogonal assays after resynthesis were included. Then, compounds bound in the binding site of Keap1 for Nrf2 were retrieved from the Protein Data Bank (PDB). Peptides were excluded from this set, providing a "validated" set of 35 inhibitors, including six fragments that had a MW < 205 Da (Table S9).
The Tanimoto coefficient, 40 used to describe structural diversity, was calculated from seven fingerprints and revealed that 14 had a low similarity to the compounds in both the PubChem and the validated set ( Figure 4a and S11). The diversity of the Keap1 inhibitor space was investigated by networklike similarity graphs derived from substructure fragment fingerprints 41 calculated for 14, the combined validated and PubChem sets, and the validated set alone. The graph for the combined set, just as the one for the validated set, confirmed that macrocycle 14 occupies a unique position in chemical space with no structurally similar neighbors (cf. color and size of spheres and the background, in Figure 4b,c). The overview of the combined set also revealed the Keap1 inhibitor space to be fairly diverse and that most inhibitors have few structural neighbors, with the exception of a large cluster located in the left center of the graph. This cluster contains a series of phthalimides represented by the validated inhibitor 20 V1 (cf. structure in Figure 4c and Table S9). 1,4-Diaminonaphtalene V3, cyclic sulfonamide V7, and diazole V8 (Figure 4c) are the most potent reported Keap1 inhibitors and were obtained after medicinal chemistry optimization. In contrast to 14, they all contain carboxylic acids. In summary, both Tanimoto coefficients and the networklike similarity graphs highlight the potential of using natural-product-derived cores as a source of structurally unique Keap1 inhibitors, belonging to a chemical space very different from that of previously reported inhibitors.

■ DISCUSSION AND CONCLUSIONS
Lead generation is very challenging for targets that have difficultto-drug binding sites, 2−4 such as protein−protein interactions (PPIs). 42 High-throughput screening (HTS) has often failed to identify useful hits for such targets as screening collections do not provide sufficient coverage of relevant chemical space due to their limited size and history of assembly. 42 Fragment-based lead discovery allows a more comprehensive search of chemical space and has shown success in some cases where HTS has failed. 13 Structure-based docking constitutes an alternative that may be particularly appealing for difficult-to-drug targets, as platforms for docking of ultralarge libraries are now being developed. 43,44 Difficult-to-drug targets that have flat and featureless binding sites are often modulated by macrocyclic drugs, many of which originate from natural products. 2,8 We therefore mined the macrocycles in the Dictionary of Natural Products to facilitate the identification of novel chemical matter for modulation of difficult-to-drug targets. This provided a smaller set of 41 leadlike macrocyclic cores and a larger set of 217 cores with more complex structures (Supporting Excel Sheet). Docking of the smaller set of cores into the positively charged and polar binding site of Keap1 led to the discovery of the 4 μM inhibitor 14, which originates from the core of cyclothialidine (1), after synthesis of only 13 compounds. Macrocycle 14 constitutes an uncharged inhibitor of Keap1, indicating that macrocycles may be used to discover promising starting points for difficult-to-drug targets that have polar binding sites, in addition to targets with flat and featureless sites. In contrast to 14, the majority of the reported Keap1 inhibitors contain acidic groups, which may be detrimental to cell permeability and oral absorption, 21 and are expected to prevent CNS permeability. 19 Compound 14 has druglike properties and shows cellular potency in an Nrf2 translocation assay, making it suitable for further lead optimization. The structure of its complex with Keap1 illustrates how cation−π interactions may allow an uncharged ligand to bind in a charged binding site and provides a first indication of how an uncharged inhibitor can be tailored to fit the charged binding site.
It is interesting to contrast our results to those obtained in a virtual screen of 1.3 billion compounds from Enamine's REAL space library and the ZINC library, 44 as well as with fragmentbased lead generation. 33,45 The ultralarge screen identified Keap1 inhibitors with 100 nM potencies as determined by surface plasmon resonance, 44 but which contain potentially reactive or toxic groups. In addition, the structures were similar to previously reported inhibitors and appeared to provide limited opportunities for optimization into drug candidates. Obviously, screening of large libraries provides immense opportunities for drug discovery, but our results reiterate the importance of the structural diversity and quality of the libraries, just as for regular HTS compound collections. One of the most advanced Keap1 inhibitors was obtained by merging information from three fragments that bound in different subsites of the Keap1 binding site. 33,45 While highly potent in in vitro (K D 1.3 nM) and active in cellular and in vivo models, the oral bioavailability in rats was low (7%), potentially because of the presence of a carboxylic acid moiety originating from one of the fragments.
Recently, natural products have also been utilized as a source of three-dimensional (3D) fragments 12 and as inspiration for design of compound collections prepared by diversity-oriented synthesis (DOS). 46 In addition, pseudo-natural products have been designed by combination and fusion of natural-productderived fragments. 47 However, natural products and their derivatives often have complex structures. We demonstrated how two rounds of docking of our smaller set of lead-like cores first identified the cyclothialidine core and then tailored it to its target Keap1 via docking of 6−9 while simultaneously reducing the structural complexity. In this manner, one of the major shortcomings of natural products, i.e., that they are often prepared via long synthetic routes, was mitigated. Similarly, the larger set of 217 structurally more complex macrocyclic cores may provide inspiration for design of collections having simplified structures that can be prepared through shorter routes than the original natural products. In conclusion, the cores reported herein provide another example of how nature's privileged structures and their diversity can be capitalized on as a rich source of quality leads for drug discovery. 11 ■ EXPERIMENTAL SECTION Chemistry. General Methods. All reagents were purchased from Sigma-Aldrich, Fluorochem, and VWR International. DCM, DMSO, hexane, DMF, and acetonitrile were purchased from VWR International, while 1,2-DCE, toluene, and THF were purchased from Sigma-Aldrich. All nonaqueous reactions were performed in oven-dried glassware under an argon atmosphere. The Buchi rotary evaporator R-114 was used to remove solvents in vacuo. Reactions were generally monitored by liquid chromatography−mass spectrometry (LC-MS) with an Agilent 1100 series high-performance liquid chromatography (HPLC) with a C18 Atlantis T3 column (3.0 mm × 50 mm, 5 μm) using acetonitrile−water (flow rate 0.75 mL/min over 6 min) as the mobile phase and a Waters micromass ZQ (model code: MM1) mass spectrometer with the electrospray ionization mode as the detector. Alternatively, TLC silica gel 60 F254 plates from VWR International were used and visualization was done using UV light (254 nm) or by staining with a KMnO 4 solution (2% m/v in water). Silica gel (43−63 μm, VWR international) was used for purification of compounds with flash column chromatography. Preparative reversed-phase HPLC was performed on a Kromasil C8 column (250 mm × 21.2 mm, 5 μm) on a Gilson HPLC equipped with a Gilson 322 pump, a UV−visible-156 detector, and a 202 collector using acetonitrile−water gradients as eluents with a flow rate of 15 mL/min and detection at 214 or 254 nm. 1 H, 13 C, COSY, HSQC, and HMBC NMR spectra for synthesized compounds were recorded at 298 K on an Agilent Technologies 400 MR spectrometer at 400 or 100 MHz or on an OXFORD AS500 spectrometer at 500 or 126 MHz or on a Bruker Avance III spectrometer at 600 MHz or at 151 MHz. The residual peak of the respective solvent was used as the internal standard [CDCl 3 (CHCl 3 δH 7.26 ppm, CDCl 3 δC 77.0 ppm), DMSO-d 6 (CD 2 HSOCD 3 δH 2.50 ppm, CD 3 SOCD 3 δC 39.5 ppm), CD 3 OD (CD 2 HOD δH 3.31 ppm, CD 3 OD δC 49.0 ppm), CD 3 CN (CD 2 HCN δH 1.94 ppm, CD 3 CN δC 1.3 and 118.3 ppm), acetone-d 6 (CD 2 HCOCD 3 δH 2.05 ppm, CD 3 COCD 3 δC 29.8 and 206.3 ppm)]. HRMS for all new compounds were recorded in the electrospray ionization (ESI) mode on an S3 LCT Premier connected to a Waters acquity UPLC I-class with acetonitrile− water used as the mobile phase (1:1, with a flow rate of 0.25 mL/min). The purity of compounds 6−18 is ≥95% as determined using a Waters LCT Premiere mass spectrometer coupled to a Waters Acquity UPLC. The Waters Acquity UPLC was equipped with either a BEH C18 column (1.7 μm, 2.1 mm × 50 mm, at 45°C using a gradient from 5 to 90% acetonitrile modified with 40 mM ammonia and 5 mM H 2 CO 3 , pH 10 within 2.5 or 3 min, detection at 210 nm) or a CSH C18 column (1.7 μm, 2.1 mm × 50 mm at 45°C using a gradient from 5 to 90% acetonitrile modified with 10 mM formic acid and 1 mM ammonium formate, pH 3, within 2.5 or 3 min, detection at 230 nm).
Extraction of Macrocyclic Cores from the DNP. The structures of the natural products contained in the sd-file version of the Dictionary of Natural Products (DNP) were cleaned as described previously, whereby natural products containing toxic or reactive groups are removed using the knowledge-based HTS filter in Pipeline Pilot. 12 Further filtering in Pipeline Pilot was performed to retain macrocycles (≥12 and ≤30 atoms in the ring) having <10 rings in total and a MW < 2500 Da to provide 3657 macrocyclic natural products.
The side chains of the macrocyclic natural products were then pruned to provide macrocycle cores using the first two steps of the fragment generation algorithm reported previously, 12 i.e., deglycosylation and extraction of extended Murcko frameworks 15 decorated with attachment points ( Figure S2). In these two steps, macrocycles were not fragmented and all ring systems and linker atoms were kept while linear side chains were pruned, but not removed. This provided macrocycle cores, i.e., macrocycles decorated with attachment points, which are functional groups linked directly to the ring system or pruned side chains. Thus, the algorithm was designed to retain functional groups and only cut side chains at carbon−carbon bonds at a distance of >2 atoms from the ring.
The algorithm defines an attachment point by traversing through the side chain of the macrocycle through two heavy atoms, starting from the ring-based atom. For each heavy atom thereafter it decides, based on a set of rules, 12 whether the atom is kept, whether that atoms neighboring atoms are examined, or if the side chain is cut after the second heavy atom. The algorithm keeps aliphatic carbon atoms as long as they are within two atoms from the ring, and longer aliphatic side chains are pruned to ethyl groups. If the algorithm encounters a carbon atom linked to a non-carbon and non-hydrogen atom, it will keep the heteroatom and examine the immediate neighbor atom, irrespective of its distance from the ring. The algorithm prunes after the next upcoming carbon atom as long as this is not double-bonded to a heteroatom.
In branched side chains, the algorithm traverses the branches separately, within the distance limit. If the branching carbon is connected to a heteroatom by a double bond, the immediate next neighbor atom in the other branch will be investigated using the same rules as for a linear side chain, irrespective of its distance from the ring. The next upcoming carbon atom is kept, as the terminating carbon, but the rest of the side chain is pruned. If the branching atom is a heteroatom, the immediate next neighbor atoms will be investigated in the same manner, irrespective of their distance from the ring. As a result of this pruning process, the original functional group motif of the macrocyclic part of the natural product remains intact in the core. In addition, functional groups close to the macrocycle ring in side chains remain intact. All generated macrocycle cores are stored in a database for further processing.
If identical macrocyclic cores were obtained after side-chain pruning of different natural products, only one was kept. Cores with ≤3 O or N atoms or >15 rotatable bonds, without a heteroatom in the macrocycle, and/or having more than one macrocycle were also removed in Pipeline Pilot. This resulted in 2175 natural-product-derived macrocycle cores, which are based on 764 different Murcko scaffolds. The 2175 cores were clustered in Pipeline Pilot using an FCFP6 pharmacophore fingerprint 52 into 217 clusters (Supporting Excel Sheet). The cluster centers were visually inspected by three or more experienced medicinal chemists, and 41 cluster centers or near neighbors were selected by exclusion of cores containing reactive or potentially redox-sensitive groups, excessively complex structures including >6 rings, or oligomeric/pseudo-oligomeric structures (Supporting Excel Sheet).
Docking  54 Structure refinement included adding hydrogen atoms, assigning bond orders, building disulfide bonds, and removal of water molecules beyond 5 Å from the ligand atoms. The PROPKA tool from Protein Preparation Wizard was used to predict the protonation states of the ionizable residues at pH 7.0. 55 Subsequently, the positions of the hydrogen atoms in the Keap1 structure were energy-minimized using the OPLS3 force field. 56 The receptor grid generation module of Glide 26 was used to define the active site for the docking experiments. The active site of Keap1 was defined either by the bound ligand or by key residues in the active site (R415, R483, Y525, Y572, A556, S603), which were used as the centroid of the grid box (the radius of the active site was 15 Å from the centroid). The docking protocol was first validated by comparing the conformation (the pose) of the bound ligands as obtained from docking with the one determined by X-ray crystallography for the two ligandbound structures (PDB ID: 4IQK 24 and 3VNG; see the Supporting Information, Computational Procedure 1). Subsequently, the set of 41 cores was preprocessed using LigPrep, which included generating possible ionization states (at pH 7) using the Epik tool 57 and structure minimization using the OPLS3 force field. 56 During preprocessing, input chirality was retained.
The 41 lead-like macrocyclic cores (Table S1) were then docked into each of the four refined Keap1 structures using Glide, which treats the cores as fully flexible while Keap1 remains rigid during the docking. Docking poses for each macrocycle core were ranked according to the GlideScore from the standard precision mode, and for each core, the docked pose that had the lowest GlideScore for binding to each of the Keap1 structures was identified (Table S2). The resulting combined list of the top-10 cores for each Keap1 structure was then used to rank the cores by the number of Keap1 structures each core bound to (Table  S3), after which the docked poses of the top-five cores in each of the four crystal structure were inspected visually ( Figure S5).
Induced-fit docking (IFD) 58 of stereoisomeric macrocycles 6−9 was performed as implemented in the Schrodinger suite. 58,59 Threedimensional structures of compounds 6−9 were built with Maestro from the Schrodinger suite 60 and energy-minimized using the OPLS3 force field. A grid centered at the bound ligand (PDB ID: 4IQK 24 ) with the box size of 15 Å was defined. Subsequently, 6−9 were docked and scored using the standard precision mode with an extended sampling Journal of Medicinal Chemistry pubs.acs.org/jmc Article protocol, which will generate up to 80 poses per ligand from several docking runs. The cyclothialidine core was included for comparison. A total of 286 binding poses were analyzed with regard to if the phenyl ring of 1 and 6−9 reached deep into the Kelch channel of the Keap1 binding pocket ( Figure S6). Binding Affinity Calculations. The atomic coordinates of Keap1 in complex with compound 14 were preprocessed as described above in the docking section. The chloride ion in the ligand binding pocket was included in the binding affinity calculations as it is coordinated with residues in Keap1 (N382, N414, and R415) and with 14. Methyl ester (9), amide (12), monomethyl amide (13), and diethylamide (15) were modeled into the Keap1 binding site using the bound conformation of 14 as a template. For compounds 9 and 13, two alternative conformations of the methyl ester and methyl amide (cis-and transforms) were explored, while five alternative conformations were explored for the diethylamide of 15 (Tables S6 and S7).
Prior to affinity calculations, the ligand binding site was defined using the receptor grid generation module available in Glide as described in the docking section above. Subsequently, the initial input ligand coordinates for 9 and 12−15 were optimized in the receptor, the ligand binding pose was scored, and poses were used for ligand binding affinity calculations in Prime MM-GBSA 34 as implemented in the Schrodinger suite. The ligand and a 5 Å region of the protein around the ligand were treated as flexible in the binding affinity calculation. Other settings were chosen as defaults.
Similarity to Reported Inhibitors of the Keap1-Nrf2 PPI. A set of inhibitors of the Keap1-Nrf2 PPI, termed "PubChem", was retrieved from the PubChem Bioassay database (PubChem Bioassay AIDs: 504523; 588683; 651798; 651801; 651806; 651807; 651823; 651829; 651833; 651834). 38 Duplicate entries, compounds that lacked complete stereochemical information, and solvents were removed from the 1127 compounds reported as active in the 10 listed bioassays. Compounds that did not pass the PAINS filter 39 were also removed, providing a PubChem set consisting of 375 inhibitors (Table S8). Another set of "validated" inhibitors was assembled from two sources. First, nine compounds, reported to show reproducible activity in a triad of orthogonal assays by Tran et al., 20 were selected. Second, crystal structures that had a compound bound in the binding site for Nrf2 of Keap1 were retrieved from the Protein Data Bank (PDB). Peptides were excluded, while fragments were retained, providing a validated set of 35 compounds (Table S9).
The Tanimoto coefficient 40 was calculated to determine the similarity between compound 14 and the Keap1 inhibitors in the validated and PubChem sets, respectively, using seven commonly used structural fingerprints ( Figure S11). The similarity was also calculated based on all fingerprints. All substructure similarity analyses were performed with the Canvas tool (version 3.1.011). 61 The molecular similarity landscape (a so-called networklike similarity graph) was investigated for the merged validated and PubChem sets and for the validated set alone, using compound 14 as a reference. The structures of all compounds were imported, and a substructure fragment fingerprint (FragFp)-based similarity landscape was computed using DataWarrior (version 5.2.1). 41 FragFp contains more than 500 predefined binary structural fingerprints. DataWarrior calculates a similarity matrix for each compound in the data set and allocates each compound to the most similar neighbor space using the Rubberbanding force-field approach. 41 Compounds with a structural similarity >0.95 were clustered for the analysis.
SPR Inhibition in Solution Assay (ISA). The SPR ISA was performed in analogy to the protocol developed by Chen et al. 30 with the following differences. Instead of a biotinylated peptide, a lysinetagged version of the Nrf2-peptide (KKKKAFFAQLQLDEETGEFL) was utilized for tethering to the sensor surface. For the covalent tethering, a CM5 biosensor (GE Healthcare, research grade) was employed using HBS-P [10 mM HEPES, 150 mM NaCl, 0.05% (v/v) Tween 20, 1.0 mM TCEP, pH 7.40] as the continuous flow buffer at a rate of 10 μL/min at 20°C on a BIAcore 3000 optical biosensor unit (GE Healthcare). The surface was activated by injecting a mixture of 1ethyl-3-(3-dimethylaminopropyl)-carbodiimide hydrochloride (EDC) and N-hydroxysuccinimide (NHS) for 7 min, followed by an injection of 50 μM Nrf2-peptide in 10 mM Na-acetate, pH 4.0 for 2 min. Any reactive groups still present on the surface were deactivated by a 7 min injection of 0.1 M ethanolamine hydrochloride-NaOH, pH 8.5. Peptide immobilization levels were typically between 400 and 600 RU to ensure mass transport limitation and thus a protein-concentration-dependent response.
For the detection of active compounds, a solution of 25 nM Keap1 (Kelch domain = aa321−aa609) was prepared in running buffer and preincubated with compounds at either constant or varying concentrations. These mixtures were subsequently injected over the peptide-modified biosensor, and control samples devoid of compounds were used to determine the remaining free protein concentration of Keap1 in response to compound binding. For this, the initial association slopes of the binding sensorgrams have been measured (interval 5−15 s after sample injection) for each sample, followed by a regeneration of the sensor for the next cycle through a 45 s injection of 50 mM Tris/Cl, 0.25% SDS, 5 mM TCEP, pH 7.5. Results have been reported as K Dvalues for concentration-response experiments. This was achieved by a soaked by incubating crystals for 1 h with 1 mM compound in 5 M ammonium acetate and 0.1 M sodium acetate at pH 4.6. Crystals were subsequently frozen in liquid nitrogen using a soaking solution supplemented with 20% glycerol as the cryoprotectant prior to data collection.
Data Collection, Structure Solution, and Refinement. X-ray diffraction data was collected at the European Synchrotron Facility, beamline ID29, using a Pilatus 6M-F pixel detector to a resolution of 2.37 Å. The data were indexed and integrated with MOSFLM 65 and scaled with SCALA 66 in the space group P2 1 2 1 2 1 with cell dimensions of 75.7 75.8 203.1 Å. Two Keap1 molecules were identified by PHASER 67 using a published Keap1 structure (PDB ID: 1ZGK 25 ) as a search model. The Keap1 macrocycle complex was further refined by alternative cycles of model rebuilding in Coot 68 and refinement in AutoBuster 2.11.6 (Global Phasing Ltd, Cambridge U.K.). A final model was refined to an R/Rfree of 19/21%. Full data collection and refinement statistics can be found in Table S5, and the 2Fo-Fc electron density of compound 14 is in Figure S8. The coordinates and corresponding structure factors have been deposited to the Protein Data Bank with accession code 6Z6A. Coordinates and structure factors for the complex of Keap1 with compound 14 have been deposited with the PDB with accession code 6Z6A. The authors will release the atomic coordinates and experimental data upon article publication.