C–N Coupling of DNA-Conjugated (Hetero)aryl Bromides and Chlorides for DNA-Encoded Chemical Library Synthesis

DNA-encoded chemical library (DECL) screens are a rapid and economical tool to identify chemical starting points for drug discovery. As a robust transformation for drug discovery, palladium-catalyzed C–N coupling is a valuable synthetic method for the construction of DECL chemical matter; however, currently disclosed methods have only been demonstrated on DNA-attached (hetero)aromatic iodide and bromide electrophiles. We developed conditions utilizing an N-heterocyclic carbene–palladium catalyst that extends this reaction to the coupling of DNA-conjugated (hetero)aromatic chlorides with (hetero)aromatic and select aliphatic amine nucleophiles. In addition, we evaluated steric and electronic effects within this catalyst series, carried out a large substrate scope study on two representative (hetero)aryl bromides, and applied this newly developed method within the construction of a 63 million-membered DECL.


General Information
Some of the general materials, equipment and procedures used in this study are adapted from those our group has reported previously [1][2][3][4][5][6] or other DNA-encoded library publications. 7,8 1a. Materials and equipment used for the synthesis and analysis of oligonucleotides and DNA-encoded chemical libraries. The central dsDNA oligonucleotide with chemically-modified phosphates that end in an amine terminus (DEC-Tec Starting Unit/DTSU, S1, Figure S1) and encoding 5'-phosphorylated oligonucleotides were purchased from LGC Biosearch Technologies. All DNA oligonucleotides were assessed through the general analytical procedure for purity. DNA sequences were designed to maximize sequence-reads and minimize close similarity while sequencing. DNA oligomers in each codon duplexed pair was designed to feature divergent mass greater than 5 Da for efficient quality control analysis. A 10-mer DNA oligomer (spike-in), featured a cholesterol tag and amine terminus, was obtained from Sigma-Aldrich and charged in library pool to monitor chemical reactions. High-concentration T4 DNA ligase was obtained from Enzymatics (Qiagen). Oligomer ligation test with DTSU was incorporated to determine the ligase activity before use. DNA working solutions were prepared using DNAse/RNAse-free ultrapure water (Invitrogen), HPLC-grade acetonitrile (Fisher) or high-purity absolute ethanol (Koptec). LC/MS running solvents were made from Optima LC/MS grade water (Fisher), Optima LC/MS grade methanol (Fisher), hexafluoroisopropanol (99+% purity, Sigma-Aldrich) and HPLC-grade triethylamine (Fisher). All listed buffers and ionic solutions, including HEPES 10X ligation buffer (300 mM 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid, 100 mM MgCl 2 , 100 mM dithiothreitol, 10 mM adenosine triphosphate, pH 7.8), aq. NaOH, aq. NaCl (5 M) and basic borate buffer (250 mM sodium borate/boric acid, pH 9.5), were prepared in-house.  Figure S1. Structure of DTSU S1 (5'-Phos-CTGCAT-Spacer 9-Amino C7-Spacer 9-ATGCAGGT 3').
Chemical building blocks and reagents were purchased from various vendor sources and used without further purification. Building blocks were generally used from aliquots dissolved in acetonitrile (MeCN), methoxyisopropanol (MIPO), dimethylacetamide (DMA) or mixed aqueous acetonitrile, and stored in 2D barcoded tubes (Phenix) with septa-caps at -80 °C. Barcoded tubes were read using a SampleScan 96 scanner (BiomicroLab) and decoded using Vortex software (Dotmatics). Solutions were transferred utilizing Biotix brand or Fisherbrand pipette tips and Biotix reservoirs. Reactions and library transformations were generally performed in polypropylene PCR tubes (Genemate), polypropylene tubes (Eppendorf), 96-well polypropylene PCR plates (Phenix or ThermoFisher), or 96-well, deep-well plates (USA Scientific). Plates were sealed for incubation with AlumaSeal II sealing films (Excel Scientific) and large volume DNA precipitations were performed in polypropylene 250 mL screw-cap bottles (various vendors) or centrifuge tubes. Heated reactions were performed in benchtop heating blocks (ThermoFisher), Mastercycler nexus gradient (Eppendorf), or TS-DW deep well plate themoshaker (Grant), or laboratory ovens (Fisher). Solutions were centrifuged in 5424R centrifuges (Eppendorf), Allegra X-15R centrifuges (Beckman-Coulter) or Lynx 4000 centrifuges (ThermoFisher). Optical density measurements were made using a Biophotometer (Eppendorf). A Vanquish UHPLC system was integrated with LTQ XL ion trap mass spectrometer (ThermoFisher Scientific) for LC/MS analysis of oligonucleotides. DNA was visualized with Molecular Imager Gel Doc XR system (BIO-RAD) after staining in an ethidium bromide solution.
1b. General procedure for the analysis of oligonucleotide compositions. Diluted samples of DNA stocks or reaction mixtures were injected on a Vanquish/LTQ system in amounts of 5-10 µL containing 50-200 pmol DNA analyte. Samples were analyzed on a Thermo Vanquish UHPLC system coupled to an electrospray LTQ ion trap mass spectrometer. An oligonucleotide column (Thermo DNAPac RP, 2.1 x 50 mm, 4 m) was equipped with ion-pairing mobile phase (15 mM TEA/100 mM HFIP in a water/methanol solvent system) for all the separations. All mass spectra were acquired in the full scan negative-ion mode over the m/z range of 500-2000. The data analysis was performed by exporting the raw instrument data (.RAW) to an automated biomolecule deconvolution and reporting software (ProMass) which uses a novel algorithm (ZNova) to produce artifact-free mass spectra. Deconvoluted mass spectra were standardized against co-currently run samples of DTSU S1 and HP S2 to account for any drift from theoretical mass during deconvolution.

LC/MS Parameters for Thermo
1c. General procedure for ethanol precipitation and DNA reconstitution. To a DNA reaction aqueous mixture was added 4% (v/v) 5 M NaCl solution and 3 times the reaction volume of absolute ethanol. The mixture was mixed thoroughly before stored at −20 °C overnight for DNA precipitation. The slurry was then centrifuged at 4000 x G for an hour, followed by decanting the supernatant. The pellet was washed with 75% chilled ethanol and the pellet was centrifuged at 4000 x G for another hour. The DNA pellet was dried in air after supernatant was decanted. Water was added to reconstitute the DNA to the needed concentration. Ethanol precipitation was generally performed after each chemical reaction and ligation. Additional ethanol precipitation can be applied if residual reagents was observed after first purification. Dilution with 2-4 times the reaction volume of water may be added before purification while higher percentage of water miscible solvent was used, such as DMSO.

Acylation (DMTMM):
To a solution of on-DNA amine ( Nitro Reduction (sodium dithionite) 1 : To a solution of nitro-containing DNA conjugate (10 L, 1.0 mM in water) was added sequentially 250 equivalents of pH 9.5 borate buffer (10 L, 250 mM in water), 10 equivalents of methyl viologen (1 L, 100 mM in water) and 100 equivalents of Na 2 S 2 O 4 (5 L, 200 mM in water). The reaction mixture was heated at 80 °C for 15 minutes. The reaction mixture was then cooled to room temperature before being quenched and purified with EtOH precipitation.
Hydroxycarbonylation 6 : To a solution of aryl bromide DNA conjugate (10 L, 1 mM in water) was added 400 equivalents of CsOH solution (10 L, 400 mM in water) and 100 equivalents of molybdenum hexacarbonyl solution (5 L, 200 mM in MIPO, sonicated for 20 minutes before using). The mixture was allowed to pipette mixed thoroughly before adding 12 equivalents of sSPhosPd G2 catalyst solution (6 L, 20 mM in MIPO). The reaction was heated at 80 °C for 15 minutes and cooled to room temperature, followed by EtOH precipitation for quenching and purification. were prepared from the general nucleophilic aromatic substitution procedure, where 28 underwent further nitro reduction with sodium dithionite; compound 7 was prepared using reductive alkylation with sodium borohydride, and 10 was prepared using reductive alkylation with sodium cyanoborohydride.

C-N coupling
2b) General procedure for C-N coupling. To a solution of DNA-conjugated aryl halide substrate (10 L, 10 nmol, 1.0 mM in water) was added DMA (3.5 L), 1000 equivalents of CsOH (5 L, 2000 mM in water), 500 equivalents of amines (12.5 L, 400 mM in DMA) and 20 equivalents of freshly prepared sodium ascorbate (5 L, 40 mM in water). The mixture was thoroughly pipette mixed before adding 2 equivalents of freshly prepared Pd-PEPPSI-iPent Cl -pyr (4 L, 5 mM in DMA). The reaction mixture was heated at 95 °C for 15 minutes, followed by the addition of 100 equivalents of scavenger, sodium L-cysteine (5 L, 200 mM in water) and heated at 80 °C for another 15 minutes. The mixture was then cooled to room temperature before assessing with LC-MS or EtOH precipitation. PEPPSI catalysts used in this study were synthesized via known procedures 3 and/or purchased from either Total Synthesis Ltd. or Sigma-Aldrich and used without further purification. While conducting a subsequent ligation, insurance of maximum removal of residual building blocks or salts before ligation was necessary as large amount of residue may influence ligation efficiency. For best results, run reactions in a PCR plate and transfer to a preheated thermocycler at 95 °C immediately upon completion of pipette mixing and foil plate sealing.  Figure S2. Deconvoluted mass spectrum of result in entry 1, table S1.

Deconvoluted Mass Spectra of C-N coupling optimizing conditions
a The conversion was determined by LC-MS. b PEPPSI-SIPr has saturated NHC ring.

Synthesis of a DNA-Encoded Chemical Library (DECL) using C-N coupling condition
7a. Architecture of the Main Library build. The DECL was produced as a three-cycle library. It was built through three iterative cycles, each containing a chemical transformation, corresponding DNA oligonucleotide (codon) encoding ligation phase and material pooling and splitting for the subsequent cycle. The library is constructed on HP S2 (shown here as combination of DTSU, first overhang, forward primer unit, and second overhang, Figure S374), which had been further diversified on the small molecule end with various amino-or carboxy-terminating linkers.
Overhangs between codons are two base pairs and encoding regions within codons feature eleven base pairs. Specific details and principles related to the overall oligonucleotide sequence design utilized in our DECL production pipeline have been discussed previously. 3 Figure S374. Architecture of the main library build. Separately assembled/ligated oligonucleotides (codons) are shown in different colors.

7b. Building block diversity analysis.
To identify a good building block set to employ in a library build, we first scanned a database of commercially available compounds for the requested chemical functionality with an in-house substructure match script utilizing SMARTS patterns. During this scan, we also limited the molecular weight of building blocks to 350 Da and filtered out building blocks containing functional groups that might interfere with the planned reactions. We eliminated the functionalities that would lead to multiple, ambiguous products at any stage along the library synthesis. The database we used for this purpose was the most up-to-date version of the Aldrich Market Select Full Release. This first pass of filtering reduced the size of the building block pool significantly, which was further reduced upon application of cost, minimum available amount, and delivery time filters. The resulting pool of building blocks was further examined for functional groups that might pose liabilities for on-DNA chemistry and medicinal chemistry optimizations, and these groups were eliminated from the pool. Additional physical property filters such as number of ring components, aromatic rings, the fraction of sp3-hybridized carbons to the rest of the carbons, and chiral center counts were imposed. The library chemists perused the set in development to give feedback about the compatibility with the design and possible built-in structure-activity relationship potential. If there were certain building blocks that they specifically would like to include, we ensured the retention of these building blocks. The set was analyzed for diversity by generating histograms of the examined physical and other computed properties including calculated logP and Murcko scaffolds. This filtered set of building blocks was then reviewed by the library chemists for compatibility with the library design.
Bifunctional building blocks, in which both functionalities would take part in separate reactions within the library build, were also subject to evaluation of the variety in their shapes via exit vector analysis. [12][13][14][15] The exit vectors were assumed to lie along the two functional groups pivoting from where they were attached to the main chemical scaffold (e. g. a ring). The distance between these pivot points, the angle spanned by the first functional group vector and the vector connecting the two pivot points, the angle spanned by the second functional group vector and the vector connecting the two pivot points, and the dihedral angle spanned by the relative alignment of the two functional group vectors were calculated for each building block. These properties were binned to assess their distributions through histograms. With this exercise, the goal was to ensure the functional groups were presented to the reactions in diverse ways.
7c. General procedures utilized in the DECL build. General information listed previously of procedures for material preparation, oligonucleotide analysis, ethanol precipitation, and ligation were applied in the library build (see sections 1a-1e). Cholesterol-tagged DNA oligomer ("spike in") was employed in cycle 2 and cycle 3 to monitor reactions in pooled library manner, which provided different retention time in LC-MS. Other general procedures related to chemical transformations were performed as followed:  (14) were attached through reductive amination onto four different aminoterminating DNA substrates in separate wells. N-Boc diamines (183) and nitro anilines (23) were attached by reverse acylation to two different carboxylic acid terminated DNA substrates in separate wells. In addition, blanks were included as controls of building block-free and/or reagent-free conditions. After precipitation, each chemical transformation was encoded through the ligation with a unique pair of 13-mer duplexed DNA oligonucleotides (codon 1) in separate wells. Finally, the N-Boc carbamates and nitroarenes were deprotected in well using the general procedures for N-Boc deprotection or nitro reduction, respectively. Each well was carefully analyzed by LC-MS before being quenched. After pooling and additional ethanol precipitation, approximately 24 mol of the cycle 1 library pool was recovered.
Procedure for Cycle 2. After splitting a portion of the cycle 1 pool into 336 wells (47.6 nmol/well), each well underwent codon 2 ligation by the general procedure. A Series of dihaloarenes (33) were attached through nucleophilic substitution using both the pH 9.5 heating and DABCO promoted methods. A Series of carboxyl aryl halides (99) were acylated employing both the DEPBT and DMTMM methods, and a series of aldehyde aryl halides (33) were attached by reductive amination using both the NaCNBH 3 and NaBH 4 methods. The conditions of these methods are described in sections previously mentioned and all methods were separately encoded with unique codons. In addition, blanks to encode the no reaction or reagent related side products were included. After pooling and additional precipitation, approximately 14.6 mol of the Cycle 2 pool was recovered. and additional precipitation, approximately 3.9 mol of the Cycle 3 pool was recovered (72% recovery yield after cycle3 chemistry and codon3 ligation).

Preparation of amplifiable DECL samples ("shots") for experiments.
After completion of the main library builds, the entire library material was ligated with a duplexed pair of 12-mer DNA oligonucleotides to encode the overall library structure/design. After EtOH precipitation, partial material underwent sequential ligation with two DNA oligonucleotides, containing a region to encode selection experiment, a degenerate region as an amplification control, a segment to increase sequencing base diversity, and a reverse primer region for post-selection PCR amplification (the purposes/design of these components are discussed in our previous publication 3 ). To ensure the integrity of DNA barcodes in our libraries, we routinely perform a sequencing of the "naive" or unscreened library material. We compare the "perfect read" rate with previous library data, and we examine the distributions of codon populations for each synthetic cycle. No anomalies were observed in our analysis of the library material, and the % perfect rate observed (69%) was close to our historical average (76%). It should be noted this analysis was performed on non-HPLC purified library material and thus reflects the total integrity of the library material within the build. Figure S375. Distributions of observed codon populations from sequencing an unscreened library sample. For each synthetic cycle, the total count for each codon sequence was evaluated and normalized by the mean count for that cycle, and reported as a percentage. Each histogram therefore shows the variation in codon populations about the mean count per cycle.

Sequencing analysis:
Raw DNA sequence reads (in the form of FASTQ files), quality metrics, and sequencing index-to-sample attribute value pairs were obtained from Illumina BaseSpace at the conclusion of sequencing. Samples were linked to their respective FASTQ files based on their sequencing index (DTSU) and were expanded into individual experiments if they were part of a larger pool. Individual samples were then decoded by perfectly matching individual oligonucleotide sub-structures without gaps and in the order defined by the known DNA encoding structure (Main Library Build). Valid DNA barcodes were annotated with the corresponding oligonucleotide sequence-to-building block lookup for each of the three codon cycles, which collectively represent a distinct small molecule within a specific DECL. The degenerate UMI (unique molecular identifier) portions of the DNA barcodes were accumulated into a list of UMIs for each unique codon tuple as a method to distinguish experimental vs. amplification events. Unique molecule counts were then evaluated using a directed-graph counting model as described previously. 3 The set of unique codon tuples with unique molecule counts was then aggregated across all possible combinations of codons (all n-synthons), and enrichment for each n-synthon was evaluated independently. Enrichment was evaluated with a normalized z-score metric which normalizes for sampling and library diversity. 3

Representative procedures for the preparation of Pd-PEPPSI-iPent Cl -pyr
iPent Cl HCl S5. The procedure used for the preparation of S5 was adapted from the work of Pompeo et al and Arduengo et al. 16,17 A flask charged with potassium tert-butoxide (0.251 g, 2.24 mmol, 1.2 equiv) and iPent HCl (1.000 g, 1.86 mmol, 1 equiv) was dried overnight under high vacuum. An atmosphere of N 2 was introduced and then dry 1,4-dioxane (7.44 mL, 0.25 M) was added and the solution was stirred for 2.5 h at 21 °C. Dry CCl 4 (7 mL) was then added, and the solution was heated to 80 °C for 2.5 h. After cooling, HCl (~1 mL, 4 M in 1,4-dioxane, ~3.72 mmol, ~2 equiv) was added slowly and the resulting slurry was stirred for 30 min at 21 °C. The reaction mixture was diluted with CH 2 Cl 2 (30 mL), and then filtered through a Celite pad with additional CH 2 Cl 2 washes. After concentration of the filtrate under reduced pressure, the resulting solids were triturated with small amounts of hexanes and diethyl ether to give nearly pure S5 (1.082 g, 1.79 mmol, 96% yield) that spectroscopically consistent with previously reported characterization data. Pd-PEPPSI-iPent Cl -pyr S6. The procedure used for the preparation of S6 was adapted from the work of Pompeo et al. 9 An oven-dried flask charged with iPent Cl HCl (0.200 g, 0.330 mmol, 1 equiv), PdCl 2 (0.058 g, 0.330 mmol, 1 equiv), and K 2 CO 3 (0.228 g, 1.65 mmol, 5 equiv, mortar-and-pestle crushed), was placed under high-vacuum for several hours to dry all components. After the addition of an N 2 atmosphere, dry pyridine (2.2 mL, 0.15 mM) was added and the reaction was heated to 110 °C for 3 h. After cooling, the reaction slurry was filtered through a silica plug with additional CH 2 Cl 2 washes and the filtrate was concentrated. After trituration with hexanes to remove residual pyridine, purification by flash column chromatography (silica, 10:0 → 4:6 hexanes:CH 2 Cl 2 ) provided S6 (0.220 g, 0.266 mmol, 81% yield) as a solid that spectroscopically consistent with the previously reported characterization data. 9 S6: R f = 0.2 (silica, 1:1 hexanes:CH 2 Cl 2 ); 1

Tables of screening results of C-N coupling
General description: The screening of C-N coupling to substrate 4 and 12 were carried out on 5 nmol scale in 96-well plate. The conversions reported in table S1 were determined by LC/MS. The reported conversions for the ester-containing building blocks were generally observed as hydrolyzed carboxylic acid.