Design of Heme Enzymes with a Tunable Substrate Binding Pocket Adjacent to an Open Metal Coordination Site

The catalytic versatility of pentacoordinated iron is highlighted by the broad range of natural and engineered activities of heme enzymes such as cytochrome P450s, which position a porphyrin cofactor coordinating a central iron atom below an open substrate binding pocket. This catalytic prowess has inspired efforts to design de novo helical bundle scaffolds that bind porphyrin cofactors. However, such designs lack the large open substrate binding pocket of P450s, and hence, the range of chemical transformations accessible is limited. Here, with the goal of combining the advantages of the P450 catalytic site geometry with the almost unlimited customizability of de novo protein design, we design a high-affinity heme-binding protein, dnHEM1, with an axial histidine ligand, a vacant coordination site for generating reactive intermediates, and a tunable distal pocket for substrate binding. A 1.6 Å X-ray crystal structure of dnHEM1 reveals excellent agreement to the design model with key features programmed as intended. The incorporation of distal pocket substitutions converted dnHEM1 into a proficient peroxidase with a stable neutral ferryl intermediate. In parallel, dnHEM1 was redesigned to generate enantiocomplementary carbene transferases for styrene cyclopropanation (up to 93% isolated yield, 5000 turnovers, 97:3 e.r.) by reconfiguring the distal pocket to accommodate calculated transition state models. Our approach now enables the custom design of enzymes containing cofactors adjacent to binding pockets with an almost unlimited variety of shapes and functionalities.


Experimental procedures Materials
All materials were obtained from commercial suppliers and used as received, unless stated otherwise. Chemicals were sourced from Sigma-Aldrich unless otherwise stated. Flash column chromatography was performed with Merck silica gel 60 (35-70 mesh). Polymyxin B sulfate was purchased from Alfa Aesar; LBagar Miller, LB Miller media and 2×YT media from Formedium; Terrific Broth II (TB-II) from MP Biomedical; Escherichia coli (E. coli) BL21 DE3 (#C2527I; NEB), E. coli 5 alpha (DH5α), Q5 DNA polymerase, Phusion polymerase, Gibson Master Mix, T4 DNA ligase and restriction enzymes from New England BioLabs (NEB). Oligonucleotides and genes were synthesized by Integrated DNA Technologies (IDT). 1 H and 13 C NMR spectra were recorded on a Bruker Advance ( 1 H at 400 MHz, 13 C at 100 MHz) spectrometer. 1 H and 13 C spectra are referenced to residual solvent signals; CDCl3 7.26 ppm for 1 H and 77.0 ppm for 13 C. Coupling constants (J) are reported in Hz and coupling patterns are described as d = doublet, q = quartet, m = multiplet.

Construction of pET29b(+)_dnHEM1 and variants
Genes encoding the 22 designs (denoted with the prefix 'IK_HC015_') were purchased as subcloned genes in pET29b(+) vector with an additional 19-residue C-terminal sequence containing a His-tag and SNAC cleavage site 1 between the NdeI and XhoI restriction sites (full sequence: M<design>GGSGGSHHWGSGSHHHHHH). The genes were codon optimized for E. coli expression. Plasmids were purchased from Integrated DNA Technologies (IDT).
Cloning DNA sequences with Gibson assembly Plasmids encoding the carbene transferase designs and dnHEM1 (pI=6) were generated by Gibson assembly. Double-stranded DNA fragments encoding the designs were purchased from Integrated DNA Technologies (IDT) as eBlocks ™ Gene Fragments. For carbene transferase designs, the DNA sequence was assembled from two fragments through an overlapping region encoding residues A111-L116, with the 5' (Nterminal) fragment containing mutations introduced through Rosetta redesign, and the second fragment encoding the second half of the protein sequence. Overhang sequences complementary to those found in the 3' and 5' ends of the linear vector were added to the 5' end of the first and 3' end of the second fragment. An overlapping sequence of 18 nucleotides (GCGGCTTTAGCCCTGCTG) was used to assemble the two fragments. Full sequence: M<design>GGSGGSHHWGSGSLEHHHHHH) The following DNA sequence was used as the second fragment (two-fragment assembly overhang highlighted in bold; vector assembly overhang highlighted in italics):

SNAC-tag cleavage
Adapted from a published protocol. 1 The following protocol was applied to proteins obtained from 50-100 mL expressions, loaded onto 1-1.5 mL Ni-NTA resin. The resin is loaded with protein by incubating the lysis supernatant and the resin for 30 minutes on a nutating platform. Protein-loaded resin is subjected to 5 washing steps: 1) 20 mL lysis buffer (25 mM Tris-HCl, 300 mM NaCl, 25 mM imidazole); 2) 20 mL lysis buffer with 1M NaCl; 3) 20 mL lysis buffer; 4) 20 mL TBS (25 mM Tris-HCl, 300 mM NaCl); 5) 20 mL SNAC buffer (100 mM CHES-NaOH, 100 mM Acetone oxime, 100 mM NaCl, pH 8.6; without NiCl2). Thereafter, the column is capped and 15 mL of SNAC buffer containing 2 mM NiCl2 is added, and the column incubated on a nutating platform for 18 hours. SDS-PAGE was used to monitor the completion of the cleavage reaction. The flowthrough fraction (containing the cleaved protein) was collected, and the resin washed with 10 mL of lysis buffer. These fractions were combined and concentrated down to 1 mL using Amicon™ Ultra-15 10K centrifugal filters.

Size-exclusion chromatography
Following IMAC purification, designs were further purified by SEC on ÄKTAxpress (GE Healthcare) using a Superdex Increase 75 10/300 GL column (GE Healthcare) in TBS buffer at 0.8 mL/min flowrate. The monomeric or smallest oligomeric fractions of each run (eluting at approximately 13.5 ml) were collected. The resulting samples were generally > 95% homogeneous on SDS-PAGE gels. SEC retention volume to molecular weight equivalencies were calibrated with protein standards (Cytiva LMW (#28403841). Further comparisons were made with selected proteins between using running buffers consisting of either 25 mM Tris, 300 mM NaCl, pH 8.2 or 50 mM KPi, 200 mM NaCl, pH 7.2.

Protein production and purification of in vitro loaded dnHEM1
Chemically competent E. coli BL21(DE3) cells were transformed with the appropriate pET29b(+)_dnHEM1 construct. A single colony of freshly transformed cells was cultured for 18 h in 5 mL of LB Miller medium containing 50 μg/mL kanamycin. 500 μL of the culture was used to inoculate 50 mL of 2xYT medium supplemented with 50 μg/mL kanamycin. The culture was incubated for ~2 h at 37 °C with shaking at 180 r.p.m to an optical density at 600 nm (OD600) of ~0.5 A.U.. Protein expression was induced with the addition of IPTG to a final concentration of 0.1 mM. The induced cultures were incubated for ~20 h at 25 °C, and the cells were subsequently harvested by centrifugation at 3220 g for 10 min. The pelleted bacterial cells were suspended in lysis buffer (50 mM KPi, 300 mM NaCl, 20 mM imidazole, pH 7.5) supplemented with lysozyme (1 mg mL -1 ), DNase (1 g mL -1 ) and a Complete EDTA free protease inhibitor cocktail tablet (Roche), and subjected to sonication (13 mm probe, 15 mins, 20 s on, 40 s off, 40 % amplitude). Cell lysates were cleared by centrifugation (10,000 g, 30 min). To maximize the heme occupancy, the clarified lysates were mixed with hemin to a final concentration of 20 μM (400 µM stock solution in assay buffer) for 30 mins at room temperature. The heme loaded clarified lysates were subjected to affinity chromatography using Ni-NTA Agarose (Qiagen). His-tagged variants were eluted using elution buffer (50 mM KPi, 300 mM NaCl, 250 mM imidazole, pH 7.5). The eluent containing purified protein was buffer exchanged into assay buffer (50 mM KPi, 200 mM NaCl, pH 7.2) using a 10DG column (Bio-Rad) and analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and mass spectrometry. The concentration of heme loaded proteins were determined using the extinction coefficient of the Soret peak (as determined using a pyridine hemochromogen assay (Supplementary Table 1)).

S5
Apo dnHEM1 production For the expression of apo dnHEM1 protein, a modified M9 minimal medium (1x M9 salts, 0.2% glucose, 0.1 mM CaCl2, 2mM MgSO4, 4 mg mL -1 casamino acids, 10 µg mL -1 thiamine chloride) was used to minimize heme contamination. A single colony of freshly transformed cells was cultured for 18 h in 5 mL modified M9 medium containing 50 µg mL -1 kanamycin. A starter culture (500 µL) was inoculated to 50 mL modified M9 medium supplemented with 50 µg mL -1 kanamycin. The culture was grown for 3 h at 37 °C, 180 r.p.m. to OD600 of 0.6. Protein expression was induced with the addition of IPTG to a final concentration of 0.1 mM. The induced culture was incubated for 20 h at 25 °C (180 r.p.m.), and the cells were subsequently harvested by centrifugation (3,220 g for 10 min). Pelleted cells were resuspended in lysis buffer (see above) and subjected to sonication. Cell lysates were cleared by centrifugation (10,000 g for 30 min) and the supernatant were subjected to affinity chromatography using Ni-NTA Agarose (Qiagen). The His-tagged protein was eluted using the 10 mL elution buffer (see above), and buffer exchanged into assay buffer with a 10DG desalting column (Bio-Rad). The purified apo protein was aliquoted, flash-frozen in liquid nitrogen and stored at -80 °C immediately. The protein concentration was determined spectrophotometrically on a NanoDrop (Thermo Fisher) with an extinction coefficient of 12,490 M -1 cm -1 (calculated using ProtParam Expasy) at 280 nm.

Hemoprotein extinction coefficient calculations
The pyridine hemochromagen assay was used to determine the extinction coefficient of the Soret maximum, according to the method of Berry

MS analysis
MS data for dnHEM1 variants were acquired on a 1200 series LC QTOF 6510 MS (Agilent). The final protein concentrations were adjusted to 10 µM in 50 mM KPi 200 mM NaCl pH 7.2. A sample volume of 5 μL of sample was injected onto the LC-MS system and desalted inline with 1 mL min -1 5% acetonitrile (0.1% formic acid). Protein was eluted over 1 minute by 95% acetonitrile. The resulting multiply charged spectrum was analyzed by an QTOF 6510 (Agilent) in ESI positive ion mode, and deconvoluted using Masshunter Software (Agilent). The instrument was tuned and calibrated with reference solution.
Alternatively, an Agilent 1200series LC G6230B TOF LC-MS with an AdvanceBio RP-Desalting column was used (A: H2O with 0.1% Formic Acid, B: Acetonitrile with 0.1% Formic Acid). Final protein concentrations were adjusted to 1-2 mg mL -1 in 30 mM Tris-HCl 300 mM NaCl pH 8.2. Subsequent data deconvolution was performed in Bioconfirm using a total entropy algorithm. All data are presented in Supplementary Spectrophotometric heme binding assay To qualitatively determine the heme binding ability of the de novo designed proteins, UV−Vis spectra were measured of the protein and hemin mixture using an Agilent Cary 8454 or Jasco Spec V750 spectrophotometer with a 10 mm pathlength cuvette. Spectra in the 230-700 nm range were collected of solutions containing 30 μM of purified protein and 10 μM of hemin (unless stated otherwise). Samples were prepared by mixing 5 μL of hemin solution (200 μM stock solution in DMF) into a protein solution in TBS buffer, adding up to a total volume of 100 μL. Data are presented in Supplementary Fig. 3.

Spectrophotometric heme titration and determination of dissociation constant
To determine the affinity of dnHEM1 for ferric heme B, we performed a binding titration following methods reported previously. 5 Briefly, a heme stock solution was prepared in DMSO with a concentration of 150 to 400 μM heme as determined by pyridine hemochromagen assay. 4 We prepared 2.5 mL of 0.4 to 1.5 μM apo-dnHEM1 in aqueous buffer with 200 mM NaCl, 50 mM potassium phosphates, pH 7.3, and 0.5% w/v octylβ-glucoside to minimize aggregation. The protein solution was added to a 1-cm pathlength quartz cuvette with a stir bar, and an absorbance spectrum was recorded with a Jasco V-750 UV−vis spectrophotometer. Aliquots of heme stock solution were added to the protein sample with stirring at 25°C. After each aliquot was added, the protein-heme mixture was allowed to equilibrate for at least 10 minutes, at which point another absorbance spectrum was recorded. Heme aliquots were added and spectra recorded until a 2.5-fold excess of heme had been added in total. Absorbance values at 402 nm (the Soret maximum in heme-bound state) were plotted against heme concentration ( Figure 2B), and the data were fitted using Origin 8.1 to a one-site binding equation: In this equation, the total absorbance at 402 nm is given as A. The total heme concentration is Htot,, Ptot represents the total protein concentration, Kd is the dissociation constant, and ε is the extinction coefficient of heme when it is free in solution or bound to protein. The collected spectra and absorbance-[heme] curves are depicted in Supplementary Fig. 10.

Variable temperature spectrophotometric measurements
To observe changes in the spectral properties of bound heme at increasing temperatures, UV−Vis spectra were measured of in vitro loaded holo-proteins using the Jasco Spec V750 spectrophotometer and a 10 mm pathlength cuvette. Spectra in the 230-800 nm range were collected at every 10 °C intervals between 25 °C and 95 °C. Temperature was increased at the rate of 5 °C min -1 , and spectra were acquired after the temperature had stabilized to within 0.5 °C of target temperature for 5 seconds. Measurements were performed with 20 μM solutions of purified holoprotein in TBS buffer (25 mM Tris-HCl, 300 mM NaCl, pH 8). Data are presented in Supplementary Fig. 16 and Fig. 26.

pH-dependent spectrophotometric measurements
To observe changes in the spectral properties of bound heme at various pH levels, UV−Vis spectra were measured of the mixture of 10 μM apo-dnHEM1 and 2 μM hemin, the in vitro loaded holo-proteins (7.5 μM dnHEM1, dnHEM1-RR2, dnHEM1-SS19), and 2 μM free hemin using the Jasco Spec V750 spectrophotometer and a 10 mm pathlength cuvette. Spectra in the 230-800 nm range were collected at at ph levels 3, 4, 5, 6, 7, 8, 9 and 10. The universal Britton-Robinson buffer system was used across the entire pH range to ensure comparable buffer conditions. The buffer consists of 150 mM NaCl and equimolar quantitites (40 mM) of H3PO4, B(OH)3 and acetic acid, with the pH adjusted using NaOH. Any particulate was removed by filtration through a 0.22 μm filter before use.
The samples were prepared by mixing 151 μL of the pH-buffer with 9 uL of the protein solution (137 μM, in 50 mM KPi buffer containing 200 mM NaCl). Hemin samples were prepared by mixing 158 uL of the pH-buffer with 2 μL of 150 μM solution of hemin in DMSO. Data are presented in Supplementary Fig. 13.

Circular dichroism
To determine secondary structure and thermostability of the designs, far-ultraviolet circular dichroism (CD) measurements were carried out on a JASCO J-1500 instrument. The 200 to 260 nm wavelength scans were measured at every 10 °C intervals from 25 °C to 95 °C. Temperature was increased at the rate of 2 °C min -1 , and spectra were acquired after the temperature had stabilized to within 0.1 °C of target temperature for 5 seconds. Wavelength scans and temperature melts were performed using 0.40 mg mL -1 protein in 25 mM Tris-HCl, 30 mM NaCl buffer at pH 8.2 with a 1 mm path length cuvette. Protein concentrations were determined by absorbance at 280 nm, measured using a NanoDrop spectrophotometer (Thermo Scientific) using predicted extinction coefficients. 6 Data are presented in Supplementary Fig. 7, Fig. 15 Fig. 1. SDS-PAGE analysis of selected designed heme binding proteins before and after SNAC cleavage reaction. I = insoluble pellet; S = soluble fraction; F = flow-through after SNAC cleavage; W = wash fraction after SNAC cleavage; B = sample from Ni-NTA beads after SNAC cleavage reaction.
Supplementary Fig. 2. Size-exclusion chromatograms of designed heme binding proteins after SNAC cleavage reaction. Data were collected using a Superdex Increase 75 10/300 GL column (GE Healthcare) in a buffer containing 25 mM Tris-HCl and 300 mM NaCl at pH 8.2. Void volume of the column is 8.5 mL.

Analysis of dnHEM1 mutants
The pI of dnHEM1 was lowered from 10.0 to 6.0 in order to bring it closer to most naturally occurring proteins, and to determine how it affects its ability to bind heme . This was achieved by mutating 12 arginine and  lysine residues on the surface of the protein to GLU, ASN or GLN: K25Q, K60Q, R61E, K64Q, K95N, K99N,  K130Q, R131E, K134Q, K200Q, R201E, K204Q. The H148A and H148F mutants of dnHEM1, as well as the low pI variant, were expressed following a standard protocol as described above (including the SNAC-tag cleavage). Size-exclusion chromatography indicated that mutating the His148 or the surface Arg and Lys residues had no effect on the oligomerization state (Supplementary Fig. 6 and 8A).
Supplementary Fig. 4. SDS-PAGE analysis of dnHEM1 mutants after SNAC cleavage reaction. FT = flowthrough fraction containing cleaved protein; B = cleaved and uncleaved protein remaining bound to the Ni-NTA resin.
Supplementary Fig. 6. Size-exclusion chromatograms of dnHEM1 H148 mutants after SNAC cleavage reaction. Data were collected using a Superdex Increase 75 10/300 GL column (GE Healthcare) in a buffer containing 25 mM Tris-HCl and 300 mM NaCl at pH = 8.2. Void volume of the column is 8.5 mL.
Supplementary Fig. 7. Top: circular dichroism (CD) spectra of holo-and apo-dnHEM1, measured at 15 μM protein concentration by increasing the temperature from 25 to 95 °C at 0.4 mg mL -1 protein concentration in 25 mM Tris-HCl, 30 mM NaCl buffer at pH = 8.2. Bottom: changes in the molar ellipticity at 222 nm while increasing the temperature. S13 Supplementary Fig. 8. (a) Size-exclusion chromatogram of SNAC-cleaved dnHEM1 pI=6 mutant. Data were collected using a Superdex Increase 75 10/300 GL column (GE Healthcare) in a buffer containing 25 mM Tris-HCl and 300 mM NaCl at pH = 8.2. Void volume of the column is 8.5 mL. (b) UV−Vis spectra of purified holo dnHEM1 (pI=6) in assay buffer at room temperature.
Supplementary Fig. 9. (a) Size-exclusion chromatogram of apo-and holo-dnHEM1 (a, d), holo dnHEM1-RR2 (b) and holo dnHEM1-SS19 (c), eluted in 50 mM KPi buffer containing 200 mM NaCl at pH 7.2. Data were collected using a Superdex Increase 75 10/300 GL column (GE Healthcare). Some degree of oligimerization can be observed under these buffer conditionsas indicated by the peak at 9.5 mL elution volume, and the shoulder at 12.5 mL elution volume.
Supplementary Fig. 10. Protein-heme binding titrations using dnHEM1 protein at different concentrations. (ac) Heme was titrated into samples with the indicated concentrations of dnHEM1 protein as described in the Supplemental Experimental Procedures. The absorbance at 402 nm was plotted against heme concentration and fitted to a one-site binding equation. The fitted dissociation constants (Kd's) are given in the plots with the standard errors of the fits, along with the adjusted R 2 values from curve fitting in Origin 8.1. For the three titrations, the mean of the fitted extinction coefficients at 402 nm was 116,000 ± 7,600 M -1 cm -1 in the hemebound state and 53,000 ± 5,500 M -1 cm -1 in the unbound state. The mean Kd for the three titrations was 2.5 ± 1.2 nM. (See Supplementary Fig. 11 for spectra of bound and unbound heme). Despite the agreement in the fitted Kd's between the three titrations and the high goodness of fit values, we note that because the titrations were carried out at concentrations substantially higher than the Kd, the accuracy of the Kd measurement may be poorer than the standard errors would suggest. Significantly lower concentrations would have had low signal-to-noise ratios. Nevertheless, the sharp change in slope we observe at a stoichiometric heme:protein ratio clearly indicates high affinity binding with a Kd significantly below the protein concentration, likely <10 nM. (d-f) The full absorbance spectra that correspond to the data shown in (a-c). Buffer conditions were 200 mM NaCl, 50 mM potassium phosphate, pH 7.3, and 0.5% w/v octyl-β-glucoside.
Supplementary Fig. 11. UV/vis absorbance spectra of heme with and without dnHEM1 protein present. The buffer conditions are the same as for the heme binding titration shown in Supplementary Fig. 10: 200 mM NaCl, 50 mM potassium phosphates, pH 7.3, and 0.5% w/v octyl-β-glucoside. The absorbance feature at 280 nm in the protein-containing trace (orange) is attributable to Trp and Tyr residues in the protein. The Soret band at 402 nm originates from heme. The heme extinction coefficient at 402 nm is significantly higher in the protein-bound state (orange) than in the unbound state (blue); this change in extinction coefficient allows binding to be monitored spectroscopically as in Supplementary Fig. 10.
Supplementary Fig. 12. The spectra of ferric (black), ferrous (blue) and CO bound state (red) of the dnHEM1 variants. (experimental procedures: In a N2 glovebox, 5 µM heme protein was reduced by 100 µM dithionite. The spectra of both ferric and ferrous states were recorded on a UV spectrometer inside the glovebox. The ferrous solution was transferred into an air-tight cuvette with a rubber cap. The cuvette was removed out of the glovebox and gently flushed through CO for 1 min in a fume hood. The UV spectra of the resulting solution was immediately recorded.) Supplementary Fig. 13. dnHEM1 retains its ability to bind heme at pH levels ranging from 3 to 10. All spectra were recorded in pH-adjusted 40 mM Britton-Robinson buffer containing 150 mM NaCl. UV/Vis spectra at different pH levels of: (a) the mixture of 10 µM dnHEM1 and 2 µM hemin; (c) 7.5 µM holo dnHEM1; (e) 7.5 µM holo dnhEM1-RR2; (g) 7.5 µM holo-dnHEM1-SS19; (i) 2 µM hemin. Insets show the changes in the Q band region at 5x magnification. Aggregation is observed at pH 9 and 10 with holo-dnHEM1. (b, d, f, h, j) Relative changes of the absorbance of the Soret maximum.

Library construction
Round 1: saturation mutagenesis. 18 positions were randomized independently using pET29b(+)_dnHEM1 as a template and primers with degenerate NNK codons (primer sequences shown in Supplementary Table 3). DNA libraries were constructed by overlap extension polymerase chain reaction (PCR). The linear library fragments and the pET29b(+) vector were digested using NdeI and XhoI endonucleases, gel-purified and subsequently ligated using T4 DNA ligase in a 5:1 ratio respectively. Round 2: divergent saturation mutagenesis. The two most active clones from the first round of mutagenesis and screening (dnHEM1.1 and dnHEM1.1B) served as the templates for a second round of divergent evolution pathways. 6 positions were randomized independently by overlap extension PCR (primer sequences shown in Supplementary Table 3) and cloned as described above. The two most active clones of Round 2 were dnHEM1.2 and dnHEM1.2B.

Shuffling by overlap extension PCR
After each round of evolution, beneficial diversity was combined by DNA shuffling of fragments generated by overlap extension PCR. Primers were designed to encode either the parent amino acid or the identified mutation. These primers were used to generate short fragments (up to six), which were gel-purified and mixed appropriately in overlap extension PCR to generate genes containing all possible combinations of mutations. Genes were cloned as described above.

Library screening
For protein expression and screening, all transfer and aliquoting steps were performed using Hamilton liquid-handling robots. Chemically competent E. coli BL21 (DE3) cells were transformed with the ligated libraries described above. Freshly transformed clones were used to inoculate 150 μL of 2×YT medium supplemented with 50 μg mL -1 kanamycin in Corning ® Costar ® 96-well microtiter round bottom plates. For reference, each plate contained six freshly transformed clones of the parent template and two clones containing an empty pET29b(+) vector. Plates were incubated overnight at 30 °C, 80% humidity in a shaking incubator (Infors) at 850 r.p.m. 20 μL of overnight culture was used to inoculate 480 μL 2×YT medium supplemented with 50 μg mL -1 kanamycin. The cultures were incubated for ~ 2 h at 30 °C, 80% humidity with shaking at 850 r.p.m. At approximately OD600 = 0.5, IPTG was added to a final concentration of 0.1 mM, and plates were incubated for 20 h at 30 °C. Subsequently, cells were collected by centrifugation at 2,900 g for 10 min. The supernatant was discarded and the pelleted cells were re-suspended in 400 µL lysis buffer (50 mM KPi, 200 mM NaCl, pH 7.2 buffer supplemented with 1.0 mg mL -1 lysozyme, 0.5 mg mL -1 polymyxin B and 10 µg mL -1 DNase I) and incubated for 1 h at 30 °C, 80% humidity with shaking at 850 r.p.m., followed by a 60 °C heat shock for 1 h at 850 r.p.m.. Precipitates were removed by centrifugation at 2,900 g for 20 min. 20 µL of clarified lysate were transferred to Corning ® Costar ® 96-well microtiter round bottom plates, followed by the addition of 20 μL hemin (final assay concentration 1 μM, from a 10 µM stock in assay buffer) and incubated at room temperature for 20 min. Subsequently, 140 µL of assay buffer containing Amplex TM Red substrate (50 µM, from a 71.5 µM stock in assay buffer) was transferred to the heme loaded lysate. Reactions were initiated by the addition of 20 µL H2O2 (500 µM, from a 5 mM stock in assay buffer). Resorufin formation was monitored by the absorbance change at 571 nm over 20 min using a CLARIOstar plate reader (BMG Labtech).
The most active clones from each round were rescreened in lysate in triplicate. Expression and screening were performed as described above, but cultures were inoculated from glycerol stocks prepared from the original library cultures. Following each round, the most active variants were rescreened as purified proteins. Proteins were expressed and purified as described above with the exception that starter cultures were inoculated from glycerol stocks prepared from the original library plate overnight cultures.
Steady-state kinetic assays to determine the total turnover numbers Steady-state kinetic assays were performed on a Cary UV-50 spectrophotometer (Varian) with a 1 cm path length quartz cuvette. Amplex TM Red substrate (50 µM) and enzymes (concentration of 0.1 µM for dnHEM1.2 and dnHEM1.2 H148A) were mixed in assay buffer (50 mM KPi, 200 mM NaCl, pH 7.2, total volume 1 mL). The reaction was initiated by the addition of 20 µL H2O2 (final concentration 500 µM) and the UV−Vis spectra at 571 nm was recorded immediately as a function of time. The absorbance of the product was converted to concentration using the extinction coefficient (ε571) of 58,000 M −1 cm −1 . Assays were performed in triplicate.  Fig. 18b).

S22
Pseudo first-order rate constants for the formation of the ferryl intermediate of dnHEM1.2B were obtained as follows. In a stopped-flow UV−Vis spectrometer equilibrated to 25 °C, one syringe containing 5 μM enzyme in assay buffer was mixed with the other syringe containing at least a 10-fold excess of H2O2 (50, 100, 200, and 400 μM) in the same buffer. Ferryl species formation was monitored by a decrease in absorbance at  Fig. 19). The decay rate of the ferryl species was determined using double mixing stopped-flow experiments according to a previous procedure. 7 To this end, 12 μM dnHEM1.2B was mixed 1:1 with 400 μM H2O2 in assay buffer at 25 °C. The mixture was aged until the protein reached full conversion to the ferryl state (8 s) before being mixed 1:1 with 500 nM bovine liver catalase to degrade any excess H2O2. Spectra were recorded with a photodiode array, the decay of the ferryl intermediate was monitored by an increase in absorbance at 401 nm, and the resulting time traces were fitted to a single exponential (Pro-Data Viewer software) to derive autoreduction rates ( Supplementary Fig. 21).
The reactions were quenched with the addition of 30 µL HCl (3 M). 500 µL of 1 mM 1,3,5-trimethoxybenzene in ethyl acetate was added as an internal standard. Following vortexing, the top organic layer was passed through MgSO4 (supported by a piece of cotton in a glass Pasteur pipette) and was analyzed by chiral and achiral GC as described below.

Chiral GC analysis
To determine the reaction enantioselectivity, chiral GC analysis was carried out using an Agilent 7890A GC system, an FID detector, and an Agilent J&W GC column (CP-Chirasil-Dex CB, 25 m x 0.25 mm, 0.25 μm film). A 1 µL sample was injected with a detector temperature 200 °C. The temperature gradient started from 80 °C, then increased to 200 °C (5 °C per min) and held for 2 min. The total run time was 30 min. The absolute configuration of the main product enantiomer was determined by comparing the main (S, S) enantiomer generated from a Mb (H64V-V68A) catalyzed biotransformation as previously reported. 8 The results of all dnHEM1 redesigns are reported in Supplementary Table 4.

Achiral GC analysis
To determine the reaction yield, achiral GC analysis was carried out using an Agilent 7890A GC system, equipped with an Agilent GC column (Vf5, 25 m x 0.25 mm, 0.25 μm film). 1 µL sample was injected with a detector temperature 250 °C. The temperature gradient started from 50 °C for 2 min, then increased to 320 °C (20 °C per min) and held for 2 min. The total run was 20 min. The product yield is calculated based on the product calibration curve ( Supplementary Fig. 23b).

Preparative scale cyclopropanation under anaerobic conditions
In an anaerobic N2 glove box, holo dnHEM1-SS19 (167.6 µL, from a 238.7 µM stock in degassed assay buffer) was added to 37.5 mL degassed assay buffer in a 100 mL round bottom flask equipped with a stirrer bar. Styrene (1 mL, from a 200 mM stock in MeCN) was added, followed by slow addition of EDA (1 mL, from a 400 mM stock in MeCN) over 1 h by a syringe pump. Dithionite (400 µL, from a 10 mM stock in degassed assay buffer) was added, and the reaction mixture was stirred for 2 h at 25 °C. The final concentrations of reagents were: 5 mM styrene, 10 mM EDA, 100 µM dithionite and 1 µM heme protein.
The reaction was quenched inside the glove box by HCl (3 M), extracted with ethyl acetate (3 x 30 mL) in a separating funnel under ambient conditions. The organic layers were combined, dried by MgSO4 and concentrated in vacuo. The resulting crude mixture was analyzed by 1 H NMR, then further purified by silica chromatography (diethyl ether: cyclohexane= 1: 10), resulting an isolated product yield of 93% with 93% e.e. for the trans (S, S) enantiomer.

Crystallographic data
Crystallization, refinement and model building Protein sample for crystallography was prepared following the procedure outlined in section "Protein production and purification of in vitro loaded dnHEM1", on page S4. The holoprotein was purified using Niaffinity and size exclusion chromatography. The C-terminal hexahistidine tag was left intact. The holo dnHEM1 was crystallized at 7 mg mL −1 in assay buffer (50 mM KPi, 200 mM NaCl, pH 7.2). Crystallization conditions for dnHEM1 were identified using the LMB screen (Molecular Dimensions). Crystals suitable for diffraction experiments were obtained by sitting drop vapor diffusion at 4 °C in 200 nL drops containing equal volumes of protein and crystallization solution. For dnHEM1 this contained 0.1M HEPES pH 7.7, 70% (4S)-2-methyl-2,4pentanediol. The crystals were cryoprotected using paraffin oil and flash-cooled in liquid nitrogen. Data were collected on beamline iO3 (wavelength 0.9763 Å) at the Diamond Light Source Facility and reduced and scaled with Xia2. The resolution limit of 1.6 Å was determined via paired refinement in PDBREDO. 9 The dnHEM1 crystal structure was solved by molecular replacement using the PHASER program in the CCP4 suite 10 using the initial left-handed closed α-solenoid design as the starting model (PDB code: 4YXX). 11 The dnHEM1 models were completed by iterative cycles of manual model building and real space refinement using the program COOT and crystallographic refinement using PHENIX.refine. 12 The processing and final refinement statistics are presented in Supplementary  Fig. 28. Fe-methyl heme model used in heme binding site matching and design. Atoms used for defining the heme-histidine constraints are labelled and shown as spheres. The added methyl group carbon atom is shown in magenta, heme carbon atoms are shown in green, and the coordinating histidine carbon atoms in white. Nitrogen = blue, oxygen = red, iron atom = orange, and hydrogen atoms are shown as small white sticks.
The heme model was built by adding a methyl group to the iron atom, trans to the coordinating imidazole ligand (acting as a mimic for histidine) and is depicted in Supplementary Fig. 28. The Rosetta params file, together with the rotamer library was created following the procedures described in section 3.2. The imidazole ligand was removed before creating the params file. A constraint file describing the heme-His interaction geometry was constructed based on the geometries found in native heme enzymes (peroxidases and globins), but allowing for flexible sampling of the torsional angle around the Fe-N bond. Unprotonated histidine nitrogen atom with Rosetta atom type 'NHis' was used for matching against heme iron atom. For steric reasons, only -protonated tautomer of histidine was used, and thus the three histidine atoms used for defining the constraints were: NE2, CE1, ND1 (Supplementary Fig. 28). Optionally, an additional hydrogen bonding residue (Glu or Asp) was matched downstream from His, to the protonated nitrogen atom, constrained to an idealized hydrogen bond geometry. A crystal structure of a toroidal repeat protein (PDB id: 4YXX) was relaxed using Rosetta FastRelax in multiple trajectories, and the resulting models were used as input scaffolds for Rosetta Matcher. Matching was restricted to positions inside the pore of the toroid: 4, 7, 8, 11, 12, 15, 39, 42, 43, 46, 47, 50, 74, 77, 78, 81, 82, 85, 109, 112, 113, 116, 117, 120, 144, 147, 148, 151, 152, 155, 179, 182, 183, 186, 187, 190 ( Contents of the file match.flags: -extra_res_fa /path/to/theozyme/HMM/HMM.params -match::lig_name HMM -match::dynamic_grid_refinement true -match::enumerate_ligand_rotamers true -match::consolidate_matches true -match::output_matches_per_group 10 -in:ignore_unrecognized_res -ex1 -ex2 -match:geometric_constraint_file /path/to/theozyme/HMM/HMM_onlyH.cst Models obtained by running Rosetta 'match' application 13 with the aforementioned constraints and scaffolds were lastly evaluated for how solvent-exposed the placed heme molecule is. Structures where more than 20% of heme is exposed, based on calculating the solvent-accessible surface area (SASA) of the heme model were discarded. Matches were filtered using a Python script 'analyze_matches_Heme.py' available for download on GitHub. 14

Heme binding site design
The models obtained from the matching step were subjected to binding site sequence optimization using the Rosetta 'enzyme_design' application using the command and flags described below. 15 Positions within 8Å of any ligand heavy atom were set to be designable while positions within 12Å were allowed to be repacked. Iterative application of repacking, design and minimization steps, while applying the same constraints used in matching, yielded models that were then scored by metrics describing the His-Fe interaction geometry, shapecomplementarity, and preorganization of the heme binding pocket (assessed by side chain packing calculations in the absence of the heme). The constraint score and no-ligand-repack RMSD metrics were calculated within the design protocol. Thereafter, designs passing thresholds for these metrics were evaluated for how solventexposed heme is, the shape complementarity between the heme and the ligand, and whether both of the carboxylate groups of heme have at least on hydrogen bond partner. The latter three metrics, as well as preliminary filtering of the designs were implemented in a Python script 'analyze_scores_heme_enzdes_pdb.py' available for download on GitHub. 14 The thresholds for filtering are summarized in Supplementary Table 7 and the distribution of scores in Supplementary Fig. 30.
Design jobs were run using the following command and flags:  Fig. 30. Distribution of metrics used for filtering designed heme binder models. Red -all scores, green -scores passing the filters. Three metrics in the last column were calculated only for the designs passing the first three metrics.

Ligand docking
Docking of heme into design models was performed using the Rosetta GALigandDock 16 mover and 'beta_genpot' scorefunction. The methyl-containing heme model was replaced with a model having an open coordination site, and 20 docking trajectories were seeded from the designed heme orientation. Each trajectory identified 20 best docks that were minimized and scored. Structures from all trajectories of a given design were combined and their Rosetta total score and ligand_rmsd (relative to the design model) values analyzed. A pnear score was calculated from the score-rmsd relationship to numerically describe how similar the lowest scoring docks are to the design model. 17 A pnear cutoff of 0.75 was used to filter most designs. The obtained forward docking funnels for each of the ordered designs are shown in Supplementary Fig. 31, together with the calculated pnear values.

DFT optimization of transition states
All calculations were performed using Gaussian 16 software. 18 Structural optimizations and frequency calculations were performed with B3LYP-D3 method along with 6-31G(d) basis set and the SDD ECP on Fe atom. Single point energy calculations were performed with M06L, M06 and B3LYP-D3 methods and def2-TZVP basis set. D3 dispersion correction was applied using the Becke-Johnson damping function. 19 Solvent effects of water and diethyl ether were included using the CPCM solvation model during optimization and single point energy calculations. This method has previously been shown to be appropriate for modeling similar systems. 20 Frequency calculations were performed to confirm whether the structure is a minimum or a transition state. Intrinsic reaction coordinate (IRC) analysis was used to confirm that the obtained transition states connect the correct minima.
Transition states leading to the formation of the R,R and S,S enantiomers of ethyl 2-phenylcycloclopropanecarboxylate were located in five different conformations resulting from the rotation around the Fe-C bond (Supplementary Fig. 32). The charge of the system was kept at -2 and the singlet electron configuration was considered. Imidazole was coordinated to the proximal site of heme to mimic histidine coordination. To verify that all of these transition states are energetically relevant their single point energies were calculated with multiple DFT functionals and the resulting corrected free energies compared (Supplementary Table 8). Implicit solvation by water and diethyl ether were used to mimic aqueous and protein pocket environments (dielectric constant of diethyl ether has been reported to approximate that of a protein pocket 21 ). As judged by the relative free energies at all tested levels of theory, all of the conformers are energetically feasible. In particular, when considering potential perturbations to the energies in a specific protein pocket environment.
Supplementary Fig. 32. Conformations of the pro-S,S and pro-R,R cyclopropanation transition states.

Rotamer library creation
Conformational diversity of each of these transition states was further increased by sampling the rotamers of the flexible propionic acid groups of heme, as well as the conformers of the vinyl groups. We aimed to find various low energy conformers that aren't necessarily exactly representing possible lowest energy local minima but are still sufficiently close to them. To achieve that, dihedral angles were randomly sampled for the 8 rotable bonds corresponding to the two propionic acid and two vinyl groups. To ensure conformational diversity, all saved conformers have at least one of the dihedral angles at least ±20° different from any other structure. Lastly, the conformers were sampled as frozen coordinates with no geometry optimization, in order not to affect the transition state geometries. Energies of the conformers were evaluated using the GFN2-XTB semiempirical QM method. 22 This procedure was performed using a Python script that packages together conformer sampling, energy evaluation and analysis, and is available on GitHub. 23 100 conformers were saved for each transition state rotamer by sampling 2000 random configurations.

Rosetta params file creation
The generated conformers were initially saved as XYZ files that were subsequently converted to MOLfiles using OpenBabel. 24 The bonding information in the MOLfile was manually inspected to ensure that the entire structure is represented as a single fragment, and edited, if necessary. Thereafter, mol2params.py script, available within Rosetta, was used to convert the MOLfile to a Rosetta-compatible .params file. The partial charges of the carboxylate oxygen atoms of the propionate groups were adjusted in the params files from -0.74 to -1.24 to increase the likelihood of H-bonds being created with these atoms during Rosetta design. The conformers described above were included in the params file via an accompanying PDB file. All created params files are available for download at GitHub. 14 S55

Carbene transferase active site design
The transition state models of S,S and R,R addition styrene to iron carbenoid, the Rosetta params files and the accompanying rotamer libraries were created following the procedures described in section 3.2. The imidazole ligand was removed before creating the params file. The corresponding theozyme models HSS and HRR are available on GitHub. 14 Transition state models were aligned to the heme molecule in the dnHEM1 design model based on the positions of the corresponding porphyrin nitrogen atoms. A subset of designs used, as the starting point, a structural model obtained by predicting the structure of dnHEM1 from single sequence with AlphaFold2 25 model 4, and relaxing it with FastRelax together with heme. Design was performed using the Rosetta FastDesign 26 mover and 'beta_nov_16' scorefunction, and implemented through a PyRosetta script 'replace_HMM_HXX_design.py' available for download on GitHub. 14 A constraint defining the interaction geometry between His148 and the heme model was applied during design using the Rosetta 'AddOrRemoveMatchCsts' mover. Design trajectories were seeded by selecting a random ligand rotamer in each iteration. Positions 5,7,8,11,12,39,42,43,46,74,75,78,109, 183 were set to be designable ( Supplementary Fig. 33), and other positions within 12Å of any ligand heavy atom were allowed to be repacked. The designed models were scored by metrics describing the His-Fe interaction geometry, shape-complementarity, and preorganization of the heme binding pocket (assessed by side chain packing calculations in the absence of the heme), with the cutoff criteria and distributions of scores depicted in Supplementary Table 7 and Supplementary Fig. 34, respectively.
Supplementary Fig. 33. Positions in the heme binding pocket that were considered for redesigning for the olefin cyclopropanation active site, indicated as orange spheres.
Supplementary Fig. 36. Distribution of Rosetta score and ddG differences between the designed enantiomer and the opposite enantiomer. Datapoints corresponding to designs selected for experimental testing are highlighted in red.