DszA Catalyzes C–S Bond Cleavage through N5–Hydroperoxyl Formation

Due to its detrimental impact on human health and the environment, regulations demand ultralow sulfur levels on fossil fuels, in particular in diesel. However, current desulfurization techniques are expensive and cannot efficiently remove heteroaromatic sulfur compounds, which are abundant in crude oil and concentrate in the diesel fraction after distillation. Biodesulfurization via the four enzymes of the metabolic 4S pathway of the bacterium Rhodococcus erythropolis (DszA-D) is a possible solution. However, the 4S pathway needs to operate at least 500 times faster for industrial applicability, a goal currently pursued through enzyme engineering. In this work, we unveil the catalytic mechanism of the flavin monooxygenase DszA. Surprisingly, we found that this enzyme follows a recently proposed atypical mechanism that passes through the formation of an N5OOH intermediate at the re side of the cofactor, aided by a well-defined, predominantly hydrophobic O2 pocket. Besides clarifying the unusual chemical mechanism of the complex DszA enzyme, with obvious implications for understanding the puzzling chemistry of flavin-mediated catalysis, the result is crucial for the rational engineering of DszA, contributing to making biodesulfurization attractive for the oil refining industry.


Modeling of DszA:FMN
The DszA model was obtained as a tetramer from the Swiss-Model webserver, after a BdsA homolog with a sequence identity of 79.28%, with a QMEAN value of -0.62 and a GMQE value of -0.62.The Ramachandran plot the model demonstrates that its secondary structure is comparable to the ones commonly found in nature, with a percentage of outliers of only 0.40% (Val371 of chain A, B and D, Gln338 of chain D, Ser355 of chain A and B, and His354 of chain D).As for the comparison of the model to other naturally existing structures, the QMean Z-Score is less than one, thus falling into the most populated structure area, demonstrating its likelihood of existing in nature, although there are few tetramer structures available for statistical validation.To further demonstrate that the attained model is similar to other structures existent in nature, each chain was compared to experimentally determined structures in ProSA-web, 1   The superimposition of the model DszA with its template, BdsA, exhibited an RMSd value of 0.134 Å over all backbone atoms, which indicates that the structures are very similar.The residues at the active center of BdsA that, according to Su et al., are essential residues for BdsA activity are Phe12, Phe56, Phe246, His20, His316, Val248 and Val372.
Observing the active sites from both systems (seen in  Based on the comparison between similar active homodimers, namely nitrilotriacetate monooxygenase (PDB ID: 3SDO), alkane monooxygenase (PDB ID: 3B9N) and riboflavin lyase (PDB IDs: 5W4Y; 5W4Z; 5W48), from which the RMSd variations among backbone atoms varied between 1.6 and 1.8 Å, DszA was finally modelled as a homodimer including chains A and B, in complex with FMN, in a total 6940 atoms.
The DszA:FMN complex was then modelled as DszA:C 4a OOH, which corresponds to the oxygen-activated form of the FMN cofactor in agreement with most literature, and underwent MD simulations.The RMSd of a 50 ns production showed that the RMSd for chain A was slightly lower than for chain B, both for the backbone and the active site atoms (Figure SI 4).Nevertheless, a visual analysis indicated that the differences in the RMSd of backbone atoms were mostly due to the dynamics of a terminal loop with no relevance for the activity of DszA.Through all the steps, the system has shown to be of quality as the initial model obtained from homology modeling should be reliable and upon energy minimizations and MD simulations, it became equilibrated and maintained most of the interactions thought to be important for cofactor binding.Moreover, comparison with the X-ray structure of its homolog, BdsA, shows no major structural modifications at the active site, which further renders our model as viable to proceed for substrate modelling.

Modeling DBT-sulfone binding
Docking of DBT-sulfone to the active site of DszA was conducted on both DszA:C 4a OOH and DszA:N 5 OOH complexes previously subject to an energy minimization.
The docking of DBT-sulfone was performed with AutoDock Vina, using the Lamarckian genetic algorithm to explore a search space encompassing the FMN cofactor and the residues composing the si face of the active site of DszA (Ser10, Thr15, His16, Gln76, Asn137, Leu244, Thr308,

Mechanistic attempts through the C 4a OOH model
The QM/MM model consists of a truncated model including DszA residues within 12 Å of the active site of chain A (total of 4299 atoms and -1 charge), where an outer shell of   Instead the single-point energy calculations suggested that the resulting 2'hydroperoxybiphenyl-2-sulfinate intermediate might be more stable in the triplet spin configuration.Nevertheless, resulting energy results suggest that this pathway is also unlikely.The energy profiles obtained were similar, although the activation free energies varied from 16.8 kcal.mol - and 22.6 kcal.mol - .The predicted rate-limiting step also varied between step 2 (predicted by PBE0, PW6B95, PWPB95 and DSD-PBEB95) and step 4
We chose PWPB95 since the combination of DFT and wavefunction-based methods should result on a more accurate description of electron correlation effects compared to single-hybrid functionals while the long-range dispersion (van der Waals) interactions are corrected with the addition of the D3BJ dispersion correction term.This notion is supported by the excellent performance of PWPB95 in extensive benchmarks 3 .

Energy Decomposition Analysis
We conducted activation strain model 4 calculations on the QM region of the catalytic step that corresponds to the hydroxyl transfer from the cofactor to the substrate to further elucidate the reactivity of both C 4a OOH and N 5 OOH intermediates and to elucidate why in DszA, the infrequent pathway through N5OOH is the most favorable one.The system was divided in three fragments, viz., the substrate, the flavin cofactor and the remainder of the protein environment present on the QM layer of the QM/MM claculations.The strain was then evaluated with respect to the geometries of these fragments in the reactants of the step, and the binding energy was further decomposed using the Morokuma-Ziegler Energy Decomposition Analysis. 5These calculations were performed using the AMS 2023 suite 6 , employing the PBE0 functional with the MBD 7 combined with the TZ2P basis set.
The strain contribution dominates overwhelmingly the differences observed (Table SI 3).
For the C4a pathway, the strain contributes 50 kcal.mol - to the activation energy barrier while only 16.61 kcal.mol - for the N5 pathway.As expected, the strain contribution is bigger on the flavin cofactor fragment (35.24 for the C4a pathway vs. 7.18 for the N5 pathway).This difference in strain of the flavin fragment can be explained by the fact that on the N5 pathway, N5 can donate its lone pair to facilitate the cleavage of the Od-Op bond while this is much more difficult on the C4 pathway.
The differences in the binding energies between the fragments on the two different pathways is much smaller; the contribution to the calculated activation energy is -6.56 and -5.09 for the C4 and N5 pathway respectively.The individual components to this binding energy are very different for the two pathways, but that is a natural consequence of the very different charge distributions for the two pathways: in the N5 pathway, the flavin fragment is negatively charged, while the environment is positively charged, whereas in the C4a pathway all fragments are neutral.

Figure SI 1 .
Figure SI 1. Representation of the two forms of flavin hydroperoxide modeled to study the catalytic mechanism of DszA.On the left it is represented the C4aOOH form, with the OOH at the si face of the cofactor.On the right it is represented the N5OOH form with the OOH at the re face of the cofactor.The location of the C4a and N5 atoms are marked in yellow.
further supporting the obtained homology model.These good quality results indicate that the obtained model should be of high quality, thus representing a reliable DszA structure.Summary of the results follows in Figure SI 2.

Figure SI 2 .
Figure SI 2. A) Quality values for DszA model, where darker blue colors corresponds to better quality; B) Ramachandran plot for DszA model; C) Comparison of DszA model with non-redundant set of PDB Structures; D) Comparison of chain A, B, C, and D of DszA model with set of X-ray and NMR structures (from ProSA-web).

Figure SI 3 )
, it is noticeable that the position and orientation of the mentioned residues on the active site of the modelled DszA and of BdsA are aligned and indeed very similar.Nonetheless, there is a residue deletion, as Val372 of BdsA is correspondent to Val371 of the model, and Val248 is substituted by Leu248.However, this latter substitution should not be a problem as both residues have similar sizes and hydrophobic characteristics.Thus, since BdsA and DszA are similar, these residues are also probably essential for DszA activity.Nonetheless, the vast resemblances between the model and FMN-bounded BdsA, further indicates that the attained model is in fact trustworthy and of quality.

Figure SI 3 .
Figure SI 3. Active center of BdsA (blue) vs active center of model DszA (orange).

Figure SI 4 .
Figure SI 4. RMSd values for backbone (solid lines) and active site heavy atoms (dotted lines) in chain A and B, throughout the 50 ns NPT production.Chain A is colored blue and chain B is colored red.

Figure SI 5 .
Figure SI 5. Superimposition of the active sites A and B from the dominant cluster of DszA (colored in pink for chain A and in orange for chain B), with those of BdsA (colored in purple).Not all residues that establish hydrogen bonds with the cofactor are depicted for clarity.Dashed blue and red lines highlight hydrogen bonds from nitrogen or oxygen donor atoms, respectively.
, His312, Val367), and the built-in AutoDock Vina scoring function.Selection of the final poses to follow for validation with MD simulations included additional criteria motivated by the available mechanistic hypotheses for DszA: distance between 3-5 Å between the distal oxygen (Od) of the cofactor and either the Cα1 or Cα2 of DBT-sulfone, lower accessibility to solvent by the DBT-sulfone, and number of poses binding a similar region.While one solution obeying all criteria was found for the DszA:N 5 OOH, we struggled to obtain solutions in which the Od-Cα1 or Od-Cα2 distances were within the defined 3-5 Å criteria for the DszA:C 4a OOH complex.Hence, we carried an alternative protocol using the GOLD software using the genetic algorithm and a search space of 10 Å within the C 4a OOH, in which a biasing harmonic restraint of 5 N•m -1 between the Od and the topologically equivalent Cα1 and Cα2 of DBT-sulfone was considering during the scoring step, where the CHEMPLP function was used.The most favorable poses of both protocols are shown in Figure SI 6.

Figure SI 6 .
Figure SI 6.Most promising poses for DBT-sulfone for the DszA:C 4a OOH complex, using the A. Gold with a bias on the Od-Cα1/2 distance during the scoring step; B. AutoDock Vina.Noticeable polar interactions are represented with dashed lines.

Figure SI 7 .Figure SI 8 . 2 Figure SI 9 .
Figure SI 7. RMSd values for backbone (solid lines) and active site heavy atoms (dotted lines) for the DszA:C 4a OOH:DBTO2 models built from the DBTO2 poses in Figure SI 6 (A in black and B in gray) , throughout the 100 ns NPT production.

Figure SI 10 .
Figure SI 10.Representation of the minimized structure of DszA in complex with the best docking poses of DBTO2 considering the C 4a OOH (A) and the N 5 OOH (B) cofactor.H-bond distances between the substrate and the active site are shown.

Figure SI 12 .
Figure SI 12. 2D-RMSd plots of the backbone (lower triangle) and active site (upper triangle) heavy atoms for chains A and B of the DszA:C 4a OOH:DBTO2 and the DszA:N 5 OOH:DBTO2 complexes.

Figure SI 13 .
Figure SI 13.Active site representation of the centroid structure for each of the chains A and B in the DszA:C 4a OOH:DBTO2 (top) and the DszA:N 5 OOH:DBTO2 (bottom) complexes, along the 10 ns molecular dynamics simulations where DszA residues beyond the active site were kept restrained..The minimized structure used as reference was colored in light blue (chain A) and light pink (chain B), and the representative structures from the clustering are colord in gray; the modelled FMN cofactor and the DBTO2 substrate are shown in ball-and-stick representation.Clustering was perfomed over the RMSd of the heavy atoms in the selection defined as the active site, with the GROMOS algorithm and a 1.0 Å cutoff.Native residues suggested by the docking poses are labelled.

Figure SI 14 .Table SI 1 .Figure SI 15 .Figure SI 16 .
Figure SI 14. Frequency of contacts between the heavy atoms of C 4a OOH (left) and N 5 OOH (right) and the DszA residues within 4 Å, for both chain A (in blue) and B (in red), along the 100 ns molecular dynamics simulations.

4 Å
was kept frozen.All calculations in the model were ran with Gaussian 16.The QM layer consisted of selected part of residues His20, Asp59, Ser139, Val371, Phe373 and the C4aOOH cofactor, the whole DBTO2 substrate and two water molecules (137 atoms with total zero charge and singlet multiplicity).The summary of the resulting linear transit scans of the mechanistic attempts ran is presented in Figure SI 17.

Figure SI 17 .
Figure SI 17. Linear transit scans for the hypothesis tested on the truncated QM/MM model for the DszA:C 4a OOH:DBTO2 complex: on the left, the formation of the peroxyhemiacetal by decreasing the Od-Cα1 distance; on the right, the transfer of the hydroperoxyl group by decreasing the Op-Cα1 distance.

Figure SI 18 .
Figure SI 18.The reactant and product states for the mechanistic attempts to form the peroxyhemiacetal intermediate: top and bottom react correspond to the attempt without and with His316 as an assisting base, respectively; bottom right panel depict the Mulliken charges along the His316-assisted attempt for the atoms possibly involved in redox reactions.

Figure SI 19 .
Figure SI 19.The reactant and product states for the mechanistic attempts to transfer the hydroperoxyl group.

Figure SI. 20 .Analysis of the electronic configuration of step 2 Figure
Figure SI.20.Optimized geometries and energy profile obtained for step 1 of the pathway through C 4a OOH with the complete QM/MM model.

Figure SI 22 .
Figure SI 22.Free energy profiles obtained from single-point energy calculations with seven different DFT functionals using the optimized geometries obtained at the PBE0-D3BJ/def2-SVP:ff10 level of theory.For all cases we used the def2-TZVPP basis set.

Table SI 3
. ASM and EDA analysis for the rate limiting step of the reaction on both possible pathways.All energy components are in kcal.mol - .