Asparagine Tautomerization in Glycosyltransferase Catalysis. The Molecular Mechanism of Protein O-Fucosyltransferase 1

O-glycosylation is a post-translational protein modification essential to life. One of the enzymes involved in this process is protein O-fucosyltransferase 1 (POFUT1), which fucosylates threonine or serine residues within a specific sequence context of epidermal growth factor-like domains (EGF-LD). Unlike most inverting glycosyltransferases, POFUT1 lacks a basic residue in the active site that could act as a catalytic base to deprotonate the Thr/Ser residue of the EGF-LD acceptor during the chemical reaction. Using quantum mechanics/molecular mechanics (QM/MM) methods on recent crystal structures, as well as mutagenesis experiments, we uncover the enzyme catalytic mechanism, revealing that it involves proton shuttling through an active site asparagine, conserved among species, which undergoes tautomerization. This mechanism is consistent with experimental kinetic analysis of Caenorhabditis elegans POFUT1 Asn43 mutants, which ablate enzyme activity even if mutated to Asp, the canonical catalytic base in inverting glycosyltransferases. These results will aid inhibitor development for Notch-associated O-glycosylation disorders.


Model building
The starting model structure for the simulations was build from the available crystal structures. A ternary complex of wild type POFUT1 with intact donor (GDP-Fuc) and acceptor protein (EGF-LD) is not available. However, there are two high resolution structures 1 that can be used to build such a complete ternary complex. On one hand, there is a structure of MmPOFUT1 in complex with GDP-Fuc and a mutant of the EGF-LD acceptor (the nucleophile Thr101 was replaced by Ala) (PDB 5KY3, at 1.53 Å resolution). On another hand, there is a structure of MmPOFUT1 in complex with GDP and the intact EGF-LD Factor VII acceptor (PDB 5KXH, at 1.33 Å resolution). We used the former structure as a template and reverted the Thr101Ala mutation by taking the Thr101 coordinates from the latter structure. The structure obtained is shown in Figure S2.

Classical MD simulations
The protonation states of the charged amino acid residues (Asp, Glu and His) were taken according to protein environment and predictions made by the MolProbity and propKa softwares considering pH 8.5. 2-3 A total number of 23001 water molecules were present within a radius of 15 Å from the protein and a Na + ion was added to neutralize the charge of the system. The simulation box dimensions were 81.384 x 102.753 x 90.406 Å 3 . Molecular dynamics (MD) simulations were performed using Amber18 software. 4 The protein was modeled using the FF14SB force field. 5 The carbohydrate substrate, water molecules and GDP were described with the GLYCAM06,TIP3P and GAFF force fields, [6][7] respectively.
The MD simulation was carried out in several steps. First, the system was minimized, holding the protein and substrate fixed, followed by energy minimization on the entire system. To gradually reach the desired temperature, weak spatial constraints were initially added to the protein and substrate, while water molecules and the sodium ion were allowed to move freely at 100 K. The constraints were then removed and the working temperature of 300K was reached after two more 100 K heating steps in the NVT ensemble. Afterwards, the density was converged up to water density at 300 K in the NPT ensemble and the simulation was extended to 100 ns in the NPT ensemble. Three MD replicas were considered to increase the sampling of the configurational space.

QM/MM MD simulations
The QM/MM simulations were started from a suitable frame extracted from the equilibrated classical MD trajectory. The simulations were performed using the method developed by Laio et al., 8 which combines Car-Parrinello MD, 9 based on DFT, with force-field MD. 10 The electrostatic interactions between the QM and MM regions were handled by a fully Hamiltonian coupling scheme, 8 where the short-range electrostatic interactions between the QM and the MM regions are explicitly taken into account for all atoms. Kohn-Sham orbitals were expanded in a plane wave basis set with a kinetic energy cutoff of 70 Ry. Troullier-Martins ab initio pseudopotentials 11 were used for all elements. The PBE functional 12 in the generalized gradient-corrected approximation of DFT was used, in consistency with previous works on glycosyltransferase reaction mechanisms. [13][14][15] A constant temperature of 300 K was reached by coupling the system to a Nosé-Hoover thermostat. 16

QM/MM metadynamics simulations
QM/MM metadynamics [17][18][19] simulations were performed to model the molecular mechanism and free energy profile of the glycosyltransfer reaction. QM/MM MD simulations were coupled with the metadynamics algorithm provided by the Plumed 2 plugin. 20 The system was re-equilibrated by QM/MM MD for 6 ps before the metadynamics algorithm was switched on. One collective variable (CV), taken as the difference of C1-OThr101 and C1-OP distances, was considered, using a hill height of 1.0 kcal/mol. The Gaussian width and deposition time were set at 0.1 Å and 30 fs (250 MD steps), respectively, according to the oscillations of the CV in an unbiased dynamics. The simulations were stopped when recrossing over the TS occurred at least once, following literature recommendations. 21 The Gaussian height was greatly diminished (0.1 kcal/mol) upon crossing/recrossing along the TS, for a better convergence and accuracy. The energy profile was completed after 884 S4 deposited Gaussians. The computed free energy barrier was compared to the one derived from the experimentally determined catalytic rate using TS theory. 22 Structure analysis was done with VMD 23 , cpptraj from the AmberTools suite, and in-house python3 programs. The computed free energy barrier was compared to the one derived from the experimentally determined catalytic rate using transition state theory (TST) and the Eyring-Polanyi expression: Where k is the experimental rate constant, kB, h and R are the Bolzmann, Planck and gas constants, respectively. The formula assumes that the TST transmission coefficient of the is 1, i.e. TS recrossing, tunneling and nonequilibrium contributions are neglected. 22 According to the above formula, a rate constant of 0. To check whether there could be an alternative mechanism that does not involve Asn51, a metadynamics simulation was designed in which Asn51 was not part of the QM region, and thus cannot react. The results obtained showed that the reaction can still take place via transfer of the Thr101 proton to the β-phosphate via the fucose 2-OH group. However, the free energy barrier of this process was much higher than the one previously obtained. This indicates that the only plausible reaction mechanism is the one involving Asn51 tautomerization.

Site-directed mutagenesis
The CePOFUT1 Asn43Asp mutant was generated following a standard site-directed mutagenesis protocol by GenScript (quick change), and using the vector pPICZαAcepofut1 (T26-A382). 26

Purification of wild type CePOFUT1 and the Asn43Ala and Asn43Asp mutants
The wild type CePOFUT1 and the Asn43Ala and Asn43Asp mutants were purified as previously described for the wild type enzyme. 26

Enzyme kinetics
As most GT-B glycosyltransferases, POFUT1 does not require a metal for activity. However, and depending on the species, Mn +2 might have a stabilizing effect on the Chinese hamster ovary POFUT1 by increasing its glycosyltransfer activity, or a slightly negative effect on the CePOFUT1 GDP-Fuc hydrolysis experiment. 24,26 The concentrations of CePOFUT1 proteins were quantitated by 10% SDS-PAGE followed by Coomassie staining using bovine serum albumin as standards. POFUT1 enzyme assays were performed as previously described. 27

Thermal shift analysis
Analysis of the protein stability of CePoFUT1 wild type, the Asn43Ala mutant and the where I is the fluorescence intensity at temperature T, A and B are pre-transitional and posttransitional fluorescence intensities, respectively, and C is a slope factor.  Figure S1.

Cartesian coordinates of states R, TS and P
The Cartesian coordinates (PDB format) of the atoms included in the QM region in states R, TS and P (representative structures shown in Figure 3C) are listed below. The TS coordinates were obtained from committor analysis, whereas those of R and P are representative snapshot at the minimum of the free energy profile.