Identification and Modeling of a GT-A Fold in the α-Dystroglycan Glycosylating Enzyme LARGE1

The acetylglucosaminyltransferase-like protein LARGE1 is an enzyme that is responsible for the final steps of the post-translational modifications of dystroglycan (DG), a membrane receptor that links the cytoskeleton with the extracellular matrix in the skeletal muscle and in a variety of other tissues. LARGE1 acts by adding the repeating disaccharide unit [-3Xyl-α1,3GlcAβ1-] to the extracellular portion of the DG complex (α-DG); defects in the LARGE1 gene result in an aberrant glycosylation of α-DG and consequent impairment of its binding to laminin, eventually affecting the connection between the cell and the extracellular environment. In the skeletal muscle, this leads to degeneration of the muscular tissue and muscular dystrophy. So far, a few missense mutations have been identified within the LARGE1 protein and linked to congenital muscular dystrophy, and because no structural information is available on this enzyme, our understanding of the molecular mechanisms underlying these pathologies is still very limited. Here, we generated a 3D model structure of the two catalytic domains of LARGE1, combining different molecular modeling approaches. Furthermore, by using molecular dynamics simulations, we analyzed the effect on the structure and stability of the first catalytic domain of the pathological missense mutation S331F that gives rise to a severe form of muscle–eye–brain disease.


■ INTRODUCTION
Glycosylation is one of the most common post-translational modifications of proteins in the cell and plays a key role in physiological and pathological cellular functions. 1−3 Glycosylation patterns are altered in a number of diseases including cancer and neurodegenerative and autoimmune disorders. 4 Dystroglycan (DG) is a highly glycosylated membrane receptor, formed by the two subunits αand β-dystroglycan (hereinafter α-DG and β-DG), that connects the extracellular matrix with the cytoskeleton in many tissues, such as muscle and brain. 5 α-DG is a peripheral membrane protein formed by two globular domains, the N-and C-terminal domains, separated by an elongated mucin-like region. 6 α-DG binds to a set of extracellular matrix molecules that harbor one or more laminin globular domains (LG-domains), 7 such as laminin, 8 agrin, 9 perlecan, 10 pikachurin, 11 neurexin, 12 and slit. 13 The binding properties of α-DG depend on a complex O-mannosyl glycosylation, a multi-step process that involves at least 17 different enzymes. 14 Mutations in any of these enzymes are linked to a number of muscular dystrophies, collectively known as secondary dystroglycanopathies, whose clinical presentations can vary from severe congenital muscular dystrophies to milder limb-girdle muscular dystrophy with manifestation in adulthood. 15 The hallmark of secondary dystroglycanopathies is the absence or the significant reduction of α-DG glycosylation. 16 The α-DG O-mannosyl modification starts with the attachment of a trisaccharide molecule GalNAc-β3-GlcNAc-β4-Man to Ser/Thr residues in the α-DG mucin-like region. 17 The mannose is then phosphorylated by POMK to allow further glycan elongation and phosphorylation by enzymes including fukutin, FKRP, TMEM5, B4GAT1, and LARGE1. 18−21 In particular, LARGE1 is a type II transmembrane protein, localized in the Golgi apparatus, 22 characterized by two distinct domains, that can be found within the Golgi lumen, with glycosyltransferase activities: a xylosyltransferase activity (domain 1) and a glucuronyltransferase activity (domain 2) 18 ( Figure 1).
Indeed, LARGE1 catalyzes the addition to its substrate of several units of the disaccharide [-3Xyl-α1,3GlcAβ1-], known as matriglycan, which is the functional motif that ensures the binding of α-DG to the extracellular matrix proteins. 14,23 During DG maturation, LARGE1 binds to the N-terminal domain of α-DG, and this binding is an essential anchor for the enzyme to specifically modify the mucin-like region. 19,24 Different mutations in LARGE1 have been identified in patients affected by severe congenital muscular dystrophy with central nervous system involvement. 25−29 These mutations include missense and frameshift mutations in both domain 1 and domain 2, as well as intragenic deletions/insertions. Overexpression of LARGE1 in cells from patients affected by different α-dystroglycanopathies restored α-DG laminin binding, indicating that the modulation of LARGE1 activity may represent a therapeutic strategy for the treatment of secondary dystroglycanopathies. 30,31 In this context, the effort to reach a fundamental understanding of the molecular details of the glycosylation process is undermined by the lack of high-resolution structural data on LARGE1. In order to start filling this gap, molecular modelling can be used to construct a working threedimensional model structure of the α-DG glycosylating enzyme. When structural homologs are not available, as in the case of LARGE1, ab initio modelling may guide a conformational search based on a designed energy function, but, in spite of recent advances, such an approach has shown low accuracy in prediction. 32 Traditional physics-based potentials for molecular mechanics and conformational searches, first used for organic compounds, 33,34 have not yet been developed to a point where reliable model structures can be generated except for a limited number of proteins. 32 Knowledge-based energy functions were demonstrated to perform better, 35 and among them, the TASSER approach ranked as the top method for automated protein structure prediction in the latest CASP experiments for three-dimensional structure prediction. 36,37 TASSER and its development, I-TASSER gateway, which involves template threading, structural fragment assembly, model refinement, and structure-based protein function annotation, indicate that ab initio and template-based modelling may successfully combine. 38 To date, the only in silico analysis of LARGE1, confined to the study of LARGE1 domain 2, is that of Bhattacharya et al. 39 In this study, we present a three-dimensional model for both domain 1 and 2 of LARGE1 and analyze their structural features using the best performing methods for fold recognition and domain boundary prediction. 40,41 The generated three-dimensional structures were refined, and the best reliable models were selected and subjected to molecular dynamics (MD) simulations. MD calculations are useful tools for evaluating the stability of predicted domains as well as the effect of a pathological mutation, 42−44 and we employed them in order to provide insight into the outcome of the S331F mutation found in a patient affected by congenital muscular dystrophy with eye and brain involvement. 26

Functional Domains Prediction and Models Building.
The 756 amino acid long sequence of LARGE1 from Mus musculus was retrieved from the UniProt database (http:// www.uniprot.org/) 45 (entry code Q9Z1M7), and the domain analysis was performed using the Conserved Domain Database (CDD) server. 46 Following a successful modelling strategy, 47 two different protein-modelling approaches were employed to build the molecular models of the identified LARGE1 protein domains: HHPRED 48 in combination with MODELLER 49 and the I-TASSER server. 50 The web-service HHpred, available at https://toolkit.tuebingen.mpg.de/tools/hhpred, offers a threading approach that uses the hidden Markov models 51 and was employed to align the sequences and search for suitable template structures. The PDB70 profile database was used for the template search, and the crystallographic structure showing the best HHpred probability score, the largest coverage, and the best resolution was chosen as a template. The resulting alignment was submitted to the program MODELLER v 9.15 as implemented in Discovery Studio 2016 (Dassault Systemes BIOVIA, Discovery Studio Modelling Environment, Release 2016, San Diego: Dassault Systemes, 2016) to build the three-dimensional structure of the identified LARGE1 domains. Fifty models with a degree of optimization scored as "high" were generated by the "Build Homology Model" protocol of Discovery Studio and ranked by the PDF total energy. The best-ranked models were then submitted to the "Loop Refinement" protocol of the program generating five models for each loop, with a "high" optimization level. The refined models with the lowest PDF total energy score were chosen as the final models. For comparison, the I-TASSER web-service was also used, a metaserver that automatically employs ten threading algorithms in combination with ab initio modelling to build the tertiary structure of a protein as well as replica-exchange Monte Carlo dynamics (REMD) simulations for the atomic-level refinement. PROCHECK 52 and ProSA-Web 53 were employed to evaluate the quality of the model. The structure of the mutant S331F of LARGE1 domain 1 was carried out with the "Build Mutant" protocol of the BIOVIA Discovery Studio suite (Dassault Systemes BIOVIA, Discovery Studio Modelling Environment, Release 2016, San Diego: Dassault Systemes, 2016). Fifty models were generated with a "high" optimization level and with no restraints and evaluated based on their PDF total energy and DOPE score.
Molecular Dynamics Simulations of WT and of the S331F Mutant. Molecular dynamics simulations were executed using the version 4.8 of Desmond (D. E. Shaw Research, New York) employing the OPLS2005 force field. 54 An all-hydrogen model of the proteins was first generated using the Protein Preparation Wizard tool 55 available in the Maestro software (Schrodinger, LLC, New York, NY, 2017), for hydrogen insertion and force field atom-types and partial charges assignment. The systems for the simulations were built with the "System Builder" tool of the Maestro suite, drawing a triclinic box around the domain 1 of LARGE1 with a distance buffer of 10 Å between the protein and the side of the box, and filling it with SPC water molecules. A 0.15 mol/L NaCl salt concentration was added, and additional Cl − ions were added to neutralize the system ( Table 1). The resulting systems both consisted of a 71 × 76 × 71 Å-sized box containing a total of about 30000 atoms, among which ∼8800 were water molecules and ∼4590 were protein atoms ( Table 1).
The two systems were first minimized in order to relax the molecules into a local energy minimum and to remove the steric clashes, using the default protocol consisting of an initial steepest descent phase followed by a minimization with the LBFGS method, until convergence. The systems were then equilibrated using the Desmond standard NPT relaxation protocol with default parameters. We performed duplicate MD simulations of 500 ns each, using different random seeds for the assignment of the initial velocities, at a constant pressure of 1 atm and 300 K, saving the energies and the trajectories every 20 ps. The Martyna−Tobias−Klein method was used for pressure coupling, and the Nose−Hoover thermostat was employed for the temperature bath, using the default settings for both, while a cut-off of 9 Å was used for the non-bond interactions. The analysis of trajectories was performed with the "simulation event analysis" tool of Maestro (Maestro-Desmond Interoperability Tools, Schrodinger, New York, NY, 2017), and the solvent accessible surface area (SASA) over the simulation time was calculated with VMD. 56 Discovery Studio (Dassault Systemes BIOVIA, Discovery Studio, 2018, San Diego: Dassault Systemes, 2018), Maestro, and VMD were used for the visual inspection of the three-dimensional structures and for the simulations results. The convergence of the MD simulations was assessed by calculating the root mean square inner product (RMSIP) using Bio3D library as implemented in the version 3.5.2 of the R package. 57,58 The protein motion essential for the biological function of LARGE1 was analyzed by principal component analysis (PCA)-based dimensionality reduction of the atomic fluctuations of the Cα atoms in the simulated trajectory. The Bio3D library, 58 as implemented in the version 3.5.2 of the R package, was employed to perform and graphically represent the PCA of the WT and the S331F mutant trajectories. The conformations from the last 210 ns replica MD simulations were used for the wild-type (WT) and mutant enzymes. All the calculations were carried out employing an NVIDIA M4000 GPU and an Intel Xeon X5660 processor, on a HPZ820 workstation running Linux Centos 7 operating system. FoldX v5.0 was used to estimate the impact of the S331F mutation on protein stability (http://foldxsuite.crg.eu). 59 The algorithm calculates the free energy of folding of the WT and the mutated protein and estimates whether the mutation has a destabilizing (ΔΔG > 0) or stabilizing (ΔΔG < 0) effect.
Cloning and Expression of LARGE1 Domain 1 in Escherichia coli. The nucleotide sequence of LARGE1 domain 1 (residues 138−413) from M. musculus was synthesized and optimized for expression in E. coli by Invitrogen (GeneArt gene synthesis). The synthetic gene was cloned with extremities 5′BamHI-3′EcoRI downstream the thioredoxin gene in the vector pHisTrx, for expression in E. coli as a fusion product; the expression vector thus obtained was transformed into E. coli BL21 (DE3). A single colony from the transformants was picked and grown o.n. in LB + Amp 100 mg/L, then diluted 100-fold in fresh LB + Amp 100 mg/mL, and grown at 30°C shaking at 220 rpm until OD600 ≈ 0.8; the expression was induced by adding IPTG to 0.5 mM final concentration. Protein production was checked on single aliquots collected at specific times after induction by lysing the cell pellets and running them on sodium dodecyl sulfatepolyacrylamide gel electrophoresis (SDS-PAGE). The bulk of cells was collected 3 h after induction, and the cell pellet was resuspended and lysed; the soluble proteins were separated from the insoluble ones by high-speed centrifugation and finally checked on SDS-PAGE in order to identify which fraction contained the fusion product. To increase the solubility of LARGE1 domain 1, the same construct was used to transform Arctic Xpress E. coli BL21 (DE3) competent cells (Agilent Technologies) following the manufacturer instructions. A single colony from the transformants was picked and grown o.n. in LB containing ampicillin 100 mg/L and gentamicin 20 mg/L, then diluted 200× in fresh LB containing no selection antibiotic, and grown at 30°C shaking at 220 rpm for 3 h. The expression was then induced by adding IPTG to 1 mM final concentration and growing the cells for additional 24 h at 12°C shaking at 200 rpm. Cells were harvested, resuspended in binding buffer (20 mM Tris HCl, 0.5 M NaCl, 15 mM imidazole, 2 mM Mn 2+ ), and lysed in a cell disruptor (Constant Systems, model Z plus 1.1 KW). The soluble and insoluble fractions were separated by centrifugation, and the supernatant was loaded on a 5 mL His trap Ni 2+ column equilibrated in binding buffer, for isolation of the Histagged fusion product by binding and imidazole elution. ■ RESULTS AND DISCUSSION Molecular Modelling. A schematic summarizing the workflow of our molecular modelling approach is provided in Figure 2.
Molecular Modelling of LARGE1 Domain 1. We decided to use the murine LARGE1 sequence for molecular modelling due to the very high degree of sequence similarity between murine and human variants (100% sequence identity for domain 1 and 98.5 for domain 2, Supporting Information, Figure S1). It is worth noting that we have recently shown that the X-ray structures of murine and human N-terminal of αdystroglycan (displaying the same degree of sequence similarity found between human and murine LARGE1) are identical. 60 As for the human point mutation of LARGE1 that we have introduced and analyzed in the model and in agreement with the above, all the residues in the region specifically perturbed by the mutation S331F in murine LARGE1 are identical to those in its human counterpart.
The xylosyltransferase and glucuronyltransferase activities of LARGE1 have been attributed to two different domains of the protein. 18 Indeed, the identification of functional domains by the CDD server revealed the presence of two domains, namely, domain 1 (residues 138−413) and domain 2 (residues 473− 742), both predicted to belong to two glycosyltransferase families (family 8 and 49, respectively), in agreement with the data reported in the literature. 18,61 The absence of a suitable template structure in the Protein Data Bank for homology modelling of the whole-length protein led us to generate threedimensional model structures of the two identified domains individually. The HHpred analysis on domain 1 (residues 138−413) identified three potential templates in the Protein Data Bank: the crystal structure of a protein belonging to the glycosyltransferase family 8 from Anaerococcus prevotii (PDB code 3TZT 62 ), the crystal structure of the galactosyltransferase LgtC from Neisseria meningitidis (PDB code 1G9R 63 ), and the crystal structure of the xyloside xylosyltransferase 1 from M. musculus (PDB code 4WMA 64 ). Amongst these, 1G9R (2 Å resolution, 19% sequence identity) was selected as the template structure because it exhibited the lowest number of missing residues. The HHpred alignment was then used by MODELLER for model building. Structural models were generated, and the highest-scoring model (lowest PDF total energy) was selected for further analysis. The quality of the model was evaluated considering the overall stereochemistry by PROCHECK, and the resulting Ramachandran plot showed that 95.7% of residues was located in the most favorable regions (core and allowed), whereas the 3.9% was located within the generously allowed region and only the 0.4%, confined to Lys 386, in the disallowed region. The interaction energy of each residue with the structural surrounding environment, calculated with PROSA-Web, gave a global Zscore of −5.6 (276 residues), which falls well within the accepted range for proteins of the same size and indicates an overall good model quality (Supporting Information, Figure  S2A). The PROSA-Web local quality analysis of the model, represented in the energy profile plot, shows that all the residues, except those in regions 265−285, have negative scores, where the more negative is the score, the more correctly modeled is the region (Supporting Information, Figure S2B).
For comparison, an alternative approach was used and the model of domain 1 was also generated on the I-TASSER server, which is based on a combination of multiple-threading alignments, iterative template fragment assembly simulations, and ab initio modelling algorithms. The template search stage of the I-TASSER procedure identified the crystal structures of the galactosyltransferase LgtC from N. meningitidis (PDB entry codes: 1G9R, 63 1GA8, 63 and 1SS9 65 ) of a glycosyltransferase family 8 from A. prevotii (PDB code 3TZT 62 ) and of the xyloside-α-1,3-xylosyltransferase 1 (XXYLT1) from Homo sapiens (PDB code 4WM0 64 ) as the best templates. In agreement with the HHpred results, all the templates detected by I-TASSER displayed the typical GT-A fold, consisting of two α/β/α domains with a continuous central β-sheet, 66 all belonging to family 8. Identified templates share sequence identity with the target sequence ranging from 19 to 24% (Supporting Information, Table S1). Five I-TASSER models were obtained, and the best scoring model displayed a TMscore of 0.79 ± 0.09, an estimated root-mean-square deviation (rmsd) of 4.8 ± 3.1 Å and, on a scale of accuracy ranging from −5 (lowest) to 2 (highest), a C-score value of 0.57. The model structure obtained from the I-TASSER procedure featured a lower percentage of residues (87%) in the core and allowed regions of the Ramachandran plot and 6.7 and 5.9% residues in the generously allowed and disallowed regions, respectively. The ProSA-web local quality analysis of this model reported negative energy values for all the residues, with the exception of the region 271−296, and the global quality analysis (276 residues) gave a Z-score of −6.92, which falls within the range of the experimentally resolved structures of the same size (Supporting Information, Figure S3A,B). The models generated by I-TASSER and HHpred were submitted to the COFACTOR software, 67 which identifies the most structurally similar enzymes in the Protein Data Bank, performing a local and global structure match. This analysis showed that the two generated models share the highest structural similarity with the crystallographic structure of the glycosyltransferase LgtC from N. meningitidis (PDB code number 1SS9), displaying a TM-score of 0.89 and 0.92 for the I-TASSER and HHpred model, respectively, as well as a Cα-rmsd lower than 2 and a sequence coverage of 94.6% for both the model structures. Given that a TM-score greater than 0.7 indicates that the structures have a 90% probability of being in the same topology family, 68 the two models of domain 1 can thus confidently be assigned to the GT-A glycosyltransferase family. For comparison, the calculated Cα-rmsd values between the MODELLER and I-TASSER LARGE1 models are 2.3 Å (domain 1) and 4.1 Å (domain 2), indicative of similar models. According to the validation results, the model structure obtained by the combined HHpred/MODELLER approach was selected as the best model for LARGE1 domain 1 for further analysis. The model structure of domain 1 consists of a central seven-stranded mainly parallel β-sheet ( Figure 3A), sandwiched between 14 α-helices and a small β-sheet, which is conserved in all the proteins with a GT-A fold ( Figure 3A,B).
On one side of the β-sheet, helix-C contacts the β1 and β2 strands, whereas helices G, J, and K contact the rest of the βsheet. On the other side, a small β-sheet (strands β5 and β9) is located between helices A, B, and N that are in contact with the β1, β2, and β3 strands, whereas helices D, L, M, and E are in contact with the β6 and β8 strands ( Figure 3). As found in other GT-A folded proteins, the small β-sheet formed by two antiparallel strands (β5 and β9) is linked to the β4 strand by a Journal of Chemical Information and Modeling pubs.acs.org/jcim Article loop (L4) that contains a conserved DXD motif likely to bind the manganese cation (Mn 2+ ) 69 responsible for the catalytic activity of domain 1. 18,22 The Mn 2+ ion has a crucial role in the catalytic activity of LARGE1, and it is coordinated by the DXD motif residues as well as by the αand β-phosphate moieties of the substrate. 63 In the predicted model structure of LARGE1 domain 1, the Mn 2+ ion is coordinated by the NE2 atom of His380 of the β9 strand and by the carboxylate oxygen atoms of Asp242 and Asp244 belonging to the β4−β5 linker loop L4 ( Figure 3B). Analogous interactions are established by the Mn 2+ ion in the neighbor crystallographic structure of LgtC from N. meningitidis (1G9R) where the cation is coordinated by His244, Asp103, and Asp105, respectively (ref 63, Figure  4). Full inspection of the structural alignment reveals that all major secondary structure elements of the model of LARGE1 domain 1 superimpose almost exactly with those of the LgtC glycosyl transferase, with only few exceptions related to the helical content (Figure 4).
Molecular Modelling of LARGE1 Domain 2. An analogous analysis was performed for LARGE1 domain 2. The top-ranking templates identified with a 99.7% probability by the template search methods of HHpred were all Nacetylgalactosaminyl transferases, namely, 1XHB, 70 6E4R, 71 6H0B, 72 6IWR, 73 and 2D7I. 74 1XHB, the crystal structure of the UDP-GalNAc: polypeptide α-N-acetylgalactosaminyltransferase-T1 from M. musculus, was selected as the best template because it exhibited the highest P-value (2.0 × 10 −19 ), highest probability (99.7%), and a sequence identity of 10% with the target. The stereochemical quality of the model with the lowest PDF value, generated by MODELLER on the basis of the HHpred alignment, was evaluated using PROCHECK. The Ramachandran plot calculation revealed that 91.5% of the residues are found in the most favorable regions (core and   63 The figure was prepared using the ESPript server (http:// espript.ibcp.fr). Identical residues are shown in white on red, and homologous residues are in red letters. Secondary structure details are represented (helices: squiggles, beta strands: arrows, and turns: TT letters).

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article allowed), 3.7% in the generously allowed region, and the remaining 4.9% in the disallowed region. ProSA-Web analysis of the model (270 residues) revealed a Z-score value of the target protein of −2.69, that is, within the range of the experimental protein structures of the same size, which indicates an acceptable overall quality of the model structure. However, the ProSA-Web analysis revealed that about 50% of the residues display positive values of interaction energy, suggesting errors in some regions of the model (Supporting Information, Figure S4A,B). The crystallographic templates found by I-TASSER to be suitable for a threading-ab initio model were the crystal structure of chondroitin polymerase from E. coli (PDB code 2Z86 75 ), the structure of the human UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-T2 (PDB codes 2FFU, 76 61 Å), where a TM-score higher than 0.5 generally indicates the same fold in SCOP/CATH. In agreement with the study of Bhattacharya and colleagues, 39 the crystal structure with PDB code 4D0T, that is, the human glycosyltransferase GalNAc-T2 in complex with UDP-GalNAc, the EA2 peptide, and manganese ion, was also found to have a close structural similarity to the model of LARGE1 domain 2 (TM-score of 0.84 and rmsd of 1.66 Å). The analysis of the stereochemical quality of this model, assessed with PRO-CHECK, showed that 95.5%, 1.6, and 2.8% of the residues are located within the core and allowed, generously allowed, and disallowed regions of the Ramachandran plot, respectively. The overall model quality shows a ProSA-web Z-score of −7.2, in agreement with the Z-scores calculated for the experimentally resolved structures in the Protein Data Bank (Supporting Information Figure S5A). The ProSA-web server showed that the interaction energy of each residue with the rest of the protein has negative values, apart from the residues within the two small regions 584−590 and 597−601 (Supporting Information, Figure S5B). Comparison of the results obtained using the two modelling approaches led to the selection of the structural model built with I-TASSER as the best model of LARGE1 domain 2 ( Figure 5). Although the I-TASSER server performed better than HHpred in terms of quality of the final model, the results from both methods agree on pointing toward domain 2 to be classified as an enzyme of the GT-A family, with a Rossmanlike fold consisting of a two sandwiched β-sheets interposed between four α-helices on both sides ( Figure 5A). Interestingly, although the fold, especially the arrangement of the βsheets, is conserved between the two domains of LARGE1, the three-dimensional model of domain 2 shows a lower number of strands with respect to domain 1 (eight vs nine) (Figures 3A  and 5A). Strand β8 undergoes a distortion at the level of Pro699 that makes its C-terminal portion parallel to the β6 strand and its N-terminal portion antiparallel to the β5 strand, thus creating the double β-sheet structure shown in Figure 5B.
Interestingly, the third DXD motif, that in domain 2 is DID (563−565), essential for the glucuronyltransferase activity of LARGE1, 18,22 is located in a small unstructured turn between the strands β4 and β5, and it is structurally aligned with the DXH motif of the structural neighbor UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-T2 from H. sapiens. The turn region is located deep into the binding cleft of the enzyme, with the two aspartates (Asp563 and Asp565) pointing toward the solvent and the isoleucine (Ile564) buried in the protein core, indicating an orientation that could favor the coordination of the carboxyl groups of the aspartates to a metal ion cofactor such as Mn 2+79 ( Figure 5B). It is noteworthy that His705, in the β8 strand, is conserved in 2FFU (His359, see sequence alignment in Figure 6) and that in the crystal structure, this residue coordinates the metal ion.
His705 in the model of domain 2 is in close contact with the aspartates 563 and 565 (distance shorter than 4 Å), suggesting an involvement of this basic residue in the catalytic cluster ( Figure 5B).
Effects of the S331F Mutation on the Structure and Dynamics of LARGE1 Domain 1. The mutation of the LARGE1 residue Ser331 to Phe has been known for some time to cause a congenital muscular dystrophy phenotype characterized by eye malformations and mental retardation. 26 According to our in silico generated structure of domain 1, Ser331 is located in a random coil region (residues 325−333) between helices I and J, on the rim of the active-site cleft and close to Asn213, which belongs to the coil region between strand β3 and helix C (residues 200−219) ( Figure 3B). Molecular dynamics simulations of the WT and of the S331F mutant were carried out to explore the conformational changes that occur upon S331F replacement and to assess its structural and dynamics effects on the enzyme function, with the final aim of understanding the molecular mechanism at the basis of Although several residues were found to display significant differences in RMSF upon mutation, the important catalytic residues such as the two DXD motifs and His380 were not affected in their atomic fluctuations. An alteration of total, hydrophobic, and hydrophilic SASAs in the mutant protein with respect to the WT was also observed. A representative plot of SASA for all-atom, hydrophobic, and polar residues against time is shown in Figure 8.
Compared to the WT, the S331F mutant showed higher allatom SASAs, which converged to an average value of 15145 ± 248 Å 2 (14777 ± 327 Å 2 in the WT protein). Notably, the hydrophobic SASA of the mutant increased about 12.7% over the simulation time with respect to the WT protein (5115 ± 265 and 4538 ± 166 Å 2 , respectively) (Figure 8), a feature that could significantly affect the interactions at the protein surface.
In order to check the convergence of the MD simulations, we have performed the RMSIP analysis on two equilibrated  A weak destabilizing effect of the S331F mutation (ΔΔG = 0.21 kcal/mol) was also calculated by FoldX. This ΔΔG value, which cannot be directly related to a change in enzyme function, 80 suggests that a phenylalanine in position 331 is not fully compatible with the surrounding chemical environment, as already observed in MD. Time evolution of the secondary structure elements along the MD simulation was also analyzed, showing the stability of the β-strand elements during the entire MD simulations and the structural transition of few α-helices into 3 10 -helices (Supporting Information, Figure S9).
In order to probe the changes in conformational dynamics induced by the S331F mutation, analysis of the MD trajectory by PCA using the Ca atoms was also performed. Results indicate that the first twenty principal components (PCs) account for 90.7% (WT) and 92.8% (mutant) of the motions observed in the last 400 ns of the trajectories, whereas the first three PCs account for the 74.5% (WT) and the 82.3% (mutant) motions (Supporting Information, Figure S10A). The trace values of the diagonalized covariance matrix were found to be 1706.9 Å 2 (WT) and 2145.8 Å 2 (mutant), indicating an increased flexibility of S331F in the collective motion as compared to the WT (Supporting Information, Figure S10A,B). Analysis of the contribution of each residue to the first principal component revealed that the Ser 331 to Phe mutation alters the protein motion in the loop regions surrounding the residue 331. Notably, the peaks corresponding to residues 211−216 and 282−291 are shifted backward (to 201−210 and 276−288, respectively) in the mutated protein (Supporting Information, Figure S10C). This analysis suggests that the S331F mutation introduces local differences in the motion of the active site rim of domain 1, which might affect the interaction with the substrate.
Cloning and Expression of LARGE1 Domain 1 in E. coli. A series of first attempts at heterologous expression of domain 1 as revealed by our modelling results were carried out. The DNA sequence coding for the region comprising residues 138−413 of M. musculus LARGE1 was custom-synthesized and optimized for the expression in E. coli. The synthetic gene thus  Journal of Chemical Information and Modeling pubs.acs.org/jcim Article obtained was cloned into the pHisTrx vector for expression downstream the fusion partner His 6 -tagged thioredoxin and subsequently transformed into E. coli strain BL21 (DE3). Although preliminary, the evidence that the protein can be recovered in the soluble form upon heterologous expression in the presence of helper proteins such as the chaperonin GroEL (see Supporting Information, Figures S11−S13) reinforce the notion that the subdomain 1 represents an autonomous folding unit, as suggested by our modelling results. Further experiments are granted aimed at producing quantitative amounts of domain 1 for future biochemical and structural studies. Recombinant expression of individual LARGE1 domains 1 and 2, as well of the two domains together in a full-length LARGE1 construct, would allow a thorough analysis of the enzymatic proprieties and activity of this glycosyltransferase. The two catalytic domains of LARGE1 are connected by a flexible linker (ranging from amino acid 414 to 472) that due to its nature could not be accurately modeled. A very interesting question is whether the two domains establish long-range interactions and work somehow cooperatively during the sequential synthesis of each added disaccharide block (i.e. the final product of every full LARGE1 enzymatic cycle) or rather function in an independent fashion. Whether the binding of each substrate to each catalytic domain induces any allosteric long-range effects influencing the enzymatic behavior of the other catalytic domain remains to be determined at the current stage. Noteworthy, also the Nterminal region of α-dystroglycan 81 harbors two different domains (IG1 and S6), and it is intriguing to foresee a scenario in which the two DG domains would be involved reciprocally in chaperoning the activity of the two catalytic domains of LARGE1.

■ CONCLUSIONS
Our combined use of molecular modelling approaches allowed for a three-dimensional description of the two domains respectively responsible for the xylosyltransferase and glucuronyltransferase activities of LARGE1. This work characterized the three DXD motifs, each likely to bind a manganese cation (Mn 2+ ), that constitute the active sites for the glycosyltransferase activities of the two domains, 22,69,82 thus offering some initial insight into the possible mechanism of binding of the sugar donor and acceptor. Molecular dynamic simulations of the S331F mutant, carried out in comparison with the WT counterpart, points to a more flexible and less stable enzyme, as suggested by an increased RMSF and all-atom and hydrophobic SASA parameters, compared to the WT protein.
Indeed, a reduced stability could be at the basis of the poorer enzymatic activity of this variant of LARGE. This would eventually lead to the hypoglycosylation of α-DG observed in a patient affected by a severe form of congenital muscular dystrophy with anomalies of eye and brain linked to this missense mutation. 26 Our study demonstrates that in the absence of experimental structural data on LARGE1, template-based three-dimensional modelling of the enzyme can offer some insight into its overall structural features, and the analysis of its dynamic behavior by in silico methods can help to elucidate the molecular mechanism underneath the diseases linked to missense mutations of the enzyme. In addition, our modelling study might pave the road for the analysis of potential complexes formed by LARGE1 and the α-dystroglycan N-terminal domain (αDGN), whose structure we have previously solved. 81 ■ ASSOCIATED CONTENT
Sequence alignment of human and murine LARGE1 proteins; structural analysis of the LARGE1 domain 1 model structure generated by the I-TASSER server; sequence identity between LARGE domain 1 and the templates used by I-TASSER to build the model; structural analysis of the LARGE1 domain 2 model generated by HHPRED/MODELLER approach; structural analysis of the LARGE1 domain 2 model generated by the I-TASSER server; analysis of MD trajectories; time evolution of the secondary structural elements along the MD simulations generated by VMD [78]; projection of the enzyme conformations onto the first two PCs for the wild-type (in blue) and mutant (in

Notes
The authors declare no competing financial interest.

■ ACKNOWLEDGMENTS
The funders played no role in the design of the study, in the collection, analysis, and interpretation of data, or in the decision to submit the manuscript for publication.