Modeling of Oligosaccharides within Glycoproteins from Free-Energy Landscapes

In spite of the abundance of glycoproteins in biological processes, relatively little three-dimensional structural data is available for glycan structures. Here, we study the structure and flexibility of the vast majority of mammalian oligosaccharides appearing in N- and O-glycosylated proteins using a bottom up approach. We report the conformational free-energy landscapes of all relevant glycosidic linkages as obtained from local elevation simulations and subsequent umbrella sampling. To the best of our knowledge, this represents the first complete conformational library for the construction of N- and O-glycan structures. Next, we systematically study the effect of neighboring residues, by extensively simulating all relevant trisaccharides and one tetrasaccharide. This allows for an unprecedented comparison of disaccharide linkages in large oligosaccharides. With a small number of exceptions, the conformational preferences in the larger structures are very similar as in the disaccharides. This, finally, allows us to suggest several efficient approaches to construct complete N- and O-glycans on glycoproteins, as exemplified on two relevant examples.


■ INTRODUCTION
Most of the processes in living cells take place via some form of carbohydrate interaction. 1 Therefore, understanding the effect of glycosylation, which occurs on more than 50% of the eukaryotic proteins, is crucial to reveal biological mysteries. Glycans, the carbohydrate moieties of glycoproteins, serve as biological markers at various stages of cellular differentiation and proliferation and also play significant roles in cell−cell recognition, cell adhesion, and signal transduction. This broad range of functionality is attributed to their structural diversity in terms of size, sequence, branching, and linkage types that can be formed from a variety of possible monomeric units. However, there is a lack of reliable information on the structure of glycoproteins for several reasons. Most of the experimentally solved biomolecular structures have undergone extensive manipulation of oligosaccharides before X-ray crystallography or NMR spectroscopy due to their inherent flexibility and high degree of coordination with water. The available X-ray structures either have different types of glycoforms than the physiologically relevant forms or show only short sequences of glycan units. Also, structures from X-ray crystallography offer the crystalline form of glycoproteins, lacking a more diverse solution representation. NMR can offer structures in solution; however, these represent averages of simultaneously occurring conformers, limiting the amount of structural information. Furthermore, glycan structures in the databases often contain wrong or missing information about stereochemistry and nomenclature. 2 Due to its detailed spatial and temporal resolution, molecular dynamics simulation emerges as a powerful tool for the modeling of glycoproteins. Some progress has been made in search of the conformational preferences of glycan units with different carbohydrate forcefields; including AMBER, 3 GLYCAM, 4 CHARMM, 5 and GROMOS. 6,7 In the current work, we will focus on the conformational preferences of carbohydrate structures that are relevant for glycoproteins.
It has become increasingly accepted that the tertiary structure of a protein around a glycosylation site plays a more crucial role than the primary structure in determining the occurrence and level of glycosylation. 8 Even though glycosylation is a cotranslational process, processing of glycan units and their variation also depends on the accessibility of the glycosylation site and the glycans to the related processing enzymes. 9 Therefore, modeling of the glycoproteins not only contributes a three-dimensional visualization of the glycosylated proteins but also offers explanations for variations in the glycan units as characterized by experimental techniques. 10 Depending on the variations of the 1,3-or 1,6-branches, Nlinked oligosaccharides are classified into three groups: high mannose, hybrid, and complex types (see Figure 1). All types start with two N-acetylglucosamine (β-D-GlcpNAc) units that are linked to an asparagine side chain. The high mannose type subsequently consists of only α-D-Manp residues. In the complex type there are different types of monosaccharides present (for example β-D-GlcpNAc, β-D-Galp, α-D-Neup5Ac, β-D-GalpNAc, and α-L-Fucp) occurring in multiple levels of branching or antennas. In hybrid types, the 1,6-branch may be typically high mannose while the 1,3-branch is complex.
We will use a bottom up approach, investigating the conformational preferences of all relevant disaccharides, then moving on to trisaccharides and larger structures. In particular, we will focus on the conformational preferences of the glycosidic linkages and the influence of additional substituents on the central units.
We start by analyzing 17 disaccharide or amino acidmonosaccharide units that are fragments of the vast majority of   mammalian glycan trees belonging to mature glycoproteins, involving N-and O-glycans. Furthermore, for the first time, we systematically study the effect of neighboring residues on the conformational preferences of the disaccharides by extensively sampling all relevant trisaccharides. The observed conformational preferences of the disaccharides can be subsequently used to build the common major classes of N-and O-glycans, including typical modifications (like core fucosylation). With the selected disaccharide units, typical N-glycans found in mature glycoproteins can be created. Additionally, for Oglycans, the Tn and T antigen, blood group related antigens Lewis x, and their Sialyl versions can be build. The full list of the studied fragments is presented in Table 1.
The most relevant conformations of the dimers are characterized through the glycosidic ϕ and ψ angles. For a (1 → X) linkage, where X is 2, 3, 4, or 6 for a reducing disaccharide, the glycosidic dihedral angles, ϕ and ψ, are defined as O5−C1−OX−CX′ and C1−OX−CX′−C(X − 1)′, respectively. For α-D-Neup5Ac-(2 → X) linkage, ϕ and ψ, are defined as O6−C2−OX−CX′ and C2−OX−CX′−C(X − 1)′, respectively. See Figure 2 for the numbering in the β-D-Manp-(1 → 4)-β-D-GlcpNAc dimer as an example. This figure also shows the subsequent steps of our analysis. To investigate the effect of neighboring residues on the conformational preferences of the dimer, all relevant trimers are studied as well, in the case of the dimer in Figure 2; three different dimers are studied.
Previously, similar simulations investigating the conformational preferences of disaccharides as well as N-glycans have been described. Fernandes et al. 11 simulated some of the related linkages using the GROMOS 43a1 force field with plain MD. Initial structures were taken from a systematic search in vacuum and used in 100 ns explicit solvent simulation. Peric-Hassler et al. 12 investigated glucose-based disaccharides using the same enhanced sampling method as the current work. Yang et al. 13 applied a protocol based on replica exchange to sample the conformations of N-glycans. The same technique was used to extensively sample complex glycans 14 and high mannose oligosaccharides. 15 Even though these studies provide understanding about conformational preferences of several disaccharides, either the studied systems and techniques are different or the sampling of all relevant conformations is insufficient. To the best of our knowledge, the current work represents a first description of a consistent conformational library for the vast majority of relevant linkages in mammalian glycans.
The organization of this paper is as follows. In the methods section, the theory of the enhanced sampling method used is presented, followed by the computational details for its application. Then, the procedure for the analysis of the simulations is introduced. In the Results and Discussion section, a comparison of the conformational preferences as obtained by the simulations with the available experimental data and literature are reported. Furthermore, the effect of neighboring residues on these preferences are systematically studied. We subsequently propose protocols for the construction of the higher oligosaccharides and describe how our approach may be used in future work. Finally, the main findings are summarized in the Conclusion.  Table 2 for the rest of the set.

■ METHODS
This work aims to suggest protocols to build carbohydrate moieties of glycoproteins in the absence of previous structural information on carbohydrates. As outlined above, we start with disaccharides as they carry all the rotational degrees of freedom of the glycosidic linkage. The conformational characteristics of a disaccharide can be represented by a plot of ϕ against ψ, in analogy to the Ramachandran plot for amino acids. As the relative orientations of the successive monosaccharides differ in each linkage type, the ϕ, ψ map of a disaccharide is distinct. Therefore, an analysis of all possible conformations of a disaccharide is a key for creating higher oligosaccharides. Molecular dynamics (MD) simulations are capable of sampling broad ranges of conformations and are therefore the method of choice. However, it turns out that relevant, distinct conformations, represented by different regions on the ϕ, ψ map are often separated by high barriers, hampering the computational efficiency of the method. Similarly, some conformations may be thermodynamically accessible, but only observed very rarely in plain MD simulations, even if they are experimentally observed. Therefore, we apply an enhanced sampling method to ensure a near-to-complete sampling of the ϕ, ψ maps, local elevation umbrella sampling (LEUS). 16,17 Application of this method offers enhanced coverage of those regions in the ϕ, ψ maps which are sterically allowed. It relies on first building up a memory-based biasing potential in the desired subspace and a subsequent umbrella sampling phase to sample the conformational space using the "frozen" biased potentials. We used the adaptable nature of the LEUS method by creating disaccharide biasing potential libraries in the glycosidic torsional angle subspace (ϕ and ψ) and construct the larger carbohydrate moiety of glycoproteins using the relevant biasing potentials from the glycosidic linkages of all disaccharide fragments. A similar kind of approach has been proposed for proteins with peptide ϕ and ψ dihedral angles. 18,19 MD Simulation Settings. All MD simulations were performed using the GROMOS11 biomolecular simulation package (http://www.gromos.net) 20 with the 53A6glyc parameter set 7,21 of the GROMOS force field for carbohydrates. Minor modifications of this parameter set were applied to ensure compatibility with the protein force field and to ensure the stereochemistry of sialic acid. These modifications are listed in Table S2 of the Supporting Information. In particular, we have updated the N-Acetyl linkage in GlcNAc and Neu5Ac to the current protein force field and adjusted the GlcNAc-Asn linkage accordingly.
Initial structures of the disaccharide units of complex glycans were modeled in the molecular operating environment (MOE; Chemical Computing Group, Inc. 2014) from the crystal structure of IgG Fc carrying a sialylated complex type of glycan (PDB ID 4BYH 22 ). The rest of the initial structures of units were modeled using the glycosidic dihedral angle preferences according to the Glycosciences portal. 23 Hydrogen atoms were added according to geometric criteria followed by a short energy minimization using the steepest-descent algorithm. The compounds were placed in a periodic cubic water box with simple point charge (SPC) water molecules 24 and initialized with a 1.4 nm minimum distance of the solute to the box walls. The system was further relaxed by a steepest descent minimization with position restraints on the solute atoms. Prior to the production simulations, the systems were equilibrated with initial random velocities generated from a Maxwell−Boltzmann distribution at 60 K. The systems were heated up to 300 K in five discrete steps with a simulation length of 20 ps, while simultaneously, position restrains on the solute atoms were reduced from 2.5 × 10 4 to 0.0 kJ mol −1 nm −2 .
The simulations were performed at a constant temperature of 300 K and a constant pressure of 1 atm using a weak coupling scheme for both temperature and pressure with coupling times τ T = 0.1 ps and τ P = 0.5 ps and an isothermal compressibility of 4.575 × 10 −4 kJ −1 mol nm 3 . Newton's equations of motion were integrated using the leapfrog scheme with a time step of 2 fs. The SHAKE algorithm 25 was used to maintain the bond distances at their desired values. Long-range electrostatic interactions beyond a cutoff of 1.4 nm were truncated and approximated by a generalized reaction field with a dielectric permittivity of 61. 26 Nonbonded interactions up to a distance of 0.8 nm were computed at every time step using a pairlist that was updated every 5 steps. Interactions up to 1.4 nm were computed at pairlist updates and kept constant in between.
Creating Biased Potential Libraries with LE and Sampling with US. The local elevation (LE) biasing potential is defined as the sum of grid-based functions along the LE coordinates Q, that is (eq 1) In the original implementation g is defined as a onedimensional truncated Gaussian function; 17 in this study, a polynomial type is used instead (eq 2), following its advantages discussed in ref 16.
N g is the number of grid points along the N l dimensional subspace and n k is the number of visits to the corresponding grid cell. MI is the minimum image function ensuring the choice of Q i when it is periodic. H is the Heaviside step function. The force constant, c, and width, σ, are the free parameters which need to be tuned for the building up procedure. After the LE phase comes an equilibrium umbrella sampling (US) phase during which the biased Hamiltonian is timeindependent. That is the values of n k are no longer increased The real (unbiased) probability to find the system at a value Q can be recovered from the LEUS (biased) simulations by reweighting: where δ is the Dirac delta function, and β is 1/k B T with k B as the Boltzmann constant and T, the temperature. The angular brackets indicate an ensemble average over the biased US simulation. The corresponding free-energy profile can be obtained as

Journal of Chemical Information and Modeling
In this study, the LEUS subspace is defined by the glycosidic dihedral angles ϕ and ψ. After testing a number of different possible combinations, parameters for the LEUS method are chosen as N g = 36, σ = 360°/N g , c = 0.005 kJ mol −1 , N l = 2 for the ϕ and ψ angle of a glycosidic linkage. The LE phase is run for t LE = 100 ns and the US phase for t US = 100 ns. During the . Free-energy maps G(ϕ, ψ) in the glycosidic dihedral angle subspace from unbiased LEUS simulations compared with available crystal and NMR data of the related linkages taken from glycanstructure.org. 27 The first column represents the disaccharide simulation, and the second and/or third column corresponds to the disaccharide within a trisaccharide simulation (for trimer codes see Table 2). The division into distinct conformational states (A, B, C, and D) is illustrated at right top panel with colorbar used for free-energy maps with 5 kJ mol −1 spacing starting from global minimum. Regions that were never visited are shown in dark blue. For the rest of the free-energy maps, see Figures 4,6, and 7 as well as Figure S1 in the Supporting Information.

Journal of Chemical Information and Modeling
Article US phase, the trajectory is saved for every 0.1 ps resulting in 10 6 frames for 100 ns to meet the statistical efficiency as discussed in ref 12. For each of the 17 disaccharides, plain MD simulations were carried out for 50 ns after equilibration. Glycosidic linkage libraries were created by LE and free-energy maps were constructed after US. To assess the effect of neighboring linkages on consecutive units, we used disaccharide frozen potentials and applied umbrella sampling on trisaccharides. Thereby, proving our concept that disaccharide motion libraries based on glycosidic linkages can be taken as a model for oligosaccharides. For each of the 10 trisaccharides simulated, the initial structures were modeled based on the free-energy maps of the disaccharides by setting their relevant glycosidic linkage angles to the global minimum value in the disaccharide. After equilibration, US sampling was applied on two glycosidic linkages with four ϕ and ψ dihedral angles of the trimer unit by using the respective frozen LE potentials evaluated from the corresponding disaccharides.
Analysis. Free-energy maps G(ϕ, ψ) were created from the LEUS simulations after reweighting of the biased energy following eqs 4 and 5. The global minimum of each map was set to G = 0 kJ mol −1 and the value for grid points that were never visited set to G max .
The free-energy maps were divided into distinct conformational regions (A, B, C, up to D) with different ϕ, ψ cutoff values for different disaccharides as illustrated in Figure 3. This scheme has previously been used in refs 6 and12 for glucosebased disaccharides. After the unbiasing procedure, the free energies of these states (G A , G B , G C , and G D ) are calculated by integration over regions enclosed by the cutoff values. Subsequently, the values were rescaled by setting the state having the minimum free energy to 0.   Figure 3).

Journal of Chemical Information and Modeling
Article ■ RESULTS AND DISCUSSION The conformational landscapes of 17 dissaccharides (see Table  1), 10 trisaccharides, and 1 tetramer were studied (see Table 2). The analyzed disaccharides constitute the most typical mammalian N-and O-glycans (see Figure 1). Glycoproteins  . Free-energy maps G(ϕ, ψ) in the glycosidic dihedral angle subspace from unbiased LEUS simulations compared with available crystal and NMR data of the related linkages taken from glycanstructure.org (last column). 27 The first column represents disaccharide simulation, and the second and/or third column corresponds to the disaccharide within a trisaccharide simulation and one for tetramer (for trimer/tetramer codes, see Table 2). Color coding as in Figure 3.

Journal of Chemical Information and Modeling
Article the conformational preferences. Our approach is to examine this effect through comparing the free-energy landscape of trisaccharides with their relevant disaccharide units.
Comparison of Disaccharide Free-Energy Maps with Experimental Data. The glycosidic linkage conformations of 17 disaccharides and 10 trisaccharides evaluated from LEUS simulations are compared with available experimental data in Figures 3−7. In Figures 3 and 4, the first column represents the free-energy map of the disaccharide from LEUS simulation, while the second and third column corresponds to the conformation of the same linkage as observed in the trisaccharide LEUS simulations. In the last column, available crystal and NMR structure data of the corresponding linkage gathered from glycanstructure.org are plotted. 27 In the freeenergy maps, conformational landscapes are divided into states (A, B, C, up to D) which is illustrated in the right top panel of  Figure 3. Contour maps were drawn with 5 kJ mol −1 spacing starting from the global minimum which is set to 0 kJ mol −1 . The regions that were never visited are shown in dark blue in all free-energy maps. In Figure 5, the remaining disaccharide freeenergy maps are reported and compared to ϕ, ψ maps from experimental data. Comparison of the free-energy maps shows that the simulations of disaccharides are in good agreement with the available crystal and NMR structures. Not only the most populated regions but also the shape of the conformational maps and the density of observations was captured when there were enough available structures. Besides the most stable states, there are other thermally accessible states present. Some of the disaccharide free-energy landscapes which are previously reported from plain MD simulations are also in agreement considering the most visited energy basin. 11 We applied twodimensional local elevation along ϕ and ψ dihedral angle; however, in the 1 → 6, 2 → 6, and 2 → 8-linked disaccharides, sampling along the extra dihedral angles (ω for 1 → 6, 2 → 6; χ1 and χ2 for 2 → 8) was free (see Figures S1 and S2) as was also seen in the work of Peric-Hassler et al. 12 For the β-Nglycosidic linkage, a one-dimensional local elevation potential was built along O5−C1−Nδ2−Cγ torsional angle.
Analysis of Disaccharides. The free energies of the individual conformational states is given in Table 1, and the respective minimum energy conformations in terms of the glycosidic angles is presented in Table S1. We consider states within 25 kJ mol −1 (∼10k B T) to be thermally accessible. 300°] ranges, with the latter one being the most populated. These observations of the ϕ dihedral angle are in agreement with conformations predicted by the exoanomeric effect. The minima of the ϕ dihedral angle in g + and g − conformations are preferred, whereby the latter one is disfavored for the nonreducing residue involved in αlinkage due to steric considerations. 8 When we compare X-(1 → 4)-β-D-GlcpNAc where X is β-D-GlcpNAc, β-D-Manp, and β-D-Galp, we observe that for all disaccharides conformation C is preferred, with conformation D representing the second energetically favored state at 12−14 kJ mol −1 above the minimum. However, when β-D-Manp is the nonreducing residue (X), the difference between state D (second energetically favored) and state A (third energetically favored) is only 1.2 kJ mol −1 , significantly smaller than the rest, for which state A is around 7 kJ mol −1 higher.
Turning our attention to the ψ dihedral angle, we observe for disaccharides with a R-configuration at CX′ (1 → 4 linkages having D-GlcpNAc at the reducing end) preferred values in the [60°; 120°] range and a secondary visited state with a ψ dihedral angle of about 300°. The ones with a S-configuration at CX′ (1 → 3 or 2 → 3 linkages having D-Manp, D-Galp, D-GalpNAc, and D-GlcpNAc at the reducing end; 1 → 2 linkage having D-Manp at the reducing end), however, prefer conformations in the [220°; 300°] range. For (1 → 6)-linked disaccharides, preferred values of the ψ dihedral angle lie within the [160°; 180°] range. These results are also compatible with the previous studies from the glucose-based disaccharide simulations 12 and also with MM3 maps. 28 Disaccharides containing neuraminic (sialic) acid at the nonreducing end (α-D-Neup5Ac-(2 → 6)-β-D-Galp and its (2 → 6)-α-D-Galp isomer) are commonly found in the carbohydrate moiety of glycoproteins and glycolipids, typically found at outermost ends of glycan units. It is a key component of receptors in molecular recognition for many viruses and bacterial toxins 29 and the receptor specificity of influenza viruses is defined in terms of their recognition of the type of sialic acid glycosidic linkage. 30 Ligabue-Braun et al. evaluated the conformational ring puckering properties of sialic acid along with 53A6glyc compatible parameters using metadynamics indicating 2 C 5 as preferred and confirmed this behavior by 1000 ns of unbiased MD simulations. 31 Free-energy maps of the disaccharides show three distinct allowed regions for the ϕ dihedral angle (60°, 180°, and 300°) unlike other α linked disaccharides. The related states (A, B, C) have very close freeenergy values of 0.0, 4.4, and 3.2 kJ mol −1 , respectively. The additional degree of freedom, ω, in these linkages is almost freely rotating as can be seen in Figure S1.
The α-D-Neup5Ac-(2 → 8)-α-D-Neup5Ac disaccharide commonly occurs in gangliosides. They are mostly found at the end of the oligosaccharides of glycoproteins and they also form polysialic acid chains which have important function in cell adhesion and differentiation events. There are only few structures in the pdb carrying this linkage (see Figure 5); therefore, the conformational preferences of this linkage could not be compared with X-ray structural data. However, our results having the lowest energy for ϕ in g + and for ψ taking a broader range is compatible with existing NMR data. 32,33 Our simulations showed extensive sampling along the χ1, while χ2 remains at a value of 120°throughout (see Figure S2). The predominant state for χ1 was at g + conformation which is in agreement with NMR studies. 32,33 Since the sampling is rather complex because of the additional two dihedral angles (χ1, χ2) in the glycosidic linkage, we also built a local elevation potential along the ϕ, χ2 space, thereby including χ2 in the LE biasing stage. Again the predominant χ2 region was 120°; however, we observed a preference for χ2 at 300°when ϕ is in g + . Forcing χ2 angle to take 300°with a ϕ value of 60°required 20 kJ mol −1 . Our results suggest that sialic acid containing 2 → 8 and 2 → 6 linkages are more flexible and span a larger area than the 2 → 3 linkage, because of the additional degrees of freedom.
The α-D-Manp-(1 → 2)-α-D-Manp linkage is highly abundant in high-mannosidic glycans. Saẅeń et al. 34 studied this disaccharide extensively with MD and NMR, derived its most probable conformation with glycosidic angles ϕ = 80°and ψ = 273°. Our results for this disaccharide shows the most energetically favored state as B and minimum conformation at this state has ϕ = 96°and ψ = 276°(see Table S1) showing strong agreement with their results. α-D-GalpNAc → Thr and α-D-GalpNAc → Ser have biologically important functions and also are of interest for therapeutics, especially in the development of vaccines for cancer treatment. 35 Also, this glycosylation has been known to force the peptide backbone to an extended conformation. 35  Although pyranose rings can adapt a number of ring conformations (chair, half-chair, envelope, boat, and skewboat) for saturated rings, a chair is preferred since it relieves much of the torsional strain. Since there are high energy barriers between ring conformational states, they often adopt their minimum energy puckerings; for D-sugars, it is 4 C 1 , and for L-sugars, 1 C 4 . According to metadynamic studies of Pol-Fachin et al. 7 with the 53A6glyc force field, the free energies of transition from 4 C 1 to 1 C 4 is −77, −54, and −59 kJ/mol for β-D-Glcp, β-D-Galp, and β-D-Manp, respectively. During our LEUS simulations, each monosaccharide accordingly remained in their expected ring puckering conformation, and no transitions were observed toward minor states. All the simulations started from their expected ring puckering conformations which were 1 C 4 for L-Fucp, 2 C 5 for D-Neup5Ac and 4 C 1 for D-sugars. While the transition free energies 7 between differently puckered states is probably too high for this force field, restricting the simulations to the most relevant conformations avoids significant complications when analyzing the preferences along the glycosidic linkages.
Comparison of the Conformational Preferences of Trisaccharides. The trisaccharides are compared to their respective dimer units in terms of their free-energy landscapes in Figures 3 and 4. While the unbiased free-energy landscapes are slightly more noisy in the trisaccharides the sampling using the disaccharide LE potentials is clearly sufficient to reconstruct the conformational preferences. The relative free-energies of the states are compared in Table 2. As can be seen from this table, the most populated state in the trisaccharides always remains the same as the ones observed for the corresponding disaccharide. Not only the global minimum state, but also the preferred order of the states did not change. The preference for second, third, and fourth energetically favored states remained the same in the presence of an additional linkage. Overall, the relative free-energy values of the linkages as observed in the trisaccharides deviated by around 2 kJ mol −1 up to a maximum of 7.2 kJ mol −1 from their disaccharide simulations. This maximum deviation was seen in the trisaccharides where the two flanking monosaccharides were on adjacent carbon atoms of the central one (trimer4 and trimer10). Besides from these two cases, the maximum deviation in the thermally accessible states was around 3 kJ mol −1 . Subsequently, we studied the occurrence of intramolecular hydrogen bonds in the di-and trisaccharides. The presence of a H-bond was determined using a geometric criterion, requiring a hydrogen−acceptor distance shorter than 0.25 nm and a donor hydrogen−acceptor angle larger than 135°. The occurrences as observed in the LEUS simulations were unbiased using umbrella sampling, to obtain average occurrences for the unbiased ensembles. In Table S3 in the Supporting Information, we report the occurrences of the most prominent (P1) and the second most relevant (P2) hydrogen bonds, using a cutoff of 2%.
Strong H-bonds are observed in the β(1 → 4) linkage between the hydroxyl group vicinal to the linkage to the ring oxygen atom of the nonreducing residue (HO3′−O5), with occurrences between 80 and 90%. The highest probability was observed in the β-D-GlcpNAc-(1 → 4)-β-D-GlcpNAc linkages, with 93%, which may be attributed to its rigidity. Strikingly, for the β-D-Galp-(1 → 4)-β-D-GlcpNAc linkage, this hydrogen bond was observed in the dimer for only 12%, which was increased to 70% in the trimer. Note that, according to Table 2, this was not resulting in a significant shift in the conformational preferences. For the α(1 → 3) linkage, the corresponding Hbonds were much weaker, with occurrences of 2−9% (HO4′− O5 and HO2′−O5). Similarly, the HO3′−O5 hydrogen bond was populated for about 5% in the β(1 → 2) linkage, for which an additional HO4′−O6 H-bond was seen in the trimers. No H-bonds with a probability higher than 2% were observed for the (1 → 6) linkages, as also reported by Peric-Hassler et al. 12 and Patel et al. 36 in their studies. Finally, sialic acid showed intramonomer H-bonding between its HO8 hydroxyl group and either one of the carboxyl oxygens O1A/O1B. This Hbond also seems to be more pronounced in the trisaccharide than in the disaccharide.

Journal of Chemical Information and Modeling
Article From the similarity between conformational preferences of the linkages in disaccharides and trisaccharides, one can conclude that the effect of consecutive linkages on the previous linkage is limited and taking the disaccharides as a model for oligosaccharides is applicable. However, in the exceptional case β-D-GlcpNAc-(1 → 4)-[α-L-Fucp-(1 → 3)]-β-D-GlcpNAc there were regions that were not sampled in the trisaccharide simulations which are sampled in the disaccharide. We further analyzed those regions which were sterically not allowed in the trisaccharide case, as illustrated in Figure 7. This was expectable since the linkages are introduced right next to each other. However, when α-L-Fucp was 1 → 6 linked to the β-D-GlcpNAc-(1 → 4)-β-D-GlcpNAc disaccharide (trimer9), all the states that were sampled in the disaccharide LEUS simulations are covered (see Figure 4). This comparison also confirms the additional conformational freedom in 1 → 6 linkages. In the other trimer simulation where the linkage is connected to carbon atoms right next to each other, β-D-GlcpNAc-(1 → 2)α-D-Manp-(1 → 3)-β-D-Manp (trimer4) the conformational preferences compared to their disaccharide behavior were not effected as much as for trimer10. See Figure S3 for a detailed comparison of the conformational preferences of the dimer linkages in trimer4. The reduced effect for this trimer may be explained in terms of the stereochemical orientation in this  3)]-β-D-GlcpNAc trimer. A hypothetical molecule was constructure using nonobserved conformations of ϕ, ψ, leading to an obvious steric clash. The color coding is as in Figure 3.

Journal of Chemical Information and Modeling
Article trimer as shown in Figure 8. Trimer10 was the case where the maximum deviation of 7.2 kJ/mol in free energy from its dimers were seen; at the α-L-Fucp-(1 → 3)-β-D-GlcpNAc linkage, state C. This result emphasizes the role of the N-acetyl group in the monosaccharides, restricting the conformational space in the trisaccharides where the linkages are on the adjacent carbon atoms (see Figure 8). Besides from the presence of the extra N-acetyl group in trimer10, additional strain comes from the position of the linkages where the β(1 → 4) linkage stays between the α(1 → 3) linkage at C3 and the methyl group at C5 (illustrated in Figure 8), thereby, affecting the free-energy landscape of β(1 → 4) linkage as can be seen from Figure 7. An additional trimer unit α-D-Manp-(1 → 2)-α-D-Manp-(1 → 2)-α-D-Manp commonly seen in high-mannose glycans was simulated to further study the effect of close-by linkages. The free-energy landscape of its dimer units showed similar conformational preferences compared to its dimer simulation (see Figure S3). The relative free energy values of both linkages showed ∼3 kJ/mol difference in its conformational states compared to the disaccharide simulation (data not shown). For the relative free energy values of conformational states of α-D-Manp-(1 → 2)-α-D-Manp disaccharide refer to Table 1.
Construction of Oligosaccharides. While the analysis in the previous paragraph implies that for most linkages, the local effect of neighboring monosaccharides will be limited, longer range influences in larger structure are still to be expected. 33,37,38 We developed several approaches to construct glycoproteins without any prior structural knowledge of its carbohydrate moiety. First, the glycan unit of a desired glycoprotein may be built according to the minimum freeenergy conformations of its individual glycosidic linkages. This offers an initial model that can be used as it is, or refined further ( Figure 9A). Alternatively, optimal trees of a glycan can subsequently be constructed either by randomly picking states for each glycosidic linkage according to their probabilities (computed from the relative free energies) or by applying US for each linkage with a frozen disaccharide LE potential energy library. This generates a broad bundle of glycan structures, which can be filtered in a protein environment after alignment with the glycosylation site of the protein (either Asn or Ser/ Thr). Longer range effects of the conformational preferences, most explicitly steric clashes, will be reflected in this bundle of feasible glycan structures. In a filtering step all structures from the bundle are discarded that have a nonbonded energy (intraglycan and glycan−protein) above a given cutoff value. The remaining glycoprotein structures are minimized in vacuum and optimal trees may be selected according to the minimum energy values.
As an example we demonstrate this procedure by constructing a sialylated complex-type N-glycan tree on the human β-galactoside α2,6-sialyltransferase-I (ST6GAL-I, PDB code 4JS1) see Figure 9. Human ST6GAL-I has two glycosylation sites on ASN149 and ASN161 which were suggested to be critical for its folding 39 and have an impact on in its enzymatic function. 40 Here the glycan tree is built only on ASN149 for representation. The initial structure of the complex type of a glycan tree was created from the global minimum of each glycosidic linkages (after unbiasing of disaccharide LEUS simulations). For the construction of the initial glycan structure, Fernandes et al. used the most populated conformations of the disaccharides obtained from 100 ns plain MD simulations. In their work, they are building the glycoprotein only with these units and subsequently apply plain MD to refine the carbohydrate environment. However, since plain MD simulations may get caught in one energy minimum, we rather apply the US stage on all linkages simultaneously to generate a diverse trajectory. Such a fragment-based approach has been described before. 19 Construction of the glycan tree through enhanced methods is the crucial step, since there are evidence of experimentally obtained N-glycans taking highly flexible dynamic conformations. A fold-back conformer of highmannose-type oligosaccharides explored by combination of NMR and REMD is one of the examples. 15 Use of LEUS enables us to capture such conformers as well (see Figure S4). The difference between plain MD simulation and the LEUS simulation can be seen from Figure 9B and C where 50 frames of the trajectories from plain MD and LEUS were randomly selected and aligned on the Asn residue. In the plain MD simulation, the oligosaccharide was only fluctuating around its initial structure, while the LEUS trajectories carried more diversity. After LEUS simulation of the oligosaccharide, these diverse structures were fitted on a protein environment, by alignment of the Asn residue. Using an energy cutoff of +10 4 kJ mol −1 , these trajectories were filtered. Next the protein (PDB ID 4JS1, 41 ST6GAL-1) carrying different possible conformations of its oligosaccharide unit are minimized in vacuum. In Figure 9D, one can see 2100 minimized glycan structures of the protein with its glycan. For clarity, only a single protein structure is shown.
In most of the glycoproteins, glycans are exposed to the surface and they are relatively free to move, as seen in Figure  9D. However, there are therapeutically important glycoproteins where glycans are not on the surface but rather embedded

Journal of Chemical Information and Modeling
Article within the glycoprotein environment. In a second application, such a complicated system was chosen; the IgG Fc fragment carrying two complex-type biantennary N-glycans attached to ASN297 of the dimeric Fc fragment, buried in the cavity between the two chains. The Fc glycans are heterogeneous leading to diverse glycoforms which correlates with certain diseases. 42 In this application, we have studied the IgG Fc fragment containing a fucosylated complex-type N-glycan, terminated with galactose, due to its high occurrence in healthy humans. 43 Since not only the two glycan units are interacting with each other but they are also embedded within the core of glycoprotein, the number of optimal trees that was obtained was significantly reduced compared to fitting of glycans on the exposed protein surface, but some diversity remained among the structures after filtering (see Figure 10A). In addition to the conformation observed in the crystal structure (PDB ID 4BYH 22 ), we have found other energetically possible conformers for the glycan units that are more exposed to the surface and more accessible, as has also been suggested by a NMR study. 44 The conformations obtained after minimization of the fitted structures showed that glycan units can take fully exposed conformations, either having the 1,6-branch ( Figure  10B) or the 1,3-branch exposed (Figure 10C), and a more dynamic glycan conformation with both ends exposed ( Figure  10D and E). Furthermore, conformations of glycan units having contacts with the protein and with the other glycan chain were seen and illustrated in Figure 10F and G, respectively. From previous NMR analysis of the Fc glycan it was concluded that the 1,6-branch terminus of the glycan is anchored to the protein surface while the 1,3-branch is exposed and readily accessible for enzymatic addition of sialic acid. 45,46 However, in the recent NMR spin relaxation study of Barb and Prestegard, 44 it was proposed that not only the 1,3-branch but also the 1,6-branch is highly dynamic and can take exposed conformations as well.
Note that the presampled bundle of glycan trees for a given chemical composition be reused time and again for different protein systems. By generating a library of typically occurring glycan tree bundles, the model building can be done rather efficiently. These approaches are readily applicable using the GROMOS package for molecular simulations. The analysis programs to filter unfavorable glycan trees from the larger conformational bundle will also be made available in the GROMOS++ library of analysis programs. Finally, all local elevation biasing potentials of the disaccharides in Table 1 are made available in the Supporting Information.

■ CONCLUSION
The conformations of glycans in conjugation with glycoproteins form a challenge both for experimental and theoretical methods. Their complexity is the result of the variety of possible monomeric units which are linked in a branched way and have differently populated conformational states. There is a pronounced lack of spatial information about them causing an obstacle to grasp the full picture about their biological functions. Especially toward the end of the glycan chains, the amount of available structural data is scarce; see for instance the rightmost panels for the α-D-Neup5NAc-(2 → 6)-β-D-Galp linkage in Figure 3 and for various O-glycan core units in Figure  5.
In this article, we studied the glycosidic linkages that occur in virtually all of the biologically relevant N-and O-linked glycan units. The individual free-energy landscapes were derived from extensive local elevation and umbrella sampling molecular dynamics simulations. For the first time, we reported a complete conformational library in the form of the free-energy landscapes of all disaccharide linkages needed to construct biologically relevant glycans. Since the glycan structures have a great flexibility, not only the lowest-energy conformation is relevant, but also energetically less favorable conformations can be expected to be accessible. For linkages for which structural data is available, our simulations are in excellent agreement with observed conformational preferences, while linkages for which experimental data is sparse, such as the α-D-Neup5NAc-(2 → 6)-α-D-Galp, α-D-Neup5NAc-(2 → 8)-α-D-Neup5NAc linkage Figure 10. (A) Representation of a fitting of the galactosylated glycoform on IgG Fc protein carrying glycans on both chains at its core. Protein and glycans units are represented in cartoon and stick form, respectively. N-Acetylglucosamine, mannose, and fucose units are shown in blue, green, and red, respectively. Galactose in the 1,6-branch is shown in yellow and in the 1,3-branch in orange. (B−G) Related views with different levels of glycan exposure: (B) the 1,6-branch is exposed; (C) the 1,3-branch is exposed; (D, E) both ends of the glycan are exposed; (F, G) examples of conformations where the glycan interacts with the protein or with the other glycan tree.

Journal of Chemical Information and Modeling
Article and various O-glycan linkages, we provide conformational preferences that complement the experimental data. Many of these linkages have not been studied thoroughly before, other than using vacuum conformational searches or very short MD simulations. All relevant initial structures, topologies, and localelevation biasing potentials of the disaccharides are made available in the Supporting Information.
In a next step, we systematically studied the effect of an additional linkage on the free-energy landscape of the disaccharide. All relevant trisaccharides and one tetrasaccharide were extensively simulated and the free-energy landscapes were compared. We show that the conformational preferences are largely independent of the consecutive linkages. However, we identified branching linkages, for which neighboring molecules do limit the accessible conformational space (see Figure 7).
The conformational library for the disaccharides, combined with the LEUS method, enables us to build models of higher oligosaccharides without extensive additional sampling of the conformational space. At the hand of two realistic examples, we demonstrate possible modeling methods for glycans that are conjugated to glycoproteins. Overall, our work offers important handles to realistically model the three-dimensional structures and flexibility of glycans in the absence of experimental data. As such, our work opens the way to more realistically study the effect of protein glycosylation on its structure, dynamics, and function.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00351.
Glycosidic linkage conformations at free-energy minima of the corresponding states (Table S1). Overview of changes in the force field that was used in the current work as compared to the 53A6glyc parameters (Table  S2). Average occurrence of intramolecular hydrogen bonds (Table S3). Free-energy maps along the additional degrees of freedom in (1−6) and (2−8) linkages ( Figures S1 and S2). Detailed comparison of dimer conformational preferences to those in trimer4 ( Figure  S3). Examples of fold-back conformations observed in the conformational bundle of a complex glycan tree ( Figure S4) (PDF) Topologies, input structures, and local elevation biasing potentials for all dimers listed in