Understanding Ring Puckering in Small Molecules and Cyclic Peptides

The geometry of a molecule plays a significant role in determining its physical and chemical properties. Despite its importance, there are relatively few studies on ring puckering and conformations, often focused on small cycloalkanes, 5- and 6-membered carbohydrate rings, and specific macrocycle families. We lack a general understanding of the puckering preferences of medium-sized rings and macrocycles. To address this, we provide an extensive conformational analysis of a diverse set of rings. We used Cremer–Pople puckering coordinates to study the trends of the ring conformation across a set of 140 000 diverse small molecules, including small rings, macrocycles, and cyclic peptides. By standardizing using key atoms, we show that the ring conformations can be classified into relatively few conformational clusters, based on their canonical forms. The number of such canonical clusters increases slowly with ring size. Ring puckering motions, especially pseudo-rotations, are generally restricted and differ between clusters. More importantly, we propose models to map puckering preferences to torsion space, which allows us to understand the inter-related changes in torsion angles during pseudo-rotation and other puckering motions. Beyond ring puckers, our models also explain the change in substituent orientation upon puckering. We also present a novel knowledge-based sampling method using the puckering preferences and coupled substituent motion to generate ring conformations efficiently. In summary, this work provides an improved understanding of general ring puckering preferences, which will in turn accelerate the identification of low-energy ring conformations for applications from polymeric materials to drug binding.


■ INTRODUCTION
Molecular rings play an important role in chemistry and biology, and their shapes are intimately linked to their physical and chemical properties. For instance, the glycosidase reactions heavily depend on their conformations. 1 Beyond small rings, macrocycle conformations are crucial in host−guest chemistry and drug design. In host−guest chemistry, the conformational preferences of macrocyclic rings lead to selective complexation of organic ligands. 2−4 On the other hand, macrocycles including cyclic peptides (CPs) have recently demonstrated their potential in modulating traditionally less druggable targets, e.g., mimicking protein−protein interactions. 5−8 The flexibility of cyclic molecules improves their chance to adopt favorable conformations that will bind to targets with flat surfaces. Despite the importance of ring conformations, most studies on ring conformations focus on small subsets, for example, on carbohydrate rings, 9−12 cycloalkanes, 13−15 and families of macrocycles involved in host−guest chemistry, 16−18 resulting in a lack of general understanding of ring conformational preferences, especially for medium-sized rings and macrocycles. We have therefore carried out an extensive conformational analysis on a wide range of ring molecules, including cyclic peptides.
Flexible rings can adopt different conformations due to outof-plane bending motions, caused by changes in the rotatable ring bonds, resulting in so-called ring puckering. Typically, the ring puckers can be classified into different canonical forms and are usually low-energy conformations; a classic example is the chair and boat conformations in 6-membered rings such as cyclohexane. These canonical forms are not "unique", as the pseudo-rotation leads to multiple equivalent conformations, for example, the 4 C 1 and 1 C 4 chair conformations in cyclohexane. 10,19 The pseudo-rotation and the coupled change in substituent orientation sometimes lead to diverse geometry, i.e., large root-mean-square deviation (RMSD) in overall threedimensional (3D) conformations, as illustrated in Figure 1. It is therefore necessary to sample ring conformations adequately to generate physically and biologically relevant conformational ensembles. In addition, there are several factors controlling the conformational flexibility of rings, including endocyclic double bonds, 20 the nature of substituents, and the presence of any intramolecular interactions such as hydrogen bonds. 21 Lyu et al. recently showed that the intramolecular hydrogen bonds restrict the pseudo-rotation path in deoxyribonucleosides, and the path characteristics depend on the strength of intramolecular interactions. In macrocycles, small structural modification, e.g., changes in exocyclic functionality, may lead to significant changes in conformation through emergent hydrogen bond and other intramolecular interactions. 22 Such conformational changes are difficult to predict, as the coupled ring bond rotations are not well understood.
A variety of coordinate systems have been developed to characterize ring puckers quantitatively. These techniques can be categorized into three general approaches. The first approach measures the perpendicular displacement of the ring atoms from a mean plane of the ring, 19,24 while the second approach makes use of the triangular tessellation of the ring and measures the associated angles between the reference plane and the triangular planes. 25 The last approach simply measures the ring torsion angles, 26 but this representation does not lend itself well to identifying pseudo-rotation. Methods used to analyze ring conformations based on perpendicular displacements of ring atoms such as Cremer−Pople puckering coordinates 19 are widely used in the community. 12,27 This representation has the advantage of using a reduced number of parameters, N − 3, to describe the geometry of an Nmembered monocyclic ring. Hence, we require only two parameters to describe the conformational space of 5membered rings and just three for 6-membered rings. It has also been used as collective variables for the enhanced sampling of 6-membered ring conformations in molecular dynamics studies. 28 To better understand ring conformational preferences, we have extended our analysis to more complex ring systems, including larger sizes and bicyclic and polycyclic rings. We not only study their puckering preferences using Cremer−Pople puckering coordinates but also identify the underlying constraints on their geometry and the change in substituent orientations upon puckering. More importantly, we build quantitative models to convert from Cremer−Pople puckering coordinates to ring torsion angles, which thus allows us to understand the torsional changes upon pseudo-rotation. A novel knowledge-based conformational sampling scheme based on puckering parameters is also proposed. Unlike knowledgebased sampling methods, e.g., OMEGA, 29 which rely on a set of discrete prespecified ring templates and heuristic rules for sampling, our method efficiently explores conformational space, including the dominant canonical conformations and their associated pseudo-rotation. We show that our sampling method can generate low-energy ring conformations effectively.

■ METHODS AND DATA
Cremer−Pople Puckering Parameters for N-Membered Rings. The out-of-plane deviations of puckered Nmembered rings can be measured by the z-coordinates of the ring atoms relative to a mean plane cutting through the ring. The z-coordinates contain information about the overall movement or the shape of the puckered ring. Translation and overall rotation of the planar reference around the x-and y-axes can be removed by imposing three constraints (see Appendix 1, eqs S1−S3).
Let R j be the position vectors of the ring atom, j, with the origin defined as the geometrical center of the puckered rings. We denote two vectors, R′ and R″, that define the mean plane (see Appendix 1, eqs S4 and S5), where n is the unit normal vector to this mean plane; z j is then the displacement of atom j from the mean plane and is given by the scalar products in eq 1 Using the mean plane and the full set of displacements, we can compute the Cremer−Pople ring puckering parameters 19 as follows.
For odd values of N, and N > 3, the puckering amplitude, q m , and phase angle, ϕ m , are defined as follows: eqs 2 and 3 apply for m = 2, 3, ..., (N − 1)/2. The amplitudes, q m , are positive-valued, while the phase angles, ϕ m , range from −π to π radians. For even values of N, eqs 2 and 3 apply for m = 2, 3, ..., (N/2 − 1), but an additional puckering amplitude is required, with the following form: Note that the q (N)/(2) value in eq 4 can take either sign. The Cremer−Pople representation is only applicable to monocyclic ring systems. To extend it to more complex ring systems such as fused and spiro rings, we first decompose the ring systems into smaller rings and calculate the puckering parameters for each ring. In particular, we adopt the concept of unique ring families (URFs) 30 for this decomposition, with the resultant Cremer−Pople parameters being calculated for all relevant cycles, i.e., minimum cycle bases.
Additionally, the Cremer−Pople representation is atomorder-dependent, and we standardize the atom ordering in the ring before calculation. Bond orders, connectivity, and element types are used to determine this standardized order (see Appendix 1). Other canonical atom numberings may also be used. 31 In symmetric rings, such as cycloalkanes, the first atom is picked at random. We also take the volume of the amino acid into account when ordering the backbone ring atoms in cyclic peptides. The priority increases with volume, so tryptophan, tyrosine, and phenylalanine have higher ranks, while glycine has the lowest. The rank order of amino acids can be found in Appendix 1, Table S1. Note that this ordering is only applied to the cyclic peptides.
Extension of Cremer−Pople Puckering Parameters for Ring Substituent Positions. It is well known that the preferred orientation of ring substituents changes under ring inversion, and neighboring substituents can also influence their preferred orientation. We followed the framework proposed by Cremer 32 to describe the position of substituents unambiguously. Two orientation angles, α and β, are introduced. The α angle describes the relative position of the substituent to the mean plane defined above (see Appendix 1, eq S11), and β angle describes the relative position of the substituent to the geometrical center of the ring (see Appendix 1, eqs 12 and 13; see Figure 2 for illustration).
When the substituent angle α = 0 or π, this indicates that the substituent is sitting axially above or below the mean plane, respectively, while α = π/2 indicates the equatorial orientation. The angle β = 0 indicates a radially outwardly directed substituent, while β = −π or β = π indicates an inwardly directed substituent.
With this complete representation for the ring puckering motion and substituent orientation, we can investigate their coupled motion extensively and develop ring puckering potentials for conformer sampling, similar to their acyclic counterparts.
Connection between Ring Puckering, Substituent Orientations, and Torsion Angles. Ring inversion is the interconversion of cyclic conformers that have equivalent ring shapes. Such interconversion can be characterized by Cremer− Pople representation. The substituent orientation also changes during inversion. In particular, we are interested in the coupled ring bond rotations and the associated change in substituent orientation during pseudo-rotation. Inspired by the functional forms studied in previous work, 33 three models are proposed (see Appendix 1, eqs S18−S20). Equation S18 is used to predict the associated change in substituent α and β orientation angles upon puckering. Equation S19 maps the puckering parameters to endocyclic torsion angles, while eq S20 helps explain the rotational dependence between the substituent exocyclic torsion angle and endocyclic torsion angle. Note that eq S19 is a mapping for the general Nmembered ring, and the functional form proposed by de Leeuw et al. 33 to convert puckering coordinates to torsion angles for 5-membered rings can be recovered by applying trigonometric identities.
Here, we denote the endocyclic torsion angle as θ endo ; the exocyclic torsion angle as θ exo ; and α and β as the substituent orientation angles.
Ring Reconstruction from Cremer−Pople Puckering Parameters. Cremer−Pople puckering parameters not only provide quantitative descriptions of puckered N-membered rings but also allow efficient conversion from puckering parameters to Cartesian coordinates, as shown by Cremer. 34 In addition to N − 3 puckering parameters, N − 3 bond angles and N bond lengths are required for the reconstruction of puckered N-membered ring conformations. The default values of bond lengths and bond angles are specified in Tables S2 and S3 in Appendix 1. The calculation of the x-, y-, and zcoordinates from puckering parameters, specified bond lengths, and bond angles is discussed in Appendix 1.
To sample low-energy ring conformations efficiently, we used kernel density estimation (KDE) to learn the ring puckering preferences and generate puckering values from the model. Note that the Cremer−Pople parameters were mapped to Cartesian space (q m cos ϕ m , q m sin ϕ m ) for the KDE calculation. A Gaussian kernel was used for the density estimation. The samples were then converted to different zcoordinates to give distinct ring conformations. Using the relationship between endocyclic torsion angles and exocyclic torsion angles (see Appendix 1, eq S20) with appropriate parameters (see Appendix 2, Table S8), we can update the ring substituent position accordingly. Note that the exocyclic bond angles are kept fixed in the sampling. This approach is in contrast to traditional knowledge-based sampling methods, 29,35,36 where ring templates and heuristic rules are used to sample ring conformations, and substituent positions are then assigned by minimizing a clash function or force field energy. Our approach does not require force field minimization, although as discussed below, minimization can also improve cases where the actual bond lengths or angles differ slightly from our model.
Data. Over 130 000 small molecules were selected from the Crystallography Open Database (COD) 37,38 (63814 molecules) and the ZINC database 39 (67009 molecules), including natural products and macrocycles. Molecules from COD and ZINC contain hydrogen, boron, carbon, nitrogen, oxygen, fluorine, silicon, phosphorus, sulfur, chlorine, bromine, and iodine. Molecules with carbon, nitrogen, oxygen, and sulfur in a ring with up to 20 atoms were considered. For COD molecules, the Open Babel version 2.4 40 was used to convert from CIF format to SDF format and assigned bond orders. Molecules with inconsistent geometries, such as hydrogen atoms or consecutive double bonds contained in a ring, were excluded from the analysis. In addition, we generated a set of cyclic peptides (CPs), including 8661 cyclic tetrapeptides (CTPs) and 2249 cyclic pentapeptides (CPPs). The peptide data sets contain head-to-tail cyclic tetrapeptides and cyclic pentapeptides, i.e., cyclization from the N-terminus to the Cterminus, yielding a set of 12-membered and 15-membered rings. Their sequences are composed of 14 of the 20 naturally occurringL-amino acids (see Appendix 3, Table S9). Methylcyclohexane is used as an example, with a mean plane (gray) cutting through the 6-membered ring. The methyl substituent is axial to the mean plane (α = 0.24 rad). O denotes the origin, which is also the geometrical center of the ring. The points S and P are projections of the methyl carbon and the ring atom that is attached to the methyl carbon onto the mean plane. The point Q lies in the mean plane such that points O, P, and Q are collinear. The angle β is defined by the angle between S, P, and Q, and β = −2.25 rad in this example.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article For all molecules from ZINC and the cyclic peptides, experimental-torsion distance geometry with basic knowledge 41 was used to generate initial geometry, followed by geometry optimization using the GFN2 method 42 and conformer sampling using the iterative metadynamics sampling and genetic crossover (iMTD-GC) method implemented in the CREST program. 43,44 Note that this data set is also used in our previous works. 45 We should note that CREST may break the molecules into smaller fragments in the output file. Such fragmented molecules were excluded from our analysis.
To demonstrate the effectiveness of using puckering preferences in sampling ring conformation, we selected 20 simple molecules, including monocyclic rings with and without endocyclic double bonds and substituents (see Appendix 3, Table S10).
Analysis. To provide a better understanding of the ring geometry in cyclic peptides, we computed the (ϕ, ψ) torsion angles. We also calculated the eccentricity, which is used to measure the "roundness" of a ring. 46 Eccentricity, e, is a nonnegative real value that characterizes the shape of a conic section. A value of 0 indicates a circle and 1 indicates an ellipse.
To assess the performance of our proposed sampling method, we computed the heavy atom root-mean-square deviation (RMSD) and torsion fingerprint deviation (TFD) 47 between the generated conformations and the lowest-energy (reference) conformation sampled from CREST.
Furthermore, three metrics, namely, squared circular correlation coefficient (R circ 2 ), mean angular error (MAE), and standard deviation of the angular error, were used to assess the predictive performance of our proposed models. The circular correlation coefficient and the angular error (circular distance between the predicted and actual angles) are defined by eqs S22 and S23 in Appendix 1, respectively.
Implementation. RDKit 23 was used to read molecules, generate conformations, and write conformers. The implementations of RMSD and TFD calculation in RDKit were used. RingDecomposerLib 48 was used to identify the URFs of the ring system. The implementation of KDE in Scikit-Learn 49 was used. The code is available in Github (https://github. com/lucianlschan/RING).

■ RESULTS AND DISCUSSION
Small-and Medium-Sized Rings. A relatively small number of conformational clusters were observed for 5-to 8membered rings, reflecting their canonical conformations. For instance, Figure S4a in Appendix 3 shows two clusters for flexible 6-membered rings, corresponding to the celebrated chair and boat conformations, as illustrated in Figure 3. As expected, the chair conformation is more frequently observed than the boat conformation. The phase angle, ϕ 2 , is uniformly distributed, suggesting free pseudo-rotation in both forms. In contrast, the presence of endocyclic double bonds or shared aromatic bonds restricts both puckering and pseudo-rotation. The puckering amplitude, q 3 , and phase angle, ϕ 2 , exhibit a sinusoidal relationship as can be seen in Figure S4b in Appendix 3. These relationships hold for both simple monocyclic rings and complex bi-and polycyclic rings.
For 7-and 8-membered rings, an additional phase angle, ϕ 3 , is required. Phase−phase couplings are evident in some conformational clusters. For example, three conformational clusters were observed in 7-membered rings with no endocyclic double bonds, having predominantly twist-chair and chair conformations, as illustrated in Figure 4a. The puckering amplitudes (q 2 , q 3 ) fall into a narrow range, and the pseudo-rotations are restricted in this region, as shown in Figure 4c. The phase angles ϕ 2 and ϕ 3 are strongly coupled, and they are marginally uniformly distributed. This coupled motion suggests the minimum energy pathway of the chair− twist-chair pseudo-rotation. As suggested by Bocian et al., 13 the pseudo-rotation map can be approximated by eq 5, with varying intercepts (ϕ 2 *, ϕ 3 *) and slopes (K 2 , K 3 ). This model is valid for all rings with heteroatoms (see Figure 4b).
In bicyclic and polycyclic rings, the adjacent rings and bulky substituents sometimes induce significant steric clashes and result in concomitant changes in conformational preferences. The increase in amplitude q 2 and decrease in amplitude q 3 indicate a conformational change from chair to half-chair (0.7 < ϕ 2 < 1) and boat conformations (ϕ 2 > 1). The pseudorotations are free in these clusters, i.e., the phase angles are randomly distributed (see Appendix 3, Figure S5c).
To assess the effect of the endocyclic double bonds on conformational preferences, we selected 7-membered rings with one and two endocyclic double bonds. We further separated the observations by the location of endocyclic double bonds. Figure S6 in Appendix 3 shows three conformational clusters in 7-membered rings with single endocyclic double bonds, and they correspond to the chair, half-chair, and boat conformations, which are the same as the case without double bonds. However, the population of the chair conformation decreases, while the population of halfchair and boat conformations increases. The pseudo-rotations in all three clusters are restricted. In the chair and twist-chair regions, the phase angle, ϕ 3 , is relatively fixed with small variations in the phase angle, ϕ 2 , while in the boat and twist boat regions, the phase angle ϕ 2 is fixed while the phase angle ϕ 3 varies. The half-chair conformation exhibits strong coupling between phase angles.
As the number of endocyclic double bonds increases, the number of degrees of freedom of the ring system decreases. The location of the double bonds strongly influences the puckering preferences, as shown in Figure 5. The double bonds in 1,3-cycloheptadiene and 1,4-cycloheptadiene-like structures (Figure 5a,5c) impose different steric constraints and lead to  Figure S7. For larger rings, the number of conformational clusters increases, while the coupling between puckering amplitudes and phase angles becomes more complex. It should be noticed that small local structural changes may result in significant changes in conformation through transannular repulsion and intramolecular interactions. To gain further insight into longrange-coupled ring bond rotations, we performed cluster analysis on a set of cyclic peptides.
Cyclic Peptides. Peptide cyclization imposes additional constraints on the system and thus reduces the thermally accessible conformational space of the resultant cyclic peptides relative to their linear counterparts. 50 There are several factors governing the backbone conformation of cyclic peptides, including the size and properties of the amino acid side chains, the presence of N-methylation, and the formation of γand βturns. Analyzing the puckering preferences helps understand the relative influence of these factors.
The configuration of the amide bonds provides important information to determine the dominant backbone conformation adopted by the cyclic peptides. The partial double bond character of the carbon−nitrogen bond in amide bonds renders them planar, resulting in either cis (C) or trans (T) amides. We can thus classify the conformations based on the sequence of cis-or trans-amide bonds, as described in Loiseau et al., 51 for example, for cyclic tetrapeptides, all-cis ("CCCC") or all-trans ("TTTT") amides. Typically, the trans-amide bond is preferred in acyclic peptides, large cyclic peptides, and proteins. Figures S9a and S13a in Appendix 3, however, show that the cis-amide bond is preferred in both cyclic tetrapeptides and cyclic pentapeptides, with 40% all-cis and 43% CCCT in cyclic tetrapeptides. In small cyclic peptides, high ring strain reduces the energy barrier between cis and trans isomers. All-trans and single-cis (CTTT and CTTTT) configurations are less favored in both tetra-and pentapeptides due to high transannular strain, and they exist only with explicit stabilization from one or more intramolecular hydrogen bonds. Such stabilization leads to γ-turns in cyclic tetrapeptides and γand β-turns in cyclic pentapeptides, as reflected by their Ramachandran (ϕ, ψ) dihedral angles: see, for example, Appendix 3, Figure S12a. The puckering amplitudes and phase angles are thus highly restricted in such conformational clusters. It should be noted that these turns are favored by the in vacuo calculation and may not reflect the conformations observed in solution. The positional preferences of amide carbonyl groups are key to Journal of Chemical Information and Modeling pubs.acs.org/jcim Article understanding the formation of such intramolecular hydrogen bonds, which we discuss next. Main chain−main chain intramolecular interactions were not observed in cyclic tetrapeptides with two or more cis-amide bonds, nor were they seen in cyclic pentapeptides with three or more cis-amide bonds. Transannular repulsion, main chain− side-chain, and side-chain−side-chain intramolecular interactions appear to be the major driving forces behind the conformational preferences seen in these cases. Small structural modifications, such as the change in amide bonds and/or sidechain orientations, may induce significant steric clashes and lead to conformational switching. For example, Figure 6a,6b shows the puckering amplitude preferences of two canonical conformations in all-cis-amide cyclic tetrapeptides, and they differ by the orientation of one amide bond. Similarly, we followed the nomenclature used in Loiseau et al., where the orientation of amide carbonyl is denoted by U when it is oriented above the mean plane, while it is denoted by D when it is oriented below the mean plane. The two canonical forms (CCCC−DDDD and CCCC−UDDD) exhibit distinct puckering amplitude preferences and phase−phase couplings (see Appendix 3, Figure S10). Similar phenomena are observed in cyclic pentapeptides (see Appendix 3, Figure S14). Furthermore, the formation of main chain−side-chain interactions and/or side-chain−side-chain interactions give rise to two subclusters within the same configuration (CCCC−DDDD) with diverse geometries, as illustrated in Figure 7a,7b. The orientation of the side-chain C β atoms plays important roles in the formation of these interactions.
To further understand the cyclic backbone conformation, we calculated the Ramachandran (ϕ, ψ) dihedral angles and the eccentricity of the backbone. The Ramachandran plots in Appendix 3, Figures S12b, and S13d show that the (ϕ, ψ) angle preferences of cyclic tetrapeptides and pentapeptides are similar to those of the standard secondary structures observed in proteins. Figures S9b and S13b show contrasting eccentricity values between clusters; for example, all-trans cyclic tetrapeptides give a mode at 0.3, while alternating CTCT cyclic tetrapeptides give a mode at 0.8, indicating diverse geometries between clusters.
We thus have shown that Cremer−Pople puckering parameters are a useful representation to understand ring puckering for both small rings and macrocycles including cyclic peptides and analyzed the associated effects of endocyclic Journal of Chemical Information and Modeling pubs.acs.org/jcim Article double bonds on ring puckering. We have also revealed the influence of configuration and orientation of amides on ring geometries. To gain further insights, we will examine the substituent orientations and their relationship to puckering preferences below. Effects of Substituent Orientation and Functionality. The size and functionality of substituents are two of the key factors determining the ring geometries, and their effects vary with ring size. We thus separated the lowest-energy conformations according to ring sizes: small (5-and 6-membered) rings, medium (7-to 11-membered) rings, and macrocycles (12-membered or larger rings).
As might be expected, ring substituents tend to be outwardly directed (relative to the ring center) in small-and mediumsized rings, i.e., β is close to zero, regardless of the nature of the substituents (see Appendix 3, Figure S15). However, substituents including carbonyl and hydroxyl are allowed to be quasi-axial to the mean plane and inwardly directed in macrocycles, which are sterically unfavorable in small-and medium-sized rings. Their α angle preferences, however, Figure 6. (a) Marginal distribution of the ring puckering amplitude (q 2 , q 3 , q 4 , q 5 , q 6 ) preferences for two conformational clusters of all-cis conformation in cyclic tetrapeptides (colored red and blue). The two clusters are defined by the α orientation angle of the amide carbonyl oxygen, where U indicates α < π/2, and D indicates α > π/2. The CCCC−DDDD conformations are colored blue, while CCCC−UDDD are colored red. In panel (a), two modes are observed in puckering amplitudes for both clusters, indicating the presence of multiple subclusters. (b) Pairwise joint distribution of the ring puckering amplitude (q 2 , q 3 , q 4 , q 5 , q 6 ) preferences for two conformational clusters of all-cis conformation in cyclic tetrapeptides. The puckering preferences of CCCC−DDDD conformations are more concentrated than those in CCCC−UDDD conformations.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article depend on both ring size and the nature of substituents. For example, Figure 8a shows the substituent orientation preferences of the carbonyl functional group. Due to the exocyclic double bond, its movement is restricted compared to other single-bonded small substituents such as hydroxyl and methyl. The carbonyl oxygen thus tends to be equatorial to the mean plane, i.e., α ≈ π/2 in small rings, and preferences change as the ring size increases. Besides exocyclic double bonds, endocyclic double bonds also restrict the exocyclic motion. Figure 8b shows the substituent orientation preferences of methyl groups in small rings, and the α angle is bounded when the methyl is attached to a ring atom that is linked to a  shows that the carbonyl groups tend to be equatorial to the mean plane (α ≈ π/2) in small rings, while panel (b) shows that the orientation of a methyl group tends to be restricted when it is attached to a ring atom that is linked to a neighboring ring atom with a shared endocyclic double bond.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article neighboring ring atom with a shared endocyclic double bond. The influence of endocyclic bonds is weakened in mediumsized rings and macrocycles, and the α angle can therefore adopt a wider range of values.
To reveal the role of bulky substituents in macrocycles, we assessed their orientation angles in cyclic peptides, in particular, the positional preferences of amide carbonyls and the side-chain C β atoms. As mentioned above, there are multiple conformational clusters in cyclic peptides. In particular, γ-turns are observed in the all-trans and CTTT conformation in cyclic tetrapeptides, and the formation of main chain intramolecular hydrogen bond leads to a rigidification of amide carbonyl positions, as illustrated in Appendix 3, Figure S17. On the other hand, amide carbonyl groups in other clusters move accordingly to avoid steric clashes and/or align main chain−side-chain intramolecular interactions (see, for example, Figure 9). Likewise, the C β atoms of all amino acids studied except Gly show correlated motions, so as to avoid steric clashes and align side-chain− side-chain interactions. In addition to C β orientation, we calculated the side-chain torsion angles, χ 1 . Figure S18 in Appendix 3 shows multimodality in χ 1 angles, which is consistent with side-chain torsion angles observed in protein secondary structures. This suggests that the side-chain conformations can be easily sampled using standard sidechain rotamer libraries. 52 The extended Cremer−Pople representation provides a means to understand correlated positional preferences in ring substituents; however, it is not clear what the relationship between the puckering preference and substituent orientation is, especially in macrocycles. We have therefore developed simple models (Appendix 1, eq S18) to predict α and β orientation angles. Figure 10a,b shows the predictions of the α and β orientation angles of carbonyl groups in a 6-membered ring at the given positions. The predicted values are in good agreement with the actual values, with low mean angular error and high squared circular correlation coefficient. The model is also valid for larger rings.
Relationship between Ring Puckering Parameters, Substituent Orientations, and Torsion Angles. As mentioned earlier, measuring torsion angles is an alternative way to quantify ring puckering and is often used in conformational analysis of small rings; de Leeuw et al. 33 discussed the connection between ring puckering coordinates and torsion angles for small rings. Here, we proposed a general model defined in Appendix 1, eq S19, to convert puckering Journal of Chemical Information and Modeling pubs.acs.org/jcim Article parameters to endocyclic torsion angles for N-membered rings. Figure 10c shows the predictions of all endocyclic torsion angles of 6-membered rings. All position submodels show good agreement with the actual ring torsion angles, with high squared circular correlation coefficient values, R circ 2 > 0.9. This model is also valid for larger rings. The improved understanding of how the rings switch conformations and pseudorotate enables the use of metadynamics simulation with appropriate coordinates to effectively sample the conformational space of macrocycles and cyclic peptides. 53 Equation S20 in Appendix 1 defines the relationship between the change in substituent exocyclic torsion angles (s i , i, i + 1, i + 2) with respect to the neighboring endocyclic torsion angles. (i − 1, i, i + 1, i + 2), where s i is the substituent atom (say, a carbonyl oxygen) attached to its ring atom i (say, a carbonyl carbon), and i + k (k = −1, 1, 2) are the ring atom positions. Figure 10d shows the excellent fit between the predicted and actual exocyclic torsion angles of carbonyl groups at different positions, regardless of the ring size. Our model gives a high squared circular correlation coefficient, R circ 2 = 0.997, and a small mean angular error, 0.04 radian (≈2.3°). Similar performance can be achieved for other substituents. These models allow us to assign substituent positions efficiently once the ring conformation is defined. We should note that the exocyclic bond angles will also change upon puckering, but their relationship with ring puckering parameters is not discussed here.
Puckering and Substituent Orientation Preferences in Solid State. We have so far presented the gas-phase puckering preference, using in vacuo GFN2 energy evaluations. To gain insights into the puckering preference in solid state, we compared our results with previous empirical studies on crystal structures from the Cambridge Structural Database 54−56 and the 63814 experimentally determined X-ray crystal structures from COD. These empirical studies focused on medium-sized rings, in particular, 7-and 8-membered rings, and showed similar puckering preferences and pseudo-rotations of the dominant canonical conformations. However, the actual Journal of Chemical Information and Modeling pubs.acs.org/jcim Article distributions slightly differ from our work due to the small number of crystal structures used in their studies. Similarly, Figure S19 in Appendix 3 shows that the puckering preferences for small-and medium-sized rings are similar in both gas phase and solid state. The coupling between the substituent orientation angle and ring puckering is almost identical to that in GFN2-computed low-energy structures. Our proposed models can therefore be applied directly to the solid-state conformation. For larger rings, we are not yet able to make any conclusions due to limited numbers of observations. These results suggest that cyclic small molecules generally adopt lowstrain conformations in both the solid state and gas phase. 57 Likewise, the intramolecular and intermolecular interactions can be aligned by pseudo-rotation or conformational switching (from one canonical form to another canonical form) in the solid state.
Ring Reconstruction. We selected 20 simple ring systems, including monocyclic rings with and without substituents and endocyclic double bonds, to assess the performance of our proposed sampling method based on the puckering preference. Note that these molecules do not contain any acyclic rotatable bonds. The lowest-energy conformations from CREST, calculated in vacuo using the GFN2 energy function, were used as reference conformations. The lowest RMSD value and the TFD value of the corresponding conformation were reported. Figure 11 shows two examples, cycloheptane and 4,4dimethylhexanone. Both have generated conformations (without energy minimization) that are very similar to their corresponding reference conformations, with low RMSD values (0.12 Å and 0.16 Å, respectively) and TFD values (0.06 and 0.05, respectively). In general, our proposed method gives low average TFD values (0.05) and an average RMSD value of 0.09 Å on the selected cyclic molecules. This demonstrates the effectiveness of our proposed method. Note that the large RMSD values are ascribed to the deviation in bond lengths and bond angles. Local geometry optimization with bond lengths and bond angles will help generate a better conformation with lower RMSD values.

■ CONCLUSIONS
We have investigated the ring puckering motions of over 140 000 flexible cyclic small molecules and cyclic peptides (CPs) using Cremer−Pople puckering parameters. By standardizing the atom ordering of the ring atoms, we have been able to elucidate the coupled motions and torsional preferences for N-membered ring molecules from GFN2-computed lowenergy structures. The representation can be easily extended to describe substituent geometries unambiguously, thus enabling us to study the coupled motion of ring substituents upon puckering.
We have shown that the presence of endocyclic double bonds and shared bonds with aromatic rings constrains the ring system and results in a corresponding change in ring puckering. In addition, the pseudo-rotations are generally restricted. The pseudo-rotation is only "free" in some conformational clusters, e.g., the chair conformation in flexible 6-membered rings without any double bonds. The substituent orientation angles, α and β, depend on the substituent types and ring sizes and can be predicted accurately from the puckering parameters.
More importantly, we studied the relationship between Cremer−Pople puckering parameters and torsion angles, which facilitated the analysis of the change in endocyclic torsion angles upon pseudo-rotation and other puckering. We have also examined the relationship between endocyclic and exocyclic torsion angles. A knowledge-based ring conformer sampling method based on the puckering preference was proposed, and kernel density estimation (KDE) was used to estimate the puckering preferences. We demonstrated its effectiveness in sampling low-energy small-and medium-sized ring conformations. To progress to larger ring systems, more structural data is necessary for the KDE estimation. Future work should focus on increasing sampling with additional accurate quantum mechanics (QM) energy calculations 57 and developing better density estimation techniques to capture the correlated puckering preferences in large rings. The resulting puckering preferences derived from conformations with QM energies can then be utilized to sample low-energy macrocycle conformations efficiently. Furthermore, our proposed sampling framework can be integrated into other knowledge-based conformer sampling tools, such as Confab 59 and OMEGA, 29 to enhance their sampling performance. We intend to benchmark the performance of our sampling approach with other sampling methods in the future.
We believe that our proposed models and sampling framework are general and readily extensible to larger and more complex ring systems. Further understanding of the conformational preference of cyclic molecules will help accelerate the sampling of low-energy conformers for a wide range of computational modeling applications.
Data and code can be found in GitHub https://github. com/lucianlschan/RING;Cremer−Pople puckering parameters; ring ordering; ring substituent orientation; unique ring families; reconstructing Cartesian coordinates; connection between ring puckering, substituent orientation, and torsion angles; performance metrics; circular correlation; circular distance and variation; distribution of properties; ring conformational preferences; side-chain torsion angles; and Puckering preference in the solid state and gas phase (PDF)