ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img
RETURN TO ISSUEPREVStructure PredictionNEXT

Training Neural Network Models Using Molecular Dynamics Simulation Results to Efficiently Predict Cyclic Hexapeptide Structural Ensembles

Cite this: J. Chem. Theory Comput. 2023, 19, 14, 4757–4769
Publication Date (Web):May 26, 2023
https://doi.org/10.1021/acs.jctc.3c00154

Copyright © 2023 The Authors. Published by American Chemical Society. This publication is licensed under

CC-BY-NC-ND 4.0.
  • Open Access

Article Views

2612

Altmetric

-

Citations

-
LEARN ABOUT THESE METRICS
PDF (5 MB)
Supporting Info (1)»

Abstract

Cyclic peptides have emerged as a promising class of therapeutics. However, their de novo design remains challenging, and many cyclic peptide drugs are simply natural products or their derivatives. Most cyclic peptides, including the current cyclic peptide drugs, adopt multiple conformations in water. The ability to characterize cyclic peptide structural ensembles would greatly aid their rational design. In a previous pioneering study, our group demonstrated that using molecular dynamics results to train machine learning models can efficiently predict structural ensembles of cyclic pentapeptides. Using this method, which was termed StrEAMM (Structural Ensembles Achieved by Molecular Dynamics and Machine Learning), linear regression models were able to predict the structural ensembles for an independent test set with R2 = 0.94 between the predicted populations for specific structures and the observed populations in molecular dynamics simulations for cyclic pentapeptides. An underlying assumption in these StrEAMM models is that cyclic peptide structural preferences are predominantly influenced by neighboring interactions, namely, interactions between (1,2) and (1,3) residues. Here we demonstrate that for larger cyclic peptides such as cyclic hexapeptides, linear regression models including only (1,2) and (1,3) interactions fail to produce satisfactory predictions (R2 = 0.47); further inclusion of (1,4) interactions leads to moderate improvements (R2 = 0.75). We show that when using convolutional neural networks and graph neural networks to incorporate complex nonlinear interaction patterns, we can achieve R2 = 0.97 and R2 = 0.91 for cyclic pentapeptides and hexapeptides, respectively.

This publication is licensed under

CC-BY-NC-ND 4.0.
  • cc licence
  • by licence
  • nc licence
  • nd licence

SPECIAL ISSUE

This article is part of the Machine Learning for Molecular Simulation special issue.

1. Introduction

ARTICLE SECTIONS
Jump To

Protein–protein interactions regulate many key biological processes, and the ability to target specific protein–protein interactions is of great therapeutic interest. While biologics can target large protein surfaces (∼1500 to 3000 Å2) involved in protein–protein interactions, these molecules typically exhibit poor cell membrane permeability and thus are unable to reach intracellular targets. (1,2) Conversely, small-molecule drugs can better cross cell membranes and engage intracellular targets. However, small molecules are often too small to target such large, flat protein surfaces. Cyclic peptides are candidate molecules with unique properties that can address the limitations of small molecules and biologics. Ranging from 500 to 1500 Da, cyclic peptides have the potential to target protein–protein interactions while still possessing good membrane permeabilty. (1,3,4) Thus, cyclic peptides have emerged as a promising drug modality, and within the past 20 years, 18 cyclic peptide drugs have been approved by the FDA. (5,6) Unfortunately, many of these recently approved cyclic peptide drugs are natural products or their derivatives, and the de novo design of cyclic peptide therapeutics remains difficult. The ability to efficiently characterize or predict cyclic peptide structures could greatly enhance cyclic peptide development but is currently hard to achieve. For example, characterizing cyclic peptide structures can be difficult using analytical methods like solution NMR spectroscopy because cyclic peptides often adopt multiple conformations in solution. (7−9) The use of computational methods like molecular dynamics (MD) simulations, density functional theory (DFT) calculations, and the NMR Analysis of Molecular Flexibility in Solution (NAMFIS) algorithm (10) can help interpret solution NMR spectroscopy results, i.e., these methods can generate an ensemble of conformers that can be compared to experimental observables from NMR spectroscopy such as chemical shifts. (7,11) Aside from using MD simulation as a means for NMR spectroscopy interpretation, the use of MD simulation itself has proved useful in characterizing cyclic peptide structural ensembles. (12−17) Because cyclic peptides can have many solvent-exposed backbone hydrogen-bond donors and acceptors available to interact with polar solvent molecules like water, these interactions must be accounted for. Thus, the inclusion of explicit solvent in order to accurately elucidate cyclic peptide structures is necessary. (18−20) Unfortunately, MD simulations using explicit solvent are computationally expensive, making them inefficient and unfeasible for large-scale screening.
Recently, machine learning (ML) models such as AlphaFold2 and RoseTTAFold have shown exciting successes in predicting structures of folded proteins. (21−24) In theory, these models can be used to predict the structures of cyclic peptides, especially when the cyclic peptides have a clear fold and adopt structural motifs observed in folded proteins. (25) However, it should be noted that cyclic peptides typically contain between 5 and 15 residues, (5,26) and thus they are generally unable to adopt regular secondary structures and most of the time lack a highly populated “fold”. Currently, it is difficult to use these ML models to predict structural ensembles adopted by non-well-behaved peptides in aqueous solution. Furthermore, these state-of-the-art ML models for protein structure prediction are trained on the extensive structural and evolutionary data available for proteins. In contrast, there are very thin structural and evolutionary data available for cyclic peptides because of the scarcity of NOE signals, the general lack of a highly populated structure, and the fact that cyclic peptides are often synthesized through nonribosomal pathways, (27−30) making it potentially challenging to improve the models using a similar strategy. Lastly, since these models are trained using structural and evolutionary data for proteins, they are unable to make predictions for peptides composed of unnatural amino acids.
Recently, our group developed a method that trains ML models for cyclic peptide structure prediction using MD simulation results and called this method StrEAMM (Structural Ensembles Achieved by Molecular Dynamics and Machine Learning). (31) The method was applied to a simple cyclic pentapeptide system with a 15 amino acid library as a proof-of-concept study. In brief, our group performed MD simulations of 705 cyclic pentapeptides and obtained the structural ensembles, i.e., a collection of different structures and their associated populations, to form the training dataset. Then we trained linear regression models that could predict the structural ensembles for new cyclic pentapeptide sequences in an independent test dataset. While the initial simulations to generate the training dataset were time-consuming, the resulting trained model could make a structural ensemble prediction in less than 1 s per sequence, and the trained model could provide simulation-quality structural ensemble predictions for all the sequences in their sequence space (which was ∼150,000 for cyclic pentapeptides consisting of a 15 amino acid library).
In this work, we aim to assess whether the StrEAMM methodology can be extended to larger cyclic peptides, as many cyclic peptide drugs are larger than just five residues. The original StrEAMM linear regression models for cyclic pentapeptides predicted the natural logarithm of the population of sequence cyclo-(X1X2X3X4X5) adopting the structure S1S2S3S4S5 in the ensemble (Figure 1). Here the structure of a cyclic pentapeptide was described using five “structural digits”, which were assigned based on each residue’s backbone (ϕ, ψ) dihedrals (Figure 2A,B). The linear regression models used weights to represent the contributions of (1,2) and (1,3) neighboring interactions to the natural logarithm of the population of a specific structure. While including (1,2) and (1,3) neighboring interactions was found to be sufficient to describe the structural preferences of cyclic pentapeptides, longer-range interactions (such as (1,4) neighbors) and other complex, multibody interactions may be required to accurately describe the structural preferences of larger cyclic peptides.

Figure 1

Figure 1. StrEAMM linear regression models are constructed using contributions from neighboring interactions. For example, the natural logarithm of the population of cyclic pentapeptide cyclo-(X1X2X3X4X5) adopting a specific structure S1S2S3S4S5 is the sum of the interaction weights for each (1,2) and (1,3) neighbor present. Xi is an amino acid; Si is a structural digit that represents a region of (ϕ, ψ) space (see Figure 2); wSiSi+1XiXi+1 is the weight for residues XiXi+1 adopting structure SiSi+1; wSiSi+1Si+2Xi_Xi+2 is the weight for residues Xi and Xi+2 adopting substructure SiSi+1Si+2; and wQX1X2X3X4X5 is related to the partition function, Q, for sequence X1X2X3X4X5 and ensures that all the structures’ populations sum to 1 for a given sequence. This equation was first proposed by Miao et al. (31)

Figure 2

Figure 2. Structural binning maps divide the (ϕ, ψ) space of cyclic pentapeptides into 10 regions and the (ϕ, ψ) space of cyclic hexapeptides into six regions. (A) The (ϕ, ψ) population density of cyclo-(GGGGG) and (B) the resulting binning map when all the grid points are assigned to their closest centroid to form the final binning map. (C) The (ϕ, ψ) population density of cyclo-(GGGGGG) and (D) the resulting binning map using the same binning protocol.

Here we extend the StrEAMM methodology to predict the structural ensembles of cyclic hexapeptides. We first show that the current StrEAMM linear regression models did not perform well in predicting cyclic hexapeptide structural ensembles when trained on cyclic hexapeptide simulation results and using (1,2) and (1,3) neighboring interaction weights. We show that upon adding additional two-body interactions, the (1,4) neighboring interactions, the model achieved moderate performance. We then propose using the StrEAMM methodology with other ML models such as convolutional neural networks (CNNs) and graph neural networks (GNNs). These neural network models can easily embed nonlinear relationships and have found success in biologically relevant applications such as secondary structure prediction, protein fold family predictions, and binding affinity predictions. (32−36) Lastly, we show that when using the StrEAMM neural network models, we can successfully and efficiently predict the structural ensembles for both cyclic pentapeptides and cyclic hexapeptides.

2. Methods

ARTICLE SECTIONS
Jump To

2.1. Datasets

We prepared our training and test datasets for our ML models by generating diverse cyclic peptide sequences, performing MD simulations of these sequences, and analyzing their MD results. For cyclic pentapeptides, to enable comparison with our previously published StrEAMM linear regression models, we used the same 705 cyclic pentapeptide training dataset from that study (31) to train our StrEAMM neural network models. These cyclic pentapeptide sequences were generated from a library of 15 amino acids including G, A, V, F, N, S, D, R, a, v, f, n, s, d, and r, where the lowercase letters represent the d-amino acids. These 15 amino acids were selected to be a diverse and representative subset of the canonical amino acids and their d forms: G is achiral and the smallest amino acid; A is the simplest chiral amino acid, with only a methyl group side chain; V has a β branch in its side chain; F has an aromatic ring in its side chain; N has an amide group in its side chain; S has a hydroxyl group in its side chain; D has a negatively charged side chain; and R has a positively charged side chain. Their d forms are also considered, as including d-amino acids in cyclic peptide design can expand the accessible structural space beyond what is achievable using only l-amino acids. Additionally, including d-amino acids in the peptide sequence can improve the peptide’s stability against enzymatic degradation. (38,39) The cyclic pentapeptide training sequences were generated in a semirandom protocol which efficiently produced diverse sequences that allowed us to observe all the possible XiXi+1Xi+2 subsequences (where Xi is one of the 15 amino acids) at least once. This protocol resulted in 705 cyclic pentapeptide training sequences. Fifty random cyclic pentapeptide sequences were used as the test set to evaluate the performance of the models. For cyclic hexapeptides, using the same 15 amino acid library and following a similar sequence generation protocol, we generated and performed MD simulations of 581 cyclic hexapeptide training sequences (List S1). For the test dataset, we followed the same procedure to obtain 50 random cyclic hexapeptide sequences (List S2).

2.2. Molecular Dynamics Simulations

To characterize the structural ensembles of the cyclic hexapeptides, we performed two parallel bias-exchange metadynamics (BE-META) simulations starting with two different initial structures for each cyclic hexapeptide. (40−42) BE-META, and more generally metadynamics, is an enhanced sampling technique that biases collective variables which ideally describe the slow degrees of freedom. During a metadynamics simulation, Gaussian potentials are deposited along each collective variable to push the conformation away from spots along the collective variable. (41) When there are multiple collective variables of interest, BE-META can be used, where multiple replicas of the simulation, each biased along one collective variable, are all simulated in parallel; periodically during the simulations, structures of different replicas can exchange with each other based on the Metropolis criterion, effectively enhancing sampling in all collective variables. (43)
In principle, one set of BE-META simulations should be able to sample the conformational space of a cyclic peptide when the simulations have been run long enough. To monitor and verify simulation convergence, for each cyclic peptide sequence, we performed two sets of BE-META simulations, starting from two different initial structures. Each set of BE-META simulations explored the conformational space independently, and the conformational sampling was considered sufficient when the two sets of BE-META simulations provided similar structural ensembles for the same cyclic peptide. For each cyclic hexapeptide in our training and test datasets, we prepared two initial structures (referred to as s1 and s2) using the UCSF Chimera package. (44) To ensure that the two initial structures were sufficiently different, we required that the backbone root-mean-square deviation (RMSD) between the two structures must be larger than 1.5 Å. For head-to-tail cyclized hexapeptides with all-trans peptide bonds, a backbone RMSD of 1.5 Å is fairly large, and it is likely geometrically impossible for two conformations to have a backbone RMSD greater than 2.0 Å. Therefore, we considered the conformations of two cyclic hexapeptides significantly different when they had a backbone RMSD greater than 1.5 Å.
All the MD simulations were performed with GROMACS 2018.6 (45) using the residue-specific force field 2 (RSFF2) (46) and TIP3P water. (47) We chose the RSFF2 force field because it was reported to have the best performance in recapitulating the crystal structures of 20 cyclic peptides (from length 5 to 12 amino acids) among RSFF1, (48) RSFF2, (46) Amber99sb-ildn, (49) and OPLS-AA/L. (50,51) Each initial structure was energy-minimized in vacuum using the steepest descent algorithm. Then, each structure was solvated in a box of water, with a minimum distance of 1.0 nm between the cyclic peptide and the walls of the box. If the total charge of the system was nonzero, counterions were added to neutralize the system. The solvated system was then energy-minimized using the steepest descent algorithm. Following energy minimization, two stages of equilibration were performed on the solvated system. First, the solvent molecules were allowed to equilibrate while the heavy atoms of the cyclic peptide were restrained using a harmonic potential with a force constant of 1000 kJ·mol–1·nm–2. This first stage of equilibration began with a 50 ps NVT simulation at 300 K followed by a 50 ps NPT simulation at 300 K and 1 bar. In the second equilibration stage, the restraints on the heavy atoms were removed to allow the whole system to equilibrate. This second stage of equilibration began with a 100 ps NVT simulation at 300 K followed by a 100 ps NPT simulation at 300 K and 1 bar. The production simulations were at least 200 ns NPT simulations at 300 K and 1 bar. The leapfrog algorithm with a time step of 2 fs was used to integrate the equations of motion. Bonds involving hydrogen were constrained with the LINCS algorithm during production. Nonbonded interactions (electrostatic and Lennard-Jones interactions) were truncated at 1.0 nm. For long-range electrostatic interactions, the particle mesh Ewald method (52) with a Fourier grid spacing of 0.12 nm and an interpolation order of 4 was applied. To account for the truncation of Lennard-Jones interactions, a long-range dispersion correction for energy and pressure was applied. An additional improper dihedral related to the H, N, C, and O atoms for each peptide bond was also included to maintain the trans amide configuration, and data used in our analysis contained only trans peptide bonds.
We performed two parallel BE-META simulations of the two initial structures for each cyclic hexapeptide using GROMACS 2018.6 (45) patched by the PLUMED 2.5.1 plugin. (53) For each BE-META simulation, there were six replicas biasing the 2D collective variables (ϕi, ψi) and six replicas biasing the 2D collective variables (ψi, ϕi+1), as it was previously observed that conformational changes of cyclic peptides involve coupled changes of these dihedral pairs. (42) In addition to the 12 biased replicas, five neutral replicas were also included to obtain the unbiased structural ensemble that was used for analysis. The trajectories were analyzed using dihedral principal component analysis. (54) We calculated the 3D density distributions of s1 and s2 simulations projected in the top three principal components, and we monitored simulation convergence by calculating the normalized integrated products (NIPs) between the s1 and s2 density distributions, where a NIP value of 1.0 represents perfect similarity. (55) Most BE-META simulations of our cyclic hexapeptides were performed for 200 ns, and when the last 100 ns of the neutral replicas was used for structural analysis, the density distributions between the s1 and s2 simulations had a NIP of ≥0.9. In these cases, trajectories of the last 100 ns of the neutral replicas of both s1 and s2 simulations were combined for each cyclic peptide and used for further structural analysis. However, some simulations did not reach a NIP of 0.9 when using the last 100 ns of 200 ns BE-META simulations. We extended these simulations to 300 ns and used the last 200 ns of the neutral replicas to calculate the NIP. To ensure that we only included high-quality data, sequences that still did not converge in 300 ns were discarded from the dataset. The final cyclic hexapeptide datasets included 555 (out of 581) sequences for training and 49 (out of 50) for testing (the sequences that did not reach convergence are colored gray in Lists S1 and S2). Files describing how to run BE-META simulations of cyclic peptides and scripts to perform dihedral principal component analysis can be found on GitHub (https://github.com/ysl-lab/CP_tutorial).

2.3. Structural Analysis

We described cyclic peptide conformations using the backbone dihedral angles (ϕ, ψ) for each residue, which are typically represented in the continuous range between −180° and 180°. In order to further simplify the structure definitions, the (ϕ, ψ) space can be separated into discrete regions, and each region can be assigned a “structural digit” (denoted by Greek letters here). (31,37,56) For example, in our previous work on cyclic pentapeptides, we used the (ϕ, ψ) distribution of cyclo-(GGGGG) to guide the discretization of the (ϕ, ψ) space for cyclic pentapeptides (Figure 2A,B). (31) First, the (ϕ, ψ) distribution of cyclo-(GGGGG) was binned into a 100 × 100 grid, and the probability density of each grid point was calculated. Then all of the grid points with a probability density larger than 0.00001 were clustered using a grid-based and density-peak-based method, resulting in 10 clusters. (57) For each cluster, the centroid was defined by the grid point with the smallest average of distances to the remaining grid points of the cluster, where the distances were also weighted by the probability density of each remaining grid point. Then, all the grid points were assigned to their closest centroid to form the final binning map (Figure 2B). Each of the 10 regions was assigned a “structural digit” (Greek letter Λ, λ, Γ, γ, B, β, Π, π, Ζ, or ζ). In the end, a cyclic pentapeptide structure could be described using a string of five structural digits based on the (ϕ, ψ) of each residue. For example, if the sequence sasFr (lowercase letters denote d-amino acids) had a structure represented using the five-digit structural code ζλβΠλ, it meant that ser1 had dihedral angles within the region assigned to the structural digit “ζ” in Figure 2B, ala2 had dihedral angles within the “λ” region, ser3 had dihedral angles within the “β” region, Phe4 had dihedral angles within the “Π” region, and arg5 had dihedral angles within the “λ” region. In this work, we performed the same protocol using cyclo-(GGGGGG) to obtain the binning map for cyclic hexapeptides (Figure 2C,D). These maps (Figures 2B and 2D) are herein called “Map 1” maps.
Given the significance of this structural analysis and discretization method in defining the structural ensembles used to train our models, here we also explored two variations to this protocol. In one variation, instead of performing our clustering protocol on only grid points with a probability density larger than 0.00001, we performed cluster analysis on all 100 × 100 grid points. The inclusion of all grid points in the clustering protocol eliminated the need to calculate the distances between the grid points to all the cluster centroids and assign grid points to their nearest centroid. This modification changed the shape of the clusters for both the cyclic pentapeptide and cyclic hexapeptide binning maps (Figure S1A, bottom). The resulting binning maps from this modified protocol are herein called “Map 2” maps.
Previously, it was assumed that cyclic polyglycines could provide a representative binning map for both d- and l-amino acids because glycine is achiral and the most flexible amino acid. In the second variation to the protocol, instead of clustering the (ϕ, ψ) distributions of cyclic polyglycines (Figure 2A,C and Figure S1A, top), we clustered the (ϕ, ψ) distributions of the cyclic peptides in our training datasets (Figure S1B, top). The (ϕ, ψ) distributions of the cyclic pentapeptide training dataset and the cyclic hexapeptide training dataset were clustered using all 100 × 100 grid points. The resulting binning maps are herein called “Map 3” maps (Figure S1B, bottom). A MATLAB script to perform grid-based and density-based clustering is available on GitHub (https://github.com/ysl-lab/CP_tutorial).

2.4. Linear Regression Models

In our group’s previous work on cyclic pentapeptides, the StrEAMM linear regression models were built to predict the (natural logarithm of) population for a sequence cyclo-(X1X2X3X4X5) adopting the structure S1S2S3S4S5. (31) We first hypothesized that neighboring interactions contribute to cyclic peptide structural preferences and that these interactions were additive like energy terms. This assumption, along with the exponential relationship between energies and population in the Boltzmann distribution (pi=eEi/kBTQ, where Q=ieEi/kBT) motivated our proposed exponential relationship between the sum of the neighboring interactions and the predicted population. We explicitly included (1,2) and (1,3) interactions, and they were represented by different “weight” terms (w). Thus, the population of a cyclic pentapeptide sequence cyclo-(X1X2X3X4X5) adopting the structure S1S2S3S4S5 is
pS1S2S3S4S5X1X2X3X4X5=exp(wS1S2X1X2+wS2S3X2X3+wS3S4X3X4+wS4S5X4X5+wS5S1X5X1+wS1S2S3X1_X3+wS2S3S4X2_X4+wS3S4S5X3_X5+wS4S5S1X4_X1+wS5S1S2X5_X2)/Q
in which the partition function, Q, is the sum of the exponential terms for all the structures in the ensemble for cyclo-(X1X2X3X4X5) and was used to calculate the exact populations for the structures.
To convert this exponential relationship to a linear relationship that could ultimately be used for our linear regression model, we applied a natural logarithm operation. This operation resulted in a linear equation between ln pS1S2S3S4S5X1X2X3X4X5 and the specific (1,2) and (1,3) weights and a weight that represented Q (Figure 1). To train the linear regression model, we first formulated the linear equation describing ln pS1S2S3S4S5X1X2X3X4X5 using the 11 corresponding weights, for every sequence and structure pair (X1X2X3X4X5 adopting a structure S1S2S3S4S5) in the training dataset. The set of linear equations for describing all the ln pS1S2S3S4S5X1X2X3X4X5 in the training dataset was represented as a matrix Aw. The coefficient matrix A contains the coefficients for the weights w and was used to select the needed weights for predicting ln pS1S2S3S4S5X1X2X3X4X5. For example, the linear equation for ln pζλβΠλsasFr is
lnpζλβΠλsasFr=(1×wζλsa)+(1×wλβas)+(1×wβΠsF)+(1×wλΠFr)+(1×wζλrs)+(1×wζλβs_s)+(1×wλβΠa_F)+(1×wβΠλs_r)+(1×wΠλζF_s)+(1×wλζλr_a)+(1×wQsasFr)
In the row of the coefficient matrix A that corresponds to this specific sequence–structure population, the coefficients for these 11 weights are 1, while all of the other coefficients are 0. The linear regression model updated the (1,2) and (1,3) weights to minimize the difference between the predicted ln pS1S2S3S4S5X1X2X3X4X5 and the actual ln pS1S2S3S4S5X1X2X3X4X5 observed in the MD simulations. We used least-squares fitting to minimize our weighted loss function L(w):
L(w)=i=1Npi|j=1MAijwjlnpi|2
where M is the number of weights, N is the number of sequence–structure pairs, and pi is the population observed in the MD simulation. The loss function was weighted by the population such that there was a larger contribution to the calculated loss from highly populated structures.

2.5. Neural Network Models

2.5.1. Amino Acid Feature Representation and Data Augmentation for Neural Networks

To represent a cyclic peptide sequence, we first encoded each amino acid in our library using circular topological molecular fingerprints (i.e., Morgan fingerprints) (58) with RDKit version 2021.03.05. (59) A molecular fingerprint breaks down molecules into a set of substructures, and then those resulting substructures can be represented as an N-bit vector, where the 1s and 0s correspond to the presence and absence of each substructure, respectively. We generated fingerprints using a radius of 3 (i.e., atoms up to three bonds away are considered in the substructure search), considering chirality, and using a fingerprint length of N = 2048 bits.
We also performed cyclic permutations of our datasets to give our models examples of the cyclic nature of our peptides. For example, if the cyclic peptide sequence sasFr had an observed structure ζλβΠλ in MD with 69.35% population, then we also supplied the model with the following additional four examples whose sequences and structural digits were cyclically permuted: sequence asFrs with the structure λβΠλζ with 69.35% population, sequence sFrsa with the structure βΠλζλ with 69.35% population, sequence Frsas with the structure Πλζλβ with 69.35% population, and sequence rsasF with the structure λζλβΠ with 69.35% population.
Additionally, we leveraged the enantiomeric nature of amino acids and the centrosymmetry of the structural digit binning maps (Figures 2B and 2D) to add the enantiomers of the cyclic peptides that we simulated to the datasets. When swapping an l-amino acid for a d-amino acid or vice versa, its corresponding structural digits would be capitalized if previously lowercase and lowercase if previously capitalized. Here, when we report performance of the neural network models on the training and test datasets, we take the model’s structural ensemble prediction for a sequence to be the average of the predictions for all of the cyclic permutations and the enantiomers of the sequence.

2.5.2. StrEAMM Convolutional Neural Networks

For our CNNs, we aimed to design architectures that would perform convolutions on combinations of (1,2), (1,3), and (1,4) neighboring residues, analogous to the StrEAMM linear regression models. To achieve so, we represented one sequence with N residues as a matrix with N columns and 2048 rows, where 2048 is the dimension of the fingerprint encoding (Figure 3A). Then, in the case of a (1,2) architecture, we generated a similar 2048 × N matrix representation for the original sequence permutated by one position. We concatenated these two matrices to create a 4096 × N matrix, such that the fingerprint encodings representing the residues in a (1,2) neighboring pair are in the same column (Figure 3B). Other architectures were created by modifying the permutation step (i.e., for a (1,3) architecture, the 2048 × N matrix representation for the original sequence and the 2048 × N matrix representation for the original sequence permutated by two positions were concatenated).

Figure 3

Figure 3. Convolutional neural network and graph neural network architectures. (A) Cartoon example of the cyclic pentapeptide sequence cyclo-(sasFr) represented using a matrix with N = 5 columns and 2048 rows, where the 2048 bits come from the fingerprint encoding of each amino acid. (B) The representations of cyclo-(sasFr) and cyclo-(asFrs) are concatenated such that the (1,2) neighboring residues are spatially close together, enabling the 1D convolutional filter (blue box representing a (4096 × 1) vector of learnable model parameters) to encompass all the fingerprint features that define residues s and a. (C) The top graph represents a cyclic pentapeptide, where the (1,2) edge types are denoted with blue arrows and the (1,3) edge types are denoted with green arrows. The bottom graph represents a cyclic hexapeptide, where in addition to the (1,2) and (1,3) edge types we also can include (1,4) edge types, as denoted with purple arrows. Note: The purple (1,4) edges appear to be double-arrowed, but there are really two unique single-arrowed edges. For example, the forward edge in dark purple that starts at ser1 would be directed to Phe4, and the forward edge in dark purple that starts at Phe4 would be directed to ser1.

The input to the CNN model was this resulting feature matrix representing the cyclic peptide, and the output of the model was a structural ensemble prediction in the form of an array of structure populations. Only structures whose populations were larger than 0.1% in at least one of the cyclic peptides in the training dataset were considered, along with their permutations and enantiomers. The feature matrix of the cyclic peptide was input to the convolutional layer of the CNN model. Here the convolutions were dot product calculations performed between each column of the matrix and the 1D convolutional filter, which is a vector containing 4096 parameters that are updated during model training. After the convolutions were performed, the resulting learned representation of the cyclic peptide sequence was fully connected to the hidden layer of a multilayer perceptron (MLP), followed by the output layer (which has a number of nodes equal to the number of structures in the structural ensemble). A rectified linear unit (ReLU) activation function (60) was applied to the node representations to apply nonlinearity during the forward pass through the MLP. The SoftMax activation function was applied to the output layer to ensure that the output structural ensemble was normalized. All models were trained using the Adam optimizer (61) and summation of the squared errors loss function (L = ∑i = 1N(pi,learnedpi)2, where N is the number of populations in the training dataset, pi,learned is the population learned by the network, and pi is the actual population observed in MD simulations). The number of filters, the number of nodes in the hidden layer, and the learning rate were tuned using a grid search with three-fold cross-validation for up to 2500 epochs and saving the model results every 100 epochs (Figures S2 and S3). Neural network training was performed using PyTorch 1.9.0 (62) and PyTorch Geometric 1.7.2 (63) software. The models that had the best-performing hyperparameters were selected based on comparing the average (across the three folds) mean square error (MSE) loss on the validation datasets. The optimal number of epochs was selected using an overfitting criterion based on percentage change and generalized error calculations using the average MSE loss at every 100 epochs (Figure S4). (64)

2.5.3. StrEAMM Graph Neural Networks (GNNs)

We represented a cyclic pentapeptide or a cyclic hexapeptide as a graph with one node for each amino acid in the sequence, and the initial node representation was set using an amino acid’s 2048-bit molecular fingerprint. In a cyclic pentapeptide graph representation, the nodes can be connected by up to four types of directed edges, where two types of edges (forward and backward with respect to peptide sequence) connect (1,2) neighbor nodes and two types of edges connect (1,3) neighbor nodes (Figure 3C). We included two directional edges to prevent a sequence and its retroisomer (reverse ordering sequence, i.e., sasFr and rFsas) from being encoded as identical graphs. Thus, a cyclic pentapeptide was represented by a graph with five nodes and up to 20 edges. We refer to the models built with the inclusion of different sets of these edges as the StrEAMM GNN (1,2), StrEAMM GNN (1,3), and StrEAMM GNN (1,2)+(1,3) models depending on whether the models connect (1,2) neighbor nodes only, (1,3) neighbor nodes only, or both (1,2) and (1,3) neighbor nodes. The cyclic hexapeptide graph representation was constructed in similar fashion, but we also included the option for (1,4) edge types, such that the resulting graph had six nodes and up to 36 directed edges (Figure 3C).
The aim of the GNN models is to input a cyclic peptide graph and output a structural ensemble prediction in the same format as the CNN models. Starting with the input graph, the network underwent one message passing operation using the RGCNConv operator through PyTorch Geometric. (63,65) This operator updated each node’s representation by summing the node’s transformed initial representation and the transformed representations of any nodes it was connected to by an edge. Each different edge type had its own weights because of their unique learned transformations. The ReLU activation function (60) was applied to the node representations, and the resulting node representations were concatenated. This concatenated vector was a learned representation of the cyclic peptide sequence that was then passed through an MLP architecture with a single hidden layer. The ReLU activation function (60) was also applied at the hidden layer, and a SoftMax activation function was applied to the output layer in a similar fashion to the CNN models. The hyperparameters, such as the number of nodes in the hidden layer and the learning rate, were tuned using a grid search with three-fold cross-validation in the same approach as the CNNs (Figures S5 and S6). The GNN models were trained with the same sum of squared errors loss function and the Adam optimizer, (61) and the PyTorch 1.9.0 (62) and PyTorch Geometric 1.7.2 (63) software packages were used for the GNN training.

3. Results and Discussion

ARTICLE SECTIONS
Jump To

3.1. StrEAMM Linear Regression (1,2)+(1,3) Models Cannot Predict the Structural Ensembles of Cyclic Hexapeptides

In our previously published proof-of-concept study, the StrEAMM linear regression model incorporated weights related to (1,2) and (1,3) neighboring interactions and could efficiently and accurately predict the structural ensembles of cyclic pentapeptides. (31) Specifically, the StrEAMM linear regression (1,2)+(1,3) model had R2 = 0.94 and an average weighted error (WE) of 1.54 in percent population between the predicted and observed populations for cyclic pentapeptide sequences in the test dataset (Figure 4A). Here, the scatter plots compare the predicted and observed percent population in MD for each structure in a cyclic peptide’s structural ensemble, and all the cyclic peptides within the training or test dataset are included in the corresponding scatter plot. We then tried to build an analogous StrEAMM linear regression (1,2)+(1,3) model to predict cyclic hexapeptide structural ensembles. Unfortunately, the resulting model showed poor performance when evaluated on the test dataset, with R2 = 0.47 and WE = 5.99 (Figure 4B). We hypothesized that (1,2) and (1,3) interactions may not be sufficient to describe the structural preferences of cyclic hexapeptides. Thus, we trained a linear regression model to also explicitly incorporate (1,4) interactions, i.e., the StrEAMM linear regression (1,2)+(1,3)+(1,4) model. Although the addition of (1,4) interactions did improve the performance of the StrEAMM linear regression model, with R2 = 0.75 and WE = 3.66 (Figure 4C), the StrEAMM linear regression (1,2)+(1,3)+(1,4) model for cyclic hexapeptides could not achieve the performance we previously observed for the StrEAMM linear regression (1,2)+(1,3) model on cyclic pentapeptides. While both the cyclic pentapeptide (1,2)+(1,3) model and hexapeptide (1,2)+(1,3)+(1,4) model were able to fit the training data, unlike the pentapeptide model, the performance of the hexapeptide model was unable to generalize to the test dataset. We hypothesized that there might exist more complex, nonlinear relationships between the amino acids that influence the structural ensemble of cyclic hexapeptides beyond simple two-body interactions. To this end, we built StrEAMM neural network models that can easily incorporate complex interaction patterns but are less interpretable than the StrEAMM linear regression models.

Figure 4

Figure 4. Performance of StrEAMM linear regression models on the training dataset (top) and test dataset (bottom) for (A) cyclic pentapeptides and (B, C) cyclic hexapeptides. (A) Performance of StrEAMM linear regression model on cyclic pentapeptides using (1,2) and (1,3) interactions. (B) Performance of StrEAMM linear regression model on cyclic hexapeptides using (1,2) and (1,3) interactions. (C) Performance of StrEAMM linear regression model on cyclic hexapeptides using (1,2), (1,3), and (1,4) interactions. The black dashed line represents y = x. R2 is the coefficient of determination. WE is the weighted error, given by WE=ipi,observed|pi,observedpi,predicted|ipi,observed, where pi,observed and pi,predicted are the populations observed in MD simulation and predicted by StrEAMM, respectively. Each point on the plot represents the predicted versus the observed percent population in MD for a structure in the structural ensemble of a cyclic peptide. All the structures in the structural ensembles for all the cyclic peptides in the training or test dataset with a predicted or observed percent population in MD of >1% are plotted.

3.2. StrEAMM Neural Network Models Can Predict Cyclic Pentapeptide Structural Ensembles

We aimed to train StrEAMM CNN and GNN models and evaluate whether they could learn complex interaction patterns that may be missing in the StrEAMM linear regression models. As a first step, in section 3.2, we verified that these neural network models could accurately predict cyclic pentapeptide structural ensembles as well as, or even better than, the StrEAMM linear regression model. Then, in section 3.3, we moved on to evaluate the performance of the neural network models for cyclic hexapeptides.

3.2.1. StrEAMM CNN Models for Cyclic Pentapeptides

The StrEAMM linear regression (1,2)+(1,3) model had R2 = 0.94 and WE = 1.54 for cyclic pentapeptides (Figure 4A). The StrEAMM CNN (1,2)+(1,3) model had R2 = 0.96 and WE = 1.25, showing a similar or slightly better performance than the StrEAMM linear regression (1,2)+(1,3) model (Figure 5A; for results on the training dataset, see Figure S7A). The similar performances between the StrEAMM linear regression and StrEAMM CNN models for cyclic pentapeptides is consistent with the assumption that cyclic pentapeptides are small enough that their structural preference is determined by their (1,2) and (1,3) neighboring interactions. Therefore, the linear regression model using simply (1,2) and (1,3) weights is sufficient in summarizing the interactions that influence cyclic pentapeptide structural preferences.

Figure 5

Figure 5. Performance of (A–C) StrEAMM CNN models and (D–F) StrEAMM GNN models on the test dataset for (A, D) cyclic pentapeptides and the models incorporating (1,2) and (1,3) filters/edges, (B, E) cyclic hexapeptides and the models incorporating (1,2) and (1,3) filters/edges, and (C, F) cyclic hexapeptides and the models incorporating (1,2), (1,3), and (1,4) filters/edges. See Figure S7 for the model performances on the corresponding training datasets. All the structures in the structural ensembles for all the cyclic peptides in the training or test dataset with a predicted or observed percent population in MD of >1% are plotted.

For StrEAMM linear regression models, including both (1,2) and (1,3) interactions was critical for the performance in predicting cyclic pentapeptide structural ensembles. When only (1,2) or (1,3) interactions were included, StrEAMM linear regression models showed poorer performance, especially when only (1,2) interactions were included, with R2 = 0.63 and WE = 4.32 (Figure 6A, red bars; Figure S8A). On the other hand, StrEAMM CNN (1,2), StrEAMM CNN (1,3), and StrEAMM CNN (1,2)+(1,3) all showed very similar performances (Figure 6A, green bars; Figure S8B). In other words, including multiple “neighbor-inspired” convolutions did not affect the model performance as significantly as it did for the linear regression models. We suspected that the neural networks did not necessarily need to incorporate all of our “neighbor-inspired” convolutions because the model was still learning with information contributed by all of the residues in the sequence when the training data were propagated through MLP layers, which were present in both our CNN and GNN models. Therefore, while we designed convolutional filters and edge types to combine features for select neighboring residues, the explicit inclusion of all the important interactions may not be necessary for the CNN and GNN models.

Figure 6

Figure 6. Comparison of the StrEAMM linear regression and neural network models’ performances on the (A) cyclic pentapeptide and (B) cyclic hexapeptide test datasets. The coefficient of determination, R2, and weighted error, WE, are shown for each model (the linear regression in red with diagonal slash pattern, CNN in green with dotted pattern, and GNN in blue with vertical line pattern) including different neighboring interactions.

3.2.2. StrEAMM GNN Models for Cyclic Pentapeptides

The StrEAMM GNN (1,2)+(1,3) model had R2 = 0.97 and WE = 1.19 (Figure 5D; for results on the training dataset, see Figure S7D), showing a similar or slightly better performance than the StrEAMM linear regression (1,2)+(1,3) model. This observation was consistent with the CNN models’ performances also having been comparable to those of the linear regression models. Also similar to the trend observed for the various CNN models, the StrEAMM GNN (1,2), StrEAMM GNN (1,3), and StrEAMM GNN (1,2)+(1,3) models showed very similar performances to each other (Figure 6A, blue bars; Figure S8C). This trend is again consistent with our hypothesis that the explicit inclusion of all “neighbor-inspired” convolutions may not be necessary for the neural networks.

3.3. StrEAMM Neural Network Models Can Predict Cyclic Hexapeptide Structural Ensembles

After we observed that our StrEAMM neural network models could accurately predict the structural ensembles of cyclic pentapeptides, we turned our attention to developing neural network models to predict the structural ensembles of cyclic hexapeptides. Our StrEAMM linear regression (1,2)+(1,3) model failed at this prediction task for cyclic hexapeptides in the test dataset, with R2 = 0.47 and WE = 5.99 (Figure 4B). While the addition of the (1,4) neighboring interactions in the StrEAMM linear regression (1,2)+(1,3)+(1,4) model improved the model’s performance, the model was still not very accurate (R2 = 0.75 and WE = 3.66; Figure 4C). Here we evaluated whether the StrEAMM neural network models could be more accurate than the current StrEAMM linear regression models at predicting cyclic hexapeptide structural ensembles.

3.3.1. StrEAMM CNN Models for Cyclic Hexapeptides

The StrEAMM CNN (1,2)+(1,3) model had R2 = 0.90 and WE = 2.05 (Figure 5B), and the StrEAMM CNN (1,2)+(1,3)+(1,4) model had R2 = 0.89 and WE = 2.12 (Figure 5C), both showing much better performance than the StrEAMM linear regression models.
As was observed for the models trained on cyclic pentapeptides, the performance of the linear regression models trained on cyclic hexapeptides strongly depended on which interactions were explicitly included in the models (Figure 6B, red bars; Figure S9A), while the performance of the CNN models did not (Figure 6B, green bars; Figure S9B). Furthermore, when considering the three-fold cross-validation results of the CNN models incorporating (1,2)-only, (1,3)-only, (1,4)-only, (1,2)+(1,3), and (1,2)+(1,3)+(1,4) interactions, we observed that they all performed similarly (Figure S3).

3.3.2. StrEAMM GNN Models for Cyclic Hexapeptides

The StrEAMM GNN (1,2)+(1,3) model had R2 = 0.91 and WE = 1.95 (Figure 5E), and the StrEAMM GNN (1,2)+(1,3)+(1,4) model had R2 = 0.91 and WE = 1.96 (Figure 5F), showing an improved performance for cyclic hexapeptides compared to the StrEAMM linear regression models. Again, while the explicit inclusion of specific neighboring interactions strongly impacted the performance of the linear regression models (Figure 6B, red bars; Figure S9A), we did not observe this dependence on the performance of the GNN models (Figure 6B, blue bars; Figure S9C). Similar to the CNN model cross-validation results, the GNN models using (1,2)-only, (1,3)-only, (1,4)-only, (1,2)+(1,3), and (1,2)+(1,3)+(1,4) interactions reported similar model performance (Figure S6). This observation suggests that due to the convolutions and operations in the neural network models, explicit inclusion of all types of neighboring interactions is not necessary for good model performance.

3.4. Using Alternative Binning Maps Does Not Improve the StrEAMM Model Performance Trained on Either Cyclic Pentapeptides or Cyclic Hexapeptides

All the models we had trained and presented results for thus far had used datasets prepared from the “Map 1” structural binning maps (Figure 2). These maps were generated by analyzing the (ϕ, ψ) density distributions of cyclic polyglycines, clustering the high-density grid points, and using the resulting cluster centroids to assign all the grid points. To evaluate how sensitive the model performance was to the binning map, we developed two new maps. For one map, the (ϕ, ψ) density distributions of cyclic polyglycines were still used, but all 100 × 100 grid points were used in the clustering protocol (“Map 2” maps; Figure S1A). The other map used the (ϕ, ψ) density distributions of the training sequences instead of those of cyclic polyglycines; again, all 100 × 100 grid points were used in the clustering protocol (“Map 3” maps; Figure S1B). The results showed that the model performance was not sensitive to the choice of the maps (Figure S10). This observation was rather surprising to us, as one might assume that a map that was more fine-tuned and based on the training sequences (i.e., “Map 3” maps) would provide more accurate binning of the structures, thus constructing more accurate structural ensembles. These results likely suggest that the differences in these maps did not alter the majority of the structural assignments, thus only leading to minor differences in the model performance.

3.5. StrEAMM Neural Network Models Can Predict Structural Ensembles of Cyclic Peptide Sequences Containing Amino Acids That Were Absent in the Training Dataset Sequences

Thus far, we had found that the StrEAMM neural network models could outperform the StrEAMM linear regression models for both cyclic pentapeptide and hexapeptide datasets. We also report here an additional benefit of using our neural network models: the encoding scheme we used to describe the features of our cyclic peptide sequences allowed our models to predict the structural ensembles of sequences composed of amino acids that were not originally included in the training dataset (i.e., amino acids that the model had never “seen” during training). We had chosen to represent peptide sequences using molecular fingerprints with the intention that our trained models could also make predictions for cyclic peptide sequences containing amino acids not present in the training dataset as long as the new amino acids shared some bit information with the amino acids that were already in the training dataset. This ability is not a feature of all encoding schemes. For example, a typical one-hot encoding scheme requires that each amino acid is represented by a bit vector. This bit vector has the length of the number of amino acids in the library and contains a single 1 at a position that is unique for each amino acid in the library. Therefore, given the uniqueness of these bit vectors and the fact that the length of the bit vector depends on the size of the original amino acid library, it would be impossible to make predictions for new amino acids. In a similar fashion, the StrEAMM linear regression models could not make predictions for sequences containing “new” amino acids because there would be no corresponding weights containing the new amino acid.
With the aid of the molecular fingerprint encodings, we extended the StrEAMM neural network models and observed that they were able to predict cyclic peptide structural ensembles for cyclic peptide sequences containing amino acids that were not originally included in the training dataset (Figure 7). We simulated an additional 25 cyclic pentapeptide and 25 cyclic hexapeptide sequences that were composed of a larger amino acid library (37 amino acids, including all the canonical amino acids in their d and l forms, with the exception of proline; Lists S3 and S4). However, it is to be noted that only 21 cyclic hexapeptide sequences were used to compose the new test dataset due to failure to reach our convergence criterion of having a NIP value ≥ 0.90 after 300 ns of simulation. For the CNN models, we chose the architectures that performed best during hyperparameter tuning, which were the CNN (1,2) model for the cyclic pentapeptide dataset and the CNN (1,2)+(1,3)+(1,4) model for the cyclic hexapeptide dataset (Figures S2 and S3). When trained using the cyclic peptide datasets containing only 15 amino acids in the library, the CNN models could only reach a mediocre performance of R2 = 0.49 for both the cyclic pentapeptide and cyclic hexapeptide test datasets containing amino acids from the 37 amino acid library (Figure 7A,B). Similarly, the GNN models reported R2 = 0.46 and R2 = 0.51 for the cyclic pentapeptide and cyclic hexapeptide test datasets, respectively (Figure 7C,D). To improve the model performances, we simulated an additional 50 training sequences, which we termed “booster” sequences, which contain amino acids from the 37 amino acid library, for the cyclic pentapeptide and cyclic hexapeptide training datasets. It is to be noted that only 48 training booster sequences were added to the cyclic hexapeptide training dataset due to our convergence criterion (Lists S5 and S6). When we supplemented the training datasets with these booster sequences, we saw a great increase in model performances across both the CNNs and GNNs: the CNN models were able to reach performances of R2 = 0.90 and R2 = 0.86 for the cyclic pentapeptide and cyclic hexapeptide test datasets, respectively (Figure 7E,F); similarly, the GNN models reported R2 = 0.88 and R2 = 0.86 for the cyclic pentapeptide and cyclic hexapeptide test datasets, respectively (Figure 7G,H). This result demonstrates that supplementing the training dataset with some additional training sequences is a viable approach to access predictions for larger amino acid libraries. With the addition of only ∼50 sequences built from a 37 amino acid library, we were able to achieve good performance on the test dataset with sequences generated from the 37 amino acid library. This addition is more efficient than if we were to build a training dataset using our current protocol for generating the cyclic hexapeptide sequences from a 15 amino acid library (which requires observing all possible XiXi+1Xi+2 subsequences), as applying this protocol to generate cyclic hexapeptide sequences from a 37 amino acid library would require ∼8500 sequences.

Figure 7

Figure 7. The StrEAMM neural network models can predict structural ensembles for cyclic pentapeptides and cyclic hexapeptides that contain amino acids that were absent in the training dataset. (A–D) Performances of the (A, B) CNN and (C, D) GNN models on cyclic pentapeptide and cyclic hexapeptide datasets containing sequences composed of the 37 amino acid library, when the models were trained using only sequences composed of the 15 amino acid library. (E–H) Performances of the (E, F) CNN and (G, H) GNN models on cyclic pentapeptide and cyclic hexapeptide datasets containing sequences composed of the 37 amino acid library, when the models were trained using sequences composed of the 15 amino acid library and “booster” sequences composed of the 37 amino acid library.

4. Conclusions and Future Directions

ARTICLE SECTIONS
Jump To

The StrEAMM methodology can efficiently predict cyclic peptide structural ensembles by leveraging MD results to train ML models. The newly proposed StrEAMM neural network models can easily embed complicated interaction patterns that may be necessary to learn the structural preferences of cyclic peptides larger than the simple cyclic pentapeptide systems. Our StrEAMM neural network models can accurately predict the structural ensembles for cyclic hexapeptides, which is an improvement from the original StrEAMM linear regression models. The new neural network models can also enable predictions for cyclic peptides with amino acids that were not originally present in the training dataset, suggesting that we can access a larger sequence space without having to spend computational resources on performing more simulations.
While we have described the advantages of using the neural network models, we acknowledge that overall, although the models perform well for cyclic hexapeptides (in general, R2 > 0.90; Figure 6B), they did not reach the same excellent performance as for cyclic pentapeptides (R2 ≥ 0.97; Figure 6A). We plan to focus our future directions on exploring different feature representations, such as physicochemical properties, in addition to the molecular fingerprints to help improve the models’ performance on the cyclic hexapeptide set. We also suspect that a possible reason for the worse performance of the cyclic hexapeptide models compared to the cyclic pentapeptide models is related to the number of training instances. Recall that the training dataset was constructed such that we observe every XiXi+1Xi+2 subsequence. In the cyclic pentapeptide dataset, there were 705 sequences in the training dataset, but in the cyclic hexapeptide dataset, there were only 555 sequences because the increased size of the cyclic hexapeptide allows for an additional triplet subsequence per cyclic peptide sequence. Therefore, we will also evaluate how collecting more training sequences for the cyclic hexapeptide dataset or better designing the training sequences can further improve the model performance. We are also eager to extend the StrEAMM models to larger cyclic peptides and cyclic peptides beyond head-to-tail cyclization, as our neural network models can efficiently predict structural ensembles without the need to explicitly include various neighboring interactions that may influence the structural preferences of such larger, more complex systems. We envision that the implementation of these StrEAMM neural network models can provide the structural information that can greatly aid the rational design and development of cyclic peptides.

Supporting Information

ARTICLE SECTIONS
Jump To

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c00154.

  • Lists of sequences in the training and test datasets; structural binning maps 2 and 3; hyperparameter tuning schemes; neural network model performances on the training datasets for cyclic pentapeptides and cyclic hexapeptides; performances of linear regression and neural network models including only (1,2) or only (1,3) interactions on training and test datasets for cyclic pentapeptides; performances of linear regression and neural network models including only (1,2), only (1,3), or only (1,4) interactions on training and test datasets for cyclic hexapeptides; comparison of model performances using different binning maps (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

ARTICLE SECTIONS
Jump To

  • Corresponding Author
  • Authors
    • Tiffani Hui - Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
    • Marc L. Descoteaux - Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
    • Jiayuan Miao - Department of Chemistry, Tufts University, Medford, Massachusetts 02155, United StatesOrcidhttps://orcid.org/0000-0003-1112-1927
  • Author Contributions

    T.H., M.L.D., and J.M. contributed equally.

  • Notes
    The authors declare the following competing financial interest(s): Y.-S.L. has an equity interest in and is a founding scientist of Speed Cycles, Inc. A patent application (PCT/US2022/072941, Cyclic peptide structure prediction via structural ensembles achieved by molecular dynamics and machine learning) was filed on June 14, 2022.

Acknowledgments

ARTICLE SECTIONS
Jump To

This work was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award R01GM124160 (PI: Y.-S.L.). We are grateful for the support from the Tufts Technology Services and for the computing resources at the Tufts Research Cluster. Initial structures for the simulations were built using UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH Grant P41-GM103311.

References

ARTICLE SECTIONS
Jump To

This article references 65 other publications.

  1. 1
    Smith, M. C.; Gestwicki, J. E. Features of protein-protein interactions that translate into potent inhibitors: topology, surface area and affinity. Expert Rev. Mol. Med. 2012, 14, e16  DOI: 10.1017/erm.2012.10
  2. 2
    Morelli, X.; Bourgeas, R.; Roche, P. Chemical and structural lessons from recent successes in protein-protein interaction inhibition (2P2I). Curr. Opin. Chem. Biol. 2011, 15, 475481,  DOI: 10.1016/j.cbpa.2011.05.024
  3. 3
    Rezai, T.; Bock, J. E.; Zhou, M. V.; Kalyanaraman, C.; Lokey, R. S.; Jacobson, M. P. Conformational Flexibility, Internal Hydrogen Bonding, and Passive Membrane Permeability: Successful in Silico Prediction of the Relative Permeabilities of Cyclic Peptides. J. Am. Chem. Soc. 2006, 128, 1407314080,  DOI: 10.1021/ja063076p
  4. 4
    Dougherty, P. G.; Sahni, A.; Pei, D. Understanding Cell Penetration of Cyclic Peptides. Chem. Rev. 2019, 119, 1024110287,  DOI: 10.1021/acs.chemrev.9b00008
  5. 5
    Zhang, H.; Chen, S. Cyclic peptide drugs approved in the last two decades (2001–2021). RSC Chem. Biol. 2022, 3, 1831,  DOI: 10.1039/D1CB00154J
  6. 6
    Zorzi, A.; Deyle, K.; Heinis, C. Cyclic peptide therapeutics: past, present and future. Curr. Opin. Chem. Biol. 2017, 38, 2429,  DOI: 10.1016/j.cbpa.2017.02.006
  7. 7
    Nguyen, Q. N. N.; Schwochert, J.; Tantillo, D. J.; Lokey, R. S. Using 1H and 13C NMR chemical shifts to determine cyclic peptide conformations: a combined molecular dynamics and quantum mechanics approach. Phys. Chem. Chem. Phys. 2018, 20, 1400314012,  DOI: 10.1039/C8CP01616J
  8. 8
    Ball, K. A.; Wemmer, D. E.; Head-Gordon, T. Comparison of Structure Determination Methods for Intrinsically Disordered Amyloid-β Peptides. J. Phys. Chem. B 2014, 118, 64056416,  DOI: 10.1021/jp410275y
  9. 9
    Fisher, C. K.; Stultz, C. M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2011, 21, 426431,  DOI: 10.1016/j.sbi.2011.04.001
  10. 10
    Cicero, D. O.; Barbato, G.; Bazzo, R. NMR Analysis of Molecular Flexibility in Solution: A New Method for the Study of Complex Distributions of Rapidly Exchanging Conformations. Application to a 13-Residue Peptide with an 8-Residue Loop. J. Am. Chem. Soc. 1995, 117, 10271033,  DOI: 10.1021/ja00108a019
  11. 11
    Ge, Y.; Zhang, S.; Erdelyi, M.; Voelz, V. A. Solution-State Preorganization of Cyclic β-Hairpin Ligands Determines Binding Mechanism and Affinities for MDM2. J. Chem. Inf. Model. 2021, 61, 23532367,  DOI: 10.1021/acs.jcim.1c00029
  12. 12
    Slough, D. P.; McHugh, S. M.; Cummings, A. E.; Dai, P.; Pentelute, B. L.; Kritzer, J. A.; Lin, Y.-S. Designing Well-Structured Cyclic Pentapeptides Based on Sequence-Structure Relationships. J. Phys. Chem. B 2018, 122, 39083919,  DOI: 10.1021/acs.jpcb.8b01747
  13. 13
    Cummings, A. E.; Miao, J.; Slough, D. P.; McHugh, S. M.; Kritzer, J. A.; Lin, Y. S. β-Branched Amino Acids Stabilize Specific Conformations of Cyclic Hexapeptides. Biophys. J. 2019, 116, 433444,  DOI: 10.1016/j.bpj.2018.12.015
  14. 14
    Wakefield, A. E.; Wuest, W. M.; Voelz, V. A. Molecular Simulation of Conformational Pre-Organization in Cyclic RGD Peptides. J. Chem. Inf. Model. 2015, 55, 806813,  DOI: 10.1021/ci500768u
  15. 15
    Damjanovic, J.; Miao, J.; Huang, H.; Lin, Y.-S. Elucidating Solution Structures of Cyclic Peptides Using Molecular Dynamics Simulations. Chem. Rev. 2021, 121, 22922324,  DOI: 10.1021/acs.chemrev.0c01087
  16. 16
    Ono, S.; Naylor, M. R.; Townsend, C. E.; Okumura, C.; Okada, O.; Lokey, R. S. Conformation and Permeability: Cyclic Hexapeptide Diastereomers. J. Chem. Inf. Model. 2019, 59, 29522963,  DOI: 10.1021/acs.jcim.9b00217
  17. 17
    Wang, S.; König, G.; Roth, H.-J.; Fouché, M.; Rodde, S.; Riniker, S. Effect of Flexibility, Lipophilicity, and the Location of Polar Residues on the Passive Membrane Permeability of a Series of Cyclic Decapeptides. J. Med. Chem. 2021, 64, 1276112773,  DOI: 10.1021/acs.jmedchem.1c00775
  18. 18
    El Tayar, N.; Mark, A. E.; Vallat, P.; Brunne, R. M.; Testa, B.; van Gunsteren, W. F. Solvent-dependent conformation and hydrogen-bonding capacity of cyclosporin A: evidence from partition coefficients and molecular dynamics simulations. J. Med. Chem. 1993, 36, 37573764,  DOI: 10.1021/jm00076a002
  19. 19
    Merten, C.; Li, F.; Bravo-Rodriguez, K.; Sanchez-Garcia, E.; Xu, Y.; Sander, W. Solvent-induced conformational changes in cyclic peptides: a vibrational circular dichroism study. Phys. Chem. Chem. Phys. 2014, 16, 56275633,  DOI: 10.1039/C3CP55018D
  20. 20
    Quartararo, J. S.; Eshelman, M. R.; Peraro, L.; Yu, H.; Baleja, J. D.; Lin, Y.-S.; Kritzer, J. A. A bicyclic peptide scaffold promotes phosphotyrosine mimicry and cellular uptake. Bioorg. Med. Chem. 2014, 22, 63876391,  DOI: 10.1016/j.bmc.2014.09.050
  21. 21
    Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G. R.; Wang, J.; Cong, Q.; Kinch, L. N.; Schaeffer, R. D.; Millán, C.; Park, H.; Adams, C.; Glassman, C. R.; DeGiovanni, A.; Pereira, J. H.; Rodrigues, A. V.; van Dijk, A. A.; Ebrecht, A. C.; Opperman, D. J.; Sagmeister, T.; Buhlheller, C.; Pavkov-Keller, T.; Rathinaswamy, M. K.; Dalwadi, U.; Yip, C. K.; Burke, J. E.; Garcia, K. C.; Grishin, N. V.; Adams, P. D.; Read, R. J.; Baker, D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871876,  DOI: 10.1126/science.abj8754
  22. 22
    Bryant, P.; Pozzati, G.; Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 2022, 13, 1265,  DOI: 10.1038/s41467-022-28865-w
  23. 23
    Bryant, P.; Pozzati, G.; Zhu, W.; Shenoy, A.; Kundrotas, P.; Elofsson, A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat. Commun. 2022, 13, 6028,  DOI: 10.1038/s41467-022-33729-4
  24. 24
    Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; Bridgland, A.; Meyer, C.; Kohl, S. A. A.; Ballard, A. J.; Cowie, A.; Romera-Paredes, B.; Nikolov, S.; Jain, R.; Adler, J.; Back, T.; Petersen, S.; Reiman, D.; Clancy, E.; Zielinski, M.; Steinegger, M.; Pacholska, M.; Berghammer, T.; Bodenstein, S.; Silver, D.; Vinyals, O.; Senior, A. W.; Kavukcuoglu, K.; Kohli, P.; Hassabis, D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583589,  DOI: 10.1038/s41586-021-03819-2
  25. 25
    Rettie, S. A.; Campbell, K. V.; Bera, A. K.; Kang, A.; Kozlov, S.; De La Cruz, J.; Adebomi, V.; Zhou, G.; DiMaio, F.; Ovchinnikov, S.; Bhardwaj, G. Cyclic peptide structure prediction and design using AlphaFold. bioRxiv 2023,  DOI: 10.1101/2023.02.25.529956v1
  26. 26
    Gang, D.; Kim, D. W.; Park, H. S. Cyclic Peptides: Promising Scaffolds for Biopharmaceuticals. Genes 2018, 9, 557,  DOI: 10.3390/genes9110557
  27. 27
    Iacovelli, R.; Bovenberg, R. A. L.; Driessen, A. J. M. Nonribosomal peptide synthetases and their biotechnological potential in Penicillium rubens. J. Ind. Microbiol. Biotechnol. 2021, 48, kuab045,  DOI: 10.1093/jimb/kuab045
  28. 28
    Marahiel, M. A. Working outside the protein-synthesis rules: insights into non-ribosomal peptide synthesis. J. Pept. Sci. 2009, 15, 799807,  DOI: 10.1002/psc.1183
  29. 29
    Martínez-Núñez, M. A.; López y López, V. E. Nonribosomal peptides synthetases and their applications in industry. Sustainable Chem. Processes 2016, 4, 13,  DOI: 10.1186/s40508-016-0057-6
  30. 30
    Sieber, S. A.; Marahiel, M. A. Learning from Nature’s Drug Factories: Nonribosomal Synthesis of Macrocyclic Peptides. J. Bacteriol. 2003, 185, 70367043,  DOI: 10.1128/JB.185.24.7036-7043.2003
  31. 31
    Miao, J.; Descoteaux, M. L.; Lin, Y.-S. Structure prediction of cyclic peptides by molecular dynamics + machine learning. Chem. Sci. 2021, 12, 1492714936,  DOI: 10.1039/D1SC05562C
  32. 32
    Jurtz, V. I.; Johansen, A. R.; Nielsen, M.; Almagro Armenteros, J. J.; Nielsen, H.; Sønderby, C. K.; Winther, O.; Sønderby, S. K. An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics 2017, 33, 36853690,  DOI: 10.1093/bioinformatics/btx531
  33. 33
    Hou, J.; Adhikari, B.; Cheng, J. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 2018, 34, 12951303,  DOI: 10.1093/bioinformatics/btx780
  34. 34
    Cheng, J.; Liu, Y.; Ma, Y. Protein secondary structure prediction based on integration of CNN and LSTM model. J. Vis. Commun. Image Representation. 2020, 71, 102844,  DOI: 10.1016/j.jvcir.2020.102844
  35. 35
    Chen, Z.; Min, M. R.; Ning, X. Ranking-Based Convolutional Neural Network Models for Peptide-MHC Class I Binding Prediction. Front. Mol. Biosci. 2021, 8, 634836,  DOI: 10.3389/fmolb.2021.634836
  36. 36
    Gelman, S.; Fahlberg, S. A.; Heinzelman, P.; Romero, P. A.; Gitter, A. Neural networks to learn protein sequence─function relationships from deep mutational scanning data. Proc. Natl. Acad. Sci. U.S.A. 2021, 118, e2104878118  DOI: 10.1073/pnas.2104878118
  37. 37
    Hosseinzadeh, P.; Bhardwaj, G.; Mulligan, V. K.; Shortridge, M. D.; Craven, T. W.; Pardo-Avila, F.; Rettie, S. A.; Kim, D. E.; Silva, D.-A.; Ibrahim, Y. M.; Webb, I. K.; Cort, J. R.; Adkins, J. N.; Varani, G.; Baker, D. Comprehensive computational design of ordered peptide macrocycles. Science 2017, 358, 14611466,  DOI: 10.1126/science.aap7577
  38. 38
    Li, X.; Du, X.; Li, J.; Gao, Y.; Pan, Y.; Shi, J.; Zhou, N.; Xu, B. Introducing d-Amino Acid or Simple Glycoside into Small Peptides to Enable Supramolecular Hydrogelators to Resist Proteolysis. Langmuir 2012, 28, 1351213517,  DOI: 10.1021/la302583a
  39. 39
    Liu, J.; Liu, J.; Chu, L.; Zhang, Y.; Xu, H.; Kong, D.; Yang, Z.; Yang, C.; Ding, D. Self-Assembling Peptide of d-Amino Acids Boosts Selectivity and Antitumor Efficacy of 10-Hydroxycamptothecin. ACS Appl. Mater. Interfaces 2014, 6, 55585565,  DOI: 10.1021/am406007g
  40. 40
    Piana, S.; Laio, A. A bias-exchange approach to protein folding. J. Phys. Chem. B 2007, 111, 45534559,  DOI: 10.1021/jp067873l
  41. 41
    Laio, A.; Parrinello, M. Escaping free-energy minima. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 1256212566,  DOI: 10.1073/pnas.202427399
  42. 42
    McHugh, S. M.; Rogers, J. R.; Yu, H.; Lin, Y.-S. Insights into How Cyclic Peptides Switch Conformations. J. Chem. Theory Comput. 2016, 12, 24802488,  DOI: 10.1021/acs.jctc.6b00193
  43. 43
    Sugita, Y.; Kitao, A.; Okamoto, Y. Multidimensional replica-exchange method for free-energy calculations. J. Chem. Phys. 2000, 113, 60426051,  DOI: 10.1063/1.1308516
  44. 44
    Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera-a visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 16051612,  DOI: 10.1002/jcc.20084
  45. 45
    Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1–2, 1925,  DOI: 10.1016/j.softx.2015.06.001
  46. 46
    Zhou, C.-Y.; Jiang, F.; Wu, Y.-D. Residue-Specific Force Field Based on Protein Coil Library. RSFF2: Modification of AMBER ff99SB. J. Phys. Chem. B 2015, 119, 10351047,  DOI: 10.1021/jp5064676
  47. 47
    Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926935,  DOI: 10.1063/1.445869
  48. 48
    Jiang, F.; Zhou, C.-Y.; Wu, Y.-D. Residue-Specific Force Field Based on the Protein Coil Library. RSFF1: Modification of OPLS-AA/L. J. Phys. Chem. B 2014, 118, 69836998,  DOI: 10.1021/jp5017449
  49. 49
    Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins: Struct., Funct., Bioinf. 2010, 78, 19501958,  DOI: 10.1002/prot.22711
  50. 50
    Geng, H.; Jiang, F.; Wu, Y.-D. Accurate Structure Prediction and Conformational Analysis of Cyclic Peptides with Residue-Specific Force Fields. J. Phys. Chem. Lett. 2016, 7, 18051810,  DOI: 10.1021/acs.jpclett.6b00452
  51. 51
    Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides. J. Phys. Chem. B 2001, 105, 64746487,  DOI: 10.1021/jp003919d
  52. 52
    Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 85778593,  DOI: 10.1063/1.470117
  53. 53
    Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 2014, 185, 604613,  DOI: 10.1016/j.cpc.2013.09.018
  54. 54
    Mu, Y.; Nguyen, P. H.; Stock, G. Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins 2005, 58, 4552,  DOI: 10.1002/prot.20310
  55. 55
    Damas, J. M.; Filipe, L. C.; Campos, S. R.; Lousa, D.; Victor, B. L.; Baptista, A. M.; Soares, C. M. Predicting the Thermodynamics and Kinetics of Helix Formation in a Cyclic Peptide Model. J. Chem. Theory Comput. 2013, 9, 51485157,  DOI: 10.1021/ct400529k
  56. 56
    Hsueh, S. C. C.; Aina, A.; Plotkin, S. S. Ensemble Generation for Linear and Cyclic Peptides Using a Reservoir Replica Exchange Molecular Dynamics Implementation in GROMACS. J. Phys. Chem. B 2022, 126, 1038410399,  DOI: 10.1021/acs.jpcb.2c05470
  57. 57
    Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 14921496,  DOI: 10.1126/science.1242072
  58. 58
    Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5, 107113,  DOI: 10.1021/c160017a018
  59. 59
    Landrum, G. RDKit: Open-Source Cheminformatics Software , 2021. https://www.rdkit.org/.
  60. 60
    Nair, V.; Hinton, G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10), Haifa, Israel, June 21–24, 2010; Fürnkranz, J., Joachims, T., Eds.; Omnipress: Madison, WI, 2010; pp 807814.
  61. 61
    Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv (Computer Science.Machine Learning) , January 30, 2017, 1412.6980, ver. 9. https://arxiv.org/abs/1412.6980 (accessed 2023-03-31).
  62. 62
    Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Kopf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; Chintala, S. Pytorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, Canada, December 8–14, 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, 2019; pp 80248035.
  63. 63
    Fey, M.; Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. arXiv (Computer Science.Machine Learning) , April 25, 2019, 1903.02428, ver. 3. https://arxiv.org/abs/1903.02428 (accessed 2023-03-31).
  64. 64
    Prechelt, L. Automatic early stopping using cross validation: quantifying the criteria. Neural Netw. 1998, 11, 761767,  DOI: 10.1016/S0893-6080(98)00010-0
  65. 65
    Schlichtkrull, M.; Kipf, T. N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. arXiv (Statistics.Machine Learning) , October 26, 2017, 1703.06103, ver. 4. https://arxiv.org/abs/1703.06103 (accessed 2023-03-31).

Cited By

ARTICLE SECTIONS
Jump To

This article has not yet been cited by other publications.

  • Abstract

    Figure 1

    Figure 1. StrEAMM linear regression models are constructed using contributions from neighboring interactions. For example, the natural logarithm of the population of cyclic pentapeptide cyclo-(X1X2X3X4X5) adopting a specific structure S1S2S3S4S5 is the sum of the interaction weights for each (1,2) and (1,3) neighbor present. Xi is an amino acid; Si is a structural digit that represents a region of (ϕ, ψ) space (see Figure 2); wSiSi+1XiXi+1 is the weight for residues XiXi+1 adopting structure SiSi+1; wSiSi+1Si+2Xi_Xi+2 is the weight for residues Xi and Xi+2 adopting substructure SiSi+1Si+2; and wQX1X2X3X4X5 is related to the partition function, Q, for sequence X1X2X3X4X5 and ensures that all the structures’ populations sum to 1 for a given sequence. This equation was first proposed by Miao et al. (31)

    Figure 2

    Figure 2. Structural binning maps divide the (ϕ, ψ) space of cyclic pentapeptides into 10 regions and the (ϕ, ψ) space of cyclic hexapeptides into six regions. (A) The (ϕ, ψ) population density of cyclo-(GGGGG) and (B) the resulting binning map when all the grid points are assigned to their closest centroid to form the final binning map. (C) The (ϕ, ψ) population density of cyclo-(GGGGGG) and (D) the resulting binning map using the same binning protocol.

    Figure 3

    Figure 3. Convolutional neural network and graph neural network architectures. (A) Cartoon example of the cyclic pentapeptide sequence cyclo-(sasFr) represented using a matrix with N = 5 columns and 2048 rows, where the 2048 bits come from the fingerprint encoding of each amino acid. (B) The representations of cyclo-(sasFr) and cyclo-(asFrs) are concatenated such that the (1,2) neighboring residues are spatially close together, enabling the 1D convolutional filter (blue box representing a (4096 × 1) vector of learnable model parameters) to encompass all the fingerprint features that define residues s and a. (C) The top graph represents a cyclic pentapeptide, where the (1,2) edge types are denoted with blue arrows and the (1,3) edge types are denoted with green arrows. The bottom graph represents a cyclic hexapeptide, where in addition to the (1,2) and (1,3) edge types we also can include (1,4) edge types, as denoted with purple arrows. Note: The purple (1,4) edges appear to be double-arrowed, but there are really two unique single-arrowed edges. For example, the forward edge in dark purple that starts at ser1 would be directed to Phe4, and the forward edge in dark purple that starts at Phe4 would be directed to ser1.

    Figure 4

    Figure 4. Performance of StrEAMM linear regression models on the training dataset (top) and test dataset (bottom) for (A) cyclic pentapeptides and (B, C) cyclic hexapeptides. (A) Performance of StrEAMM linear regression model on cyclic pentapeptides using (1,2) and (1,3) interactions. (B) Performance of StrEAMM linear regression model on cyclic hexapeptides using (1,2) and (1,3) interactions. (C) Performance of StrEAMM linear regression model on cyclic hexapeptides using (1,2), (1,3), and (1,4) interactions. The black dashed line represents y = x. R2 is the coefficient of determination. WE is the weighted error, given by WE=ipi,observed|pi,observedpi,predicted|ipi,observed, where pi,observed and pi,predicted are the populations observed in MD simulation and predicted by StrEAMM, respectively. Each point on the plot represents the predicted versus the observed percent population in MD for a structure in the structural ensemble of a cyclic peptide. All the structures in the structural ensembles for all the cyclic peptides in the training or test dataset with a predicted or observed percent population in MD of >1% are plotted.

    Figure 5

    Figure 5. Performance of (A–C) StrEAMM CNN models and (D–F) StrEAMM GNN models on the test dataset for (A, D) cyclic pentapeptides and the models incorporating (1,2) and (1,3) filters/edges, (B, E) cyclic hexapeptides and the models incorporating (1,2) and (1,3) filters/edges, and (C, F) cyclic hexapeptides and the models incorporating (1,2), (1,3), and (1,4) filters/edges. See Figure S7 for the model performances on the corresponding training datasets. All the structures in the structural ensembles for all the cyclic peptides in the training or test dataset with a predicted or observed percent population in MD of >1% are plotted.

    Figure 6

    Figure 6. Comparison of the StrEAMM linear regression and neural network models’ performances on the (A) cyclic pentapeptide and (B) cyclic hexapeptide test datasets. The coefficient of determination, R2, and weighted error, WE, are shown for each model (the linear regression in red with diagonal slash pattern, CNN in green with dotted pattern, and GNN in blue with vertical line pattern) including different neighboring interactions.

    Figure 7

    Figure 7. The StrEAMM neural network models can predict structural ensembles for cyclic pentapeptides and cyclic hexapeptides that contain amino acids that were absent in the training dataset. (A–D) Performances of the (A, B) CNN and (C, D) GNN models on cyclic pentapeptide and cyclic hexapeptide datasets containing sequences composed of the 37 amino acid library, when the models were trained using only sequences composed of the 15 amino acid library. (E–H) Performances of the (E, F) CNN and (G, H) GNN models on cyclic pentapeptide and cyclic hexapeptide datasets containing sequences composed of the 37 amino acid library, when the models were trained using sequences composed of the 15 amino acid library and “booster” sequences composed of the 37 amino acid library.

  • References

    ARTICLE SECTIONS
    Jump To

    This article references 65 other publications.

    1. 1
      Smith, M. C.; Gestwicki, J. E. Features of protein-protein interactions that translate into potent inhibitors: topology, surface area and affinity. Expert Rev. Mol. Med. 2012, 14, e16  DOI: 10.1017/erm.2012.10
    2. 2
      Morelli, X.; Bourgeas, R.; Roche, P. Chemical and structural lessons from recent successes in protein-protein interaction inhibition (2P2I). Curr. Opin. Chem. Biol. 2011, 15, 475481,  DOI: 10.1016/j.cbpa.2011.05.024
    3. 3
      Rezai, T.; Bock, J. E.; Zhou, M. V.; Kalyanaraman, C.; Lokey, R. S.; Jacobson, M. P. Conformational Flexibility, Internal Hydrogen Bonding, and Passive Membrane Permeability: Successful in Silico Prediction of the Relative Permeabilities of Cyclic Peptides. J. Am. Chem. Soc. 2006, 128, 1407314080,  DOI: 10.1021/ja063076p
    4. 4
      Dougherty, P. G.; Sahni, A.; Pei, D. Understanding Cell Penetration of Cyclic Peptides. Chem. Rev. 2019, 119, 1024110287,  DOI: 10.1021/acs.chemrev.9b00008
    5. 5
      Zhang, H.; Chen, S. Cyclic peptide drugs approved in the last two decades (2001–2021). RSC Chem. Biol. 2022, 3, 1831,  DOI: 10.1039/D1CB00154J
    6. 6
      Zorzi, A.; Deyle, K.; Heinis, C. Cyclic peptide therapeutics: past, present and future. Curr. Opin. Chem. Biol. 2017, 38, 2429,  DOI: 10.1016/j.cbpa.2017.02.006
    7. 7
      Nguyen, Q. N. N.; Schwochert, J.; Tantillo, D. J.; Lokey, R. S. Using 1H and 13C NMR chemical shifts to determine cyclic peptide conformations: a combined molecular dynamics and quantum mechanics approach. Phys. Chem. Chem. Phys. 2018, 20, 1400314012,  DOI: 10.1039/C8CP01616J
    8. 8
      Ball, K. A.; Wemmer, D. E.; Head-Gordon, T. Comparison of Structure Determination Methods for Intrinsically Disordered Amyloid-β Peptides. J. Phys. Chem. B 2014, 118, 64056416,  DOI: 10.1021/jp410275y
    9. 9
      Fisher, C. K.; Stultz, C. M. Constructing ensembles for intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2011, 21, 426431,  DOI: 10.1016/j.sbi.2011.04.001
    10. 10
      Cicero, D. O.; Barbato, G.; Bazzo, R. NMR Analysis of Molecular Flexibility in Solution: A New Method for the Study of Complex Distributions of Rapidly Exchanging Conformations. Application to a 13-Residue Peptide with an 8-Residue Loop. J. Am. Chem. Soc. 1995, 117, 10271033,  DOI: 10.1021/ja00108a019
    11. 11
      Ge, Y.; Zhang, S.; Erdelyi, M.; Voelz, V. A. Solution-State Preorganization of Cyclic β-Hairpin Ligands Determines Binding Mechanism and Affinities for MDM2. J. Chem. Inf. Model. 2021, 61, 23532367,  DOI: 10.1021/acs.jcim.1c00029
    12. 12
      Slough, D. P.; McHugh, S. M.; Cummings, A. E.; Dai, P.; Pentelute, B. L.; Kritzer, J. A.; Lin, Y.-S. Designing Well-Structured Cyclic Pentapeptides Based on Sequence-Structure Relationships. J. Phys. Chem. B 2018, 122, 39083919,  DOI: 10.1021/acs.jpcb.8b01747
    13. 13
      Cummings, A. E.; Miao, J.; Slough, D. P.; McHugh, S. M.; Kritzer, J. A.; Lin, Y. S. β-Branched Amino Acids Stabilize Specific Conformations of Cyclic Hexapeptides. Biophys. J. 2019, 116, 433444,  DOI: 10.1016/j.bpj.2018.12.015
    14. 14
      Wakefield, A. E.; Wuest, W. M.; Voelz, V. A. Molecular Simulation of Conformational Pre-Organization in Cyclic RGD Peptides. J. Chem. Inf. Model. 2015, 55, 806813,  DOI: 10.1021/ci500768u
    15. 15
      Damjanovic, J.; Miao, J.; Huang, H.; Lin, Y.-S. Elucidating Solution Structures of Cyclic Peptides Using Molecular Dynamics Simulations. Chem. Rev. 2021, 121, 22922324,  DOI: 10.1021/acs.chemrev.0c01087
    16. 16
      Ono, S.; Naylor, M. R.; Townsend, C. E.; Okumura, C.; Okada, O.; Lokey, R. S. Conformation and Permeability: Cyclic Hexapeptide Diastereomers. J. Chem. Inf. Model. 2019, 59, 29522963,  DOI: 10.1021/acs.jcim.9b00217
    17. 17
      Wang, S.; König, G.; Roth, H.-J.; Fouché, M.; Rodde, S.; Riniker, S. Effect of Flexibility, Lipophilicity, and the Location of Polar Residues on the Passive Membrane Permeability of a Series of Cyclic Decapeptides. J. Med. Chem. 2021, 64, 1276112773,  DOI: 10.1021/acs.jmedchem.1c00775
    18. 18
      El Tayar, N.; Mark, A. E.; Vallat, P.; Brunne, R. M.; Testa, B.; van Gunsteren, W. F. Solvent-dependent conformation and hydrogen-bonding capacity of cyclosporin A: evidence from partition coefficients and molecular dynamics simulations. J. Med. Chem. 1993, 36, 37573764,  DOI: 10.1021/jm00076a002
    19. 19
      Merten, C.; Li, F.; Bravo-Rodriguez, K.; Sanchez-Garcia, E.; Xu, Y.; Sander, W. Solvent-induced conformational changes in cyclic peptides: a vibrational circular dichroism study. Phys. Chem. Chem. Phys. 2014, 16, 56275633,  DOI: 10.1039/C3CP55018D
    20. 20
      Quartararo, J. S.; Eshelman, M. R.; Peraro, L.; Yu, H.; Baleja, J. D.; Lin, Y.-S.; Kritzer, J. A. A bicyclic peptide scaffold promotes phosphotyrosine mimicry and cellular uptake. Bioorg. Med. Chem. 2014, 22, 63876391,  DOI: 10.1016/j.bmc.2014.09.050
    21. 21
      Baek, M.; DiMaio, F.; Anishchenko, I.; Dauparas, J.; Ovchinnikov, S.; Lee, G. R.; Wang, J.; Cong, Q.; Kinch, L. N.; Schaeffer, R. D.; Millán, C.; Park, H.; Adams, C.; Glassman, C. R.; DeGiovanni, A.; Pereira, J. H.; Rodrigues, A. V.; van Dijk, A. A.; Ebrecht, A. C.; Opperman, D. J.; Sagmeister, T.; Buhlheller, C.; Pavkov-Keller, T.; Rathinaswamy, M. K.; Dalwadi, U.; Yip, C. K.; Burke, J. E.; Garcia, K. C.; Grishin, N. V.; Adams, P. D.; Read, R. J.; Baker, D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021, 373, 871876,  DOI: 10.1126/science.abj8754
    22. 22
      Bryant, P.; Pozzati, G.; Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 2022, 13, 1265,  DOI: 10.1038/s41467-022-28865-w
    23. 23
      Bryant, P.; Pozzati, G.; Zhu, W.; Shenoy, A.; Kundrotas, P.; Elofsson, A. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nat. Commun. 2022, 13, 6028,  DOI: 10.1038/s41467-022-33729-4
    24. 24
      Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; Bridgland, A.; Meyer, C.; Kohl, S. A. A.; Ballard, A. J.; Cowie, A.; Romera-Paredes, B.; Nikolov, S.; Jain, R.; Adler, J.; Back, T.; Petersen, S.; Reiman, D.; Clancy, E.; Zielinski, M.; Steinegger, M.; Pacholska, M.; Berghammer, T.; Bodenstein, S.; Silver, D.; Vinyals, O.; Senior, A. W.; Kavukcuoglu, K.; Kohli, P.; Hassabis, D. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583589,  DOI: 10.1038/s41586-021-03819-2
    25. 25
      Rettie, S. A.; Campbell, K. V.; Bera, A. K.; Kang, A.; Kozlov, S.; De La Cruz, J.; Adebomi, V.; Zhou, G.; DiMaio, F.; Ovchinnikov, S.; Bhardwaj, G. Cyclic peptide structure prediction and design using AlphaFold. bioRxiv 2023,  DOI: 10.1101/2023.02.25.529956v1
    26. 26
      Gang, D.; Kim, D. W.; Park, H. S. Cyclic Peptides: Promising Scaffolds for Biopharmaceuticals. Genes 2018, 9, 557,  DOI: 10.3390/genes9110557
    27. 27
      Iacovelli, R.; Bovenberg, R. A. L.; Driessen, A. J. M. Nonribosomal peptide synthetases and their biotechnological potential in Penicillium rubens. J. Ind. Microbiol. Biotechnol. 2021, 48, kuab045,  DOI: 10.1093/jimb/kuab045
    28. 28
      Marahiel, M. A. Working outside the protein-synthesis rules: insights into non-ribosomal peptide synthesis. J. Pept. Sci. 2009, 15, 799807,  DOI: 10.1002/psc.1183
    29. 29
      Martínez-Núñez, M. A.; López y López, V. E. Nonribosomal peptides synthetases and their applications in industry. Sustainable Chem. Processes 2016, 4, 13,  DOI: 10.1186/s40508-016-0057-6
    30. 30
      Sieber, S. A.; Marahiel, M. A. Learning from Nature’s Drug Factories: Nonribosomal Synthesis of Macrocyclic Peptides. J. Bacteriol. 2003, 185, 70367043,  DOI: 10.1128/JB.185.24.7036-7043.2003
    31. 31
      Miao, J.; Descoteaux, M. L.; Lin, Y.-S. Structure prediction of cyclic peptides by molecular dynamics + machine learning. Chem. Sci. 2021, 12, 1492714936,  DOI: 10.1039/D1SC05562C
    32. 32
      Jurtz, V. I.; Johansen, A. R.; Nielsen, M.; Almagro Armenteros, J. J.; Nielsen, H.; Sønderby, C. K.; Winther, O.; Sønderby, S. K. An introduction to deep learning on biological sequence data: examples and solutions. Bioinformatics 2017, 33, 36853690,  DOI: 10.1093/bioinformatics/btx531
    33. 33
      Hou, J.; Adhikari, B.; Cheng, J. DeepSF: deep convolutional neural network for mapping protein sequences to folds. Bioinformatics 2018, 34, 12951303,  DOI: 10.1093/bioinformatics/btx780
    34. 34
      Cheng, J.; Liu, Y.; Ma, Y. Protein secondary structure prediction based on integration of CNN and LSTM model. J. Vis. Commun. Image Representation. 2020, 71, 102844,  DOI: 10.1016/j.jvcir.2020.102844
    35. 35
      Chen, Z.; Min, M. R.; Ning, X. Ranking-Based Convolutional Neural Network Models for Peptide-MHC Class I Binding Prediction. Front. Mol. Biosci. 2021, 8, 634836,  DOI: 10.3389/fmolb.2021.634836
    36. 36
      Gelman, S.; Fahlberg, S. A.; Heinzelman, P.; Romero, P. A.; Gitter, A. Neural networks to learn protein sequence─function relationships from deep mutational scanning data. Proc. Natl. Acad. Sci. U.S.A. 2021, 118, e2104878118  DOI: 10.1073/pnas.2104878118
    37. 37
      Hosseinzadeh, P.; Bhardwaj, G.; Mulligan, V. K.; Shortridge, M. D.; Craven, T. W.; Pardo-Avila, F.; Rettie, S. A.; Kim, D. E.; Silva, D.-A.; Ibrahim, Y. M.; Webb, I. K.; Cort, J. R.; Adkins, J. N.; Varani, G.; Baker, D. Comprehensive computational design of ordered peptide macrocycles. Science 2017, 358, 14611466,  DOI: 10.1126/science.aap7577
    38. 38
      Li, X.; Du, X.; Li, J.; Gao, Y.; Pan, Y.; Shi, J.; Zhou, N.; Xu, B. Introducing d-Amino Acid or Simple Glycoside into Small Peptides to Enable Supramolecular Hydrogelators to Resist Proteolysis. Langmuir 2012, 28, 1351213517,  DOI: 10.1021/la302583a
    39. 39
      Liu, J.; Liu, J.; Chu, L.; Zhang, Y.; Xu, H.; Kong, D.; Yang, Z.; Yang, C.; Ding, D. Self-Assembling Peptide of d-Amino Acids Boosts Selectivity and Antitumor Efficacy of 10-Hydroxycamptothecin. ACS Appl. Mater. Interfaces 2014, 6, 55585565,  DOI: 10.1021/am406007g
    40. 40
      Piana, S.; Laio, A. A bias-exchange approach to protein folding. J. Phys. Chem. B 2007, 111, 45534559,  DOI: 10.1021/jp067873l
    41. 41
      Laio, A.; Parrinello, M. Escaping free-energy minima. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 1256212566,  DOI: 10.1073/pnas.202427399
    42. 42
      McHugh, S. M.; Rogers, J. R.; Yu, H.; Lin, Y.-S. Insights into How Cyclic Peptides Switch Conformations. J. Chem. Theory Comput. 2016, 12, 24802488,  DOI: 10.1021/acs.jctc.6b00193
    43. 43
      Sugita, Y.; Kitao, A.; Okamoto, Y. Multidimensional replica-exchange method for free-energy calculations. J. Chem. Phys. 2000, 113, 60426051,  DOI: 10.1063/1.1308516
    44. 44
      Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera-a visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 16051612,  DOI: 10.1002/jcc.20084
    45. 45
      Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1–2, 1925,  DOI: 10.1016/j.softx.2015.06.001
    46. 46
      Zhou, C.-Y.; Jiang, F.; Wu, Y.-D. Residue-Specific Force Field Based on Protein Coil Library. RSFF2: Modification of AMBER ff99SB. J. Phys. Chem. B 2015, 119, 10351047,  DOI: 10.1021/jp5064676
    47. 47
      Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926935,  DOI: 10.1063/1.445869
    48. 48
      Jiang, F.; Zhou, C.-Y.; Wu, Y.-D. Residue-Specific Force Field Based on the Protein Coil Library. RSFF1: Modification of OPLS-AA/L. J. Phys. Chem. B 2014, 118, 69836998,  DOI: 10.1021/jp5017449
    49. 49
      Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins: Struct., Funct., Bioinf. 2010, 78, 19501958,  DOI: 10.1002/prot.22711
    50. 50
      Geng, H.; Jiang, F.; Wu, Y.-D. Accurate Structure Prediction and Conformational Analysis of Cyclic Peptides with Residue-Specific Force Fields. J. Phys. Chem. Lett. 2016, 7, 18051810,  DOI: 10.1021/acs.jpclett.6b00452
    51. 51
      Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides. J. Phys. Chem. B 2001, 105, 64746487,  DOI: 10.1021/jp003919d
    52. 52
      Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A smooth particle mesh Ewald method. J. Chem. Phys. 1995, 103, 85778593,  DOI: 10.1063/1.470117
    53. 53
      Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New feathers for an old bird. Comput. Phys. Commun. 2014, 185, 604613,  DOI: 10.1016/j.cpc.2013.09.018
    54. 54
      Mu, Y.; Nguyen, P. H.; Stock, G. Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins 2005, 58, 4552,  DOI: 10.1002/prot.20310
    55. 55
      Damas, J. M.; Filipe, L. C.; Campos, S. R.; Lousa, D.; Victor, B. L.; Baptista, A. M.; Soares, C. M. Predicting the Thermodynamics and Kinetics of Helix Formation in a Cyclic Peptide Model. J. Chem. Theory Comput. 2013, 9, 51485157,  DOI: 10.1021/ct400529k
    56. 56
      Hsueh, S. C. C.; Aina, A.; Plotkin, S. S. Ensemble Generation for Linear and Cyclic Peptides Using a Reservoir Replica Exchange Molecular Dynamics Implementation in GROMACS. J. Phys. Chem. B 2022, 126, 1038410399,  DOI: 10.1021/acs.jpcb.2c05470
    57. 57
      Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 14921496,  DOI: 10.1126/science.1242072
    58. 58
      Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5, 107113,  DOI: 10.1021/c160017a018
    59. 59
      Landrum, G. RDKit: Open-Source Cheminformatics Software , 2021. https://www.rdkit.org/.
    60. 60
      Nair, V.; Hinton, G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10), Haifa, Israel, June 21–24, 2010; Fürnkranz, J., Joachims, T., Eds.; Omnipress: Madison, WI, 2010; pp 807814.
    61. 61
      Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv (Computer Science.Machine Learning) , January 30, 2017, 1412.6980, ver. 9. https://arxiv.org/abs/1412.6980 (accessed 2023-03-31).
    62. 62
      Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; Desmaison, A.; Kopf, A.; Yang, E.; DeVito, Z.; Raison, M.; Tejani, A.; Chilamkurthy, S.; Steiner, B.; Fang, L.; Bai, J.; Chintala, S. Pytorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, Canada, December 8–14, 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, 2019; pp 80248035.
    63. 63
      Fey, M.; Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. arXiv (Computer Science.Machine Learning) , April 25, 2019, 1903.02428, ver. 3. https://arxiv.org/abs/1903.02428 (accessed 2023-03-31).
    64. 64
      Prechelt, L. Automatic early stopping using cross validation: quantifying the criteria. Neural Netw. 1998, 11, 761767,  DOI: 10.1016/S0893-6080(98)00010-0
    65. 65
      Schlichtkrull, M.; Kipf, T. N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. arXiv (Statistics.Machine Learning) , October 26, 2017, 1703.06103, ver. 4. https://arxiv.org/abs/1703.06103 (accessed 2023-03-31).
  • Supporting Information

    Supporting Information

    ARTICLE SECTIONS
    Jump To

    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c00154.

    • Lists of sequences in the training and test datasets; structural binning maps 2 and 3; hyperparameter tuning schemes; neural network model performances on the training datasets for cyclic pentapeptides and cyclic hexapeptides; performances of linear regression and neural network models including only (1,2) or only (1,3) interactions on training and test datasets for cyclic pentapeptides; performances of linear regression and neural network models including only (1,2), only (1,3), or only (1,4) interactions on training and test datasets for cyclic hexapeptides; comparison of model performances using different binning maps (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

Pair your accounts.

Export articles to Mendeley

Get article recommendations from ACS based on references in your Mendeley library.

You’ve supercharged your research process with ACS and Mendeley!

STEP 1:
Click to create an ACS ID

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

MENDELEY PAIRING EXPIRED
Your Mendeley pairing has expired. Please reconnect