Assessment of Two Restraint Potentials for Coarse-Grained Chemical-Cross-Link-Assisted Modeling of Protein Structures

The influence of distance restraints from chemical cross-link mass spectroscopy (XL-MS) on the quality of protein structures modeled with the coarse-grained UNRES force field was assessed by using a protocol based on multiplexed replica exchange molecular dynamics, in which both simulated and experimental cross-link restraints were employed, for 23 small proteins. Six cross-links with upper distance boundaries from 4 Å to 12 Å (azido benzoic acid succinimide (ABAS), triazidotriazine (TATA), succinimidyldiazirine (SDA), disuccinimidyl adipate (DSA), disuccinimidyl glutarate (DSG), and disuccinimidyl suberate (BS3)) and two types of restraining potentials ((i) simple flat-bottom Lorentz-like potentials dependent on side chain distance (all cross-links) and (ii) distance- and orientation-dependent potentials determined based on molecular dynamics simulations of model systems (DSA, DSG, BS3, and SDA)) were considered. The Lorentz-like potentials with properly set parameters were found to produce a greater number of higher-quality models compared to unrestrained simulations than the MD-based potentials, because the latter can force too long distances between side chains. Therefore, the flat-bottom Lorentz-like potentials are recommended to represent cross-link restraints. It was also found that significant improvement of model quality upon the introduction of cross-link restraints is obtained when the sum of differences of indices of cross-linked residues exceeds 150.

virtual-bond angles for the Z-SDA-Lys cross-link systems, where Z denotes the aminoacid residue that binds to the photoactive site of SDA, X(O) denotes the photoactive site, and X(N) denotes the lysine-binding site of the SDA.
angles averaged over the adjacent virtual-bond-angles θ for the Z-SDA-Lys cross-link systems, where Z denotes the aminoacid residue that binds to the photoactive site of SDA, X(O) denotes the photoactive site, and X(N) denotes the lysine-binding site of the SDA.
angles averaged over the adjacent virtual-bond-angles θ for the Lys-DSA-Lys cross-link system.: Relationship between the number of cross links (N Xl , left panels), the topological length of the longest cross-link (L max , defined as the number of residues in the loop bridged by a cross-link, middle panels) and the sum of cross-link topological lengths (ΣL, right panels) and the difference between the GDT_TS of models obtained with Lorentz-type cross-link restraints (eq 8 of the main text) for different restraint types (indicated in the respective panels) and the first models obtained from simulations.LR(σ,A) denotes Lorentzlike potentials, with σ and A being the wall thickness and well depth, respectively (eq 8) of the main text, and MD denotes MD-based potentials (eqs 3 -6) of the main text.: Relationship between the number of cross links (N Xl , left panels), the topological length of the longest cross-link (L max , defined as the number of residues in the loop bridged by a cross-link, middle panels) and the sum of cross-link topological lengths (ΣL, right panels) and the difference between the GDT_TS of models obtained with Lorentz-type cross-link restraints (eq 8 of the main text) for different restraint types (indicated in the respective panels) and the best models obtained from simulations.LR(σ,A) denotes Lorentzlike potentials, with σ and A being the wall thickness and well depth, respectively (eq 8) of the main text, and MD denotes MD-based potentials (eqs 3 -6) of the main text.
Table S1: Parameters of the expressions for the cross-link virtual-bond potentials (eq 4 of the main text).
Link type angles, where X denotes the amino-acid residue binding to the photoactive site of SDA and Y denotes the lysine-binding site of SDA.For DSA, which is a symmetric homobifunctional cross-linker, both binding sites are equivalent and, consequently, there is only one set of parameters.

Link type N
a Number of residues in the PDB structure.
b The number of the first residue in the PDB structure.
c Number of synthetic cross-links between lysine residues, the long-range cross-links (bridging the residues with indices differing by at least 5) in parentheses.
d Residue numbering follows that of the PDB structure.
b The number of the first residue in the PDB structure.
c Number of cross-links, the long-range cross-links (bridging the residues with indices differing by at least 5) in parentheses.
d The cross-links corresponding to pairs of residues with the C α -distances greater by more than 10 Å than the maximum cross-link span are in red font and those with C α -distances greater by 5-10 Å than the maximum cross-link span are in orange font.All other cross-links are in regular font.b Upper line for 11AO6-1, 1AO6-2, 1AO6-3, and 1AO6-6: dihedral-angle restraints only; lower lines: dihedral-angle and disulfide-bridge restraints.
c MD-derived potentials are not available for most of the cross-links reported for myoglobin (2V1H); therefore the respective calculations were not performed.

Figure S1 :
Figure S1: Charges on the atoms (electron charge units) of the model compounds selected for the determination of MD-based X-SDA-Lys potentials of mean force by all-atom MD simulations.The charges were determined by fitting to the molecular electrostatic potential calculated by the RHF 6-31G* ab initio method.

Figure S2 :
Figure S2: Potentials of mean force (filled circles) and fitted analytical expressions (lines; eq 4 of the main text) of the Cα • • • X(N) (A) and X(O)• • • C α (B) and X(N)• • • X(O) (C)virtual-bond lengths for the Z-SDA-Lys cross-link systems, where Z denotes the aminoacid residue that binds to the photoactive site of SDA, X(O) denotes the photoactive site, and X(N) denotes the lysine-binding site of the SDA.

Figure S4 :
Figure S4: Potentials of mean force (filled circles) and fitted analytical expressions (lines; eq 6 of the main text) of the Cα • • • X(N)• • • X(O)• • • C α virtual-bond-dihedralangles averaged over the adjacent virtual-bond-angles θ for the Z-SDA-Lys cross-link systems, where Z denotes the aminoacid residue that binds to the photoactive site of SDA, X(O) denotes the photoactive site, and X(N) denotes the lysine-binding site of the SDA.

Figure S5 :Figure S6 :
FigureS5: Potentials of mean force (filled circles) and fitted analytical expressions (lines; eq 4 of the main text) of the C α • • • X (A) virtual-bond and X• • • X (B) virtual-bond lengths for the Lys-DSA-Lys cross-link system, where X denotes a lysine-binding site of DSA.6

Figure S7 :
Figure S7: Potential of mean force (filled circles) and the fitted analytical expression (line; eq 6 of the main text) of the Cα • • • X• • • X• • • C α virtual-bond-dihedralangles averaged over the adjacent virtual-bond-angles θ for the Lys-DSA-Lys cross-link system.
FigureS8: Relationship between the number of cross links (N Xl , left panels), the topological length of the longest cross-link (L max , defined as the number of residues in the loop bridged by a cross-link, middle panels) and the sum of cross-link topological lengths (ΣL, right panels) and the difference between the GDT_TS of models obtained with Lorentz-type cross-link restraints (eq 8 of the main text) for different restraint types (indicated in the respective panels) and the first models obtained from simulations.LR(σ,A) denotes Lorentzlike potentials, with σ and A being the wall thickness and well depth, respectively (eq 8) of the main text, and MD denotes MD-based potentials (eqs 3 -6) of the main text.
FigureS9: Relationship between the number of cross links (N Xl , left panels), the topological length of the longest cross-link (L max , defined as the number of residues in the loop bridged by a cross-link, middle panels) and the sum of cross-link topological lengths (ΣL, right panels) and the difference between the GDT_TS of models obtained with Lorentz-type cross-link restraints (eq 8 of the main text) for different restraint types (indicated in the respective panels) and the best models obtained from simulations.LR(σ,A) denotes Lorentzlike potentials, with σ and A being the wall thickness and well depth, respectively (eq 8) of the main text, and MD denotes MD-based potentials (eqs 3 -6) of the main text.

Table S2 :
Y denotes the lysine-binding site of DSA or SDA and X the photoactive site of SDA.For SDA, the upper rows contain the parameters of the C α • • • Y and the lower rows those of the Y• • • C α virtual bonds.There is only one row for DSA because this is a symmetric homobifunctional cross-linker.Parameters of the expressions for the cross-link virtual-bond-angle potentials (eq 5 of the main text).a a a For the Z-SDA-Lys systems, the upper rows are for the C α • • • X• • • Y and the lower rows for the Y•

Table S7 :
C α -RMSD, TM-score, and GDT_TS values of the first and best models of the 12 short-link benchmark proteins obtained in the free and restrained simulations.a The number of significant digits of GDT_TS and C α -RMSD values follows the convention a

Table S8 :
C α -RMSD, TM-score, and GDT_TS values of the first and best models of the 7 benchmark proteins from ref 1 obtained in free and cross-link-restrained of simulations.a The number of significant digits of GDT_TS and C α -RMSD values follows the convention of reporting these values in the CASP experiments (https://www.predictioncenter.org).Following this convention, TM-score values, which range from 0 to 1, are reported with 4 digits after the decimal separator.The number of significant digits of GDT_TS and C α -RMSD values follows the convention of reporting these values in the CASP experiments (https://www.predictioncenter.org).Following this convention, TM-score values, which range from 0 to 1, are reported with 4 digits after the decimal separator.
a a