Docking for Molecules That Bind in a Symmetric Stack with SymDOCK

Discovering ligands for amyloid fibrils, such as those formed by the tau protein, is an area of great current interest. In recent structures, ligands bind in stacks in the tau fibrils to reflect the rotational and translational symmetry of the fibril itself; in these structures, the ligands make few interactions with the protein but interact extensively with each other. To exploit this symmetry and stacking, we developed SymDOCK, a method to dock molecules that follow the protein’s symmetry. For each prospective ligand pose, we apply the symmetry operation of the fibril to generate a self-interacting and fibril-interacting stack, checking that doing so will not cause a clash between the original molecule and its image. Absent a clash, we retain that pose and add the ligand–ligand van der Waals energy to the ligand’s docking score (here using DOCK3.8). We can check these geometries and energies using an implementation of ANI, a neural-network-based quantum-mechanical evaluation of the ligand stacking energies. In retrospective calculations, symmetry docking can reproduce the poses of three tau PET tracers whose structures have been determined. More convincingly, in a prospective study, SymDOCK predicted the structure of the PET tracer MK-6240 bound in a symmetrical stack to AD PHF tau before that structure was determined; the docked pose was used to determine how MK-6240 fit the cryo-EM density. In proof-of-concept studies, SymDOCK enriched known ligands over property-matched decoys in retrospective screens without sacrificing docking speed and can address large library screens that seek new symmetrical stackers. Future applications of this approach will be considered.


■ INTRODUCTION
An open problem in ligand discovery is understanding and exploiting the ability of high-affinity ligands to bind to protein amyloids.−17 Recently, cryo-EM structures of AD tau and CTE tau with bound ligands have been determined.A striking feature of these structures is that the ligands bind in stacks to the tau filaments to reflect the rotational and translational symmetry of the fibril.Epigallocatechin gallate (EGCG), a tau disaggregator found in green tea, binds at the inter-protofilament cleft of AD PHF tau with a stoichiometry of one ligand molecule per protein monomer. 18−22 In each of these examples, the ligands within the stacks make substantial van der Waals and π−π interactions.Intriguingly, for the PET tracers, these ligand−ligand interactions seem more dominant than their direct contacts with the fibrils.For example, GTP-1 forms only a single hydrogen bond with lysine 353 on AD PHF tau. 19Flortaucipir's pyridine nitrogen in its benzo-pyrrolo-pyridine moiety makes a hydrogen bond to aspartate 358 in CTE type I tau; however, apart from this, the structure primarily extends into the solvent. 21EGCG establishes more hydrophilic contacts with the inter-protofilament cleft of AD PHF tau, including hydrogen bonds to asparagine 327, glutamate 338, and lysine 340, and polar contact with histidine 329.Yet, its interactions mainly involve other ligands in the stack and the surrounding solvent. 18−30 To use this new mode of binding to find other ligands that might bind to different tau polymorphs and to other protein fibrils, we developed the symmetric extension to DOCK3.8, 31−33 "SymDOCK".This method uses the same pose sampling and scoring as in DOCK3.8 but retains only those poses that can form a symmetrical stack in the protein.
We then evaluate the ligand−ligand van der Waals energy as a crude measure of favorable self-interaction and add that to the DOCK score.Given the imposed symmetry, we assume that the interactions with the protein are copied from the ligand monomer to monomer.We found that in both retrospective and, more convincingly, prospective predictions, SymDOCK succeeded in recapitulating and in predicting the poses of different tau ligands.In a test-of-concept screen of 22 million molecules, SymDOCK was not much slower than base DOCK3.8.This suggests that the extra cost of the new symmetry operations and of calculating inter-ligand van der Waals energies is compensated by the lower sampling imposed by the symmetry constraints.To further test SymDOCK's generated geometries, we computed their energies using ANI, a neural network force field trained on density functional theory (DFT) energy data. 34,35We found that their ANI energies and exact geometries differed only slightly from those of ANIoptimized stacks.We consider further generalizations of this approach.
■ METHODS Symmetric Pose Sampling.The SymDOCK extension uses the same ligand building and initial pose generation as base DOCK3.8,allowing a user to dock the built section of the ZINC22 database without any modifications. 31,36To keep SymDOCK from decreasing the speed of docking each molecule, we do not generate or attempt to dock a stack of molecules.Rather, we generate poses (conformations and orientations) for each ligand as always in DOCK but only keep those that can self-stack. 31,37,38Because of the symmetry of the system, we need to evaluate only the ligand−protein interaction (checks for clashes and energetic favorability) for a single site; the ligand−protein interactions at all other sites will be identical.
Once we generate a potential pose, we apply the symmetry operation of the fibril and check the ligand−ligand interactions (Figure 1a,b).The symmetry operation of the fibril is the fourdimensional affine transformation matrix that acts on a vector in three dimensions, applying a rotation followed by a translation.For the tau fibril, this transformation would be a 4.7 Å translation down the long axis of the fibril along with a ∼−1°rotation in the plane perpendicular to that axis. 11,18We describe a procedure for generating this matrix in the discussion of Figure S1.Once we have a rotated/translated pose of the molecule, we check for a van der Waals clash with the initial pose, defined as any atom from the initial pose being within 2 Å of any atom in the transformed pose.We choose this conservative definition of what constitutes a clash based on the experimental structure of GTP-1 bound to AD PHF tau (Figure S2). 19For each pose that passes the symmetry check,  (a, b) SymDOCK generates poses for molecules as DOCK 3.8 normally does and then applies a symmetry check: if the molecule (blue) and its translated/rotated copy (purple) have no atoms within 2 Å of each other, then the pose passes the filter (a: with the dummy example of the experimental pose).If the molecule and its copy have atoms within this distance, then the pose fails the filter (b: showing the top pose of docking without the symmetry requirement).SymDOCK only evaluates ligand−ligand van der Waals energies of poses that pass the filter, and only does so for the first pair's energies because ligand−protein interactions will be the same at every site by symmetry (ligand−protein interaction terms are evaluated as in DOCK3.8 for the first ligand pose in the stack) .(c) To avoid edge effects from modeling a small fraction of a fibril that should be nanometers in length, we artificially extend the input protein structure using the symmetry operation.This cryo-EM structure only contains five monomers of tau in each protofilament, but by applying the affine transformation and its inverse on the monomer atom positions, we can generate new atom positions for a fibril of 15 monomers (bronze for the original structure, blue for applying symmetry operation, and magenta for applying the inverse symmetry operation).
−41 Changes to the DOCK Score Grids.In addition to the explicit ligand−ligand interactions, we slightly changed how we calculate the grids for the ligand−protein interactions.Because the protein fibril is hundreds of nanometers long, its interactions with any small molecule ligand should be periodic. 11We simulate this by artificially extending the fibril structure to more monomers than what are in the deposited structure (Figure 1c).For example, the cryo-EM structure of GTP-1 bound to AD PHF tau (PDB ID: 8FUG; we used an earlier version of the structure) only contains five monomers of tau in each protofilament.By applying the affine transformation and its inverse on the monomer atom positions, we can generate new atom positions for a fibril of 15 monomers.We then feed the longer fibril into the Pydock3 procedure 42 for generating the van der Waals, electrostatic, and desolvation grids and see that these grids do not have edge effects in the region of interest to docking (Figure S3).
We also change the electrostatic and desolvation grids to account for the changed dielectric environment of a stack of organic ligand molecules (Figure S4).Part of the normal DOCK3.8optimization procedure includes adding a low dielectric layer that extends the boundary of protein atoms into the bulk solvent. 31,39,42This increases the magnitude of electrostatic interactions, attempting to balance the nonpolar terms in docking, which often dominate.In SymDOCK, we have two sources of large-magnitude van der Waals energies: ligand−protein interactions and ligand−ligand interactions.Extending the low dielectric boundary for both electrostatic and desolvation grids to the positions of the heavy atoms in a stack of ligands in the experimental structure helps to overcome this bias toward nonpolar terms and mimics the ligand binding's effect on the dielectric boundary (Figure S5, made using the R Programming Language and the ggplot2 package). 43,44Such a ligand perturbation of the dielectric boundary becomes even more important with a stack of ligands bound to the fibril.
ANI-Based Stack Filter.The energetic favorability of the molecular poses returned by SymDOCK may be screened using the ANI-2x molecular force field, as implemented in TorchANI. 34,35ANI-2x is a neural network trained to predict the ground state energies of small organic molecules as calculated by density functional theory at the ωB97X/6-31G* level.We reasoned that the ANI-2x force field provided the best compromise between throughput and accuracy for the goal of determining the per-monomer ground state energies of the stacks of small molecules generated by SymDOCK.
We first estimate the SymDOCK-predicted geometry's permonomer energy, and then use that as a starting geometry for a Monte Carlo (MC) optimization via the Metropolis−Hastings algorithm. 45We do not perform any optimization prior to the ANI calculations: the MC procedure works only using the ANI energies.We sample each candidate move by taking translations from a multivariate normal distribution and rotations from an isotopic normal distribution on SO(3) (the group or rotations in 3D). 46For each candidate move, we generate the two nearest symmetry mates of the molecule, evaluate the ANI energy, and determine the per-monomer energy (Supplemental Methods).After 1000 Monte Carlo steps, we compare the minimum energy configuration achieved to SymDOCK's output in terms of both ANI-generated energy difference and RMSD to the original docking pose.At this point in the development of the method, this is mostly done to test whether the docking-predicted ligand stacking is sensible; we do not alter the conformation of the molecule from what SymDOCK produces in the docking predictions and do not currently use the MC protocol to refine best-scoring poses, although doing so may be explored in future use of this approach.

■ RESULTS
Retrospective Pose Reproduction.Reproducing experimental poses of known binders was a first metric we used to evaluate SymDOCK.Typically, reproducing the ligand pose by docking it back into its native complex is considered necessary in the field, but here, there is the complication of regenerating the full ligand stack.We began with the PET tracer GTP-1, docking it back into its complex with the AD PHF tau fibril.As is typical, we used pseudo atoms ("spheres") nucleated around the known ligand coordinates to define the sampling and calculated van der Waals, Poisson−Boltzmann electrostatic, and desolvation energy potential grids to evaluate ligand protein complementarity. 31,39,47SymDOCK sampled GTP-1 in 1507 orientations in the fibril site and multiple conformations within each orientation. 48Each pose was checked for the ability to generate a symmetry partner that does not clash with itself using the fibril symmetry operator (Figure 1a).If a pose passed this check, then SymDOCK calculated the van der Waals, electrostatic, desolvation and, new to SymDOCK, ligand−ligand van der Waals energies.We then ranked allowed poses by the total DOCK score.When we compared the highest-scoring docked pose of GTP-1 to its experimental geometry, the root-mean-squared difference (RMSD) was 1.165 Å, with the three-ring system of the ligand almost exactly superposed and the differences coming from the flexible tail (Figure 2a,b, Figure S6).Similarly, docking EGCG back into its complex in the inter-protofilament cleft of AD PHF tau led to an RMSD of 2.37 Å to the experimental structure for the top-scoring pose (Figure 2c,d, Figure S7).Even with this RMSD, SymDOCK recapitulated most of the key polar interactions observed in the experimental structure.Just as importantly, all of the predicted poses captured the ligand− ligand aromatic stacking.Finally, docking flortaucipir back into its complex with CTE type I tau led to an RMSD of 0.477 Å for the best docking pose, 7.78 Å for the top scoring one, and RMSD values between of 7.49 and 7.74 Å for the poses scoring between those two (Figure 2e,f, Figure S8).When we compared GTP-1's, EGCG's, and flortaucipir's SymDOCK poses to their ANI-optimized geometries, the ligand−ligand energies improved, but the structures changed only between 0.6 and 1.1 Å in RMSD (Table S1).Crucially, in all three of the docked complexes, the ligand−ligand packing and quadrupole stacking closely resembled those in the experimental structures.
Retrospective Enrichment of Ligands over Decoys.We were curious if symmetry docking could identify plausible fibril ligands among a large molecular library.We first asked whether SymDOCK could prioritize known ligands versus a much larger set of property-matched decoys.Docking against AD PHF tau, we docked the PET ligands GTP-1 and MK-6240 and a set of 14 MK-6240 analogs drawn from a larger group of 94 to maximize diversity (Table S2). 20,49We generated 50 property matched decoys for each ligand, for 800 total. 50,51Docking the 16 ligands and 800 decoys against the GTP-1-bound structure of AD PHF tau resulted in an adjusted logAUC enrichment of 32.7 and an enrichment factor at 1% (EF 1% ) of 6.4 (Figure S9). 39,52The better ranking of the known ligands versus the decoys reflects their better ability to form stacks that recapitulate fibril symmetry and interact with it.For instance, ZINC000000035542 is a 1.2.2-bicyclo that can dock to the receptor grids using the nonsymmetric default DOCK3.8 procedure, but its three-dimensional ring structure prevents it from docking symmetrically (structure in Figure S10).Even though this decoy has properties matched with the known ligands, the symmetry check prevents it from scoring and improves enrichment.
Docking 22 Million Molecules from ZINC22.Some of the decoys from the enrichment calculation scored well and looked plausible, stacking well with themselves and packing in the fibril to mimic its symmetry.We wondered whether there might be many molecules in a general library that might be suitable to form symmetrical stacks in a fibril.To test how symmetry docking would work in a large library screen, we docked 22 million molecules from the ZINC22 database 36 against the GTP-1-bound form of AD PHF tau.We selected molecules that were similar in gross properties to GTP-1: neutral, with heavy atom count between 21 and 23 and clogP values between 3.30 and 3.50. 53,54The 22 million molecules docked in 17,800 core hours (less than a day on a typical 1000core cluster), 2.90 s per molecule per core, two to three times slower than typical for standard docking with DOCK3.8. 32hen passed through the ANI-based filter, the configurations of the stacked ligands typically changed only modestly (Figure S11, made using the R Programming Language): 74% of the 5000 molecules changed by less than 1.5 Å RMSD, and 82% of them changed by less than 2.0 Å RMSD. 43Practically, these observations suggest that most of the stacking interactions in the docking poses are energetically sensible when they are evaluated at a higher level of theory.
Visually, many of the top 5000 molecules from the large library docking stacked in plausible ways, typically making one or two polar interactions with the fibril, akin to what is observed in the experimental ligand−fibril structures.For instance, ZINCnD000001AHba appears to make a single hydrogen bond to Lys353 from its amide carbonyl (Figure 3a).The molecule stacks with symmetry reflecting that of the overall fibril, including 4.7 Å translation.The orientation of the benzothiophenes and quinoline rings, however, creates interplanar distances of 3.4 and 3.5 Å, respectively.−57 Meanwhile, ZINCmD000004I6kF and ZINCnF000004BXys are not posed to form explicit hydrogen bonds with the fibril, instead making van der Waals contacts with it and perhaps more importantly making extensive ligand−ligand stacking interactions (Figure 3b,c).Self-stacking also seems to dominate in the pose of the amide-linked phenylbenzimidazole ZINClD000002Tbny, which again makes no formal hydrogen bond to the fibril (Figure 3d).When docked to the same site, the known PET ligands GTP-1 and MK-6240 would have placed among the top 5 to 7% best-scoring molecules in this screen of 22 million (the ranking of the other 14 known ligands used in the retrospective study is in Table S3).The poses of the molecules taken from ZINC22 lack experimental testing should not carry too much weight, but they do support the plausibility of large library screens using a symmetry-based ligand stacking approach.
Prospective Pose Prediction.Encouraged by the retrospective results and the ability to generate plausible geometries from large library screens, we tried the method for a genuine prospective pose prediction.Aware that the structure of the PET ligand MK-6240 was being determined against AD PHF tau fibrils, we docked that molecule against the structure of AD PHF tau in its GTP-1 complex using SymDOCK.When we compared the predicted docked poses to the subsequently determined cryo-EM electron density, we found that the ninthbest scoring pose came close to fitting the electron density (Figure 4a,b) and captured the symmetry and the overall placement of the ligand stack.The one discrepancy was that the ligand stack extended out of the cryo-EM density in its pyrrolo-pyridine ring and not completely fill it on the other side of the molecule, indicating that a simple rigid-body translation of the SymDOCK pose could better fit the density.Comparing the final refined position for the ligand stack to that predicted by the docking, the RMSD of 2.13 Å can be mostly attributed to a rotation about the axis of the fibril (Figure 4c,d).Indeed, the structure was close enough that the docked pose was used as the input for refinement of the final experimental structure of the MK-6240/AD PHF tau complex. 58We note that although this was the ninth-best scoring structure, the best pose by DOCK score had an RMSD of 3.01 Å from the experimental structure, and what is effectively the second-best scoring pose had an RMSD of 2.27 Å to that structure.From a virtual screening standpoint, either of these top-ranking poses would have been sufficient to identify MK-6240 as a plausible candidate even though it was the ninth best pose that we used to prospectively predict the experimental structure and indeed help in its experimental determination.

■ DISCUSSION
An arresting feature of the recent tau fibril structures is the adoption of symmetrical stacks by their ligands (Figures 2 and  4).Ligand−receptor interactions are few in these structures, with the most extensive contacts coming between the ligands themselves.For a receptor with multiple symmetrical sites, the adoption of such ligand stacks may be the favorable mode.Considerations of entropy coming from the number of sites and cooperativity from the ligand−ligand interactions may explain how molecules with so few fibril interactions may nevertheless bind in the nanomolar range, as many of the PET ligands do.A challenge for ligand discovery is exploiting the symmetrical fibril sites and ligand−ligand interactions, as most docking methods anticipate a 1:1 ligand-to-receptor stoichiometry and are designed to explore multiple ligand poses unencumbered by symmetry or stacking.
The SymDOCK approach adapts docking by imposing the symmetry of the fibril on any ligand geometry, insisting that there are no ligand−ligand van der Waals clashes.The method succeeds retrospectively in finding the experimental poses of four topologically unrelated PET ligands to tau (Figures 2 and  4).Although the symmetry calculation and the internal clash checks add operations to docking, these are compensated by the constraint imposed on pose selection by insisting on symmetry, and so the method loses little speed.A user can thus perform large library screens seeking new ligands.Although these predictions remain to be tested experimentally, known fibril ligands are enriched both against property matched decoys (Figure S9) and in screens of 22 million diverse molecules (Table S3).The new ligands' poses seem plausible, forming stacks around planar aromatic and heteroaromatic systems while preserving the symmetry of the fibril (Figure 3).In a genuine prospective prediction, the method correctly predicts the binding pose of MK-6240 to AD PHF tau (Figure 4), and that docking pose was in fact used in the solution of the ligand-complex structure as determined by cryo-EM. 58everal limitations of SymDOCK merit airing.For speed of calculation, we have used a 2 Å cutoff to define ligand−ligand clashes, which misses those owing to large radii atoms like bromine or iodine.Only evaluating the favorability of a stacking geometry using the AMBER 4.0 united-atom force field's van der Waals parameters ignores higher-order effects from π−π stacking and electrostatics. 40,41We have explored compensating for this with more detailed evaluation using the DFT-derived ANI, 34,35 although this is slow enough to limit application to rescoring post-docking.We certainly have not optimized to balance the self-stacking versus the ligand− protein interactions other than relying on the same scoring function for both�if, for some ligands, the balance is off, this could lead to false negatives and false positives.We currently only use the symmetry operation (affine transformation) from the protein, but it is possible that other symmetries of a ligand stack could fit the pocket.For example, one could imagine ligands that bind in a 2:1 stoichiometry by forming stacks of molecules alternating in orientation with respect to the protein by 180°rotations, and more complex repeat patterns among the ligands are also conceivable.Finally, we have shown that prospective large library screens are mechanically possible using SymDOCK and produce plausible symmetrical stacks, but prospective hits and binding modes will require experimental testing.For now, these predictions merely suggest that it should be possible to test this method for ligand discovery.
Notwithstanding these caveats, the key observations from this study should be clear.Imposing symmetry and excluding ligand−ligand clashes are effective approaches for docking ligands that form extensive, symmetric stacks against fibril proteins and potentially other proteins with symmetrical sites.Because we used a relatively simple approach to this problem, the method is fast enough for use in large screens.Encouragingly, by insisting on symmetry and excluding clashes, the method seems to capture the key features of the symmetrical ligand stacks despite ignoring higher-order energetic terms.We have implemented the symmetry docking approach in the program SymDOCK (freely available for academic research at https://dock.compbio.ucsf.edu/);−75 ■ ASSOCIATED CONTENT * sı Supporting Information The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c01749.
Additional methods describing the calculation of ANI energies; Tables S1 to S3 show energies and RMSD values before and after ANI refinement, known ligands of PHF tau used for enrichment calculations, and their ranks, respectively; and Figures S1 to S11 display further details on the symmetry calculations, calculation of Journal of Chemical Information and Modeling internal clashes among symmetry-docked molecules, ligand−protein docked complexes, effects of dielectric boundaries on docking scores, enrichment calculations, and how ANI optimization affects docked poses (PDF)

Figure 1 .
Figure 1.A demonstration of SymDOCK by docking GTP-1 into the structure of it bound to AD PHF tau (PDB ID: 8FUG).(a, b) SymDOCK generates poses for molecules as DOCK 3.8 normally does and then applies a symmetry check: if the molecule (blue) and its translated/rotated copy (purple) have no atoms within 2 Å of each other, then the pose passes the filter (a: with the dummy example of the experimental pose).If the molecule and its copy have atoms within this distance, then the pose fails the filter (b: showing the top pose of docking without the symmetry requirement).SymDOCK only evaluates ligand−ligand van der Waals energies of poses that pass the filter, and only does so for the first pair's energies because ligand−protein interactions will be the same at every site by symmetry (ligand−protein interaction terms are evaluated as in DOCK3.8 for the first ligand pose in the stack) .(c) To avoid edge effects from modeling a small fraction of a fibril that should be nanometers in length, we artificially extend the input protein structure using the symmetry operation.This cryo-EM structure only contains five monomers of tau in each protofilament, but by applying the affine transformation and its inverse on the monomer atom positions, we can generate new atom positions for a fibril of 15 monomers (bronze for the original structure, blue for applying symmetry operation, and magenta for applying the inverse symmetry operation).

Figure 2 .
Figure 2. Retrospective pose reproduction, with the protein in beige, the cryo-EM pose in blue, and the SymDOCK-predicted pose in purple, with (a, c, e) and without (b, d, f; emphasizing the docked ligand stacking) the experimental structure shown.(a, b) GTP-1 docked into in the PETtracer-site of AD PHF tau; RMSD to the experimental structure 1.2 Å. (c, d) EGCG docked into the inter-protofilament cleft of AD PHF tau; RMSD to the experimental structure 2.4 Å. (e, f) Flortaucipir docked into CTE type I tau; RMSD to the experimental structure 0.48 Å.

Figure 3 .
Figure 3. Stacking and polar interactions to the AD PHF tau (beige) among docked poses of four representative high-ranking molecules from a 22million-molecule library screen (blue).(a) The benzothiophene-quinoline ZINCnD000001Ahba.(b and c) The triaryls ZINCmD000004I6kF and ZINCnF000004Bxys.(d) The amide-linked phenyl-benzimidazole ZINClD000002Tbny.

Figure 4 .
Figure 4. Prospective pose prediction for MK-6240 in AD PHF tau.(a, b) The electron density of the MK-6240-tau structure (black mesh showing the 0.0117 level) with the modeled protein in beige and SymDOCK-predicted pose in purple.(a) Looking down the axis of the fibril.(b) Looking into the open trough of the fibril.(c, d) The final refinement of the MK-6240 pose into the electron density (blue) using the SymDOCK-predicted pose as input.