PlayMolecule Glimpse: Understanding Protein–Ligand Property Predictions with Interpretable Neural Networks

Deep learning has been successfully applied to structure-based protein–ligand affinity prediction, yet the black box nature of these models raises some questions. In a previous study, we presented KDEEP, a convolutional neural network that predicted the binding affinity of a given protein–ligand complex while reaching state-of-the-art performance. However, it was unclear what this model was learning. In this work, we present a new application to visualize the contribution of each input atom to the prediction made by the convolutional neural network, aiding in the interpretability of such predictions. The results suggest that KDEEP is able to learn meaningful chemistry signals from the data, but it has also exposed the inaccuracies of the current model, serving as a guideline for further optimization of our prediction tools.

For all three models, during training and validation, the protein-ligand complex was rotated around the geometrical center of the ligand before generating the final grid, in order to augment the existing training set and compensate for the fact that CNNs are not rotationally invariant. 2

Clash detector
To train the clash detector model, we used the complexes available in the refined set of the 2019 version of PDBbind. The clashed poses were artificially generated by randomly rotating the ligand on its own geometrical center while ensuring that at least one atom in the ligand was within a distance of 1.5Å to the protein. We used the same architecture as for K DEEP , and the binary cross-entropy loss function. 3 We trained the model for 50 epochs with a batch size of 32 and a starting learning rate of 10 −3 . This model achieved 0.97 classification accuracy and 0.98 precision in a held-out validation set, constituted by a randomly selected group of 10% of the protein-ligand complexes, for which clashed poses were also generated.
Both the training and validation sets were constructed in a balanced way, so that half the examples were crystal poses and the other half were clashed poses.

Pose classifier
The pose classifier model was trained on BindingMoad database, 4 which contains 38, 702 protein-ligand complexes. Ten docked poses were generated for each complex using the rDock docking software, 5 which led to more than 310, 110 examples after removing failed jobs. This set was split into two classes, one featuring "good" poses (poses with RMSD below 1Å) and "bad" poses (RMSD greater than 3Å). Poses between 1 and 3Å were discarded, similar to the work by, 2 to create a greater separation between the two distributions and ease the classification task. The final number of examples was 270, 225, constituting a much larger training set than the other two models. This model was trained with the same hyperparameters, loss function and architecture as the previous one.
A validation set was created, composed by all the good and bad poses generated for a random selection of 10% of the protein-ligand complexes in the BindingMoad database, so that poses for the same protein-ligand complex cannot be found in both training and validation sets. Because most of the poses belonged to the "bad" category, a sampling correction was introduced in the training and a number of bad poses in the validation were removed to reach a 1:1 ratio, reaching a total of 17, 478 examples in the validation set.

Strict split: K DEEP
For K DEEP , we designed a more strict split, where the PDBbind refined set was clustered by sequence similarity using a 70% threshold. The three biggest clusters were selected for testing. A final filter was applied to these three clusters to discard complexes whose ligands had a fingerprint similarity greater than 0.6 with any ligand in any other cluster ensuring that these test sets were different both in terms of protein sequence and ligand composition from any other cluster. Finally, three different K DEEP models were trained using one of the three clusters as test set (leaving one cluster out and training in all the others). Pearson's correlation coefficient in these three test sets was 0.70 (N=29), 0.28 (N=152) and 0.09 (N=81). Hence, predictive performance is lower than on the less strict split and it might be family-dependent.

Quantitative analysis Correlation between far away residues and accuracy
We tried to measure if any correlation existed between the presence of far protein residues being highlighted and prediction accuracy. We summed the attributions of all protein channels for the voxels that were further than 8Åfrom any ligand atom (bad attributions) and divided it by the sum of all protein attributions, obtaining a percentage of the attributions falling far from the ligand. The Pearson's correlation with the prediction error was just 0.05 meaning that the presence of far away residues being highlighted does not correlate well with prediction accuracy.

Attribution consistency across rotations and pose variations
In order to check how sensitive attributions were to changes in the protein-ligand complex's orientation, attributions were computed for 10 different orientations for each system. Then the ligand and protein atoms closest to the voxel with the highest attribution in the occupancy channels were identified in each rotation. This allowed us to evaluate how consistent was the selection in comparison to a random baseline, where the ligand and protein atoms were selected randomly among those inside the 24Å  Number of times the same atom was selected Protein consistency across rotations Figure S10: Consistency in protein atom attributions across 10 different orientations. The protein atom closest to the best voxel in the protein occupancy channel is identified in each of the 10 orientations. For each complex, we plot how many times the same atom was selected.
As can be seen, in the clash detector and pose classifier models, for some complexes, the exact same protein atom is picked in all 10 rotations. In all three models, the distribution is clearly shifted upwards (more consistent) compared to the random baseline.