Teaching and Learning Computational Drug Design: Student Investigations of 3D Quantitative Structure–Activity Relationships through Web Applications

The increasing use of information technology in the discovery of new molecular entities encourages the use of modern molecular-modeling tools to help teach important concepts of drug design to chemistry and pharmacy undergraduate students. In particular, statistical models such as quantitative structure–activity relationships (QSAR)—often as its 3D QSAR variant—are commonly used in the development and optimization of a leading compound. We describe how these drug discovery methods can be taught and learned by means of free and open-source web applications, specifically the online platform www.3d-qsar.com. This new suite of web applications has been integrated into a drug design teaching course, one that provides both theoretical and practical perspectives. We include the teaching protocol by which pharmaceutical biotechnology master students at Pharmacy Faculty of Sapienza Rome University are introduced to drug design. Starting with a choice among recent articles describing the potencies of a series of molecules tested against a biological target, each student is expected to build a 3D QSAR ligand-based model from their chosen publication, proceeding as follows: creating the initial data set (Py-MolEdit); generating the global minimum conformations (Py-ConfSearch); proposing a promising mutual alignment (Py-Align); and finally, building, and optimizing a robust 3D QSAR models (Py-CoMFA). These student activities also help validate these new molecular modeling tools, especially for their usability by inexperienced hands. To more fully demonstrate the effectiveness of this protocol and its tools, we include the work performed by four of these students (four of the coauthors), detailing the satisfactory 3D QSAR models they obtained. Such scientifically complete experiences by undergraduates, made possible by the efficiency of the 3D QSAR methodology, provide exposure to computational tools in the same spirit as traditional laboratory exercises. With the obsolescence of the classic Comparative Molecular Field Analysis Sybyl host, the 3dqsar web portal offers one of the few available means of performing this well-established 3D QSAR method.


Students works [V.E.]: Indoleamine 2,3-dioxygenase 1 (IDO1)
In this section is described the development of FB LB 3-D QSAR model based on the molecules described by Tianwei Weng et al 50 and Ute F. Röhrig et al 51 . The molecules were evaluated as inhibitors of IDO1. Overview of the biological target and connected pathology. IDO1 ( Figure SI1) is a monomeric oxidase (45 kDa) containing an active heme with a Fe +2 ion. Structurally, the active site of IDO1 comprises two lipophilic regions: a pocket immediately above the heme group, site of the catalytic tryptophan (pocket A, in orange), and a lipophilic binding area (pocket B, in green). Pocket B is composed by several key residues, such as Arg231, Phe226 and Phe227, which were recognized as fundamental for IDO1 activity. Cancer immunotherapy is currently entering an exciting new era. Among attractive immunotherapy approaches, immune checkpoint therapy which targets regulatory pathways to enhance the antitumor immune responses of T cells has led to important clinical advances in cancer. A central role in immune escape has been attributed to the kynurenine pathway of tryptophan (TRP) metabolism, which leads to the depletion of tryptophan and to the production of kynurenine metabolites, both responsible for local immunosuppression. 51 IDO1, which catalyzes the initial and rate-limiting step of the kynurenine pathway, is expressed by tumour cells to escape a potentially effective immune response, and high IDO1 expression is associated with poor prognosis in a variety of cancer types. In vitro and in vivo studies demonstrated that IDO1 inhibitor administration improved the efficacy of therapeutic vaccination, chemotherapy, or radiation therapy. Physiologically, IDO1 role is to prevent autoimmunity phenomena and is therefore involved in peripheral immune tolerance. The classical concept proposes that tumour cells express high levels of IDO1, whose enzymatic activity determines the depletion of TRP in the local microenvironment and the consequent inhibition of T cell responses. T cells, in fact, detect low levels of TRP through unloaded tRNAs and subsequently initiated a lack of amino acid response resulting in cell cycle arrest and cell death. It has also been shown that IDO1 expression is induced by tumour necrosis factor alpha (TNF-α) and other inflammatory mediators. 50 Therefore, IDO1 could be induced as a consequence of the initial inflammatory response to the tumour. Furthermore, several studies have proposed that the immunosuppression by the degradation of TRP is not only a consequence of the lowering of local TRP levels but also the accumulation of high levels of TRP metabolites. This idea was supported by studies demonstrating that T cell responses are inhibited by TRP metabolites due to the binding of the kynurenine metabolite (KYN) with the aryl hydrocarbon receptor (AHR) which involves modulation of the immune response and the IDO1 activity of. 59 50 3-D QSAR model building. Through the Balloon in the Py-ConfSearch module, 70 conformations for each training set molecules were calculated and aligned by means of Shaep (Py-Align) ( Figure SI2). Multiple alignments were carried out by means of the different templates as implemented in the web application (i.e. the most/least polar molecule, the most/least rigid molecule, and so on). All the 16 available alignment rules were applied leading to 16 3-D QSAR models built with default settings. Further parameters were also changed to optimize the best default model by varying the probe type, the grid spacing, the min-max cutoff energy and the minimum sigma. (Tables SI3 -SI 4 The best 3-D QSAR model was obtained aligning the training set molecules using the "most polar" (highest LogP value) molecule as a template and using as probe atom a sp2 carbon atom (C.2). The model's building settings were grid spacing of 1.312 Å, obtained through optimizations (Tables SI3 and SI4), minimum sigma 0.05, grid extension 5 Å, and cutoff energy of 15 kcal/mol. On the best model, both LOO and LSO validation were performed to evaluate the model robustness. Finally, the Y-S procedure was carried out, to verify any lack of chance correlation. The − 2 and − 2 values were always lower than those obtained on this best model, suggesting that this model was not a chance correlation (Table SI5 and Figure SI3).   Final model graphical inspection 3-D QSAR contour maps were generated using the Coeff x Mean to observe steric and electrostatic CoMFA like maps. In Figure SI4 the least (compound IDO1_09 in red) and most (compound IDO1_09 in blue) active molecules have been superimposed. Steric contour maps are related to increase (positive, in green) or decrease (negative, in orange) in activity, based on presence or absence of certain steric hindrance, such as Van der Waals forces ( Figure SI4A). As it shown in Figure SI4A, contour maps are green where most active molecule (blue) has a steric hindrance such as benzene ring; As for electrostatic maps, the blue colored polyhedra indicate that in this zone hydrogen bonds acceptors or generally negatively charged groups are favored, while hydrogen bonds or generally positive charge groups are unfavorable. The red polyhedral are the areas around the molecules where hydrogen bond donor groups are favored, or in general with a positive charge, while acceptor groups or generally with a negative charge could lead to lower inhibitory potency ( Figure SI4B). Predictive Performance and Interpretation of the Final Model. As described above, during Py-CoMFA model generation, the activities of a separate test set can also be predicted. Such test set predictions for the IDO1 inhibitors are shown in Table SI6 50-51 While the overall SDEPpred of 1.72 was much higher than the LOO crossvalidation SDEP of 1.05, it might be noted that this gap is almost entirely caused by the predicted activity for a single structure, IDO1_61, being much lower than the experimentally observed activity. In the context of an actual drug discovery project, such a promising "magic methyl" type of result might prompt the examination of IDO1_61's structure for any distinguishing features.  Overview of the biological target and connected pathology. SHP-2 ( Figure SI5) is a non-receptor tyrosine phosphatase protein, presents two N-terminal tandem domains Src homology 2 called SH2, a PTP domain and a C-terminal domain 60 . Thanks to the analysis through X-ray, it was found that the conformation is composed of two N-SH2 and C-SH2 domains that bind with the PTP domain through a "tunnel" portion. 61 Aminoacid residue Glut76 plays a linking role between N-SH2 and the PTP domain, including a hydrogen bond with the residue Ser502 and a salt bridge with the residue R265. 62 A B C Figure SI5. Cartoon representation of SHP-2. It is composed of a C-terminal lobe (blue and light blue chains) linked by a hinge region with the N-terminal lobe (green, yellow, orange and red chains) (A). Surface representation of hydrophobic and hydrophilic sites of the protein. The black circle indicates the core (B). Cartoon, stick and surface representation of the interaction between a competitive inhibitor and binding site (C). For figure generation, the data available in the PDB entry code 5XZR was used.
SHP-2 phosphatase encoded the PTPNII gene is a non-receptor PTP containing two N-terminal Src homology 2 domains, a PTP domain and a C-terminal tail. 60 X-ray structures have demonstrated that SHP-2 adopts an selfinhibited conformation in its basal state: N-SH2 domain of SHP-2 protein binds to the PTP domain and blocks its substrate access, therefore, resulting in suppressed PTP activity. When the SH2 domains bind to specific phosphotyrosine motifs, the auto inhibitory interactions are abolished. The phosphatase is then in an open conformation that allows SHP-2 activation. 63 Protein tyrosine phosphorylation is a key modification controlling all aspects of crucial cellular processes including proliferation, differentiation, growth, and apoptosis. Dysregulation of tyrosine phosphorylation has been associated with the developmental pathologies of various human diseases such as cancer, diabetes, and autoimmune disorders. 64-65 More recently, SHP-2 was reported to bind and dephosphorylate RAS and increase its association with effector protein RAF to activate downstream proliferative RAS/ERK/MAPK signaling. 66 Also, germline or somatic mutations in PTPN11 that cause hyperactivation of SHP-2 have been identified in Noonan syndrome (50%), 67 juvenile myelomonocytic leukemia (JMML, 35%), myelodysplastic syndrome (10%), B-cell acute lymphoblastic leukemia (7%), acute myeloid leukemia (AML, 4%) 68 and solid tumors including lung adenocarcinoma, colon cancer, neuroblastoma, melanoma, and hepatocellular carcinoma. 69 SHP-2 may also participate in the programmed cell death pathway (PD-1/PD-L1) and contribute to immune evasion 70 reversing immunosuppression in the tumour microenvironment, the reason why investigating the inhibition of SHP-2 for cancer immunotherapy is also of great interest. 71 Describing the binding site, the mutated residues are located in the interface between N-SH2 and PTP domains. 72 Glutamate 76 plays a key role in bridging N-SH2 and PTP domains including a hydrogen bond with S502 hydroxyl and a salt bridge interacting with R265. Substituting glutamate 76 leads to weakened interactions between these two domains and thus arouses the destabilization of the auto inhibited conformation. 61 Mutations of E76 residue have been frequently identified in Noonan syndrome and leukaemia. Mutant E76A fulllength SHP-2 (SHP-2E76A) exhibited much higher phosphatase activity than wild-type SHP-2 (SHP-2WT) in an

C-terminal
Hinge region N-terminal in vitro biochemical assay using DIFMUP as a surrogate substrate. In contrast, C459S SHP-2 (SHP-2C459S), which replaces the catalytic centre cysteine to serine, totally abolished its phosphatase activity. 62 The PTP catalytic pocket presents positive charge and it's highly conserved, that represents a unique drug discovery challenges, thus most catalytic site inhibitors require multiple ionizable functional groups in order to inhibit the enzyme. Training set composition. The training set consisted of 40 molecules taken from two articles 52-53 (Table SI7), including two classes of compounds: the first one characterized by an aminotiazole scaffold, where adding a bromine atom increases activity; the second one characterized by pyrazine and pyrimidine scaffold, among which pyrazine showed more potency against SHP2. In Table SI8 are shown the experimental activity of each molecule and the predicted activity of the model for each molecule, with the relating error.

3-D QSAR model building.
A B Figure SI6. SHP-2 alignment rules. (A) Example of conformations obtained for the compound SHP2-29. (B) Training set aligned on longest alignment rule. The best model obtained was optimized in terms of probes atom, grid spacing, grid extension and so on to find the best 3-D QSAR model optimized (Table SI9). The best model was obtained with C.3 probe; it had an electrostatic q 2 of -0.049, a r 2 of 0.847, SDEC of 0.312 and SDEP of 0.275 considering 4 ONPCs. The best result setting were obtained with grid spacing 2.6 Å, minimum sigma 0.10, grid extension 5 Å, and cutoff energy of 20 kcal/mol. LSO cross-validation was also performed in addition to LOO, in order to better evaluate the model stability. Finally, a Y-scrambling (Ys) was carried out to verify any lack of fortuitous correlation. The model was not affected by chance correlations as the q 2 results resulted al negative. (Table SI10-  For each alignment, a construction model was tested with the same option value for Grid spacing, Grid extension, CutOff, Min. Sigma and CV Type, changing only the probe atom to find the best result indicating in which type of alignment start building the model. The best model is highlighted in yellow.

Grid Extension 10
Minimum Sigma 1.5 Final model graphical inspection. By best model 3-D QSAR contour maps are generated using the Coeff x Mean, to observe steric and electrostatic CoMFA like maps. First of all, in Figure SI8 the least (compound SHP-2_28 in red) and most (compound SHP-2_26 in blue) active molecules have been overlying. Given the presence of a fluorine atom (in light green) on most active molecule's terminal portion, can be hypothesized that presence of a large atom can be important for the activity. Steric contour maps are related to increase (positive, in green) or decrease (negative, in orange) in activity, based on presence or absence of certain steric hindrance, such as Van der Waals forces ( Figure SI8A). Electrostatic contour maps give information on molecules' charge to observe whether attractive or repulsive forces are favored: in positive electrostatic contour maps (green polyhedral) attractive forces are favored, instead in negative electrostatic contour maps (orange polyhedral) repulsive forces are favored ( Figure SI8B). Test set composition and predictive ability evaluation. Regarding the external validation about the test set, the following table summarized the difference between experimental and predicted activities. The prediction errors are low, except for the molecules SHP-2_41 and SHP-2_43 (Table  SI12).

Students works [S.M.]: Interleukin-1 Receptor Associated Kinase-4 (IRAK4)
In this section is described the development of FB LB 3-D QSAR model based on the molecules described Scott J.S. et al 54 and Lee K.L. et al. [54][55] The molecules were evaluated as an inhibitor of IRAK4.
Overview of the biological target and connected pathology. IRAK4 ( Figure SI9) is a serine-threonine kinase that is known to have biologically important kinase activity. 73 IRAK4 and all members of the family (including IRAK1, IRAK2 and IRAKM) facilitate key protein scaffolding and regulatory functions in various signaling pathways 74-75 . IRAK4, in particular, participates as a key mediator in the Interleukin-1/Toll-like Receptor (IL-1/TLR) signaling cascades. [76][77] This kinase made up of 460 amino acids, which contains both a kinase domain and a death domain. 76 Its kinase domain exhibits the typical bilobal structure of kinases, with the N-terminal lobe consisting of five-stranded antiparallel β-sheet and one α-helix, while the C-terminal lobe is composed mainly of a few α-helices. Situated where the two lobes meet is an ATP binding site, which is covered by a tyrosine gatekeeper. Tyrosine as a gatekeeper is believed to be unique to the IRAK family of kinases 78 . The protein also contains three auto-phosphorylation sites, each of which when mutated results in a decrease in the kinase activity of IRAK4  Training set composition. The training set consists of 58 molecules arising from two articles 54-55 (Table SI13). Most of these compounds were characterized by a tricyclic scaffold that showed potency against IRAK4 and a good level of selectivity against a panel of kinases. 85 In Table SI14 are shown the experimental activity of each molecule and the predicted activity of the model for each molecule, with the relating error.  3-D QSAR model building. For conformation analysis, 70 conformations through Balloon method were performed and a molecular alignment with Shaep method was executed ( Figure SI10). Being a Ligand-Based approach, the alignment rules can be multiple, such as the most/least active molecule, the most/least polar molecule, etc. (Table  SI15) So, all rules were applied and for each one a 3-D QSAR model with default settings has built. The best alignment found on the lowest LogP molecule (ID molecule: IRAK4_55).
A B Figure SI10.  The best model obtained was optimized in terms of probes atom, grid spacing, grid extension and so on to find the best 3-D QSAR model optimized (Table SI16). The best probe with best energy interaction is Oxygen (O.3) and it shows an electrostatic and steric q 2 value of 0.444, a fitting represented by r 2 of 0.979, SDEC of 0.162 and SDEP of 0.837 considering 6 optimal numbers of principal components (ONPC) . The best outcome setting is obtained with grid spacing 1.0 Å, grid extension 5 Å, minimum sigma 2.0 and cutoff energy of 25 kcal/mol. Later, on the best model has been performed an LSO validation in addition to LOO, in order to assess the model stability. In the end, a Y-scrambling (Ys) validation has been carried out to verify if the model was affected by the fortuitous correlation and the feedback was positive (Tables SI16-SI17 and Figure SI11). The best models (LOO and LSO) are highlighted in yellow.  Final model graphical inspection. 3-D QSAR contour maps generated using the "Coeff x Mean" (average activity contribution -AAC) option and overlapped to the most (IRAK4_54) and least (IRAK4_07) potent IRAK4 inhibitor were used to interpret the best CoMFA like model ( Figure SI12). Positive and negative steric associated AAC are depicted by green and yellow polyhedrons as suggested in the original CoMFA article 29 (panels A and B of Figure SI12). In particular, green polyhedron was observable around the IRAK_07 fluorine atom indicating that a slight increase of steric hindrance in that position could lead to more potent derivatives ( Figure SI12A). Further green polyhedron are visible around the main central fused bicycle (pyrrolopyrimidine) revealing that some substitution could provide more potent derivatives. Regarding the negatively associate steric contribution (yellow polyhedrons in Figure  SI12B) the most important features to be avoided are the substitutions corresponding to the IRAK4_07 methyl and IRAK_54 carboxyamide at position C12 and C11 of the byciclic central moiety (pyrrolopyrimidine), respectively (Figure SI12 B). Regarding positive and negative electrostatic associated AAC to avoid any misunderstanding have been represented in blue and red meshed contour maps, respectively (panels C and D of Figure SI12). These maps clearly indicated that an electron-rich substituent is favorable for the activity ( Figure  SI12C), instead 2-pyrrolidone's ethyl group and -NH group are negatively influencing the inhibiting potency ( Figure SI12D). Test set composition and predictive ability evaluation. The predictive ability of the best model was then evaluated by mean of an external test set by randomly selecting 12 compounds from the ChEMBL database (https://www.ebi.ac.uk/chembl/) and subjecting them to all the procedure and getting the predicted activities (Table SI18). It must be noted however that, inasmuch as ChEMBL is a record of all published structures having any reported biological potency, the very poor accuracy of these predictions for these random structural choices is only to be expected. In this section is described the development of FB LB 3-D QSAR model based on the molecules described by Keith F. McDaniel et al. 86 The molecules were evaluated as ligand for BRD4.
Overview of the biological target and connected pathology. BRDs are structure evolutionarily conserved in different nuclear proteins and there are 46 human proteins known to contain bromodomains 86 BRD4 ( Figure  SI13) (Table  SI19). These compounds, characterized by a pyridone core linked to a diaryl ether scaffold showing binding  Table SI20 are shown the experimental activity of each molecule and the predicted activity of the model for each molecule, with the relating error.

3-D QSAR model building.
The conformation analysis is made by Balloon on the platform www.3d-qsar.com. In order to find the conformation of each molecule which has the lowest energy, it was performed a conformation analysis with at least 70 conformation for each molecule ( Figure SI14). Once the conformational analysis is performed, it needs to perform a molecular alignment. In the Ligand-Based approach, the alignment rules can be multiple and, in this work, it was used "lowest_LogP" rule. Therefore, the reference molecule is BRD4_27 (Table SI21). The best probe atom with best energy interaction results to be Hydrogen (H). It is found to be with an Electrostatic and Steric q 2 0.548, a Fitting r 2 of 0.886, a SDEC of 0.246 and a SDEP of 0.489, considering 4 optimal numbers of principal component (ONPC). These results were obtained by optimizing parameters by imposing a Grid spacing of 2.2 Å, a Grid extension of 5 Å, a Minimum sigma of 0.05 and cutoff energy of 25 kcal/mol (Table  SI22). LSO and LOO validation was performed in order to assess the model stability. In addition, a Y-scrambling (Ys) validation was made to verify if the model was affected by fortuitous correlation and the feedback was positive (Table SI23 and Figure SI15).  Table SI22. BRD_4 3-D QSAR models optimization by varying the pretreatment parameters on the best alignment rule. Final model graphical inspection. 3-D QSAR contour maps were generated using the Coeff x Mean option, using both the most active and the least active model, superimposing them. In this way, steric and electrostatic CoMFA like maps were obtained and studied. The CAA associated with positive and negative steric is represented by green and yellow polyhedra, as suggested in the original article CoMFA [30] (panels A and B of Figure SI16). In yellow ( Figure SI16A) we see the negative steric fields, where a further steric hindrance is not recommended. In fact, they could lead to less activity. In green ( Figure SI16B), therefore, we see areas with positive steric fields, where it is therefore advisable to add substituents to increase activity. In particular, it is observable green polyhedron around the fluorine atoms (in green) suggests that a slight increase in steric hindrance in that position could lead to more active compounds. But even a replacement with a larger group instead of the methyl group of the 6-methyl-1H-pyrrolo [2,3-c] pyridine-7-one group or in the sulfur position of the ethanesulfonamide group could lead to better structures. ( Figure SI16A). While, instead, as regards the negative steric component (in yellow), it is necessary to avoid substitutions on the carbonyl group of 6-methyl-1H-pyrrole [2,3 -c] pyridine-7-one ( Figure SI16B). In positive electrostatic contour maps (blue polyhedra) the attractive forces are favored around the 6-methyl-1Hpyrrole [2,3-c] pyridine-7-one group ( Figure SI16C), while the repulsive forces (red polyhedra) are favored close to the second aromatic diphenyl ether ring ( Figure SI16D).

Test set composition
Model's external validation was performed with the molecules of the test set and the following table summarized the difference between the experimental activities and those foreseen. The molecules were obtained by randomly selecting 12 compounds from the ChEMBL database (https://www.ebi.ac.uk/chembl/) and subjecting them to all the procedure and getting the predicted activities (Table SI24). It must be noted however that, inasmuch as ChEMBL is a record of all published structures having any reported biological potency, the very poor accuracy of these predictions for these random structural choices is only to be expected.