A Machine Learning Strategy for Drug Discovery Identifies Anti-Schistosomal Small Molecules

Schistosomiasis is a chronic and painful disease of poverty caused by the flatworm parasite Schistosoma. Drug discovery for antischistosomal compounds predominantly employs in vitro whole organism (phenotypic) screens against two developmental stages of Schistosoma mansoni, post-infective larvae (somules) and adults. We generated two rule books and associated scoring systems to normalize 3898 phenotypic data points to enable machine learning. The data were used to generate eight Bayesian machine learning models with the Assay Central software according to parasite’s developmental stage and experimental time point (≤24, 48, 72, and >72 h). The models helped predict 56 active and nonactive compounds from commercial compound libraries for testing. When these were screened against S. mansoni in vitro, the prediction accuracy for active and inactives was 61% and 56% for somules and adults, respectively; also, hit rates were 48% and 34%, respectively, far exceeding the typical 1–2% hit rate for traditional high throughput screens.

S chistosomiasis is one of a number of parasitic infectious diseases associated with poverty that principally impact lowand middle-income countries. 1,2 The disease is caused by various species of the flatworm parasite, Schistosoma, which lives in the blood vasculature and produces eggs that are responsible for a variety of pathologies. With more than 200 million people infected worldwide, the painful and often lifelong consequences of this disease can negatively impact the economic performance of the afflicted communities. 3,4 Treatment relies solely on praziquantel (PZQ), 5−9 which is safe, affordable, and reasonably effective in decreasing disease-associated morbidity. However, as the only drug available, there is concern regarding decreased efficacy or resistance, particularly as its use continues to expand. 9−11 Moreover, there is a lack of pharmaceutical investment in new chemotherapies for schistosomiasis.
Academia remains key to the identification, characterization, and preclinical evaluation of antischistosomal small molecules. 5,11 This process has involved small molecule screens of either validated targets or, more often, phenotypic (whole organism) screens of the schistosome parasite in culture. 11,12 Although the amount of data accumulated is small relative to major areas of research such as cancer, it is still a valuable resource for the application of machine learning methods to the drug discovery process. Computational techniques are an attractive drug discovery and development modality, especially given the financially constrained environment for diseases like schistosomiasis. 11,13 To date, however, there have been just a few efforts using these types of tools (e.g., docking and quantitative structure−activity relationship models) for schistosomiasis, in contrast to the more typical strategy of screening small molecule collections. 11,14,15 Bayesian machine learning methods have convincingly demonstrated their applicability to predicting active compounds for other infectious diseases of poverty such as Chagas disease, 16 Ebola, 17,18 and tuberculosis. 19−21 With respect to schistosomiasis, the phenotypic screening data in the literature have been generated using a plethora of quantitative or partially quantitative metrics for bioactivity and involved more than one developmental stage, most often postinfective larvae (schistosomula or somules) and adults of S. mansoni, the species best adapted to the laboratory environment. 12,22−25 To render the disparate data potentially useful for machine learning methods, we developed two "rule books" with which the data identified in a literature search could be normalized. These data were used to generate Assay Central 18,26−30 Bayesian machine learning models of antischistosomal activity for somules and adults over four different time points (eight models total). These Bayesian models were subsequently used to identify potential antischistosomal molecules in various chemical libraries. Using both manual and automated molecule selection techniques, a set of compounds was purchased and screened for bioactivity against somules and adults of S. mansoni. The eight Assay Central training data sets produced a high-quality, binary data set that can be utilized for additional machine learning methods. Also, each of the eight training data sets was applied to six other algorithms, and these model performances were compared to that of Assay Central.

■ RESULTS
For the machine learning application of Assay Central, two rule books were developed. The first normalized phenotypic screening data for S. mansoni that reported single metric outputs (e.g., ED 50 and % mortality; 19 articles between 1980 and 2019; Tables 1 and S1), and the second normalized data from screens mainly performed by the University of California, San Diego (UCSD) team (two published articles and 13 published and unpublished data sets) using an observational approach that describes and then enumerates the many phenotypic changes of which the schistosome is capable (Tables 2 and S2). In total, 3377 somule and 521 adult worm data points were curated for machine learning methods.
Data sets resulting from the two rule books were combined to develop eight Bayesian machine learning models with Assay Central over four time points (≤24, 48, 72, and >72 h) for both somules and adults ( Figure 1). Active (hit) compounds were defined as those receiving rule book scores of 3 or 4. Five-fold cross-validation performance metrics for these machine learning models are presented in Table 3 and Figure S1. Of the approximately 3100 and 500 compounds screened in the literature against somules and adults, respectively, both sets had a similar 5−10% recovery of active molecules and covered a similar chemical space as measured by our domain score of 0.305−0.384 (which is calculated by reference to the ChEMBL database 36 ). Predictive performance was assessed via receiver operating characteristic (ROC) scores, which fell within a tight range from 0.796 to 0.845 across all time points and developmental stages, thus suggesting that they are likely performing similarly. In general, all of the internal performance metrics were higher for adult models than somule models,  Compound activities have rule book scores that are scaled from 0 to 4 where 4 represents the most active compound. b Terms: D, dead; deg, degenerating; teg bleb, damage (blebbing) to surface tegument of adult parasites; any one of these particular changes observed is awarded a rule book score of 4. See Table S2 for full details. particularly the F1-Score, Cohen's kappa, and Matthews correlation coefficient.
In addition to the Bayesian algorithm of Assay Central, six other machine learning methods (random forest, k-Nearest Neighbors, support vector classification, naive Bayesian, AdaBoosted decision trees, and deep learning) were applied to the eight training data sets arising from the implementation of the rule books. The same 5-fold cross-validation performance metrics output by Assay Central were generated to allow for an evaluation of machine learning algorithms on the same data sets. These metrics were compared as radar plots in Figure S2. Metrics were comparable between the algorithms, although recall and ROC (also referred to as area-under-the-curve) were generally greater for the Assay Central models (difference of ≥0.2), especially in the adult data sets. Independent and pairwise comparisons of these alternative algorithms are shown in Table  S3, and Figure S3 depicts the rank normalized and "difference from the top" rank normalized score (ΔRNS) metrics. These comparisons suggest that there are no significant differences between most machine learning algorithms for these data sets even using more sophisticated and computationally intensive machine learning methods. The exception was the Adaboosted decision trees algorithm, which was significantly poorer in performance than the other algorithms.
A total of 56 compounds were selected and purchased for phenotypic screening of S. mansoni somules and adults ( Figure  S4). For each developmental stage, 10 predicted actives were chosen using each of the automated and manual methods (Figures 2 and 3), and eight predicted nonactives were chosen using the manual method. Although compounds were selected on a developmental stage-specific basis, all compounds were tested against both stages. The identities of the purchased compounds were blinded to the UCSD team performing the phenotypic screens until after the screen data were assembled. Results are summarized in Tables 4 and 5 for somules and adults, respectively, and the combined data for all compounds against   ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article both developmental stages are presented in Table S4. Table S5 presents the same 5-fold cross-validation metrics discussed above for the training data after integrating the phenotypic screening data for the 56 tested compounds. This inclusion of tested compounds had little, if any, consistent impact on the metrics compared to the original models. Automated and Manual Predictions for Somules. For somules, regardless of whether manual or automated predictions were made, the predicted actives possessed structural moieties in common with the active compounds in the training set. This is to be expected. These features include fused aromatic ring systems such as phenothiazines, indoles, and piperazines, nitrogen heterocycles such as 4-anilinoquinazoline, and peripheral substituents such as chlorine and fluorine (examples in Figure  4). Overall, 61% of those compounds predicted to be either active (yielding a severity score of ≥2) or inactive against somules was indeed confirmed as such in the phenotypic screening assay (Table 4). Further, 27 of the 56 (48%) compounds tested vs somules were active.
Seven of the ten automatically predicted active compounds against somules were experimentally confirmed, i.e., severity scores of ≥2 at 10 μM after 72 h (Table 4). Three were strong hits with severity scores of 3 or 4, namely, Z304863612, Z56174662, and Z56175896, whereas the other four, Z56978084, Z133946058, Z204004384, and Z385159220, yielded scores of 2. Notably, the top hit, Z304862612, was also a strong hit at 1 μM with a severity score of 4 after 72 h (Table S4). Furthermore, two compounds, Z56174662 and Z56978084, were active against the adults with scores of 4 and 2, respectively, after 48 h (Table S4).
Five of the ten predicted actives chosen manually for somules were confirmed experimentally ( Table 4). Four of these were strongly active with severity scores of 4 after 72 h (i.e., the antidepressant (S)-duloxetine hydrochloride; the proton pump inhibitor revaprazan hydrochloride; the antineoplastic amsacrine hydrochloride; and Z56872965). In contrast, Z425126666 yielded a score of 2. The two top hits, (S)-duloxetine and revaprazan hydrochloride, were also active at 1 μM, generating severity scores of 4 and 3, respectively, after 72 h (Table S4).
Three of the eight compounds manually selected as nonactive compounds against somules were, in fact, strongly active at 10 μM: the phosphoinositide-specific phospholipase C inhibitor U-73122, the natural product piperlongumine, and the antibiotic tiamulin fumarate. The other five predicted nonactives, sivelestat sodium, PNU-282987, R(+)-IAA-94, ecabet sodium, and I-OMe-Tyrphostin AG 538, were confirmed as inactive ( Table 4). Two of the active compounds, U-73122 and piperlongumine, were also active against adults with severity scores of 4 after 48 h (Table S4).
Automated and Manual Predictions for Adults. Similar to somules and regardless of whether manual or automated predictions were employed, the predicted adult active compounds possessed many structural moieties observed in the active training data compounds. Also, the prediction of hits for adults included those substituents seen in somules such piperazine rings and halogens (namely, trifluorine and bromine) as well as nitrile and carbonyl moieties (examples in Figure 5). Other chemistries not seen in the somule outputs included dihydropyridine analogs and steroids as well as compounds with multiple methoxy substituents. Overall, 56% of those compounds predicted to be either active (yielding a severity score of ≥2) or inactive against adults was indeed confirmed as such in the phenotypic screening assay (Table 5). Further, 19 of the 56 (34%) compounds tested vs adults were active.
Of the ten automated active predictions for adult worms, three were confirmed as active, i.e., severity scores of ≥2 at 10 μM after 48 h with Z827016000, Z2241105867, and Z288901226 generating severity scores of 2, 3, and 4, respectively ( Table 5). Two of these, Z827016000 and Z2241105867, and a third adult-inactive compound, Z827015296, were active against somules at 10 μM after 72 h with scores of 3 or 4 ( Table S4).
Five of the ten manually predicted adult actives were confirmed as active (Table 5). Specifically, nemadipine-A, an L-type calcium channel blocker, generated the maximum severity score of 4 at all time points measured. Moxidectin, an antinematode macrocyclic lactone, was also strongly active with a score of 4 after 48 h. Both bioactivities are consistent with previously published data for these compounds. 37,38 The nonnucleoside reverse transcriptase inhibitor etravirine 39,40 was active with a score of 3, whereas two other active compounds, Z53005631 and the p38 MAPK inhibitor SB202190 hydrochloride, each yielded scores of 2. The same five compounds were also active against somules at 10 μM after 72 h with scores between 2 and 4 (Table S4), as was one additional nonadult active compound, Z826994844, with a score of 2 after 72 h.
Finally, of the eight manually predicted nonactive compounds, only the antifungal itraconazole was active against adult worms with a score of 3 after 48 h ( Table 5). The same compound plus two others, dabigatran etexilate and BIX 01294 trihydrochloride hydrate, were also active against somules at 10 μM after 72 h with scores of 2 and 4, respectively (Table S4).
Compound Prioritization Process for Future Antischistosomal Studies. Because adult worms are ultimately responsible for disease in humans via the eggs they produce, 4 nine bioactive compounds were initially prioritized for further investigation based on the generation of severity scores of 3 or 4 against adults after 24 h. These were itraconazole, moxidectin, ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article piperlongumine, nemadipine-A, the benzimidazole Z425126666, revaprazan hydrochloride, the indole Z56174662, the pyridine-containing Z288901226, and U-73122. Prioritization was also influenced by activity against somules and, with the exception of Z288901226, the nine chosen compounds generated severity scores of ≥2 after 72 h (Tables 4 and S4).
During the screening assays, precipitation was noted for itraconazole, and the compound was not considered further. Three other compounds are known to have antischistosomal effects, including in some cases in vivo activity, namely, moxidectin 22,37 piperlongumine, 41−43 and nemadipine-A. 38 Due to our desire to identify novel starting points for treatments, these were also removed from further consideration.  sivelestat sodium tetrahydrate Compound activities have severity scores that are scaled from 0 to 4 where 4 represents the most active compound. Active compounds are those generating a severity score of ≥2. Compounds were tested in two experiments, each in duplicate, and representative data are shown. Structures and the descriptors associated with the severity scores are shown in Table S4 as are the phenotypic data arising from the use of 1 μM compound.
ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article The remaining five compounds ( Figure 6) were evaluated with other Assay Central models for stability, 44 permeability, 45 and cytotoxicity 46 (Table S6 and Figure S5). From these predictions, Z425126666 scored the best, i.e., was active for stability and permeability but inactive for cytotoxicity. The other four compounds were scored as inactive for stability and active for permeability and cytotoxicity.

■ DISCUSSION
For machine learning, we developed two rule books to normalize the disparate literature data arising from small molecule, in vitro   Table S4.
ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article phenotypic screens of S. mansoni (Tables 1 and 2; Tables S1 and  S2). The parsing and normalization of the data for the rule books were manually intensive and time-consuming yet necessary to develop the machine learning models. The rule books are also a first step toward developing a unified database of antischistosomal compounds. The scores derived from both rule books were pooled and applied to eight Bayesian machine learning methods with Assay Central. These models produced favorable 5-fold crossvalidation metrics with ROC scores exceeding 0.8 (Table 3, Figure S1). Although less literature data were available for adult worms (the largest model totaled 509 versus 3151 compounds for somules), there were generally higher 5-fold cross-validation performance metrics for these models over the somule counterparts. The more diverse and larger somule sets have a lower ratio of actives to total compounds compared with the adult data sets (approximately 1−5% less), which likely impacted the internal performance of the models. Machine learning models are only as good as the data that comprises them, so with less active compounds to learn bioactivity features from, the less likely predictions will be accurate.
Comparisons between the machine learning methods ( Figures S2 and S3, Table S3) suggest that the more advanced methods like deep learning and support vector classification do not significantly improve the internal predictive performance of the resulting models. This is a similar outcome to previous comparisons of the same algorithms using data sets for tuberculosis and HIV infection. 28,30 This could be related to the data set size, balance of the data set, or other factors such as model hyperparameter optimization. Lacking an algorithm with a clear and significant performance increase, the Bayesian method utilized by Assay Central is faster in generating models compared to the other algorithms like deep learning and can be implemented quickly on an average desktop computer, a major advantage in the constrained drug discovery research environment for diseases of poverty. 11 Several libraries of compounds from commercial vendors were virtually screened with the Assay Central Bayesian machine learning models to select both predicted-active and -inactive compounds for in vitro phenotypic assays of S. mansoni somules and adults. These predictions were performed either manually or in an automated manner (Figures 1−3), and 56 compounds were selected and purchased. Bioactivity against the parasite as a function of time and/or concentration was presented as severity scores, in accordance with previous studies (Tables 4 and  5). 47−49 Nine active compounds were initially prioritized on the basis of severity scores of 3 or 4 against adult worms after 24 h; of these, eight were also active against somules.   ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article After triaging for in-assay precipitation problems and prior evidence of antischistosomal activity, we settled on five compounds for future follow up studies: revaprazan hydrochloride, U-73122, Z425126666, Z56174662, and Z288901226 ( Figure 6). All possess common antischistosomal chemical moieties seen in adult and somule predictions, including indole (Z56174662) and pyrimidine rings (revaprazan hydrochloride) as well as fluorine substituents (Z288901226 and revaprazan hydrochloride). U-73122 does not possess these somule-specific moieties but instead has a steroid core that is common in adult active predictions. This may explain why it was selected manually as a potential developmental stage-specific compound. Both revaprazan hydrochloride and U-73122 have the advantage of known mechanisms of action (i.e., acid pump antagonist and phospholipase C inhibitor, respectively). 50−53 Only one compound, Z425126666, was predicted favorably by the stability, permeability, and cytotoxicity models (Table S6). This may indicate the need for further optimization of the molecular properties in future studies.
Both the automated and manual compound prediction methods demonstrated advantages and disadvantages in this study. The automated approach was efficient in selecting compounds with established antischistosomal chemical features such as halogen substituents, piperazine, and quinazolines, which is valuable for finding hit compounds based on known chemistries. However, this method did not produce novel chemistries for testing. In contrast, the manual compound selection method for active compounds, although more timeconsuming, allowed us to pick "underdog" compounds that diverge from the more established chemistries. For somules, both the automated and manual prediction methods were reasonably accurate in selecting active and inactive compounds as evidenced by the 70%, 50%, and 63% correct prediction return for the automated and manual actives, and manual inactives, respectively. For adults, prediction accuracy was somewhat less in relation to actives predicted automatically (30%) or manually (50%), whereas the prediction of manual inactives was 87.5% accurate. Together, the methods employed are less time-consuming and more likely to yield active compounds than the screening of large libraries, as indicated by our 48% and 34% hit rates for somules and adults, respectively, from just 56 molecules. Future developments of an automated compound selection method may include molecular property and toxicity predictions as well as expanding the number of libraries utilized. The manual selection process could be improved with a more defined selection of chemical diversity rather than tediously judging structures.
Repurposing approved drugs is a means to fast-tracking a drug to the clinic 54 and has been applied in the context of infectious diseases of poverty, including schistosomiasis. 22,47 One compound to emerge as strongly bioactive against both somules and adults was revaprazan hydrochloride. Revaprazan is a reversible proton pump inhibitor that reduces gastric acid secretions 50 but is also known to activate the serotonin receptor 4b. 51 The compound is approved in South Korea and India (under the trade name Revanex) to treat excess gastric acid secretion and gastritis and is used at a daily dose of 200 mg/day. Although the drug has poor water solubility and a relatively low oral bioavailability, 53 it is well tolerated in rats after oral administration (50−100 mg/kg). 55 In vitro studies in Caco-2 cells suggest that the uptake is mediated by a nucleobase transport system, which may contribute to the dose-dependent bioavailability when saturated. 56 This compound is a good example of repurposing an already-approved drug.
A number of the compounds that were identified as bioactive vs adults and/or somules are already known for their antischistosomal activity, e.g., moxidectin 22,37,57 and piperlongumine. 41−43 These studies were not found in our initial literature search and, thus, were not included in our training data but offer an opportunity for further validation of the prediction and experimental approaches herein. In a previous in vitro study, moxidectin was considered to be active (at 10 μM for 72 h) against somules and moderately active against adults (at 33.3 μM for 24 h). 22 The drug has also shown some efficacy in patients infected with S. mansoni, particular in decreasing egg burdens. 37,57 In our own screens, 10 μM moxidectin produced degenerative changes in both developmental stages by 48 h.
We also tested piperlongumine as part of the predicted nonactive compounds for somules. Contrary to the prediction, piperlongumine was in fact strongly active against somules and adults (dead or dying parasites by 48 h; Tables 4, 5, and S4). Our experimental data are consistent with other in vitro studies whereby adult worms were dead by 24 h at 15 μM and 7-day old somules were killed within 48 h at the same concentration. 41 The rediscovery (confirmation) of active compounds is a familiar issue in machine learning, as predictions are limited by the training data available. Refinement of the rule books and the development of a comprehensive database will limit the future rediscovery of active compounds.
Lago et al. screened 73 nonsteroidal anti-inflammatory drugs, including a compound screened by us here, niflumic acid, against adult S. mansoni in vitro for 72 h at 50 μM. 58 Niflumic acid was not active in the initial in vitro screen, but other analogs had modest activity (LC 50 values ranged from 20.6 to 37.4 μM), the best of which, mefenamic acid, generated an LC 50 of 11.1 μM. An inspection of the rule book based on these published single metric data (Table 1) shows that these compounds would be considered inactive (rule book scores of zero). We had selected niflumic acid manually as an adult active because it possesses the attractive feature of a known mechanism of action (a cyclooxygenase-2 inhibitor). Experimentally, however, it was essentially inactive (a severity score of 1 after 48 h at 10 μM), i.e., consistent with the data from Lago et al. Interestingly, however, the analog Z288901226 ( Figure S4B), which was predicted to be active using the automated selection method, was lethal to adults by 48 h at 10 μM. Thus, even though our prediction of activity for niflumic acid was incorrect, our identification of the active analog Z288901226 provides a novel starting point for further exploration of this 3-(trifluoromethyl)anilino-3-pyridine chemotype.

■ CONCLUSION
We have described a process to curate and normalize the disparate phenotypic screening data for S. mansoni using two rule books. Once standardized, these data sets were interrogated by the proprietary software Assay Central to generate a total of eight Bayesian machine learning models. From these models, 56 predicted active and nonactive small molecules were selected for in vitro phenotypic screening against S. mansoni somules and adults; we identified five actives for future optimization studies. The prediction accuracy was 61% and 56% for somules and adults, respectively, with hit rates of 48% and 34%, respectively. Thus, the return on the time and effort invested exceeds the typical 1−2% hit rate from high throughput screens, 59,60 which is especially attractive when working with schistosomes given the ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article need for small animal hosts to propagate the parasite and the finite numbers of parasites that can be recovered per host. Finally, the rule books represent a first step toward a unified database of antischistosomal activity. We will continue the iterative feedback process of generating and assembling new data to improve our machine learning models.
Development of Two Rule Books to Normalize Phenotypic Screening Data in the Literature. Reports of in vitro antischistosomal activity in the literature have typically employed two phenotypic screen approaches: (i) those that reported single metric outputs (ED 50 , LD 50 , % mortality, % survival, etc.) either derived from an observationally based adjudication system or the measurement of a biochemical marker (e.g., ATP or NADPH) at fixed time points and (ii) those that involve the observational assessment and enumeration of the phenotypic changes that the schistosome parasite is capable of (changes relating to motility, size, and density) as a function of time and/or concentration. For each approach, we developed a "rule book" that employs a sliding scale of scores 0 (no activity) to 4 (most activity) whereby potent compounds that act quickly and at low concentrations receive higher scores than those that take more time to act and/or act at higher concentrations.
In the first rule book (Table 1; full details in Table S1) for example, ED 50 values measured at 24 h in the range of 10−25 and <5 μM would generate rule book scores of 2 and 4, respectively. However, to achieve the same scores at the longer time point of 72 h, the ED 50 values would be necessarily more stringent, i.e., 2.5−5 and <1 μM, respectively.
For the second rule book, phenotypic changes (principally shape, motility, and density) were counted up to a maximum of three to provide a partially quantitative assessment of overall severity ( Table 2; full details in Table S2). Severe changes that involved degenerating parasites, damage to the outer tegument (specific to adult worms), and worm death were given the same weighting as three phenotypic changes. In addition to the number of phenotypic changes, the time to appearance of these changes was considered such that those that occurred in shorter time frames received a higher rule book score. Finally, the concentration at which the changes were observed (between 0.1 and 10 μM) also contributed to the final rule book score. For example, degenerate or dead parasites observed at <72 h in the presence of 0.5 μM compound would result in a rule book score of 4, whereas in the presence of 10 μM compound, the score would be 2. Data incorporated as part of the second phenotypic screening approach are derived from peer-reviewed resources, 24,47 the CHEMBL database, and unpublished screens performed by the UCSD authors.
Data Set Organization for Machine Learning. Upon application of the two rule book scoring systems, the resulting data sets were pooled for generating machine learning models with Assay Central (Figure 1). Models were generated according to the screened development stage (somule or adult) and experimental time point for modeling (≤24, 48, 72, and >72 h). For building somule models, data for both newly transformed somules (NTS) and somules that had been allowed to acclimate overnight prior to screening were consolidated. Likewise, for building adult models, data for adults that had been harvested at 37 days post-infection or at later time points were consolidated.
The same activity thresholds were applied to all individual models: compounds generating a rule book score of 3 or 4 were considered active whereas those that yielded a score of 0−2 were considered inactive. This threshold was chosen with a view to finding strongly active compounds. For any given time-point model, inactive compounds at longer time points were included to maximize chemical diversity; e.g., inactive compounds at 72 h were included in the 48 h model. When duplicate compounds between articles were observed, the binary activities reflecting the rule book score, i.e., a value of 1 for rule book scores 3−4 and a value of 0 for rule book scores 0−2, were averaged and rounded to classify the compound as active or inactive. A > 72 h model was also generated to consider a compound's activity over all recorded time points by applying a binary classification. Thus, if a compound was inactive between 24 and 72 h but active at 168 h, the compound was considered active in the >72 h model but inactive in the other models. Four models were built from standardized time points (≤24, 48, 72, and >72 h postexposure) for each developmental stage (eight total).
Assay Central. The associated rule book scores were used with the Assay Central technology to predict compounds for in vitro screening against S. mansoni. The Assay Central software has been described in detail elsewhere. 18,26−30 Briefly, all screening data were collated within Molecular Notebook (Molecular Materials Informatics, Inc. in Montreal, Canada). The underlying framework applies a series of molecular standardization scripts for thorough curation, including removing salts and flagging abnormal valences and mixtures, to generate high-quality (i.e., machine learning-ready) data sets and Bayesian models that are capable of bioactivity predictions. 45,76 These models employ extended-connectivity fingerprints of a maximum diameter of 6 (ECFP6) that are generated from the Chemistry Development Kit library 77 by applying the Morgan algorithm. The ECFP6 descriptors are well-known for their ability to map structure−activity relationships. 45 All Assay Central models include several metrics 45 to evaluate and compare predictive performance, including ROC, recall, precision, F1-Score, Cohen's kappa, 78,79 and Matthews correlation coefficient 80 scores. A Domain metric was also generated for each model to provide a measure of chemical coverage of the training data in relation to the chemical space of the entire ChEMBL database (comprising nearly two million compounds) ranging from 0 (no overlap) to 1 (total overlap). 30 The generation of probability-like prediction scores from Bayesian models within the Assay Central software has also been previously described. 45,76 Briefly, this score sums the "contributions" of molecular fingerprints to an active classification, determined by the ratio of its presence in active and inactive training data. Bayesian predictions were evaluated using the standard probability cutoff 45 so that a chemical receiving a score of ≥0.5 is classified as active, i.e., a hit compound, at the modeled target. Predictions also included an applicability score whereby a ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article higher score indicates that more of the predicted molecule's fingerprints are present in the training data. There is no standard cutoff for the applicability score, rather this serves to increase confidence in the prediction score. Three Assay Central prediction methods were conducted to either manually select active and nonactive compounds or, using an automated workflow (discussed in more depth in the following section), select active compounds (Figure 1). A "raw" prediction outputs a user-defined number of top-scoring molecules with no consideration of the applicability score or diversity. Compounds identified as present in the training data are also excluded from raw prediction outputs to avoid testing compounds with known (according to our literature search) bioactivity. In contrast, a "ranked" prediction is initially identical with the raw prediction but outputs a user-defined subset of diverse compounds (according to Tanimoto similarity and molecular fingerprints) from a user-defined number of topscoring compounds, for example, the most diverse 10 compounds from the top-scoring 100. Finally, a "consensus" prediction considers multiple models to output a consolidated score that is calculated from the average of the component prediction scores (constrained between 0 and 1) multiplied by the component applicability scores for a given molecule. A raw prediction was applied for the manual selection of active and nonactive compounds (Figure 2), whereas both ranked and consensus prediction methods were applied in the automated workflow to select active compounds (Figure 3).
The honeycomb visualization feature of Assay Central ( Figure  7) 81 allows users to investigate, in a visually intuitive manner, the similarity of predicted compounds to those found in the training data sets. This feature depicts an external compound as the central point of the plot and builds the training compounds around it, so that the increasing distance is proportional to the decreasing similarity with the central compound.
Comparison of Assay Central with Other Machine Learning Algorithms. The eight Bayesian models generated within Assay Central were compared to six other machine learning algorithms, namely, random forest, k-Nearest Neighbors, support vector classification, naive Bayesian, AdaBoosted decision trees, and deep learning. 28,30,82 Briefly, deep learning was implemented using Keras (https://keras.io/) and Tensorflow (www.tensorflow.org) backend, and hyperparameter optimization was performed with three layers and the Scikitlearn grid search method. Other algorithms were built using the open source Scikit-learn (http://scikit-learn.org/stable/) python library. All alternative algorithms employed the ECFP6 molecular descriptor as used in Assay Central for a straightforward comparison of algorithms using the same data sets and descriptors. The 5-fold cross-validation performance metrics were compared using a rank normalized score as performed previously. 30,83,84 Rank normalized scores were evaluated using a pairwise comparison to compare per training set, and an independent comparison was used to give a more general comparison. A "difference from the top" (ΔRNS) metric 30,83,84 gave a rank normalized score for each algorithm subtracted from the highest rank normalized score for a specific training set. The ΔRNS metric retains the pairwise results from each training set cross-validation score by algorithm, allowing a direct performance comparison of two algorithms (using all of Figure 7. An example image of the Assay Central honeycomb visualization feature. 81 This was employed in the manual selection of compounds for in vitro testing. Training molecules (white background) are organized in relation to a central predicted molecule (black background) such that increasing distance is proportional to decreasing structural similarity (evaluated by the Tanimoto coefficient and ECFP6) with the central molecule (revaprazan hydrochloride).

ACS Infectious Diseases
pubs.acs.org/journal/aidcbc Article the available model quality metrics) without losing information from the other algorithms.
Compound Selection for Phenotypic Screens of S. mansoni. The described machine learning models were applied to vendor libraries (see below) to select compounds for in vitro phenotypic screening against S. mansoni somules and adults. For each developmental stage, predicted active compounds were chosen manually and automatically ( Figure  1). Predicted nonactives for each developmental stage were chosen using the manual method only; the automated method was considered a proof-of-concept, so only active predictions were deemed of consequence.
The manual approach to predicting active and inactive compounds for each developmental stage was as follows ( Figure  2). First, a raw prediction was generated against all models within several small molecule collections: (i) an internally curated collection of 1355 FDA-approved drugs from 2016 to 2018, (ii) a lead-like and chemically diverse collection from Enamine containing over 50 000 compounds, 85 (iii) the Library of Pharmacologically Active Compounds or LOPAC 1280 from Sigma-Aldrich, 86 and (iv) a screening library from Selleck Chemicals of over 1600 natural products. 87 When prioritizing compounds, more consideration was given to predictions from the ≤24 h and >72 h models so as to capture fast-action and chemical diversity, respectively. Compounds were filtered on the basis of multiple criteria including known compound targets, cost, and liabilities; for example, compounds known to elicit serious side effects were deprioritized. As the goal of this study was to discover novel antischistosomal chemistries, the predictions were further filtered so that compounds that were dissimilar to the training data were considered more desirable candidates for testing, as determined by Assay Central applicability scores and honeycomb plots (Figure 7). 81 Ten active and eight inactive compounds were manually predicted for in vitro phenotypic screens of adults and somules (36 compounds total).
For the automated approach to predicting active compounds (Figure 3), only the diversity collection from Enamine 85 was interrogated for both simplicity of purchase and the drug-like nature of this collection. First, a ranked prediction was applied to each of the four time point models in a given developmental stage to output the ten most diverse compounds from the topscoring 50 (totaling 40 compounds per life stage). Then, a consensus prediction was applied to each deduplicated subset across all time point models to output the ten predicted active compounds for each developmental stage. These were then purchased.
Life Cycle of S. mansoni and Screening of Compounds Predicted by Assay Central. S. mansoni (NMRI isolate) was maintained by passage through Biomphalaria glabrata snails (NMRI line) and 3−5 week-old, male Golden Syrian hamsters as intermediate and definitive hosts, respectively. Somules were generated from infectious larvae (cercariae) that were harvested from infected snails, and adult parasites were harvested from hamsters, as described. 47,88 Somules were used for screening within 2 h of their preparation from cercariae (otherwise known as NTS).
For phenotypic screens of somules, 47 parasites (40 animals/ well in clear, u-bottomed 96-well plates) were incubated in 100 μL of Basch medium 89 supplemented with 4% heat-inactivated FBS, 100 U/mL penicillin, and 100 μg/mL streptomycin. Compounds predicted by Assay Central were then added at 2× of the final concentrations of 1 and 10 μM. The same medium (100 μL) was immediately added to mix the compound with a final concentration of 0.5% DMSO. Compounds were tested in two experiments each in duplicate. Incubations were maintained at 37°C in a 5% CO 2 environment, and phenotypic changes were noted at 24, 48, and 72 h. A compound was considered active when it generated a severity score of ≥2 after 72 h (see below).
Adult parasites (five males and approximately two pairs per well in 24-well plates) were maintained in 2 mL of the same medium under the same conditions in the presence of 10 μM compound and a final concentration of 0.1% DMSO. 47 Phenotypic changes were noted at 1, 5, 24, and 48 h. Compounds were tested in two experiments, each in duplicate. A compound was considered active when it generated a severity score of ≥2 after 48 h (see below).
Phenotypic changes in both developmental stages were observed using a Zeiss Axiovert A1 inverted microscope. The parasite's phenotypic changes in shape, density, and motility were recorded using a constrained nomenclature of simple and, where possible, self-explanatory descriptors. 47−49 To allow for the partially quantitative comparisons of compound effects, each descriptor was typically given a value of 1 and these were summed to generate a "severity score" with a maximum value of 4. Descriptors recording severe phenotypes, i.e., death, degeneracy or, for adult parasites specifically, damage to the surface tegument, were given the maximum value of 4.
Ethics Statement. The use of hamsters in support of the S. mansoni life cycle was in accordance with a protocol approved by UC San Diego's Institutional Animal Care and Use Committee. The committee derives its authority for its activities from the United States Public Health Service (PHS) Policy on Humane Care and Use of Laboratory Animals and the Animal Welfare Act and Regulations (AWAR).
ACS Infectious Diseases pubs.acs.org/journal/aidcbc Article Further details on the models, structures of public molecules, and computational models (PDF)