ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Figure 1Loading Img

Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets

View Author Information
Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
§ G2 Research, Inc., P.O. Box 1242, Tahoe City, California 96145, United States
Center for Emerging & Re-emerging Pathogens, Division of Infectious Diseases, Department of Medicine, Rutgers University—New Jersey Medical School, Newark, New Jersey 07103, United States
Department of Pharmacology & Physiology, Rutgers University—New Jersey Medical School, Newark, New Jersey 07103, United States
# Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham, 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
*E-mail (A.M.C.): [email protected]
*E-mail (S.E.): [email protected]. Phone: (215) 687-1320.
Cite this: J. Chem. Inf. Model. 2015, 55, 6, 1231–1245
Publication Date (Web):May 21, 2015
https://doi.org/10.1021/acs.jcim.5b00143

Copyright © 2015 American Chemical Society. This publication is licensed under these Terms of Use.

  • Open Access

Article Views

5715

Altmetric

-

Citations

LEARN ABOUT THESE METRICS
PDF (3 MB)
Supporting Info (1)»

Abstract

On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user’s own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery.

Introduction

ARTICLE SECTIONS
Jump To

For well over a decade, the cost of in vitro and in vivo screening of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties of molecules has motivated efforts to develop various in silico methods to efficiently pre-filter candidates for actual physical testing. (1-29) By relying on very large, internally consistent datasets, large pharmaceutical companies have succeeded in developing highly predictive but ultimately proprietary models. (29-33) At one pharmaceutical company, for example, many of these models (e.g., volume of distribution, aqueous kinetic solubility, acid dissociation constant, distribution coefficient, microsomal clearance, CYP3A4 time-dependent inhibition) (30-36) as well as other endpoints (15, 22) have achieved such high accuracy that they have essentially put the experimental assays out of business. It is likely that most large pharmaceutical companies can now perform experimental assays for a small fraction of compounds pre-filtered through the proprietary ADME/Tox and physicochemical property computational models, thus improving cost efficiency while minimizing in vitro and animal experimentation. Extra-pharma computational efforts have not been so successful, largely because they have, by necessity, drawn upon considerably smaller datasets, in many cases trying to combine information from the literature. (37-43) This situation, however, has improved with larger datasets publicly available in PubChem, (44, 45) ChEMBL, (46-48) CDD, (49) and others, and some drug companies depositing their data (e.g., the recently deposited AstraZeneca data in ChEMBL), which can be useful for model building. (50-53)
ADME/Tox properties have been modeled by us (1, 54-81) and many other groups (29, 82) using an array of machine learning algorithms such as support vector machines, (59) Bayesian modeling, (19) Gaussian processes, (83) and many others. (84) A more exhaustive review of the different machine learning approaches is outside the scope of this work. These combined efforts at ADME/Tox model building have likely resulted in hundreds of published models which are, unfortunately, inaccessible to anyone but their authors in most cases. This limited access problem for published models is also likely the case with computational models for bioactivity or other physicochemical properties of interest. The ability to share such models freely still remains a major challenge when dealing with issues of proprietary samples or data, as repercussions for such for-profit pharmaceutical companies could be severe. The current development of technologies for open models and descriptors builds on established methodologies. (85-88) Datasets for quantitative structure–activity relationships (QSAR) have previously been represented in a reproducible way via QSAR-ML. (85) These methods also come with a reference implementation for the Bioclipse workbench, (86, 87) which provides a graphical interface. There have been several early efforts at cheminformatics Web services; e.g., Indiana University provides access to cheminformatics methods (fingerprints, 2D depiction, and various molecular descriptors) and statistical techniques. These have been used to develop models for the NCI60 cancer cell lines. (89, 90) In addition, there are Web tools for the prediction of bioactivities and physicochemical properties, like the Chemistry Activity Predictor (GUSAR). (91) Also, the Open Notebook Science (ONS) project (92) has developed models for solubility and melting point using web services based on open descriptors and algorithms. These tools all enable parties to collaborate publicly but do not facilitate private or selective collaboration.
We have previously demonstrated a proof of concept using open descriptors and modeling tools to model very large ADME datasets at Pfizer. (22) Models were constructed with open descriptors and keys (CDK + SMARTS) using open software (C5.0) and performed essentially identically to expensive proprietary descriptors and models (MOE2D + SMARTS + Rulequest’s Cubist) across all metrics of performance when evaluated on human liver microsomal stability (HLM), RRCK passive permeability, P-gp efflux, and aqueous solubility datasets. (22) Pfizer’s HLM dataset, used in this study, contained more than 230,000 compounds and covered a diverse range of chemistry space as well as addressing many therapeutic areas. The HLM dataset was split into a training set (80%) and a test set (20%) using the venetian blind splitting method. In addition, a newly screened set of 2310 compounds was evaluated as a blind dataset. All the key metrics of model performance, e.g., R2, RMSE, kappa, sensitivity, specificity, and positive predictive value (PPV), were nearly identical for the open source approach vs proprietary software (e.g., PPV = 0.80 vs 0.82).
Our goal is to enable extra-pharma drug discovery projects to exploit in silico machine learning methods that have, until now, been confined in practice to pharma and to a few knowledgeable academics. These methods better exploit limited screening resources and will enable such projects to cover more unexplored chemical space and to address ADME/Tox earlier in the discovery process. Extra-pharma projects represent a growing trend for commercial drug discovery (93-95) to be the principal efforts to find cures for many neglected diseases (e.g., tuberculosis, malaria, Chagas disease, visceral leishmaniasis, etc.), and thousands of orphan indications (96, 97) will require more collaborations and data, and therefore model sharing. This approach has the potential to accelerate the discovery of promising drug-like lead compounds with acceptable properties in vivo and ultimately yield a significant impact on global health.
We now describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project (98, 99) and incorporated using the FCFP6 descriptors in the CDD Vault, which was also recently made open source. (100) We make use of the CDD Public database, (49) which has over 100 public datasets that can be used to generate community-based models, including extensive neglected infectious disease SAR datasets (malaria, tuberculosis, Chagas disease, etc.), and ADMEdata.com datasets that are broadly applicable to many projects. An accompanying paper uses this software to develop models on a much larger scale. (101)

Experimental Section

ARTICLE SECTIONS
Jump To

Data and Materials Availability

All computational models are available from the authors upon request. All molecules for malaria, tuberculosis, and cholera datasets from Table 1 are available in CDD Public (https://app.collaborativedrug.com/register), and the models from Table 2 are available from http://molsync.com/bayesian1.

Laplacian-Modified Naïve Bayesian Definition and Pseudocode

Bayesian models have been a useful part of computer-aided drug discovery for many years and were popularized in Pipeline Pilot. (19, 102, 103) The statistical method is particularly useful for correlated structure-derived fingerprint bit strings with an activity measurement that has been classified as active or inactive on the basis of a selected threshold. Variants on the original Bayes theorem can be used to produce an estimate of the likelihood of activity for proposed compounds. For reproducing binary classifications, Bayesian methods have several appealing features. The model creation process is typically very fast and can be implemented in O(N) time, which means that an ordinary desktop computer can build and evaluate models with hundreds of thousands of compounds with minimal delay. When general purpose structure-derived fingerprints are used, the methods tend to be quite robust, which is in contrast to methods such as QSAR, which require some expertise to select appropriate descriptors, avoid overtraining, and ensure domain applicability.
Unlike many other applications of Bayesian methods, the use of chemical structure fingerprints as inputs means priors often numbers in the thousands, which produces scale distortions. Even if the prior probabilities are approximately 0.5 in each case, multiplying thousands of such values together tends to warp the distribution of the posterior probabilities to being asymptotically close to 0 or 1, which introduces numerical precision issues. By summing the logarithms of the ratios rather than multiplying the fractions, and incrementing the numerator and denominator, the precision issues are eliminated, and the resulting predictions tend to follow a linear distribution.
The main drawback with this particular Bayesian variant is that the resulting prediction is not a probability, but rather an arbitrary number that has no particular upper or lower bound. The results can be converted into a two-state classification by selecting a threshold, or into a probability-like value by picking a linear transformation function, but these are post-Bayesian calibrations that must be made by applying judgment criteria that are not an intrinsic part of the method.
We have used our work (100) on the reference extended connectivity ECFP and FCFP fingerprints to create a software class that allows Bayesian models to be created from a collection of molecules and activity data, used to predict probabilities for new molecules, and to serialize/deserialize the model as a structure text string that can be saved to a file or shared with any other package that implements the same functionality.
The Laplacian-modified naïve Bayesian (which we call Bayesian models for simplicity) formula uses a simple definition, which pre-supposes that each molecule has been described by enumerating a list of fingerprints that applies to it, and has a determination of whether it is active or inactive. For each fingerprint code in the entire dataset:where Ci is the contribution associated with the presence of a fingerprint hash code i, which is in turn derived from Ai and Ti, which are respectively the number of active molecules with the fingerprint and the total number of molecules with the fingerprint, while R is the overall fraction of actives.
Building the Bayesian model is a simple matter of determining the total set of fingerprints in the dataset and, for each of them, calculating the value of Ci. Any fingerprint that is theoretically possible but not encountered in the training set has an implied value of 0. Any fingerprint hash code that is observed equally often in active and inactive molecules (or not at all in either) has a ratio of 1, for which the log value is zero.
When making a prediction for an incoming molecule, the value is determined by adding up the contributions for each fingerprint hash code for the molecule:The resulting prediction, Pm, is an uncalibrated value: unlike for the conventional Bayes theorem, the result is not a probability and is generally not directly interpretable, meaning that there is no significance to either the scale or offset. Methods for interpreting these values will be discussed subsequently.
Creating a Bayesian model using this method is very fast and has favorable scalability properties, because it requires just two passes through the input collection: the total number of actives and inactives needs to be summed, and after that, each compound needs to be considered only individually. The total memory required to build the Bayesian model is bounded by the theoretical number of fingerprints. For each possible unique fingerprint hash code, it is necessary to store two integers (Ai, Ti) and derive one floating point value per fingerprint (Ci). For small, relatively dense fingerprint schemes, these can be stored in a flat array (e.g., when folding fingerprints into 1024 possible values), but for larger schemes with sparse occupancy it is better to use a dictionary object (e.g., when the full 32-bit range of ECFP6 or FCFP6 fingerprints is allowed).
The pseudocode for the model building is as follows:
  • let T = empty dictionary (key: hash code i, value: total t)

  • let A = empty dictionary (key: hash code i, value: actives a)

  • for m in all molecules in training set:

    • determine list of fingerprints F for molecule m

    • for fingerprint hash code i in F:

      • increment Ti

      • if molecule m is active: increment Ai

  • let R = total actives/total molecules

  • let C = empty dictionary (key: hash code i, value: contribution v)

  • let L = unique list of keys for T

  • for i in L:

    • put Ci = log([Ai + 1]/[Ti·R + 1])

For making a prediction for an incoming molecule:
  • let m = molecular structure

  • determine list of fingerprints F for molecule m

  • let Pm = 0

  • for i in L:

    • if i is one of the fingerprints in F:

      • let Pm = Pm + Ci

Implementing these algorithms using a flat array rather than a dictionary object is analogous and differs only in the way indices are looked up.

Chemistry Development Kit

The method described in this article is implemented in the Chemistry Development Kit (CDK) project and made available under the terms of the Lesser Gnu Public License (LGPL). The latest version of the project can be obtained from its SourceForge host and underlying Git repository (http://sourceforge.net/p/cdk/code/ci/master/tree). The Bayesian modeling capabilities are available within the tools section, the main class for which is org.openscience.cdk.fingerprint.model.Bayesian.
Using the CDK library to create a new Bayesian model from a collection of molecule objects and boolean activity values is straightforward. For example, given the filename for an MDL SDfile with a field called “pIC50”, for which any molecule with a value of 6 or greater is considered active, the following Java code snippet can be used to create a serialized model:
The resulting serialized form can be stored for future use. If it is stored in a file, it can be easily retrieved and used to apply to a different SDfile, e.g.:
The first step is to read the model from the pre-existing file (“activity.bayesian”). The second step iterates over the input SDfile, while writing to another SDfile with two extra fields appended: “RawPrediction”, which contains the uncalibrated outcome from the modified Bayesian method, and “ScaledPrediction”, which contains the prediction that has been scaled using metrics originally derived from the internal cross-validation.
These two examples demonstrate the ‘create and consume’ use cases and can be easily adapted to scenarios besides reading and writing from files. Serialized Bayesian models can be embedded in any kind of text-friendly data structure, e.g., XML documents, JSON messages, SQL tables, etc. Use of models to provide predictions can be applied to a variety of invocations, such as command line tools, incorporation into modeling packages with a graphical interface, Web services accessible via API, etc.

File Format

For saving models for subsequent reuse, the information necessary to apply the model to make predictions for new molecules can be stored in a text-based file format. The molecules that were used to build the model are not included in the serialized form, nor are the fingerprints that were generated from them. This means that sharing a serialized model allows the recipient to make inferences on the basis of the original data without explicitly having access to it. For confidentiality purposes, sharing models without the underlying data is useful in a number of situations, but it should be noted that this cannot be considered as entirely foolproof: a determined hacker with some context would likely be able to make a well educated guess as to the actives contained in the training set.
Figure 1 shows an example of a serialized file. The default file extension is .bayesian, and the MIME type is chemical/x-bayesian. The text should be encoded as UTF-8 unicode, for which all of the content is limited to the ASCII subset, except for the freeform text notes. End of line should be encoded Unix-style, and floating point numbers can be encoded with a decimal point (e.g., 1.23, with a period symbol for the separator, invariant of localization) or scientific notation (e.g., 1.23 × 10–9). The format is case- and whitespace-sensitive. The body of the format consists of individual lines, each of which encodes a discrete property, and is of arbitrary length.

Figure 1

Figure 1. Example of a serialized file containing a very small Bayesian model. The default file extension is .bayesian, and the MIME type is chemical/x-bayesian.

The first line contains the header, which consists of the recognition sequence and essential information about the model. The first nine characters are always set to the ASCII characters for the string “Bayesian!” (hex: 42 61 79 65 73 69 61 6E 21), which can be used as a recognition sequence. This is useful for situations such as embedding in streams, within whitespace-padded subfields such as XML elements, or when the file extension or MIME type is unavailable or unreliable.
The recognition sequence is followed by four comma-separated fields: fingerprint type, folding length, calibration minimum, and maximum. Only the first two fields are mandatory, and parsers should ignore additional fields, in case the format is subsequently extended.
The fingerprint type must be one of ECFPn or FCFPn, where n is 0, 2, 4, or 6. These correspond to the eight different permutations of circular fingerprints that are implemented in the CDK library. The most commonly used values are ECFP6 and FCFP6. The variety of fingerprints may be extended at a later date. When a parser encounters a fingerprint type that it does not recognize, it should invoke an error pathway if there is any intention of applying the model to new molecules, since the ability to produce the exact same fingerprints is a pre-requisite. The folding length should either be 0 (no folding, i.e., full range of 32-bit integers) or a power of 2 (e.g., 512, 1024, 2048, etc.). The parser should fail for invalid folding lengths. The calibration minimum and maximum values are used to transform raw predictions into a probability-like range. Since this is calculated by analyzing the cross-validation metrics, which is an optional step, the information may not be available. If not included, then the model can only be used to generate raw prediction values. Note that sometimes the minimum and maximum values are equal, which can occur for datasets that are small or trivial. In this case, the degenerate value should be treated like a simple threshold, giving results of 0 or 1, rather than a probability-like transform.
The model specification ends with a line beginning with the string “!End”. This should be considered as the terminator sequence regardless of trailing characters or whitespace. All lines in between the header and footer can be examined out of order.
Lines that match the template {bit index}={contribution}, where bit index is an integer and contribution is a floating point number, make up the payload of the model. The contributions-per-bit are typically stored in a dictionary object, since the bit coverage is sparse, i.e. usually not all of the possible bits are represented. If the fingerprints are folded, then the bit indices range from 0 to folding-1. If not folded, the indices are represented as signed integers: approximately half of the values will be negative.
For generating raw Bayesian predictions, all that is needed is fingerprint type, folding, and contribution list. All remaining lines are optional and must follow the general format of {category}:{key}={value}, whereby category and key must be plain ASCII without whitespace, category must begin with an alphanumeric character, and value may contain any unicode characters, except for end-of-line. Duplicates are allowed. When software needs to parse, modify then write a model file, any optional lines that are not understood should be preserved as-is.
The optional data that are currently used by the CDK implementation include the following:

training:size and training:actives: the total number of compounds in the training set, and the number of actives, respectively

roc:auc: integral of the receiver operator characteristic, which is a number between 0 and 1

roc:type: the method used to partition the data for internal cross-validation, which is one of leave-one-out, three-fold, or five-fold.

roc:x and roc:y: two comma-separated lists of numbers from 0 to 1 which can be used to recreate the ROC curve visually. Note that for large datasets, the total number of points may be reduced in order to limit the impact on file size. This means that while the resolution is indistinguishable for graph plotting purposes, recalculating the integral from these points is less precise than using the stored roc:auc value.

note:title: ideally a short free-text description of the model that communicates to a scientist what data were being modeled. It should be expected to be displayed in a single line.

note:origin: a short free-text description that provides information about the provider of the model, be it the software algorithm or the source of the data, or both.

note:comment: a completely freeform field, which may be of any length. Since newlines are disallowed, multiple paragraphs of comments should be encoded by having multiple comment lines.

The file size for a serialized model depends on the size and diversity of the molecules. One of the main reasons for opting to fold the fingerprints is that it places a reasonable maximum limit on the file size. For example, a collection of 7000 molecules with experimentally determined activity against Mycobacterium tuberculosis using ECFP6 with no folding produced a file size of 1.4 Mb. Folding the fingerprints into 32,768 bits reduced the file size to 646 kb, and into 2048 down to 67 kb. (101)

CDD Models

The CDD Vault (104) product makes use of the fingerprinting functionality in the CDK to provide Bayesian model-building capabilities, which we have termed CDD Models. While the Bayesian implementation is proprietary, the underlying algorithm for model generation is equivalent to the method described in this article. Models can be created and used within the CDD Vault environment, and at any time they can be exported using the format described above, which means that they can be utilized by any software that either implements the requisite algorithms described in this article or makes use of the CDK library.
The CDD Models extension is part of CDD Vision in CDD Vault. A model is created by separating a set of molecules into two collections: those that could be considered ‘actives’ and those that can be considered ‘inactives’. These classes are then used to train the model, after a series of steps are taken to ensure logical consistency. These include ensuring that duplicates do not appear in either collection, that there is no overlap between the collections, and that each collection contains at least two molecules. The Standard InChIKey (105) of each molecule is used as the criteria for detecting and removing duplicates. These precautions are addressed in a series of pre-processing steps in the CDD Ruby on Rails application, where the modeling process and molecule management system is hosted, wherein the training sets are algorithmically curated via optimized raw SQL code.
Once the training set has passed these checks and pre-processing, the model is generated. CDD Vault uses the FCFP6 (100, 102, 106) structural fingerprints to build the Bayesian statistical model. (107) This machine learning model is stored as a special type of protocol (category = Machine-Learning model), which provides an ROC plot generated by stratified three-fold cross-validation. This ROC plot is interactive, allowing the user to explore the sensitivity, specificity, and corresponding score cutoff at each point along the curve (Figure 2A). After the model has been created, each molecule in the user’s selected ‘project’ receives a relative score, applicability number (fraction of structural features shared with the training set), and maximum similarity number (maximum Tanimoto/Jaccard similarity to any of the “good” molecules). The model can be subsequently shared with both other users and the user’s other ‘projects’ to score any molecule of interest.
The model can also be exported from CDD Vault by making use of the aforementioned .bayesian file format (Figure 2B). To render a serialized version of a model, CDD Vault feeds the training set structures into the serialization implementation described in the previous section. The connection between the Ruby code in CDD Vault and the Java-based serialization code is accomplished using RJB (Ruby-Java Bridge). Further details on using CDD Models are described in the Supporting Information.

Figure 2

Figure 2. Example of the model output in CDD Models. (A) Model derived from whole-cell datasets from antimalarial screening across four CDD Public datasets (MMV, St. Jude, Novartis, and TCAMS), ∼20,000 EC50 values, cutoff < 10 nM. (B) Options for exporting a model from CDD.

Mobile Apps

Once the Bayesian model building was formalized as part of the CDK project, with a rigorously defined file format, it became a straightforward matter to implement the algorithm on other platforms. We have previously described the implementation of ECFP and FCFP fingerprints in a way that is agnostic to the specifics of any particular cheminformatics toolkit and so can be easily ported to other platforms, such that it is literally compatible with the original reference. We have taken the same approach with the Bayesian modeling and ported enough of the functionality to the iOS mobile platform such that models created with the CDK or CDD Vault using ECFP6 fingerprints can be parsed from within mobile apps and used to make predictions. Currently Bayesian model prediction capabilities have been incorporated into the Mobile Molecular DataSheet (MMDS) app (Figure 3), Approved Drugs, and MolPrime+ (all apps produced by Molecular Materials Informatics). Several useful Bayesian models have been packaged with the apps as default functionality, and it is also possible to import user-created models in order to make structure-based predictions within the mobile app.

Figure 3

Figure 3. Example of the Bayesian model implemented in the MMDS mobile app. (a) hERG model, based on literature data. (b) A molecule from a hERG paper. (151) (c) Results scored with this model (hERG measured IC50 = 24 nM) showing a visually intuitive atom coloring for this and other Bayesian models. This compound would appear to be an inhibitor of hERG and possibly KCNQ1 potassium channels.

Application to Datasets

To illustrate the utility of CDD Models we evaluated several datasets available in CDD public as well as in our own CDD Vaults (Table 1). These include screening datasets for malaria, tuberculosis, and cholera from whole-cell screens, in vivo data from mice treated with potential antituberculars, as well as several ADME/Tox properties such as Ames mutagenicity, mouse and human intrinsic clearance, Caco-2, 5-HT2B, solubility, PXR activation, maximum recommended therapeutic dose, and blood brain barrier permeability data. In all cases, three-fold ROC data were collated.
Several datasets were also selected for integration into mobile apps (including MMDS and Approved Drugs). These datasets included solubility, probe-like, (108) hERG, KCNQ1, screening data for whole-cell phenotypic screens against Bubonic plague as well as Chagas disease. In all cases, five-fold ROC data were collated, summarized, and compared to published results.
Table 1. Datasets Used for Bayesian Models Created with CDD Models Using FCFP6 Fingerprints
modeldatasets used and refscutoff for activeno. of moleculesthree-fold ROCa
malaria (Plasmodium falciparum)CDD Public datasets (MMV, St. Jude, Novartis, and TCAMS) (127-129)3D7 EC50 <10 nM184 actives, 19,824 inactives0.97
TB (Mycobacterium tuberculosis)CDD Public datasets from NIAID/SRI (MLSMR, CB2, kinase) (138-140)Mtb inhibition >90%6891 actives, 210,190 inactives0.88
TB (Mycobacterium tuberculosis)CDD Public datasets from NIAID/SRI (MLSMR, CB2, kinase, and ARRA) (138-141)Mtb IC50 or IC90 <10 μM3712 actives, 1145 inactives0.89
TB (Mycobacterium tuberculosis)CDD Public MLSMR single-point dataMtb inhibition >90%3986 actives, 210,447 inactives0.87
TB (Mycobacterium tuberculosis)CDD Public MLSMR dose–responseMtb IC50 <10 μM and classed as active624 actives, 1649 inactives0.75
TB (Mycobacterium tuberculosis) efficacy in vivo mouseCDD Public (112)described in ref 112371 actives, 407 inactives0.73
choleraCDD Public in the TB ARRA dataset (141)IC50 <5 μM50 actives, 1874 inactives0.93
Ames mutagenicityref 142Ames positive, active = 13501 actives, 3007 actives0.83
mouse intrinsic clearancedata from ChEMBL<10 μL/(min·g)52 actives, 312 inactives0.82
human intrinsic clearancedata from ChEMBL≤10 μL/(min·g)105 actives, 638 inactives0.92
human intrinsic clearanceAZ data from ChEMBL (143)≤10 μL/(min·mg)496 actives, 604 inactives0.80
Caco-2proprietary data from ADMEdata.compH 6.5, cutoff >1×10–5181 actives, 325 inactives0.79
Caco-2data from ChEMBLcutoff >1×10–560 actives, 399 inactives0.89
5-HT2Bref 144active = 1, described in ref 144146 actives, 607 inactives0.89
solubilityref 145Log solubility = −51136 actives, 154 inactives0.87
PXR activationref 146described in ref 146174 actives, 143 inactives0.80
maximum recommended therapeutic doseref 147>10 mg/(kg·day)350 actives, 813 inactives0.85
blood brain barrier permeabilityref 28BBB positive, described in ref 281472 actives, 432 inactives0.92
a

ROC = receiver operator characteristic integral.

Results

ARTICLE SECTIONS
Jump To

Chemistry Development Kit

As mentioned previously, the reference implementation for the Bayesian algorithm and the underlying ECFP/FCFP fingerprints is available in the CDK library, which can be freely downloaded from Github. The open source implementation is also accompanied by a testing library, which runs a battery of tests to ensure that the basic functionality is operating as-described.

CDD Bayesian Models

CDD Models using FCFP6 fingerprints have been demonstrated using diverse datasets, such as from public phenotypic screening, and published ADME/Tox datasets (Table 1). In most cases, the three-fold cross-validation ROC values are >0.75. We have also illustrated with the tuberculosis (M. tuberculosis) and malaria (P. falciparum) screening datasets that very large dose–response or single-point datasets can be constructed by combining datasets in CDD Public. Other datasets were collected for this study by manual mining of ChEMBL (mouse intrinsic clearance, human intrinsic clearance, Caco-2). Although these datasets are generated from many different published datasets, the ROC values are a good starting point (>0.80) and comparable to those obtained from proprietary datasets. Several of these datasets (blood brain barrier permeability and PXR) were used recently for a comparison across SVM and Bayesian methods, (109) and the three-fold cross-validation ROC values were similar to those obtained with five-fold cross-validation in this study. The use of the ROC value in this way is a reasonable method to evaluate the utility of the computational models. However, ideally the use of an additional external test set would provide further confidence. The ROC values for the M. tuberculosis models are comparable to those published recently using a commercial tool. For example, in this study MLSMR single-point model three-fold ROC = 0.87 (Figure S1, five-fold ROC 0.87, (110) leave out 50% × 100 cross-validation ROC = 0.86 (111)) and MLSMR dose–response model three-fold cross-validation ROC = 0.75 (leave out 50% × 100 cross-validation ROC = 0.73 (111)), M. tuberculosis efficacy in mouse three-fold ROC = 0.73 (five-fold ROC = 0.73 (112)), and Ames mutagenicity three-fold ROC = 0.83 (five-fold ROC = 0.84 (109)).

Mobile App Bayesian Models

The models developed using the same underlying code in mobile apps used the ECFP6 descriptors, and all eight models described had five-fold ROC values >0.75 (Table 2). Several of these datasets have previously been used to generate SVM and Bayesian methods with FCFP6 descriptors (109) using other software. For example, the three-fold ROC for the probe-like dataset in this study was 0.76 (five-fold ROC = 0.73 (108)), the three-fold ROC for the hERG dataset was 0.85 (five-fold ROC = 0.84 (109)), and the three-fold ROC for the KCNQ1 dataset was 0.84 (five-fold ROC = 0.86 (109)). The models derived with FCFP6 (Table 1) and ECFP6 descriptors (Table 2) can be compared; e.g., the three-fold ROC for the malaria dataset using FCFP6 was 0.97 (Table 1) (using ECFP6 for the same dataset five-fold ROC = 0.98, Table 2).
Table 2. Datasets Used for Bayesian Models Created for Use by MMDS, with ECFP6 Fingerprintsa
modeldatasets used and refscutoff for activeno. of moleculesfive-fold ROC
solubilityref 145Log solubility = −51144 actives, 155 inactives0.86
probe-likeref 148described in ref 148253 actives, 69 inactives0.76
hERGref 149described in ref 149373 actives, 433 inactives0.85
KCNQ1PubChem BioAssay: AID 2642 (150)using actives assigned in PubChem301,737 actives, 3878 inactives0.84
Bubonic plague (Yersina pestis)PubChem single-point screen BioAssay: AID 898active when inhibition ≥50%223 actives, 139,710 inactives0.81
Chagas disease (Typanosoma cruzi)Pubchem BioAssay: AID 2044with EC50 <1 μM, >10-fold difference in cytotoxicity as active1692 actives, 2363 inactives0.8
TB (Mycobacterium tuberculosis)in vitro bioactivity and cytotoxicity data from MLSMR, CB2, kinase, and ARRA datasets (110)Mtb activity and acceptable Vero cell cytotoxicity selectivity index = (MIC or IC90)/CC50 ≥101434 actives, 5789 inactives0.73
malaria (Plasmodium falciparum)CDD Public datasets (MMV, St. Jude, Novartis, and TCAMS) (127-129)3D7 EC50 <10 nM175 actives, 19,604 inactives0.98
a

All eight models are ECFP6, with folding into 32,768 slots.

These examples of models generated previously and now with open source descriptors and algorithms suggest they are likely comparable (based on ROC values) and will be evaluated prospectively in future studies. We have also made the models in the mobile app freely accessible via the link http://molsync.com/bayesian1, which is summarized in Figure 4.

Figure 4

Figure 4. Screenshots summarizing the ROC plots and active and inactive compounds for eight models implemented in MMDS.

Discussion

ARTICLE SECTIONS
Jump To

We have recently suggested how providing computational models tightly integrated in software used for storing and sharing chemistry and biology data will be useful for decision making. (113) Some resources exist such as qsardb.org and ochem.eu for public model sharing and development, (114, 115) while another, Chembench, provides a resource for creating and using models and other cheminformatics tools privately. (116) Our work, proposing that open source descriptors and algorithms are comparable to commercial software in performance, (22) will ideally lead to more sharing of computational models. At approximately the same time, QSAR-ML (85) was developed to enable standards for interoperability of QSAR models. (85, 117) We now build on this prior work as the current study sets the stage for being able to generate a model in proprietary software such as CDD Vault and export a model in a format that could be run in open source software using CDK components. This is a significant advance, because it means that a shared Bayesian model can in principle be used by anyone, regardless of which commercial software packages they have licenses to, since the model capabilities are implemented by an open source toolkit that runs on essentially every desktop platform (CDK is written in cross-platform Java). The creation of additional products that implement the same identical reference algorithm, e.g., mobile apps, (100, 118-123) makes use of shared models increasingly convenient. None of the existing Web sites for creating or storing QSAR models appear to offer this capability.
In terms of willingness to share models, sharing with collaborators is one thing, while sharing models openly with the community at large is another, but we have at least removed the main technology hurdle for fingerprint-based Bayesian models in this study. As previously noted, the shared models do not contain chemical structures or the fingerprints corresponding directly to them. However, the direct correlation between structural features and fingerprint does provide clues as to what active molecules an organization may have been using to build their models, and so this caveat must be taken into account when trustworthiness cannot be assumed. While additional security measures are appropriate for the world of proprietary high-value disease targets, this is much less of an issue for rare or neglected diseases, which is where we believe that open model sharing will have the greatest impact. (113) There has been considerable research and discussion on efforts to securely share chemistry data, (124-126) and some of these approaches could be implemented to encrypt models in future.
We have now described how implementation of Bayesian models with FCFP6 descriptors generated in the CDD vault enables the rapid creation of machine learning models from public datasets or the user’s own proprietary data. We also enable the resultant models to be selectively shared (or not) within CDD without having to disclose the underlying data—this represents a practical middle ground, where a trusted broker (CDD) allows a research group to share some of the benefits of their results, but not necessarily full access to the raw data, nor sufficient detail to reverse engineer it. Since sharing is not mandatory, and the option exists to export a model in an open format that can be used by anyone, this means that the full spectrum of model sharing options is available. Providing researchers with greater flexibility to designate and share models with specified collaborators, over particular time intervals, and with clear rights encourages data exchange by allowing researchers to share on terms they control. More fine-grained access control will expand the boundary of what models can be shared to fit the comfort levels of scientists (and their management and lawyers). From having been involved in a number of collaborations large and small, we have observed considerable variation in the need for security and desired degree of openness.
The possibility of using such models to further drug discovery for neglected diseases is of considerable interest, since the available software has traditionally catered to the proprietary market that provides most of the funding. There is now a significant amount of SAR data for rare and neglected diseases that is publicly available, and following up on open data with open source modeling algorithms is an important step. For example, pharmaceutical companies and other research groups have performed high-throughput screens on likely millions of compounds in the search for antimalarials, but they have generally only offered up the active compounds, (127-130) some of which are available in CDD Public. By selecting a cut-off for activity for the antimalarial data that is very stringent (e.g., <10 nM) in CDD Models one can construct a Bayesian model with a three-fold ROC = 0.97 upon combining four public datasets (Table 1). This model may be useful for virtual screening of future compound libraries and complements our other efforts at machine learning models for antimalarial research. (131) Ideally having access to the millions of other inactive compounds would also be useful, although one could imagine a company could just make a model available by selecting a cut-off for inactives as we have demonstrated herein.
Any efficiencies that can be gained in drug discovery would be highly desirable as it is widely known it is both time-consuming and very costly. (93, 94) Therefore, the use of tools like computational models that can point out drug candidate liabilities earlier will have considerable value. (1, 132, 133) With a considerable percentage of drug failures attributed to ADME/Tox issues, (1, 43) it is still important to assess these qualities early in the drug development process. Running experimental ADME/Tox assays on each compound for initial screening of chemical libraries is cost- and labor-intensive, (1, 43) while computational approaches that rapidly and reliably predict these qualities are gaining more acceptance in the drug discovery community. It is therefore possible to exclude compounds that are most likely to exhibit undesirable ADME or toxicity problems sooner. We present an approach to drug discovery using computational methods for predicting whole-cell activity as well as ADME/Tox and physicochemical properties that can be broadly applied and do not have to be restricted to large companies with sophisticated software and big budgets. For example, modeling of microsomal metabolism has been used with large datasets, (22, 29, 35) and such models are now more accessible through availability of public data. The results summarized in Tables 1 and 2 suggest that reliable Bayesian models for various bioactivity and ADME/Tox models can be generated with simple fingerprint descriptors (FCFP6 and ECFP6) and the same Bayesian algorithm. This is enabled in such a way that experience in building computational models, while valuable, may not be essential to facilitate model generation, compound scoring, and interpretation.
The models described in Table 2 which are available in MMDS are now also freely accessible (http://molsync.com/bayesian1). Our main motivation for creating and disseminating this work is to enable the sharing of Bayesian models between a diverse set of toolkits and computing platforms. We have previously described our open source implementation of ECFP6 and FCFP6 fingerprints, (100) inspired by the original commercial implementation that was partially reported in the literature without the disclosure of key details, which remain a trade secret. (102, 106) While there are several other examples of the general approach, our intent was to create a reference implementation and document it so that identical results could be reproduced. The algorithm herein is explicitly documented in a stepwise fashion, and the reference method is available publicly in source code, and hence can be used to compare against when re-implementing in another environment. We believe that taking such care to ensure that the algorithms can be implemented in a way that is 100% compatible with the formal reference removes a major barrier to scientific progress, since building and using models is no longer an isolated activity. We have deliberately taken a two-prong approach: by releasing a fully functional implementation as part of a popular open source toolkit, and also taking the effort to document the algorithm in fine grained detail, to encourage creators of commercial software to consider the advantages of interoperability within their own proprietary products.
Because the source code is a part of the CDK, the modeling functionality that we describe can be used in a variety of scenarios as-is. Any software environment that is capable of linking to a Java Virtual Machine (either directly or through a pipe) can make use of this functionality. Since the CDK is made available under the LGPL license, it can be incorporated into proprietary products as long as it is linked as a separate library, but for internal projects, back-end services for which the software is not distributed, or open source projects with a compatible license, it can be used essentially without restrictions. For wholly closed-source products, and platforms that are not compatible with the Java Virtual Machine, the methodology can be re-implemented without difficulty. The exact implementation of Bayesian model building and subsequent calibration is straightforward, and we have represented it in pseudocode form (see the accompanying paper for details of algorithms for additional analysis (101)). The CDK version is readily available to verify literal compatibility and can be used as a limitless source of validation data for direct comparison. Thus far, the method has been ported to Objective-C, in order to enable the use of Bayesian models within several different mobile apps (Figure 3) and CDD Vault as CDD Models. The use of CDD Models online in the CDD Vault data sharing platform to create Bayesian models, the use of mobile apps to apply them to small collections of proposed compounds, and integration into other products and scripts via the CDK library present a number of opportunities for making computational modeling potentially more useful and widespread. Currently structure–activity models are generally only able to be created and used by one specific platform, or if they have some portability, they often suffer from serious compatibility issues due to differences in the underlying technology (e.g., aromaticity models, ylide representations, SMARTS implementations, partial charge models, etc.) By releasing a well-documented reference implementation as open source and building powerful and useful functionality on top of it, we hope to encourage computational chemists and software creators to make use of this increased inter-operability.
Future work related to this project will include the implementation of further measures to assess model quality and the applicability (115, 134-137) of a model to a test compound. In the accompanying paper, (101) we describe several additional algorithms, including calibration of raw Bayesian results to a probability-like scale, the effects of folding fingerprints into a smaller range, methods for extracting suitable validation test sets from large public datasets, automated determination of thresholds for active/inactive, and the impact of training set selection on internal cross-validation metrics. As others begin to use the new CDK functionality, CDD Models, and Bayesian functionality implemented in various mobile apps, we will expect to see further prospective and retrospective testing of the underlying technology and descriptions of the utility and limitations.

Supporting Information

ARTICLE SECTIONS
Jump To

Description of how to use CDD Models, and one supporting figure. The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.5b00143.

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

ARTICLE SECTIONS
Jump To

  • Corresponding Authors
    • Alex M. Clark - Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada Email: [email protected]
    • Sean Ekins - Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United StatesCollaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States Email: [email protected]
  • Authors
    • Krishna Dole - Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
    • Anna Coulon-Spektor - Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
    • Andrew McNutt - Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
    • George Grass - G2 Research, Inc., P.O. Box 1242, Tahoe City, California 96145, United States
    • Joel S. Freundlich - Center for Emerging & Re-emerging Pathogens, Division of Infectious Diseases, Department of Medicine, Rutgers University—New Jersey Medical School, Newark, New Jersey 07103, United StatesDepartment of Pharmacology & Physiology, Rutgers University—New Jersey Medical School, Newark, New Jersey 07103, United States
    • Robert C. Reynolds - Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham, , 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
  • Notes
    The authors declare the following competing financial interest(s): S.E. is a consultant for Collaborative Drug Discovery Inc. A.M.C. is the founder of Molecular Materials Informatics, Inc.

Acknowledgment

ARTICLE SECTIONS
Jump To

We gratefully acknowledge Dr. Barry Bunin and all our colleagues at CDD for their support and development of CDD Models. S.E. and A.M.C. kindly acknowledge many valuable discussions with Antony J. Williams on sharing models and mobile apps for chemistry. This project was supported by Award No. 9R44TR000942-02, “Biocomputation across distributed private datasets to enhance drug discovery”, from the NIH National Center for Advancing Translational Sciences. The Chagas disease datasets were collected with funding from the National Institutes of Health (NIH), National Institute of Allergy and Infectious Diseases (NIAID), Grant R41-AI108003-01, “Identification and validation of targets of phenotypic high throughput screening”. The CDD TB database was made possible with funding from the Bill and Melinda Gates Foundation (Grant No. 49852, “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”). R.C.R. acknowledges the American Reinvestment and Recovery Act Grant 1RC1AI086677-01 that provided support for the presented study (NIH, NIAID, “Targeting MDR-Tuberculosis”). J.S.F. also thanks Rutgers University for financial support.

Abbreviations

ARTICLE SECTIONS
Jump To

ADME/Tox

absorption, metabolism, distribution, excretion, and toxicity

CDD

Collaborative Drug Discovery

CDK

Chemistry Development Kit

ECFP6

extended connectivity

FCFP6

molecular function class fingerprints of maximum diameter 6

hERG

human ether-a-go-go related gene

HLM

human liver microsomal stability

HTS

high-throughput screening

LGPL

Lesser Gnu Public License

MMDS

Mobile Molecular DataSheet

Mtb

Mycobacterium tuberculosis

ONS

Open Notebook Science

PPV

positive predictive value

PXR

pregnane X-receptor

QSAR

quantitative structure–activity relationship

ROC

receiver operator curve

SAR

structure–activity relationship

SVM

support vector machine

References

ARTICLE SECTIONS
Jump To

This article references 151 other publications.

  1. 1
    Ekins, S.; Waller, C. L.; Swaan, P. W.; Cruciani, G.; Wrighton, S. A.; Wikel, J. H. Progress in predicting human ADME parameters in silico J. Pharmacol. Toxicol. Methods 2000, 44, 251 272
  2. 2
    Wessel, M. D.; Mente, S. ADME by computer Annu. Rep. Med. Chem. 2001, 36, 257 266
  3. 3
    Boobis, A.; Gundert-Remy, U.; Kremers, P.; Macheras, P.; Pelkonen, O. In silico prediction of ADME and pharmacokinetics. Report of an expert meeting organised by COST B15 Eur. J. Pharm. Sci. 2002, 17, 183 193
  4. 4
    Butina, D.; Segall, M. D.; Frankcombe, K. Predicting ADME properties in silico: methods and models Drug Discov. Today 2002, 7, S83 S88
  5. 5
    Ekins, S.; Boulanger, B.; Swaan, P. W.; Hupcey, M. A. Towards a new age of virtual ADME/TOX and multidimensional drug discovery Mol. Divers. 2002, 5, 255 275
  6. 6
    Ekins, S.; Rose, J. P. In Silico ADME/TOX: The state of the art J. Mol. Graphics 2002, 20, 305 309
  7. 7
    Klein, C.; Kaiser, D.; Kopp, S.; Chiba, P.; Ecker, G. F. Similarity based SAR (SIBAR) as tool for early ADME profiling J. Comput. Aided Mol. Des. 2002, 16, 785 793
  8. 8
    Krejsa, C. M.; Horvath, D.; Rogalski, S. L.; Penzotti, J. E.; Mao, B.; Barbosa, F.; Migeon, J. C. Predicting ADME properties and side effects: the BioPrint approach Curr. Opin. Drug Discov. Devel. 2003, 6, 470 480
  9. 9
    van de Waterbeemd, H.; Gifford, E. ADMET in silico modelling: towards prediction paradise? Nat. Rev. Drug Discov. 2003, 2, 192 204
  10. 10
    Ekins, S.; Swaan, P. W. Computational models for enzymes, transporters, channels and receptors relevant to ADME/TOX Rev. Comput. Chem. 2004, 20, 333 415
  11. 11
    Smith, P. A.; Sorich, M. J.; Low, L. S.; McKinnon, R. A.; Miners, J. O. Towards integrated ADME prediction: past, present and future directions for modelling metabolism by UDP-glucuronosyltransferases J. Mol. Graph. Model. 2004, 22, 507 517
  12. 12
    Stoner, C. L.; Gifford, E.; Stankovic, C.; Lepsy, C. S.; Brodfuehrer, J.; Prasad, J. V.; Surendran, N. Implementation of an ADME enabling selection and visualization tool for drug discovery J. Pharm. Sci. 2004, 93, 1131 1141
  13. 13
    Yamashita, F.; Hashida, M. In silico approaches for predicting ADME properties of drugs Drug Metab. Pharmacokinet. 2004, 19, 327 338
  14. 14
    Balakin, K. V.; Ivanenkov, Y. A.; Savchuk, N. P.; Ivaschenko, A. A.; Ekins, S. Comprehensive computational assessment of ADME properties using mapping techniques Curr. Drug Discov. Technol. 2005, 2, 99 113
  15. 15
    O’Brien, S. E.; de Groot, M. J. Greater than the sum of its parts: combining models for useful ADMET prediction J. Med. Chem. 2005, 48, 1287 1291
  16. 16
    Chang, C.; Ekins, S. Pharmacophores for human ADME/Tox-related proteins. In Pharmacophores and pharmacophore searches; Langer, T.; Hoffman, R. D., Eds.; Wiley-VCH: Weinheim, 2006; Chapter 14, pp 299 324.
  17. 17
    Ekins, S. Systems-ADME/Tox: Resources and network approaches J. Pharmacol. Toxicol. Methods 2006, 53, 38 66
  18. 18
    Ekins, S.; Bugrim, A.; Brovold, L.; Kirillov, E.; Nikolsky, Y.; Rakhmatulin, E.; Sorokina, S.; Ryabov, A.; Serebryiskaya, T.; Melnikov, A.; Metz, J.; Nikolskaya, T. Algorithms for network analysis in systems-ADME/Tox using the MetaCore and MetaDrug platforms Xenobiotica 2006, 36, 877 901
  19. 19
    Klon, A. E.; Lowrie, J. F.; Diller, D. J. Improved naive Bayesian modeling of numerical data for absorption, distribution, metabolism and excretion (ADME) property prediction J. Chem. Inf. Model. 2006, 46, 1945 1956
  20. 20
    Ekins, S.; Honeycutt, J. D.; Metz, J. T. Evolving molecules using multi-objective optimization: applying to ADME Drug Discov. Today 2010, 15, 451 460
  21. 21
    Ekins, S.; Williams, A. J. Precompetitive Preclinical ADME/Tox Data: Set It Free on The Web to Facilitate Computational Model Building to Assist Drug Development Lab Chip 2010, 10, 13 22
  22. 22
    Gupta, R. R.; Gifford, E. M.; Liston, T.; Waller, C. L.; Bunin, B.; Ekins, S. Using open source computational tools for predicting human metabolic stability and additional ADME/TOX properties Drug Metab. Dispos. 2010, 38, 2083 2090
  23. 23
    Cheng, F.; Li, W.; Zhou, Y.; Shen, J.; Wu, Z.; Liu, G.; Lee, P. W.; Tang, Y. admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties J. Chem. Inf. Model. 2012, 52, 3099 3105
  24. 24
    Ekins, S.; Wrighton, S. A. Application of in silico approaches to predicting drug–drug interactions J. Pharmacol. Toxicol. Methods 2001, 45, 65 69
  25. 25
    Ekins, S. In silico approaches to predicting metabolism, toxicology and beyond Biochem. Soc. Trans. 2003, 31, 611 614
  26. 26
    Kemp, C. A.; Flanagan, J. U.; van Eldik, A. J.; Marechal, J. D.; Wolf, C. R.; Roberts, G. C.; Paine, M. J.; Sutcliffe, M. J. Validation of model of cytochrome P450 2D6: an in silico tool for predicting metabolism and inhibition J. Med. Chem. 2004, 47, 5340 5346
  27. 27
    de Graaf, C.; Vermeulen, N. P.; Feenstra, K. A. Cytochrome P450 in silico: an integrative modeling approach J. Med. Chem. 2005, 48, 2725 2755
  28. 28
    Martins, I. F.; Teixeira, A. L.; Pinheiro, L.; Falcao, A. O. A Bayesian approach to in silico blood-brain barrier penetration modeling J. Chem. Inf. Model. 2012, 52, 1686 1697
  29. 29
    Hu, Y.; Unwalla, R.; Denny, R. A.; Bikker, J.; Di, L.; Humblet, C. Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability J. Comput. Aided Mol. Des. 2010, 24, 23 35
  30. 30
    Lombardo, F.; Obach, R. S.; Dicapua, F. M.; Bakken, G. A.; Lu, J.; Potter, D. M.; Gao, F.; Miller, M. D.; Zhang, Y. A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human J. Med. Chem. 2006, 49, 2262 2267
  31. 31
    Lombardo, F.; Obach, R. S.; Shalaeva, M. Y.; Gao, F. Prediction of human volume of distribution values for neutral and basic drugs. 2. Extended data set and leave-class-out statistics J. Med. Chem. 2004, 47, 1242 1250
  32. 32
    Lombardo, F.; Obach, R. S.; Shalaeva, M. Y.; Gao, F. Prediction of volume of distribution values in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding J. Med. Chem. 2002, 45, 2867 2876
  33. 33
    Lombardo, F.; Shalaeva, M. Y.; Tupper, K. A.; Gao, F. ElogDoct: A tool for lipophilicity determination in drug discovery. 2. Basic and neutral compounds J. Med. Chem. 2001, 44, 2490 2497
  34. 34
    Lombardo, F.; Shalaeva, M. Y.; Tupper, K. A.; Gao, F.; Abraham, M. H. ElogPoct A tool for lipophilicity determination in drug discovery J. Med. Chem. 2000, 43, 2922 2928
  35. 35
    Chang, C.; Duignan, D. B.; Johnson, K. D.; Lee, P. H.; Cowan, G. S.; Gifford, E. M.; Stankovic, C. J.; Lepsy, C. S.; Stoner, C. L. The development and validation of a computational model to predict rat liver microsomal clearance J. Pharm. Sci. 2009, 98, 2857 2867
  36. 36
    Zientek, M.; Stoner, C.; Ayscue, R.; Klug-McLeod, J.; Jiang, Y.; West, M.; Collins, C.; Ekins, S. Integrated in silico-in vitro strategy for addressing cytochrome P450 3A4 time-dependent inhibition Chem. Res. Toxicol. 2010, 23, 664 676
  37. 37
    Lagorce, D.; Sperandio, O.; Galons, H.; Miteva, M. A.; Villoutreix, B. O. FAF-Drugs2: free ADME/tox filtering tool to assist drug discovery and chemical biology projects BMC Bioinformatics 2008, 9, 396
  38. 38
    Villoutreix, B. O.; Renault, N.; Lagorce, D.; Sperandio, O.; Montes, M.; Miteva, M. A. Free resources to assist structure-based virtual ligand screening experiments Curr. Protein Pept. Sci. 2007, 8, 381 411
  39. 39
    Ekins, S. Computational Toxicology: risk assessment for pharmaceutical and environmental chemicals; John Wiley and Sons: Hoboken, NJ, 2007.
  40. 40
    Balani, S. K.; Miwa, G. T.; Gan, L. S.; Wu, J. T.; Lee, F. W. Strategy of utilizing in vitro and in vivo ADME tools for lead optimization and drug candidate selection Curr. Top. Med. Chem. 2005, 5, 1033 1038
  41. 41
    van De Waterbeemd, H.; Smith, D. A.; Beaumont, K.; Walker, D. K. Property-based design: optimization of drug absorption and pharmacokinetics J. Med. Chem. 2001, 44, 1313 1333
  42. 42
    Walters, W. P.; Murcko, M. A. Prediction of ‘drug-likeness’ Adv. Drug Deliv. Rev. 2002, 54, 255 271
  43. 43
    Ekins, S.; Ring, B. J.; Grace, J.; McRobie-Belle, D. J.; Wrighton, S. A. Present and future in vitro approaches for drug metabolism J. Pharm. Toxicol. Methods 2000, 44, 313 324
  44. 44
    Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Res. 2009, 37, W623 W633
  45. 45
    Wang, Y.; Bolton, E.; Dracheva, S.; Karapetyan, K.; Shoemaker, B. A.; Suzek, T. O.; Wang, J.; Xiao, J.; Zhang, J.; Bryant, S. H. An overview of the PubChem BioAssay resource Nucleic Acids Res. 2010, 38, D255 D266
  46. 46
    Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery Nucleic Acids Res. 2012, 40, D1100 D1107
  47. 47
    Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Kruger, F. A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J. P. The ChEMBL bioactivity database: an update Nucleic Acids Res. 2014, 42, D1083 D1090
  48. 48
    Papadatos, G.; Overington, J. P. The ChEMBL database: a taster for medicinal chemists Future Med. Chem. 2014, 6, 361 364
  49. 49
    Ekins, S.; Bunin, B. A. The Collaborative Drug Discovery (CDD) database Methods Mol. Biol. 2013, 993, 139 154
  50. 50
    Sun, H.; Veith, H.; Xia, M.; Austin, C. P.; Tice, R. R.; Huang, R. Prediction of Cytochrome P450 Profiles of Environmental Chemicals with QSAR Models Built from Drug-like Molecules Mol. Inform. 2012, 31, 783 792
  51. 51
    Sun, H.; Veith, H.; Xia, M.; Austin, C. P.; Huang, R. Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data J. Chem. Inf. Model. 2011, 51, 2474 2481
  52. 52
    Veith, H.; Southall, N.; Huang, R.; James, T.; Fayne, D.; Artemenko, N.; Shen, M.; Inglese, J.; Austin, C. P.; Lloyd, D. G.; Auld, D. S. Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries Nat. Biotechnol. 2009, 27, 1050 1055
  53. 53
    MacArthur, R.; Leister, W.; Veith, H.; Shinn, P.; Southall, N.; Austin, C. P.; Inglese, J.; Auld, D. S. Monitoring compound integrity with cytochrome P450 assays and qHTS J. Biomol. Screen. 2009, 14, 538 546
  54. 54
    Ekins, S.; Diao, L.; Polli, J. E. A Substrate Pharmacophore for the Human Organic Cation/Carnitine Transporter Identifies Compounds Associated with Rhabdomyolysis Mol. Pharmaceutics 2012, 9, 905 913
  55. 55
    Pan, Y.; Li, L.; Kim, G.; Ekins, S.; Wang, H.; Swaan, P. W. Identification and Validation of Novel hPXR Activators Amongst Prescribed Drugs via Ligand-Based Virtual Screening Drug Metab. Dispos. 2011, 39, 337 344
  56. 56
    Ekins, S.; Williams, A. J.; Xu, J. J. A Predictive Ligand-Based Bayesian Model for Human Drug Induced Liver Injury Drug Metab. Dispos. 2010, 38, 2302 2308
  57. 57
    Ivanenkov, Y. A.; Savchuk, N. P.; Ekins, S.; Balakin, K. V. Computational mapping tools for drug discovery Drug Discov. Today 2009, 14, 767 775
  58. 58
    Ekins, S.; Kortagere, S.; Iyer, M.; Reschly, E. J.; Lill, M. A.; Redinbo, M.; Krasowski, M. D. Challenges Predicting Ligand-Receptor Interactions of Promiscuous Proteins: The Nuclear Receptor PXR PLoS Comput. Biol. 2009, 5e1000594
  59. 59
    Kortagere, S.; Chekmarev, D. S.; Welsh, W. J.; Ekins, S. New predictive models for blood brain barrier permeability of drug-like molecules Pharm. Res. 2008, 25, 1836 1845
  60. 60
    Khandelwal, A.; Krasowski, M. D.; Reschly, E. J.; Sinz, M. W.; Swaan, P. W.; Ekins, S. Machine learning methods and docking for predicting human pregnane X receptor activation Chem. Res. Toxicol. 2008, 21, 1457 1467
  61. 61
    Ekins, S.; Kholodovych, V.; Ai, N.; Sinz, M.; Gal, J.; Gera, L.; Welsh, W. J.; Bachmann, K.; Mani, S. Computational discovery of novel low micromolar human pregnane X receptor antagonists Mol. Pharmacol. 2008, 74, 662 672
  62. 62
    Chekmarev, D. S.; Kholodovych, V.; Balakin, K. V.; Ivanenkov, Y.; Ekins, S.; Welsh, W. J. Shape signatures: new descriptors for predicting cardiotoxicity in silico Chem. Res. Toxicol. 2008, 21, 1304 1314
  63. 63
    Khandelwal, A.; Bahadduri, P.; Chang, C.; Polli, J. E.; Swaan, P.; Ekins, S. Computational Models to Assign Biopharmaceutics Drug Disposition Classification from Molecular Structure Pharm. Res. 2007, 24, 2249 2262
  64. 64
    Jones, D. R.; Ekins, S.; Li, L.; Hall, S. D. Computational approaches that predict metabolic intermediate complex formation with CYP3A4 (+b5) Drug Metab. Dispos. 2007, 35, 1466 1475
  65. 65
    Embrechts, M. J.; Ekins, S. Classification of metabolites with kernel-partial least squares (K-PLS) Drug Metab. Dispos. 2007, 35, 325 327
  66. 66
    Ekins, S.; Embrechts, M. J.; Breneman, C. M.; Jim, K.; Wery, J.-P. Novel applications of Kernel-partial least squares to modeling a comprehensive array of properties for drug discovery. In Computational Toxicology: Risk assessment for pharmaceutical and environmental chemicals; Ekins, S., Ed.; Wiley-Interscience: Hoboken, NJ, 2007; pp 403 432.
  67. 67
    Ekins, S.; Chang, C.; Mani, S.; Krasowski, M. D.; Reschly, E. J.; Iyer, M.; Kholodovych, V.; Ai, N.; Welsh, W. J.; Sinz, M.; Swaan, P. W.; Patel, R.; Bachmann, K. Human pregnane X receptor antagonists and agonists define molecular requirements for different binding sites Mol. Pharmacol. 2007, 72, 592 603
  68. 68
    Ekins, S.; Balakin, K. V.; Savchuk, N.; Ivanenkov, Y. Insights for human Ether-a-Go-Go-Related Gene Potassium Channel inhibition using recursive partitioning, Kohonen and Sammon mapping Techniques J. Med. Chem. 2006, 49, 5059 5071
  69. 69
    Ekins, S.; Nikolsky, Y.; Nikolskaya, T. Techniques: Application of Systems Biology to Absorption, Distribution, Metabolism, Excretion, and Toxicity Trends Pharmacol. Sci. 2005, 26, 202 209
  70. 70
    Ekins, S. Predicting undesirable drug interactions with promiscuous proteins in silico Drug Discov. Today 2004, 9, 276 285
  71. 71
    Balakin, K. V.; Ekins, S.; Bugrim, A.; Ivanenkov, Y. A.; Korolev, D.; Nikolsky, Y.; Skorenko, S. A.; Ivashchenko, A. A.; Savchuk, N. P.; Nikolskaya, T. Kohonen maps for prediction of binding to human cytochrome P450 3A4 Drug Metab. Dispos. 2004, 32, 1183 1189
  72. 72
    Balakin, K. V.; Ekins, S.; Bugrim, A.; Ivanenkov, Y. A.; Korolev, D.; Nikolsky, Y.; Ivashchenko, A. A.; Savchuk, N. P.; Nikolskaya, T. Quantitative structure-metabolism relationship modeling of the metabolic N-dealkylation rates Drug Metab. Dispos. 2004, 32, 1111 1120
  73. 73
    Ekins, S.; Berbaum, J.; Harrison, R. K. Generation and validation of rapid computational filters for CYP2D6 and CYP3A4 Drug Metab. Dispos. 2003, 31, 1077 1080
  74. 74
    Ethell, B. T.; Ekins, S.; Wang, J.; Burchell, B. Quantitative structure activity relationships for the glucuronidation of simple phenols by expressed human UGT1A6 and UGT1A9 Drug Metab. Dispos. 2002, 30, 734 738
  75. 75
    Ekins, S.; Mirny, L.; Schuetz, E. G. A ligand-based approach to understanding selectivity of nuclear hormone receptors PXR, CAR, FXR, LXRα and LXRβ Pharm. Res. 2002, 19, 1788 1800
  76. 76
    Ekins, S.; Kim, R. B.; Leake, B. F.; Dantzig, A. H.; Schuetz, E.; Lan, L. B.; Yasuda, K.; Shepard, R. L.; Winter, M. A.; Schuetz, J. D.; Wikel, J. H.; Wrighton, S. A. Three dimensional quantitative structure-activity relationships of inhibitors of P-glycoprotein Mol. Pharmacol. 2002, 61, 964 973
  77. 77
    Ekins, S.; Kim, R. B.; Leake, B. F.; Dantzig, A. H.; Schuetz, E.; Lan, L. B.; Yasuda, K.; Shepard, R. L.; Winter, M. A.; Schuetz, J. D.; Wikel, J. H.; Wrighton, S. A. Application of three dimensional quantitative structure-activity relationships of P-glycoprotein inhibitors and substrates Mol. Pharmacol. 2002, 61, 974 981
  78. 78
    Ekins, S.; Crumb, W. J.; Sarazan, R. D.; Wikel, J. H.; Wrighton, S. A. Three dimensional quantitative structure activity relationship for the inhibition of the hERG (human ether-a-gogo related gene) potassium channel J. Pharmacol. Exp. Ther. 2002, 301, 427 434
  79. 79
    Ekins, S.; Boulanger, B.; Swaan, P. W.; Hupcey, M. A. Z. Towards a new age of virtual ADME/TOX and multidimensional drug discovery J. Comput. Aided Mol. Des 2002, 16, 381 401
  80. 80
    Ekins, S.; de Groot, M.; Jones, J. P. Pharmacophore and three dimensional quantitative structure activity relationship methods for modeling cytochrome P450 active sites Drug Metab. Dispos. 2001, 29, 936 944
  81. 81
    Ekins, S.; Ring, B. J.; Bravi, G.; Wikel, J. H.; Wrighton, S. A. Predicting drug-drug interactions in silico using pharmacophores: a paradigm for the next millennium. In Pharmacophore perception, development, and use in drug design; Guner, O. F., Ed.; IUL: San Diego, 2000; pp 269 299.
  82. 82
    Paranjpe, P. V.; Grass, G. M.; Sinko, P. J. In Silico Tools for Drug Absorption Prediction: Experience to Date Am. J. Drug Deliv. 2003, 1, 133 148
  83. 83
    Obrezanova, O.; Csanyi, G.; Gola, J. M.; Segall, M. D. Gaussian processes: a method for automatic QSAR modeling of ADME properties J. Chem. Inf. Model. 2007, 47, 1847 1857
  84. 84
    Zhang, L.; Zhu, H.; Oprea, T. I.; Golbraikh, A.; Tropsha, A. QSAR modeling of the blood-brain barrier permeability for diverse organic compounds Pharm. Res. 2008, 25, 1902 1914
  85. 85
    Spjuth, O.; Willighagen, E. L.; Guha, R.; Eklund, M.; Wikberg, J. E. Towards interoperable and reproducible QSAR analyses: Exchange of datasets J. Cheminform. 2010, 2, 5
  86. 86
    Spjuth, O.; Alvarsson, J.; Berg, A.; Eklund, M.; Kuhn, S.; Masak, C.; Torrance, G.; Wagener, J.; Willighagen, E. L.; Steinbeck, C.; Wikberg, J. E. Bioclipse 2: a scriptable integration platform for the life sciences BMC Bioinformatics 2009, 10, 397
  87. 87
    Spjuth, O.; Helmus, T.; Willighagen, E. L.; Kuhn, S.; Eklund, M.; Wagener, J.; Murray-Rust, P.; Steinbeck, C.; Wikberg, J. E. Bioclipse: an open source workbench for chemo- and bioinformatics BMC Bioinformatics 2007, 8, 59
  88. 88
    Steinbeck, C.; Hoppe, C.; Kuhn, S.; Floris, M.; Guha, R.; Willighagen, E. L. Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics Curr. Pharm. Des. 2006, 12, 2111 2120
  89. 89
    Dong, X.; Gilbert, K. E.; Guha, R.; Heiland, R.; Kim, J.; Pierce, M. E.; Fox, G. C.; Wild, D. J. Web service infrastructure for chemoinformatics J. Chem. Inf. Model. 2007, 47, 1303 1307
  90. 90
    Guha, R.; Schurer, S. C. Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays J. Comput. Aided Mol. Des. 2008, 22, 367 384
  91. 91
    Zakharov, A. V.; Peach, M. L.; Sitzmann, M.; Nicklaus, M. C. QSAR modeling of imbalanced high-throughput screening data in PubChem J. Chem. Inf. Model. 2014, 54, 705 712
  92. 92
    Bradley, J.-C.http://usefulchem.blogspot.com/2011/06/open-melting-points-on-iphone-via-mmds.html, June 10, 2011.
  93. 93
    Paul, S. M.; Mytelka, D. S.; Dunwiddie, C. T.; Persinger, C. C.; Munos, B. H.; Lindborg, S. R.; Schacht, A. L. How to improve R&D productivity: the pharmaceutical industry’s grand challenge Nat. Rev. Drug Discov. 2010, 9, 203 214
  94. 94
    Munos, B. Lessons from 60 years of pharmaceutical innovation Nat. Rev. Drug Discov. 2009, 8, 959 968
  95. 95
    Munos, B. Can open-source R&D reinvigorate drug research? Nat. Rev. Drug Discov. 2006, 5, 723 729
  96. 96
    Ekins, S.; Williams, A. J.; Krasowski, M. D.; Freundlich, J. S. In silico repositioning of approved drugs for rare and neglected diseases Drug Discov. Today 2011, 16, 298 310
  97. 97
    Ekins, S.; Williams, A. J. Finding promiscuous old drugs for new uses Pharm. Res. 2011, 28, 1786 1791
  98. 98
    May, J. W.; Steinbeck, C. Efficient ring perception for the Chemistry Development Kit J. Cheminform. 2014, 6, 3
  99. 99
    Beisken, S.; Meinl, T.; Wiswedel, B.; de Figueiredo, L. F.; Berthold, M.; Steinbeck, C. KNIME-CDK: Workflow-driven cheminformatics BMC Bioinformatics 2013, 14, 257
  100. 100
    Clark, A. M.; Sarker, M.; Ekins, S. New target predictions and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0 J. Cheminform. 2014, 6, 38
  101. 101
    Clark, A. M.; Ekins, S. Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL J. Chem. Inf. Model. 2015,

    (following paper in this issue)

     DOI: 10.1021/acs.jcim.5b00144
  102. 102
    Rogers, D.; Brown, R. D.; Hahn, M. Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up J. Biomol. Screen. 2005, 10, 682 686
  103. 103
    Chen, B.; Sheridan, R. P.; Hornak, V.; Voigt, J. H. Comparison of random forest and Pipeline Pilot Naive Bayes in prospective QSAR predictions J. Chem. Inf. Model. 2012, 52, 792 803
  104. 104
    Hohman, M.; Gregory, K.; Chibale, K.; Smith, P. J.; Ekins, S.; Bunin, B. Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery Drug Discov. Today 2009, 14, 261 270
  105. 105
    Klinger, R.; Kolarik, C.; Fluck, J.; Hofmann-Apitius, M.; Friedrich, C. M. Detection of IUPAC and IUPAC-like chemical names Bioinformatics 2008, 24, i268 i276
  106. 106
    Rogers, D.; Hahn, M. Extended-connectivity fingerprints J. Chem. Inf. Model. 2010, 50, 742 754
  107. 107
    Xia, X.; Maliski, E. G.; Gallant, P.; Rogers, D. Classification of kinase inhibitors using a Bayesian model J. Med. Chem. 2004, 47, 4463 4470
  108. 108
    Litterman, N.; Lipinski, C. A.; Bunin, B. A.; Ekins, S. Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes J. Chem. Inf. Model. 2014, 54, 2996 3004
  109. 109
    Ekins, S. Progress in computational toxicology J. Pharmacol. Toxicol. Methods 2014, 69, 115 140
  110. 110
    Ekins, S.; Freundlich, J. S.; Reynolds, R. C. Are Bigger Data Sets Better for Machine Learning? Fusing Single-Point and Dual-Event Dose Response Data for Mycobacterium tuberculosis J. Chem. Inf. Model. 2014, 54, 2157 2165
  111. 111
    Ekins, S.; Bradford, J.; Dole, K.; Spektor, A.; Gregory, K.; Blondeau, D.; Hohman, M.; Bunin, B. A Collaborative Database And Computational Models For Tuberculosis Drug Discovery Mol. BioSyst. 2010, 6, 840 851
  112. 112
    Ekins, S.; Pottorf, R.; Reynolds, R. C.; Williams, A. J.; Clark, A. M.; Freundlich, J. S. Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis J. Chem. Inf. Model. 2014, 54, 1070 1082
  113. 113
    Ekins, S.; Clark, A. M.; Swamidass, S. J.; Litterman, N.; Williams, A. J. Bigger data, collaborative tools and the future of predictive drug discovery J. Comput. Aided Mol. Des. 2014, 28, 997 1008
  114. 114
    Aruoja, V.; Moosus, M.; Kahru, A.; Sihtmae, M.; Maran, U. Measurement of baseline toxicity and QSAR analysis of 50 non-polar and 58 polar narcotic chemicals for the alga Pseudokirchneriella subcapitata Chemosphere 2014, 96, 23 32
  115. 115
    Sushko, I.; Novotarskyi, S.; Korner, R.; Pandey, A. K.; Rupp, M.; Teetz, W.; Brandmaier, S.; Abdelaziz, A.; Prokopenko, V. V.; Tanchuk, V. Y.; Todeschini, R.; Varnek, A.; Marcou, G.; Ertl, P.; Potemkin, V.; Grishina, M.; Gasteiger, J.; Schwab, C.; Baskin, I. I.; Palyulin, V. A.; Radchenko, E. V.; Welsh, W. J.; Kholodovych, V.; Chekmarev, D.; Cherkasov, A.; Aires-de-Sousa, J.; Zhang, Q. Y.; Bender, A.; Nigsch, F.; Patiny, L.; Williams, A.; Tkachenko, V.; Tetko, I. V. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information J. Comput. Aided Mol. Des. 2011, 25, 533 554
  116. 116
    Walker, T.; Grulke, C. M.; Pozefsky, D.; Tropsha, A. Chembench: a cheminformatics workbench Bioinformatics 2010, 26, 3000 3001
  117. 117
    Williams, A. J.; Ekins, S.; Spjuth, O.; Willighagen, E. L. Accessing, using, and creating chemical property databases for computational toxicology modeling Methods Mol. Biol. 2012, 929, 221 241
  118. 118
    Williams, A. J.; Ekins, S.; Clark, A. M.; Jack, J. J.; Apodaca, R. L. Mobile apps for chemistry in the world of drug discovery Drug Discov. Today 2011, 16, 928 939
  119. 119
    Clark, A. M.; Ekins, S.; Williams, A. J. Redefining cheminformatics with intuitive collaborative mobile apps Mol. Informatics 2012, 31, 569 584
  120. 120
    Ekins, S.; Clark, A. M.; Williams, A. J. Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration Mol. Informatics 2012, 31, 585 597
  121. 121
    Clark, A. M.; Williams, A. J.; Ekins, S. Cheminformatics workflows using mobile apps Chem-Bio Informatics J. 2013, 13, 1 18
  122. 122
    Ekins, S.; Clark, A. M.; Sarker, M. TB Mobile: A Mobile App for Anti-tuberculosis Molecules with Known Targets J. Cheminform. 2013, 5, 13
  123. 123
    Ekins, S.; Clark, A. M.; Williams, A. J. Incorporating Green Chemistry Concepts into Mobile Chemistry Applications and Their Potential Uses ACS Sustain Chem. Eng. 2013, 1, 8 13
  124. 124
    Swamidass, S. J.; Matlock, M.; Rozenblit, L. Securely Measuring the Overlap between Private Datasets with Cryptosets PLoS One 2015, 10e0117898
  125. 125
    Swamidass, S. J.; Schillebeeckx, C. N.; Matlock, M.; Hurle, M. R.; Agarwal, P. Combined Analysis of Phenotypic and Target-Based Screening in Assay Networks J. Biomol. Screen. 2014, 19, 782 790
  126. 126
    Matlock, M.; Swamidass, S. J. Sharing chemical relationships does not reveal structures J. Chem. Inf. Model. 2014, 54, 37 48
  127. 127
    Guiguemde, W. A.; Shelat, A. A.; Bouck, D.; Duffy, S.; Crowther, G. J.; Davis, P. H.; Smithson, D. C.; Connelly, M.; Clark, J.; Zhu, F.; Jimenez-Diaz, M. B.; Martinez, M. S.; Wilson, E. B.; Tripathi, A. K.; Gut, J.; Sharlow, E. R.; Bathurst, I.; El Mazouni, F.; Fowble, J. W.; Forquer, I.; McGinley, P. L.; Castro, S.; Angulo-Barturen, I.; Ferrer, S.; Rosenthal, P. J.; Derisi, J. L.; Sullivan, D. J.; Lazo, J. S.; Roos, D. S.; Riscoe, M. K.; Phillips, M. A.; Rathod, P. K.; Van Voorhis, W. C.; Avery, V. M.; Guy, R. K. Chemical genetics of Plasmodium falciparum Nature 2010, 465, 311 315
  128. 128
    Gamo, F.-J.; Sanz, L. M.; Vidal, J.; de Cozar, C.; Alvarez, E.; Lavandera, J.-L.; Vanderwall, D. E.; Green, D. V. S.; Kumar, V.; Hasan, S.; Brown, J. R.; Peishoff, C. E.; Cardon, L. R.; Garcia-Bustos, J. F. Thousands of chemical starting points for antimalarial lead identification Nature 2010, 465, 305 310
  129. 129
    Gagaring, K.; Borboa, R.; Francek, C.; Chen, Z.; Buenviaje, J.; Plouffe, D.; Winzeler, E.; Brinker, A.; Diagena, T.; Taylor, J.; Glynne, R.; Chatterjee, A.; Kuhen, K. Novartis-GNF Malaria Box. ChEMBL-NTD (www.ebi.ac.uk/chemblntd).
  130. 130
    Ekins, S.; Williams, A. J. Meta-analysis of molecular property patterns and filtering of public datasets of antimalarial “hits” and drugs MedChemComm 2010, 1, 325 330
  131. 131
    Zhang, L.; Fourches, D.; Sedykh, A.; Zhu, H.; Golbraikh, A.; Ekins, S.; Clark, J.; Connelly, M. C.; Sigal, M.; Hodges, D.; Guiguemde, A.; Guy, R. K.; Tropsha, A. Discovery of Novel Antimalarial Compounds Enabled by QSAR-Based Virtual Screening J. Chem. Inf. Model. 2013, 53, 475 492
  132. 132
    Ekins, S.; Mestres, J.; Testa, B. In silico pharmacology for drug discovery: applications to targets and beyond Br. J. Pharmacol. 2007, 152, 21 37
  133. 133
    Ekins, S.; Mestres, J.; Testa, B. In silico pharmacology for drug discovery: methods for virtual ligand screening and profiling Br. J. Pharmacol. 2007, 152, 9 20
  134. 134
    Dimitrov, S.; Dimitrova, G.; Pavlov, T.; Dimitrova, N.; Patlewicz, G.; Niemela, J.; Mekenyan, O. A stepwise approach for defining the applicability domain of SAR and QSAR models J. Chem. Inf. Model. 2005, 45, 839 849
  135. 135
    Tetko, I. V.; Bruneau, P.; Mewes, H. W.; Rohrer, D. C.; Poda, G. I. Can we estimate the accuracy of ADME-Tox predictions? Drug Discov. Today 2006, 11, 700 707
  136. 136
    Tropsha, A.; Golbraikh, A. Predictive QSAR modeling workflow, model applicability domains, and virtual screening Curr. Pharm. Des. 2007, 13, 3494 3504
  137. 137
    Tetko, I. V.; Sushko, I.; Pandey, A. K.; Zhu, H.; Tropsha, A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.; Varnek, A. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection J. Chem. Inf. Model. 2008, 48, 1733 1746
  138. 138
    Ananthan, S.; Faaleolea, E. R.; Goldman, R. C.; Hobrath, J. V.; Kwong, C. D.; Laughon, B. E.; Maddry, J. A.; Mehta, A.; Rasmussen, L.; Reynolds, R. C.; Secrist, J. A., III; Shindo, N.; Showe, D. N.; Sosa, M. I.; Suling, W. J.; White, E. L. High-throughput screening for inhibitors of Mycobacterium tuberculosis H37Rv Tuberculosis (Edinb.) 2009, 89, 334 353
  139. 139
    Maddry, J. A.; Ananthan, S.; Goldman, R. C.; Hobrath, J. V.; Kwong, C. D.; Maddox, C.; Rasmussen, L.; Reynolds, R. C.; Secrist, J. A., III; Sosa, M. I.; White, E. L.; Zhang, W. Antituberculosis activity of the molecular libraries screening center network library Tuberculosis (Edinb.) 2009, 89, 354 363
  140. 140
    Reynolds, R. C.; Ananthan, S.; Faaleolea, E.; Hobrath, J. V.; Kwong, C. D.; Maddox, C.; Rasmussen, L.; Sosa, M. I.; Thammasuvimol, E.; White, E. L.; Zhang, W.; Secrist, J. A., III. High throughput screening of a library based on kinase inhibitor scaffolds against Mycobacterium tuberculosis H37Rv Tuberculosis (Edinb.) 2012, 92, 72 83
  141. 141
    Ekins, S.; Freundlich, J. S.; Hobrath, J. V.; Lucile White, E.; Reynolds, R. C. Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery Pharm. Res. 2014, 31, 414 435
  142. 142
    Hansen, K.; Mika, S.; Schroeter, T.; Sutter, A.; ter Laak, A.; Steger-Hartmann, T.; Heinrich, N.; Muller, K. R. Benchmark data set for in silico prediction of Ames mutagenicity J. Chem. Inf. Model. 2009, 49, 2077 2081
  143. 143
    Temesi, D. G.; Martin, S.; Smith, R.; Jones, C.; Middleton, B. High-throughput metabolic stability studies in drug discovery by orthogonal acceleration time-of-flight (OATOF) with analogue-to-digital signal capture (ADC) Rapid Commun. Mass Spectrom. 2010, 24, 1730 1736
  144. 144
    Hajjo, R.; Grulke, C. M.; Golbraikh, A.; Setola, V.; Huang, X. P.; Roth, B. L.; Tropsha, A. Development, validation, and use of quantitative structure-activity relationship models of 5-hydroxytryptamine (2B) receptor ligands to identify novel receptor binders and putative valvulopathic compounds among common drugs J. Med. Chem. 2010, 53, 7573 7586
  145. 145
    Huuskonen, J. Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology J. Chem. Inf. Comput. Sci. 2000, 40, 773 777
  146. 146
    Kortagere, S.; Chekmarev, D.; Welsh, W. J.; Ekins, S. Hybrid scoring and classification approaches to predict human pregnane X receptor activators Pharm. Res. 2009, 26, 1001 1011
  147. 147
    Matthews, E. J.; Kruhlak, N. L.; Benz, R. D.; Contrera, J. F. Assessment of the health effects of chemicals in humans: I. QSAR estimation of the maximum recommended therapeutic dose (MRTD) and no effect level (NOEL) of organic chemicals based on clinical trial data Curr. Drug Discov. Technol. 2004, 1, 61 76
  148. 148
    Litterman, N. K.; Lipinski, C. A.; Bunin, B. A.; Ekins, S. Computational Prediction and Validation of an Expert’s Evaluation of Chemical Probes J. Chem. Inf. Model. 2014, 54, 2996 3004
  149. 149
    Wang, S.; Li, Y.; Wang, J.; Chen, L.; Zhang, L.; Yu, H.; Hou, T. ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage Mol. Pharmaceutics 2012, 9, 996 1010
  150. 150
    Du, F.; Yu, H.; Zou, B.; Babcock, J.; Long, S.; Li, M. hERGCentral: a large database to store, retrieve, and analyze compound-human Ether-a-go-go related gene channel interactions to facilitate cardiotoxicity assessment in drug development Assay Drug Dev Technol. 2011, 9, 580 588
  151. 151
    Suzuki, T.; Kameda, M.; Ando, M.; Miyazoe, H.; Sekino, E.; Ito, S.; Masutani, K.; Kamijo, K.; Takezawa, A.; Moriya, M.; Ito, M.; Ito, J.; Nakase, K.; Matsushita, H.; Ishihara, A.; Takenaga, N.; Tokita, S.; Kanatani, A.; Sato, N.; Fukami, T. Discovery of novel diarylketoxime derivatives as selective and orally active melanin-concentrating hormone 1 receptor antagonists Bioorg. Med. Chem. Lett. 2009, 19, 5339 5345

Cited By

ARTICLE SECTIONS
Jump To

This article is cited by 91 publications.

  1. Ana C. Puhl, Renuka Raman, Tammy M. Havener, Eni Minerali, Anthony J. Hickey, Sean Ekins. Identification of New Modulators and Inhibitors of Palmitoyl-Protein Thioesterase 1 for CLN1 Batten Disease and Cancer. ACS Omega 2024, 9 (10) , 11870-11882. https://doi.org/10.1021/acsomega.3c09607
  2. Pasquale Linciano, Antonio Quotadamo, Rosaria Luciani, Matteo Santucci, Kimberley M. Zorn, Daniel H. Foil, Thomas R. Lane, Anabela Cordeiro da Silva, Nuno Santarem, Carolina B Moraes, Lucio Freitas-Junior, Ulrike Wittig, Wolfgang Mueller, Michele Tonelli, Stefania Ferrari, Alberto Venturelli, Sheraz Gul, Maria Kuzikov, Bernhard Ellinger, Jeanette Reinshagen, Sean Ekins, Maria Paola Costi. High-Throughput Phenotypic Screening and Machine Learning Methods Enabled the Selection of Broad-Spectrum Low-Toxicity Antitrypanosomatidic Agents. Journal of Medicinal Chemistry 2023, 66 (22) , 15230-15255. https://doi.org/10.1021/acs.jmedchem.3c01322
  3. Mohamed Diwan M. AbdulHameed, Ruifeng Liu, Anders Wallqvist. Using a Graph Convolutional Neural Network Model to Identify Bile Salt Export Pump Inhibitors. ACS Omega 2023, 8 (24) , 21853-21861. https://doi.org/10.1021/acsomega.3c01583
  4. Ajay Vikram Singh, Girija Bansod, Mihir Mahajan, Paul Dietrich, Shivam Pratap Singh, Kranti Rav, Andreas Thissen, Aadya Mandar Bharde, Dirk Rothenstein, Shilpa Kulkarni, Joachim Bill. Digital Transformation in Toxicology: Improving Communication and Efficiency in Risk Assessment. ACS Omega 2023, 8 (24) , 21377-21390. https://doi.org/10.1021/acsomega.3c00596
  5. Thomas R. Lane, Fabio Urbina, Xiaohong Zhang, Margret Fye, Jacob Gerlach, Stephen H. Wright, Sean Ekins. Machine Learning Models Identify New Inhibitors for Human OATP1B1. Molecular Pharmaceutics 2022, 19 (11) , 4320-4332. https://doi.org/10.1021/acs.molpharmaceut.2c00662
  6. Fabio Urbina, Christopher T. Lowden, J. Christopher Culberson, Sean Ekins. MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS Omega 2022, 7 (22) , 18699-18713. https://doi.org/10.1021/acsomega.2c01404
  7. Gonzalo Cerruela-García, José Manuel Cuevas-Muñoz, Nicolás García-Pedrajas. Graph-Based Feature Selection Approach for Molecular Activity Prediction. Journal of Chemical Information and Modeling 2022, 62 (7) , 1618-1632. https://doi.org/10.1021/acs.jcim.1c01578
  8. Victor O. Gawriljuk, Daniel H. Foil, Ana C. Puhl, Kimberley M. Zorn, Thomas R. Lane, Olga Riabova, Vadim Makarov, Andre S. Godoy, Glaucius Oliva, Sean Ekins. Development of Machine Learning Models and the Discovery of a New Antiviral Compound against Yellow Fever Virus. Journal of Chemical Information and Modeling 2021, 61 (8) , 3804-3813. https://doi.org/10.1021/acs.jcim.1c00460
  9. Jimmy S. Patel, Javiera Norambuena, Hassan Al-Tameemi, Yong-Mo Ahn, Alexander L. Perryman, Xin Wang, Samer S. Daher, James Occi, Riccardo Russo, Steven Park, Matthew Zimmerman, Hsin-Pin Ho, David S. Perlin, Véronique Dartois, Sean Ekins, Pradeep Kumar, Nancy Connell, Jeffrey M. Boyd, Joel S. Freundlich. Bayesian Modeling and Intrabacterial Drug Metabolism Applied to Drug-Resistant Staphylococcus aureus. ACS Infectious Diseases 2021, 7 (8) , 2508-2521. https://doi.org/10.1021/acsinfecdis.1c00265
  10. Kushal Batra, Kimberley M. Zorn, Daniel H. Foil, Eni Minerali, Victor O. Gawriljuk, Thomas R. Lane, Sean Ekins. Quantum Machine Learning Algorithms for Drug Discovery Applications. Journal of Chemical Information and Modeling 2021, 61 (6) , 2641-2647. https://doi.org/10.1021/acs.jcim.1c00166
  11. Fabio Urbina, Kimberley M. Zorn, Daniela Brunner, Sean Ekins. Comparing the Pfizer Central Nervous System Multiparameter Optimization Calculator and a BBB Machine Learning Model. ACS Chemical Neuroscience 2021, 12 (12) , 2247-2253. https://doi.org/10.1021/acschemneuro.1c00265
  12. Patricia A. Vignaux, Eni Minerali, Thomas R. Lane, Daniel H. Foil, Peter B. Madrid, Ana C. Puhl, Sean Ekins. The Antiviral Drug Tilorone Is a Potent and Selective Inhibitor of Acetylcholinesterase. Chemical Research in Toxicology 2021, 34 (5) , 1296-1307. https://doi.org/10.1021/acs.chemrestox.0c00466
  13. Maureen J. Donlin, Thomas R. Lane, Olga Riabova, Alexander Lepioshkin, Evan Xu, Jeffrey Lin, Vadim Makarov, Sean Ekins. Discovery of 5-Nitro-6-thiocyanatopyrimidines as Inhibitors of Cryptococcus neoformans and Cryptococcus gattii. ACS Medicinal Chemistry Letters 2021, 12 (5) , 774-781. https://doi.org/10.1021/acsmedchemlett.1c00038
  14. Kimberley M. Zorn, Shengxi Sun, Cecelia L. McConnon, Kelley Ma, Eric K. Chen, Daniel H. Foil, Thomas R. Lane, Lawrence J. Liu, Nelly El-Sakkary, Danielle E. Skinner, Sean Ekins, Conor R. Caffrey. A Machine Learning Strategy for Drug Discovery Identifies Anti-Schistosomal Small Molecules. ACS Infectious Diseases 2021, 7 (2) , 406-420. https://doi.org/10.1021/acsinfecdis.0c00754
  15. Jennifer J. Klein, Nancy C. Baker, Daniel H. Foil, Kimberley M. Zorn, Fabio Urbina, Ana C. Puhl, Sean Ekins. Using Bibliometric Analysis and Machine Learning to Identify Compounds Binding to Sialidase-1. ACS Omega 2021, 6 (4) , 3186-3193. https://doi.org/10.1021/acsomega.0c05591
  16. Thomas R. Lane, Daniel H. Foil, Eni Minerali, Fabio Urbina, Kimberley M. Zorn, Sean Ekins. Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery. Molecular Pharmaceutics 2021, 18 (1) , 403-415. https://doi.org/10.1021/acs.molpharmaceut.0c01013
  17. Kimberley M. Zorn, Daniel H. Foil, Thomas R. Lane, Wendy Hillwalker, David J. Feifarek, Frank Jones, William D. Klaren, Ashley M. Brinkman, Sean Ekins. Comparing Machine Learning Models for Aromatase (P450 19A1). Environmental Science & Technology 2020, 54 (23) , 15546-15555. https://doi.org/10.1021/acs.est.0c05771
  18. Kimberley M. Zorn, Daniel H. Foil, Thomas R. Lane, Wendy Hillwalker, David J. Feifarek, Frank Jones, William D. Klaren, Ashley M. Brinkman, Sean Ekins. Comparison of Machine Learning Models for the Androgen Receptor. Environmental Science & Technology 2020, 54 (21) , 13690-13700. https://doi.org/10.1021/acs.est.0c03984
  19. Eni Minerali, Daniel H. Foil, Kimberley M. Zorn, Sean Ekins. Evaluation of Assay Central Machine Learning Models for Rat Acute Oral Toxicity Prediction. ACS Sustainable Chemistry & Engineering 2020, 8 (42) , 16020-16027. https://doi.org/10.1021/acssuschemeng.0c06348
  20. Patricia A. Vignaux, Eni Minerali, Daniel H. Foil, Ana C. Puhl, Sean Ekins. Machine Learning for Discovery of GSK3β Inhibitors. ACS Omega 2020, 5 (41) , 26551-26561. https://doi.org/10.1021/acsomega.0c03302
  21. Kimberley M. Zorn, Daniel H. Foil, Thomas R. Lane, Daniel P. Russo, Wendy Hillwalker, David J. Feifarek, Frank Jones, William D. Klaren, Ashley M. Brinkman, Sean Ekins. Machine Learning Models for Estrogen Receptor Bioactivity and Endocrine Disruption Prediction. Environmental Science & Technology 2020, 54 (19) , 12202-12213. https://doi.org/10.1021/acs.est.0c03982
  22. Eni Minerali, Daniel H. Foil, Kimberley M. Zorn, Thomas R. Lane, Sean Ekins. Comparing Machine Learning Algorithms for Predicting Drug-Induced Liver Injury (DILI). Molecular Pharmaceutics 2020, 17 (7) , 2628-2637. https://doi.org/10.1021/acs.molpharmaceut.0c00326
  23. Xin Yang, Yifei Wang, Ryan Byrne, Gisbert Schneider, Shengyong Yang. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chemical Reviews 2019, 119 (18) , 10520-10594. https://doi.org/10.1021/acs.chemrev.8b00728
  24. Kimberley M. Zorn, Thomas R. Lane, Daniel P. Russo, Alex M. Clark, Vadim Makarov, Sean Ekins. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Molecular Pharmaceutics 2019, 16 (4) , 1620-1632. https://doi.org/10.1021/acs.molpharmaceut.8b01297
  25. Manu Anantpadma, Thomas Lane, Kimberley M. Zorn, Mary A. Lingerfelt, Alex M. Clark, Joel S. Freundlich, Robert A. Davey, Peter B. Madrid, Sean Ekins. Ebola Virus Bayesian Machine Learning Models Enable New in Vitro Leads. ACS Omega 2019, 4 (1) , 2353-2361. https://doi.org/10.1021/acsomega.8b02948
  26. Thomas Lane, Daniel P. Russo, Kimberley M. Zorn, Alex M. Clark, Alexandru Korotcov, Valery Tkachenko, Robert C. Reynolds, Alexander L. Perryman, Joel S. Freundlich, Sean Ekins. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Molecular Pharmaceutics 2018, 15 (10) , 4346-4360. https://doi.org/10.1021/acs.molpharmaceut.8b00083
  27. Daniel P. Russo, Kimberley M. Zorn, Alex M. Clark, Hao Zhu, Sean Ekins. Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Molecular Pharmaceutics 2018, 15 (10) , 4361-4370. https://doi.org/10.1021/acs.molpharmaceut.8b00546
  28. Alexandru Korotcov, Valery Tkachenko, Daniel P. Russo, and Sean Ekins . Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. Molecular Pharmaceutics 2017, 14 (12) , 4462-4475. https://doi.org/10.1021/acs.molpharmaceut.7b00578
  29. John P. Santa Maria, Jr., Yumi Park, Lihu Yang, Nicholas Murgolo, Michael D. Altman, Paul Zuck, Greg Adam, Chad Chamberlin, Peter Saradjian, Peter Dandliker, Helena I. M. Boshoff, Clifton E. Barry, III, Charles Garlisi, David B. Olsen, Katherine Young, Meir Glick, Elliott Nickbarg, and Peter S. Kutchukian . Linking High-Throughput Screens to Identify MoAs and Novel Inhibitors of Mycobacterium tuberculosis Dihydrofolate Reductase. ACS Chemical Biology 2017, 12 (9) , 2448-2456. https://doi.org/10.1021/acschembio.7b00468
  30. Peter Gedeck, Suzanne Skolnik, and Stephane Rodde . Developing Collaborative QSAR Models Without Sharing Structures. Journal of Chemical Information and Modeling 2017, 57 (8) , 1847-1858. https://doi.org/10.1021/acs.jcim.7b00315
  31. Andreas Verras, Chris L. Waller, Peter Gedeck, Darren V. S. Green, Thierry Kogej, Anandkumar Raichurkar, Manoranjan Panda, Anang A. Shelat, Julie Clark, R. Kiplin Guy, George Papadatos, and Jeremy Burrows . Shared Consensus Machine Learning Models for Predicting Blood Stage Malaria Inhibition. Journal of Chemical Information and Modeling 2017, 57 (3) , 445-453. https://doi.org/10.1021/acs.jcim.6b00572
  32. Mohamed Diwan M. AbdulHameed, Danielle L. Ippolito, and Anders Wallqvist . Predicting Rat and Human Pregnane X Receptor Activators Using Bayesian Classification Models. Chemical Research in Toxicology 2016, 29 (10) , 1729-1740. https://doi.org/10.1021/acs.chemrestox.6b00227
  33. Sean Ekins, Alexander L. Perryman, Alex M. Clark, Robert C. Reynolds, and Joel S. Freundlich . Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014–2015). Journal of Chemical Information and Modeling 2016, 56 (7) , 1332-1343. https://doi.org/10.1021/acs.jcim.6b00004
  34. Alex M. Clark, Krishna Dole, and Sean Ekins . Open Source Bayesian Models. 3. Composite Models for Prediction of Binned Responses. Journal of Chemical Information and Modeling 2016, 56 (2) , 275-285. https://doi.org/10.1021/acs.jcim.5b00555
  35. Alex M. Clark and Sean Ekins . Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL. Journal of Chemical Information and Modeling 2015, 55 (6) , 1246-1260. https://doi.org/10.1021/acs.jcim.5b00144
  36. Fabio Urbina, Sean Ekins. Reliability and Applicability Assessment for Machine Learning Models. 2024, 299-314. https://doi.org/10.1002/9783527840748.ch13
  37. Özden TARI, Nuray ARPACI. İLAÇ TASARIMINDA YAPAY ZEKÂ UYGULAMALARI. Ankara Universitesi Eczacilik Fakultesi Dergisi 2024, 48 (1) , 7-7. https://doi.org/10.33483/jfpau.1327078
  38. Monica Chauhan, Chintu Prajapati, Sadaf Mirza, Rahul Barot, Rasana Yadav, Mahesh Barmade, Dhruvi Kakadiya, Ravi Vijayvargia, Bijaya Haobam, Anurag TK Baidya, Rajnish Kumar, M. R. Yadav, Prashant Murumkar. Design, synthesis, biological evaluation and molecular dynamics of some novel 3-phenylpyrazolo[1,5- a ]pyrimidine-2,7(1 H ,4 H )-dione based compounds as anti-tubercular agents. Journal of Biomolecular Structure and Dynamics 2023, 355 , 1-19. https://doi.org/10.1080/07391102.2023.2249109
  39. Monalisa Kesh, Sachin Goel. Target-Based Screening for Lead Discovery. 2023, 141-173. https://doi.org/10.1007/978-981-99-1316-9_7
  40. Ana C. Puhl, Zhan-Guo Gao, Kenneth A. Jacobson, Sean Ekins. Machine Learning for Discovery of New ADORA Modulators. Frontiers in Pharmacology 2022, 13 https://doi.org/10.3389/fphar.2022.920643
  41. Loganathan S Dhivya, Balappaudayar R Pradeepa, Sabarathinam Sarvesh. Synthesis and in vitro studies for structure-based design of novel chalcones as antitubercular agents targeting InhA. Future Medicinal Chemistry 2022, 14 (12) , 851-866. https://doi.org/10.4155/fmc-2022-0052
  42. Melina Mottin, Lindsay K. Caesar, David Brodsky, Nathalya C.M.R. Mesquita, Ketllyn Zagato de Oliveira, Gabriela Dias Noske, Bruna K.P. Sousa, Paulo R.P.S. Ramos, Hannah Jarmer, Bonnie Loh, Kimberley M. Zorn, Daniel H. Foil, Pedro M. Torres, Rafael V.C. Guido, Glaucius Oliva, Frank Scholle, Sean Ekins, Nadja B. Cech, Carolina H. Andrade, Scott M. Laster. Chalcones from Angelica keiskei (ashitaba) inhibit key Zika virus replication proteins. Bioorganic Chemistry 2022, 120 , 105649. https://doi.org/10.1016/j.bioorg.2022.105649
  43. Alan A. Schmalstig, Kimberley M. Zorn, Sebastian Murcia, Andrew Robinson, Svetlana Savina, Elena Komarova, Vadim Makarov, Miriam Braunstein, Sean Ekins. Mycobacterium abscessus drug discovery using machine learning. Tuberculosis 2022, 132 , 102168. https://doi.org/10.1016/j.tube.2022.102168
  44. Vishwesh Venkatraman. FP-ADMET: a compendium of fingerprint-based ADMET prediction models. Journal of Cheminformatics 2021, 13 (1) https://doi.org/10.1186/s13321-021-00557-5
  45. Siennah R. Miller, Meghan E. McGrath, Kimberley M. Zorn, Sean Ekins, Stephen H. Wright, Nathan J. Cherrington. Remdesivir and EIDD-1931 Interact with Human Equilibrative Nucleoside Transporters 1 and 2: Implications for Reaching SARS-CoV-2 Viral Sanctuary Sites. Molecular Pharmacology 2021, 100 (6) , 548-557. https://doi.org/10.1124/molpharm.121.000333
  46. Efrén Pérez Santín, Raquel Rodríguez Solana, Mariano González García, María Del Mar García Suárez, Gerardo David Blanco Díaz, María Dolores Cima Cabal, José Manuel Moreno Rojas, José Ignacio López Sánchez. Toxicity prediction based on artificial intelligence: A multidisciplinary overview. WIREs Computational Molecular Science 2021, 11 (5) https://doi.org/10.1002/wcms.1516
  47. Neetu Tripathi, Manoj Kumar Goshisht, Sanat Kumar Sahu, Charu Arora. Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review. Molecular Diversity 2021, 25 (3) , 1643-1664. https://doi.org/10.1007/s11030-021-10237-z
  48. Samit Ganguly, David Finkelstein, Timothy I. Shaw, Ryan D. Michalek, Kimberly M. Zorn, Sean Ekins, Kazuto Yasuda, Yu Fukuda, John D. Schuetz, Kamalika Mukherjee, Erin G. Schuetz, . Metabolomic and transcriptomic analysis reveals endogenous substrates and metabolic adaptation in rats lacking Abcg2 and Abcb1a transporters. PLOS ONE 2021, 16 (7) , e0253852. https://doi.org/10.1371/journal.pone.0253852
  49. Akalesh Kumar Verma, Vikas Kumar, Sweta Singh, Bhabesh Ch. Goswami, Ihosvany Camps, Aishwarya Sekar, Sanghwa Yoon, Keun Woo Lee. Repurposing potential of Ayurvedic medicinal plants derived active principles against SARS-CoV-2 associated target proteins revealed by molecular docking, molecular dynamics and MM-PBSA studies. Biomedicine & Pharmacotherapy 2021, 137 , 111356. https://doi.org/10.1016/j.biopha.2021.111356
  50. Conor R. Caffrey, Dietmar Steverding, Rafaela S. Ferreira, Renata B. de Oliveira, Anthony J. O'Donoghue, Ludovica Monti, Carlo Ballatore, Kelly A. Bachovchin, Lori Ferrins, Michael P. Pollastri, Kimberley M. Zorn, Daniel H. Foil, Alex M. Clark, Melina Mottin, Carolina H. Andrade, Jair L. de Siqueira‐Neto, Sean Ekins. Drug Discovery and Development for Kinetoplastid Diseases. 2021, 1-79. https://doi.org/10.1002/0471266949.bmc235.pub2
  51. Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E.H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce V. Bastos, Stephen Boyd, J.B. Brown, Stephen J. Capuzzi, Yaroslav Chushak, Heather Ciallella, Alex M. Clark, Viviana Consonni, Pankaj R. Daga, Sean Ekins, Sherif Farag, Maxim Fedorov, Denis Fourches, Domenico Gadaleta, Feng Gao, Jeffery M. Gearhart, Garett Goh, Jonathan M. Goodman, Francesca Grisoni, Christopher M. Grulke, Thomas Hartung, Matthew Hirn, Pavel Karpov, Alexandru Korotcov, Giovanna J. Lavado, Michael Lawless, Xinhao Li, Thomas Luechtefeld, Filippo Lunghini, Giuseppe F. Mangiatordi, Gilles Marcou, Dan Marsh, Todd Martin, Andrea Mauri, Eugene N. Muratov, Glenn J. Myatt, Dac-Trung Nguyen, Orazio Nicolotti, Reine Note, Paritosh Pande, Amanda K. Parks, Tyler Peryea, Ahsan H. Polash, Robert Rallo, Alessandra Roncaglioni, Craig Rowlands, Patricia Ruiz, Daniel P. Russo, Ahmed Sayed, Risa Sayre, Timothy Sheils, Charles Siegel, Arthur C. Silva, Anton Simeonov, Sergey Sosnin, Noel Southall, Judy Strickland, Yun Tang, Brian Teppen, Igor V. Tetko, Dennis Thomas, Valery Tkachenko, Roberto Todeschini, Cosimo Toma, Ignacio Tripodi, Daniela Trisciuzzi, Alexander Tropsha, Alexandre Varnek, Kristijan Vukovic, Zhongyu Wang, Liguo Wang, Katrina M. Waters, Andrew J. Wedlake, Sanjeeva J. Wijeyesakere, Dan Wilson, Zijun Xiao, Hongbin Yang, Gergely Zahoranszky-Kohalmi, Alexey V. Zakharov, Fagen F. Zhang, Zhen Zhang, Tongan Zhao, Hao Zhu, Kimberley M. Zorn, Warren Casey, Nicole C. Kleinstreuer. CATMoS: Collaborative Acute Toxicity Modeling Suite. Environmental Health Perspectives 2021, 129 (4) https://doi.org/10.1289/EHP8495
  52. David A. Winkler. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Frontiers in Chemistry 2021, 9 https://doi.org/10.3389/fchem.2021.614073
  53. Siennah R. Miller, Xiaohong Zhang, Raymond K. Hau, Joseph L. Jilek, Erin Q. Jennings, James J. Galligan, Daniel H. Foil, Kimberley M. Zorn, Sean Ekins, Stephen H. Wright, Nathan J. Cherrington. Predicting Drug Interactions with Human Equilibrative Nucleoside Transporters 1 and 2 Using Functional Knockout Cell Lines and Bayesian Modeling. Molecular Pharmacology 2021, 99 (2) , 147-162. https://doi.org/10.1124/molpharm.120.000169
  54. Anna Egorova, Elke Bogner, Elena Novoselova, Kimberley M. Zorn, Sean Ekins, Vadim Makarov. Dispirotripiperazine-core compounds, their biological activity with a focus on broad antiviral property, and perspectives in drug design (mini-review). European Journal of Medicinal Chemistry 2021, 211 , 113014. https://doi.org/10.1016/j.ejmech.2020.113014
  55. Milan Voršilák, Michal Kolář, Ivan Čmelo, Daniel Svozil. SYBA: Bayesian estimation of synthetic accessibility of organic compounds. Journal of Cheminformatics 2020, 12 (1) https://doi.org/10.1186/s13321-020-00439-2
  56. Edward Anderson, Tammy M. Havener, Kimberley M. Zorn, Daniel H. Foil, Thomas R. Lane, Stephen J. Capuzzi, Dave Morris, Anthony J. Hickey, David H. Drewry, Sean Ekins. Synergistic drug combinations and machine learning for drug repurposing in chordoma. Scientific Reports 2020, 10 (1) https://doi.org/10.1038/s41598-020-70026-w
  57. Thomas R. Lane, Julie Dyall, Luke Mercer, Caleb Goodin, Daniel H. Foil, Huanying Zhou, Elena Postnikova, Janie Y. Liang, Michael R. Holbrook, Peter B. Madrid, Sean Ekins. Repurposing Pyramax®, quinacrine and tilorone as treatments for Ebola virus disease. Antiviral Research 2020, 182 , 104908. https://doi.org/10.1016/j.antiviral.2020.104908
  58. Janaina Cruz Pereira, Samer S. Daher, Kimberley M. Zorn, Matthew Sherwood, Riccardo Russo, Alexander L. Perryman, Xin Wang, Madeleine J. Freundlich, Sean Ekins, Joel S. Freundlich. Machine Learning Platform to Discover Novel Growth Inhibitors of Neisseria gonorrhoeae. Pharmaceutical Research 2020, 37 (7) https://doi.org/10.1007/s11095-020-02876-y
  59. Luz Adriana Borrero, Lilibeth Sanchez Guette, Enrique Lopez, Omar Bonerge Pineda, Edgardo Buelvas Castro. Predicting Toxicity Properties through Machine Learning. Procedia Computer Science 2020, 170 , 1011-1016. https://doi.org/10.1016/j.procs.2020.03.093
  60. Thomas R. Lane, Christopher Massey, Jason E. Comer, Manu Anantpadma, Joel S. Freundlich, Robert A. Davey, Peter B. Madrid, Sean Ekins, . Repurposing the antimalarial pyronaridine tetraphosphate to protect against Ebola virus infection. PLOS Neglected Tropical Diseases 2019, 13 (11) , e0007890. https://doi.org/10.1371/journal.pntd.0007890
  61. Liangliang Wang, Junjie Ding, Li Pan, Dongsheng Cao, Hui Jiang, Xiaoqin Ding. Artificial intelligence facilitates drug design in the big data era. Chemometrics and Intelligent Laboratory Systems 2019, 194 , 103850. https://doi.org/10.1016/j.chemolab.2019.103850
  62. Sean Ekins, Jacob Gerlach, Kimberley M. Zorn, Brett M. Antonio, Zhixin Lin, Aaron Gerlach. Repurposing Approved Drugs as Inhibitors of Kv7.1 and Nav1.8 to Treat Pitt Hopkins Syndrome. Pharmaceutical Research 2019, 36 (9) https://doi.org/10.1007/s11095-019-2671-y
  63. Sean Ekins, Ana C. Puhl, Kimberley M. Zorn, Thomas R. Lane, Daniel P. Russo, Jennifer J. Klein, Anthony J. Hickey, Alex M. Clark. Exploiting machine learning for end-to-end drug discovery and development. Nature Materials 2019, 18 (5) , 435-441. https://doi.org/10.1038/s41563-019-0338-z
  64. Alex G. Dalecki, Kimberley M. Zorn, Alex M. Clark, Sean Ekins, Whitney T. Narmore, Nichole Tower, Lynn Rasmussen, Robert Bostwick, Olaf Kutsch, Frank Wolschendorf. High-throughput screening and Bayesian machine learning for copper-dependent inhibitors of Staphylococcus aureus. Metallomics 2019, 11 (3) , 696-706. https://doi.org/10.1039/C8MT00342D
  65. Alex G Dalecki, Kimberley M Zorn, Alex M Clark, Sean Ekins, Whitney T Narmore, Nichole Tower, Lynn Rasmussen, Robert Bostwick, Olaf Kutsch, Frank Wolschendorf. High-throughput screening and Bayesian machine learning for copper-dependent inhibitors of Staphylococcus aureus. Metallomics 2019, 11 (3) , 696-706. https://doi.org/10.1039/c8mt00342d
  66. Iqbal Azad, Asif Jafri, Tahmeena Khan, Yusuf Akhter, Md Arshad, Firoj Hassan, Naseem Ahmad, Abdul Rahman Khan, Malik Nasibullah. Evaluation of pyrrole-2,3-dicarboxylate derivatives: Synthesis, DFT analysis, molecular docking, virtual screening and in vitro anti-hepatic cancer study. Journal of Molecular Structure 2019, 1176 , 314-334. https://doi.org/10.1016/j.molstruc.2018.08.049
  67. Philip J. Sandoval, Kimberley M. Zorn, Alex M. Clark, Sean Ekins, Stephen H. Wright. Assessment of Substrate-Dependent Ligand Interactions at the Organic Cation Transporter OCT2 Using Six Model Substrates. Molecular Pharmacology 2018, 94 (3) , 1057-1068. https://doi.org/10.1124/mol.117.111443
  68. Sean Ekins, Alex M. Clark, Alexander L. Perryman, Joel S. Freundlich, Alexandru Korotcov, Valery Tkachenko. Accessible Machine Learning Approaches for Toxicology. 2018, 1-29. https://doi.org/10.1002/9781119282594.ch1
  69. Denis Fourches, Antony J. Williams, Grace Patlewicz, Imran Shah, Chris Grulke, John Wambaugh, Ann Richard, Alexander Tropsha. Computational Tools for ADMET Profiling. 2018, 211-244. https://doi.org/10.1002/9781119282594.ch8
  70. Alex M. Clark, Kimberley M. Zorn, Mary A. Lingerfelt, Sean Ekins. Developing Next Generation Tools for Computational Toxicology. 2018, 363-387. https://doi.org/10.1002/9781119282594.ch14
  71. Sean Ekins, Mary A. Lingerfelt, Jason E. Comer, Alexander N. Freiberg, Jon C. Mirsalis, Kathleen O'Loughlin, Anush Harutyunyan, Claire McFarlane, Carol E. Green, Peter B. Madrid. Efficacy of Tilorone Dihydrochloride against Ebola Virus Infection. Antimicrobial Agents and Chemotherapy 2018, 62 (2) https://doi.org/10.1128/AAC.01711-17
  72. Sean Ekins, Alex M. Clark, Krishna Dole, Kellan Gregory, Andrew M. Mcnutt, Anna Coulon Spektor, Charlie Weatherall, Nadia K. Litterman, Barry A. Bunin. Data Mining and Computational Modeling of High-Throughput Screening Datasets. 2018, 197-221. https://doi.org/10.1007/978-1-4939-7724-6_14
  73. Akshata Gad, Andrew Titus Manuel, Jinuraj K. R., Lijo John, Sajeev R., Shanmuga Priya V. G., Abdul Jaleel U.C.. Virtual screening and repositioning of inconclusive molecules of beta-lactamase Bioassays—A data mining approach. Computational Biology and Chemistry 2017, 70 , 65-88. https://doi.org/10.1016/j.compbiolchem.2017.07.005
  74. In-Wha Kim, Jung Mi Oh. Deep learning: from chemoinformatics to precision medicine. Journal of Pharmaceutical Investigation 2017, 47 (4) , 317-323. https://doi.org/10.1007/s40005-017-0332-x
  75. Antoine Daina, Olivier Michielin, Vincent Zoete. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Scientific Reports 2017, 7 (1) https://doi.org/10.1038/srep42717
  76. Sean Ekins, Adwait Anand Godbole, György Kéri, Lászlo Orfi, János Pato, Rajeshwari Subray Bhat, Rinkee Verma, Erin K. Bradley, Valakunja Nagaraja. Machine learning and docking models for Mycobacterium tuberculosis topoisomerase I. Tuberculosis 2017, 103 , 52-60. https://doi.org/10.1016/j.tube.2017.01.005
  77. Sean Ekins, Anna Coulon Spektor, Alex M. Clark, Krishna Dole, Barry A. Bunin. Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discovery Today 2017, 22 (3) , 555-565. https://doi.org/10.1016/j.drudis.2016.10.009
  78. Priyanka Banerjee, Vishal B. Siramshetty, Malgorzata N. Drwal, Robert Preissner. Computational methods for prediction of in vitro effects of new chemical structures. Journal of Cheminformatics 2016, 8 (1) https://doi.org/10.1186/s13321-016-0162-2
  79. Sean Ekins. The Next Era: Deep Learning in Pharmaceutical Research. Pharmaceutical Research 2016, 33 (11) , 2594-2603. https://doi.org/10.1007/s11095-016-2029-7
  80. David Hoksza, Petr Skoda. Using Bayesian modeling on molecular fragments features for virtual screening. 2016, 1-6. https://doi.org/10.1109/CIBCB.2016.7758111
  81. Alex M. Clark, Antony J. Williams, Sean Ekins. Mobile Apps for Green Chemistry. 2016, 1-9. https://doi.org/10.1002/9781119951438.eibc2413
  82. Lucy J. Martínez-Guerrero, Mark Morales, Sean Ekins, Stephen H. Wright. Lack of Influence of Substrate on Ligand Interaction with the Human Multidrug and Toxin Extruder, MATE1. Molecular Pharmacology 2016, 90 (3) , 254-264. https://doi.org/10.1124/mol.116.105056
  83. Kamel Djaout, Vinayak Singh, Yap Boum, Victoria Katawera, Hubert F. Becker, Natassja G. Bush, Stephen J. Hearnshaw, Jennifer E. Pritchard, Pauline Bourbon, Peter B. Madrid, Anthony Maxwell, Valerie Mizrahi, Hannu Myllykallio, Sean Ekins. Predictive modeling targets thymidylate synthase ThyX in Mycobacterium tuberculosis. Scientific Reports 2016, 6 (1) https://doi.org/10.1038/srep27792
  84. Alexander L. Perryman, Thomas P. Stratton, Sean Ekins, Joel S. Freundlich. Predicting Mouse Liver Microsomal Stability with “Pruned” Machine Learning Models and Public Data. Pharmaceutical Research 2016, 33 (2) , 433-449. https://doi.org/10.1007/s11095-015-1800-5
  85. Ingo Muegge, Prasenjit Mukherjee. An overview of molecular fingerprint similarity search in virtual screening. Expert Opinion on Drug Discovery 2016, 11 (2) , 137-148. https://doi.org/10.1517/17460441.2016.1117070
  86. Sean Ekins, Nadia K. Litterman, Christopher A. Lipinski, Barry A. Bunin. Thermodynamic Proxies to Compensate for Biases in Drug Discovery Methods. Pharmaceutical Research 2016, 33 (1) , 194-205. https://doi.org/10.1007/s11095-015-1779-y
  87. Sean Ekins, Daniel Mietchen, Megan Coffee, Thomas P Stratton, Joel S Freundlich, Lucio Freitas-Junior, Eugene Muratov, Jair Siqueira-Neto, Antony J Williams, Carolina Andrade. Open drug discovery for the Zika virus. F1000Research 2016, 5 , 150. https://doi.org/10.12688/f1000research.8013.1
  88. Sean Ekins, Alex M. Clark, Stephen H. Wright. Making Transporter Models for Drug–Drug Interaction Prediction Mobile. Drug Metabolism and Disposition 2015, 43 (10) , 1642-1645. https://doi.org/10.1124/dmd.115.064956
  89. Sean Ekins, Joel S. Freundlich, Alex M. Clark, Manu Anantpadma, Robert A. Davey, Peter Madrid. . F1000Research 2015, 1091. https://doi.org/10.12688/f1000research.7217.1
  90. Sean Ekins, Joel S. Freundlich, Alex M. Clark, Manu Anantpadma, Robert A. Davey, Peter Madrid. Machine learning models identify molecules active against the Ebola virus in vitro. F1000Research 2015, 4 , 1091. https://doi.org/10.12688/f1000research.7217.2
  91. Sean Ekins, Joel S. Freundlich, Alex M. Clark, Manu Anantpadma, Robert A. Davey, Peter Madrid. Machine learning models identify molecules active against the Ebola virus in vitro. F1000Research 2015, 4 , 1091. https://doi.org/10.12688/f1000research.7217.3
  • Abstract

    Figure 1

    Figure 1. Example of a serialized file containing a very small Bayesian model. The default file extension is .bayesian, and the MIME type is chemical/x-bayesian.

    Figure 2

    Figure 2. Example of the model output in CDD Models. (A) Model derived from whole-cell datasets from antimalarial screening across four CDD Public datasets (MMV, St. Jude, Novartis, and TCAMS), ∼20,000 EC50 values, cutoff < 10 nM. (B) Options for exporting a model from CDD.

    Figure 3

    Figure 3. Example of the Bayesian model implemented in the MMDS mobile app. (a) hERG model, based on literature data. (b) A molecule from a hERG paper. (151) (c) Results scored with this model (hERG measured IC50 = 24 nM) showing a visually intuitive atom coloring for this and other Bayesian models. This compound would appear to be an inhibitor of hERG and possibly KCNQ1 potassium channels.

    Figure 4

    Figure 4. Screenshots summarizing the ROC plots and active and inactive compounds for eight models implemented in MMDS.

  • References

    ARTICLE SECTIONS
    Jump To

    This article references 151 other publications.

    1. 1
      Ekins, S.; Waller, C. L.; Swaan, P. W.; Cruciani, G.; Wrighton, S. A.; Wikel, J. H. Progress in predicting human ADME parameters in silico J. Pharmacol. Toxicol. Methods 2000, 44, 251 272
    2. 2
      Wessel, M. D.; Mente, S. ADME by computer Annu. Rep. Med. Chem. 2001, 36, 257 266
    3. 3
      Boobis, A.; Gundert-Remy, U.; Kremers, P.; Macheras, P.; Pelkonen, O. In silico prediction of ADME and pharmacokinetics. Report of an expert meeting organised by COST B15 Eur. J. Pharm. Sci. 2002, 17, 183 193
    4. 4
      Butina, D.; Segall, M. D.; Frankcombe, K. Predicting ADME properties in silico: methods and models Drug Discov. Today 2002, 7, S83 S88
    5. 5
      Ekins, S.; Boulanger, B.; Swaan, P. W.; Hupcey, M. A. Towards a new age of virtual ADME/TOX and multidimensional drug discovery Mol. Divers. 2002, 5, 255 275
    6. 6
      Ekins, S.; Rose, J. P. In Silico ADME/TOX: The state of the art J. Mol. Graphics 2002, 20, 305 309
    7. 7
      Klein, C.; Kaiser, D.; Kopp, S.; Chiba, P.; Ecker, G. F. Similarity based SAR (SIBAR) as tool for early ADME profiling J. Comput. Aided Mol. Des. 2002, 16, 785 793
    8. 8
      Krejsa, C. M.; Horvath, D.; Rogalski, S. L.; Penzotti, J. E.; Mao, B.; Barbosa, F.; Migeon, J. C. Predicting ADME properties and side effects: the BioPrint approach Curr. Opin. Drug Discov. Devel. 2003, 6, 470 480
    9. 9
      van de Waterbeemd, H.; Gifford, E. ADMET in silico modelling: towards prediction paradise? Nat. Rev. Drug Discov. 2003, 2, 192 204
    10. 10
      Ekins, S.; Swaan, P. W. Computational models for enzymes, transporters, channels and receptors relevant to ADME/TOX Rev. Comput. Chem. 2004, 20, 333 415
    11. 11
      Smith, P. A.; Sorich, M. J.; Low, L. S.; McKinnon, R. A.; Miners, J. O. Towards integrated ADME prediction: past, present and future directions for modelling metabolism by UDP-glucuronosyltransferases J. Mol. Graph. Model. 2004, 22, 507 517
    12. 12
      Stoner, C. L.; Gifford, E.; Stankovic, C.; Lepsy, C. S.; Brodfuehrer, J.; Prasad, J. V.; Surendran, N. Implementation of an ADME enabling selection and visualization tool for drug discovery J. Pharm. Sci. 2004, 93, 1131 1141
    13. 13
      Yamashita, F.; Hashida, M. In silico approaches for predicting ADME properties of drugs Drug Metab. Pharmacokinet. 2004, 19, 327 338
    14. 14
      Balakin, K. V.; Ivanenkov, Y. A.; Savchuk, N. P.; Ivaschenko, A. A.; Ekins, S. Comprehensive computational assessment of ADME properties using mapping techniques Curr. Drug Discov. Technol. 2005, 2, 99 113
    15. 15
      O’Brien, S. E.; de Groot, M. J. Greater than the sum of its parts: combining models for useful ADMET prediction J. Med. Chem. 2005, 48, 1287 1291
    16. 16
      Chang, C.; Ekins, S. Pharmacophores for human ADME/Tox-related proteins. In Pharmacophores and pharmacophore searches; Langer, T.; Hoffman, R. D., Eds.; Wiley-VCH: Weinheim, 2006; Chapter 14, pp 299 324.
    17. 17
      Ekins, S. Systems-ADME/Tox: Resources and network approaches J. Pharmacol. Toxicol. Methods 2006, 53, 38 66
    18. 18
      Ekins, S.; Bugrim, A.; Brovold, L.; Kirillov, E.; Nikolsky, Y.; Rakhmatulin, E.; Sorokina, S.; Ryabov, A.; Serebryiskaya, T.; Melnikov, A.; Metz, J.; Nikolskaya, T. Algorithms for network analysis in systems-ADME/Tox using the MetaCore and MetaDrug platforms Xenobiotica 2006, 36, 877 901
    19. 19
      Klon, A. E.; Lowrie, J. F.; Diller, D. J. Improved naive Bayesian modeling of numerical data for absorption, distribution, metabolism and excretion (ADME) property prediction J. Chem. Inf. Model. 2006, 46, 1945 1956
    20. 20
      Ekins, S.; Honeycutt, J. D.; Metz, J. T. Evolving molecules using multi-objective optimization: applying to ADME Drug Discov. Today 2010, 15, 451 460
    21. 21
      Ekins, S.; Williams, A. J. Precompetitive Preclinical ADME/Tox Data: Set It Free on The Web to Facilitate Computational Model Building to Assist Drug Development Lab Chip 2010, 10, 13 22
    22. 22
      Gupta, R. R.; Gifford, E. M.; Liston, T.; Waller, C. L.; Bunin, B.; Ekins, S. Using open source computational tools for predicting human metabolic stability and additional ADME/TOX properties Drug Metab. Dispos. 2010, 38, 2083 2090
    23. 23
      Cheng, F.; Li, W.; Zhou, Y.; Shen, J.; Wu, Z.; Liu, G.; Lee, P. W.; Tang, Y. admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties J. Chem. Inf. Model. 2012, 52, 3099 3105
    24. 24
      Ekins, S.; Wrighton, S. A. Application of in silico approaches to predicting drug–drug interactions J. Pharmacol. Toxicol. Methods 2001, 45, 65 69
    25. 25
      Ekins, S. In silico approaches to predicting metabolism, toxicology and beyond Biochem. Soc. Trans. 2003, 31, 611 614
    26. 26
      Kemp, C. A.; Flanagan, J. U.; van Eldik, A. J.; Marechal, J. D.; Wolf, C. R.; Roberts, G. C.; Paine, M. J.; Sutcliffe, M. J. Validation of model of cytochrome P450 2D6: an in silico tool for predicting metabolism and inhibition J. Med. Chem. 2004, 47, 5340 5346
    27. 27
      de Graaf, C.; Vermeulen, N. P.; Feenstra, K. A. Cytochrome P450 in silico: an integrative modeling approach J. Med. Chem. 2005, 48, 2725 2755
    28. 28
      Martins, I. F.; Teixeira, A. L.; Pinheiro, L.; Falcao, A. O. A Bayesian approach to in silico blood-brain barrier penetration modeling J. Chem. Inf. Model. 2012, 52, 1686 1697
    29. 29
      Hu, Y.; Unwalla, R.; Denny, R. A.; Bikker, J.; Di, L.; Humblet, C. Development of QSAR models for microsomal stability: identification of good and bad structural features for rat, human and mouse microsomal stability J. Comput. Aided Mol. Des. 2010, 24, 23 35
    30. 30
      Lombardo, F.; Obach, R. S.; Dicapua, F. M.; Bakken, G. A.; Lu, J.; Potter, D. M.; Gao, F.; Miller, M. D.; Zhang, Y. A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human J. Med. Chem. 2006, 49, 2262 2267
    31. 31
      Lombardo, F.; Obach, R. S.; Shalaeva, M. Y.; Gao, F. Prediction of human volume of distribution values for neutral and basic drugs. 2. Extended data set and leave-class-out statistics J. Med. Chem. 2004, 47, 1242 1250
    32. 32
      Lombardo, F.; Obach, R. S.; Shalaeva, M. Y.; Gao, F. Prediction of volume of distribution values in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding J. Med. Chem. 2002, 45, 2867 2876
    33. 33
      Lombardo, F.; Shalaeva, M. Y.; Tupper, K. A.; Gao, F. ElogDoct: A tool for lipophilicity determination in drug discovery. 2. Basic and neutral compounds J. Med. Chem. 2001, 44, 2490 2497
    34. 34
      Lombardo, F.; Shalaeva, M. Y.; Tupper, K. A.; Gao, F.; Abraham, M. H. ElogPoct A tool for lipophilicity determination in drug discovery J. Med. Chem. 2000, 43, 2922 2928
    35. 35
      Chang, C.; Duignan, D. B.; Johnson, K. D.; Lee, P. H.; Cowan, G. S.; Gifford, E. M.; Stankovic, C. J.; Lepsy, C. S.; Stoner, C. L. The development and validation of a computational model to predict rat liver microsomal clearance J. Pharm. Sci. 2009, 98, 2857 2867
    36. 36
      Zientek, M.; Stoner, C.; Ayscue, R.; Klug-McLeod, J.; Jiang, Y.; West, M.; Collins, C.; Ekins, S. Integrated in silico-in vitro strategy for addressing cytochrome P450 3A4 time-dependent inhibition Chem. Res. Toxicol. 2010, 23, 664 676
    37. 37
      Lagorce, D.; Sperandio, O.; Galons, H.; Miteva, M. A.; Villoutreix, B. O. FAF-Drugs2: free ADME/tox filtering tool to assist drug discovery and chemical biology projects BMC Bioinformatics 2008, 9, 396
    38. 38
      Villoutreix, B. O.; Renault, N.; Lagorce, D.; Sperandio, O.; Montes, M.; Miteva, M. A. Free resources to assist structure-based virtual ligand screening experiments Curr. Protein Pept. Sci. 2007, 8, 381 411
    39. 39
      Ekins, S. Computational Toxicology: risk assessment for pharmaceutical and environmental chemicals; John Wiley and Sons: Hoboken, NJ, 2007.
    40. 40
      Balani, S. K.; Miwa, G. T.; Gan, L. S.; Wu, J. T.; Lee, F. W. Strategy of utilizing in vitro and in vivo ADME tools for lead optimization and drug candidate selection Curr. Top. Med. Chem. 2005, 5, 1033 1038
    41. 41
      van De Waterbeemd, H.; Smith, D. A.; Beaumont, K.; Walker, D. K. Property-based design: optimization of drug absorption and pharmacokinetics J. Med. Chem. 2001, 44, 1313 1333
    42. 42
      Walters, W. P.; Murcko, M. A. Prediction of ‘drug-likeness’ Adv. Drug Deliv. Rev. 2002, 54, 255 271
    43. 43
      Ekins, S.; Ring, B. J.; Grace, J.; McRobie-Belle, D. J.; Wrighton, S. A. Present and future in vitro approaches for drug metabolism J. Pharm. Toxicol. Methods 2000, 44, 313 324
    44. 44
      Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Bryant, S. H. PubChem: a public information system for analyzing bioactivities of small molecules Nucleic Acids Res. 2009, 37, W623 W633
    45. 45
      Wang, Y.; Bolton, E.; Dracheva, S.; Karapetyan, K.; Shoemaker, B. A.; Suzek, T. O.; Wang, J.; Xiao, J.; Zhang, J.; Bryant, S. H. An overview of the PubChem BioAssay resource Nucleic Acids Res. 2010, 38, D255 D266
    46. 46
      Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery Nucleic Acids Res. 2012, 40, D1100 D1107
    47. 47
      Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Kruger, F. A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J. P. The ChEMBL bioactivity database: an update Nucleic Acids Res. 2014, 42, D1083 D1090
    48. 48
      Papadatos, G.; Overington, J. P. The ChEMBL database: a taster for medicinal chemists Future Med. Chem. 2014, 6, 361 364
    49. 49
      Ekins, S.; Bunin, B. A. The Collaborative Drug Discovery (CDD) database Methods Mol. Biol. 2013, 993, 139 154
    50. 50
      Sun, H.; Veith, H.; Xia, M.; Austin, C. P.; Tice, R. R.; Huang, R. Prediction of Cytochrome P450 Profiles of Environmental Chemicals with QSAR Models Built from Drug-like Molecules Mol. Inform. 2012, 31, 783 792
    51. 51
      Sun, H.; Veith, H.; Xia, M.; Austin, C. P.; Huang, R. Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data J. Chem. Inf. Model. 2011, 51, 2474 2481
    52. 52
      Veith, H.; Southall, N.; Huang, R.; James, T.; Fayne, D.; Artemenko, N.; Shen, M.; Inglese, J.; Austin, C. P.; Lloyd, D. G.; Auld, D. S. Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries Nat. Biotechnol. 2009, 27, 1050 1055
    53. 53
      MacArthur, R.; Leister, W.; Veith, H.; Shinn, P.; Southall, N.; Austin, C. P.; Inglese, J.; Auld, D. S. Monitoring compound integrity with cytochrome P450 assays and qHTS J. Biomol. Screen. 2009, 14, 538 546
    54. 54
      Ekins, S.; Diao, L.; Polli, J. E. A Substrate Pharmacophore for the Human Organic Cation/Carnitine Transporter Identifies Compounds Associated with Rhabdomyolysis Mol. Pharmaceutics 2012, 9, 905 913
    55. 55
      Pan, Y.; Li, L.; Kim, G.; Ekins, S.; Wang, H.; Swaan, P. W. Identification and Validation of Novel hPXR Activators Amongst Prescribed Drugs via Ligand-Based Virtual Screening Drug Metab. Dispos. 2011, 39, 337 344
    56. 56
      Ekins, S.; Williams, A. J.; Xu, J. J. A Predictive Ligand-Based Bayesian Model for Human Drug Induced Liver Injury Drug Metab. Dispos. 2010, 38, 2302 2308
    57. 57
      Ivanenkov, Y. A.; Savchuk, N. P.; Ekins, S.; Balakin, K. V. Computational mapping tools for drug discovery Drug Discov. Today 2009, 14, 767 775
    58. 58
      Ekins, S.; Kortagere, S.; Iyer, M.; Reschly, E. J.; Lill, M. A.; Redinbo, M.; Krasowski, M. D. Challenges Predicting Ligand-Receptor Interactions of Promiscuous Proteins: The Nuclear Receptor PXR PLoS Comput. Biol. 2009, 5e1000594
    59. 59
      Kortagere, S.; Chekmarev, D. S.; Welsh, W. J.; Ekins, S. New predictive models for blood brain barrier permeability of drug-like molecules Pharm. Res. 2008, 25, 1836 1845
    60. 60
      Khandelwal, A.; Krasowski, M. D.; Reschly, E. J.; Sinz, M. W.; Swaan, P. W.; Ekins, S. Machine learning methods and docking for predicting human pregnane X receptor activation Chem. Res. Toxicol. 2008, 21, 1457 1467
    61. 61
      Ekins, S.; Kholodovych, V.; Ai, N.; Sinz, M.; Gal, J.; Gera, L.; Welsh, W. J.; Bachmann, K.; Mani, S. Computational discovery of novel low micromolar human pregnane X receptor antagonists Mol. Pharmacol. 2008, 74, 662 672
    62. 62
      Chekmarev, D. S.; Kholodovych, V.; Balakin, K. V.; Ivanenkov, Y.; Ekins, S.; Welsh, W. J. Shape signatures: new descriptors for predicting cardiotoxicity in silico Chem. Res. Toxicol. 2008, 21, 1304 1314
    63. 63
      Khandelwal, A.; Bahadduri, P.; Chang, C.; Polli, J. E.; Swaan, P.; Ekins, S. Computational Models to Assign Biopharmaceutics Drug Disposition Classification from Molecular Structure Pharm. Res. 2007, 24, 2249 2262
    64. 64
      Jones, D. R.; Ekins, S.; Li, L.; Hall, S. D. Computational approaches that predict metabolic intermediate complex formation with CYP3A4 (+b5) Drug Metab. Dispos. 2007, 35, 1466 1475
    65. 65
      Embrechts, M. J.; Ekins, S. Classification of metabolites with kernel-partial least squares (K-PLS) Drug Metab. Dispos. 2007, 35, 325 327
    66. 66
      Ekins, S.; Embrechts, M. J.; Breneman, C. M.; Jim, K.; Wery, J.-P. Novel applications of Kernel-partial least squares to modeling a comprehensive array of properties for drug discovery. In Computational Toxicology: Risk assessment for pharmaceutical and environmental chemicals; Ekins, S., Ed.; Wiley-Interscience: Hoboken, NJ, 2007; pp 403 432.
    67. 67
      Ekins, S.; Chang, C.; Mani, S.; Krasowski, M. D.; Reschly, E. J.; Iyer, M.; Kholodovych, V.; Ai, N.; Welsh, W. J.; Sinz, M.; Swaan, P. W.; Patel, R.; Bachmann, K. Human pregnane X receptor antagonists and agonists define molecular requirements for different binding sites Mol. Pharmacol. 2007, 72, 592 603
    68. 68
      Ekins, S.; Balakin, K. V.; Savchuk, N.; Ivanenkov, Y. Insights for human Ether-a-Go-Go-Related Gene Potassium Channel inhibition using recursive partitioning, Kohonen and Sammon mapping Techniques J. Med. Chem. 2006, 49, 5059 5071
    69. 69
      Ekins, S.; Nikolsky, Y.; Nikolskaya, T. Techniques: Application of Systems Biology to Absorption, Distribution, Metabolism, Excretion, and Toxicity Trends Pharmacol. Sci. 2005, 26, 202 209
    70. 70
      Ekins, S. Predicting undesirable drug interactions with promiscuous proteins in silico Drug Discov. Today 2004, 9, 276 285
    71. 71
      Balakin, K. V.; Ekins, S.; Bugrim, A.; Ivanenkov, Y. A.; Korolev, D.; Nikolsky, Y.; Skorenko, S. A.; Ivashchenko, A. A.; Savchuk, N. P.; Nikolskaya, T. Kohonen maps for prediction of binding to human cytochrome P450 3A4 Drug Metab. Dispos. 2004, 32, 1183 1189
    72. 72
      Balakin, K. V.; Ekins, S.; Bugrim, A.; Ivanenkov, Y. A.; Korolev, D.; Nikolsky, Y.; Ivashchenko, A. A.; Savchuk, N. P.; Nikolskaya, T. Quantitative structure-metabolism relationship modeling of the metabolic N-dealkylation rates Drug Metab. Dispos. 2004, 32, 1111 1120
    73. 73
      Ekins, S.; Berbaum, J.; Harrison, R. K. Generation and validation of rapid computational filters for CYP2D6 and CYP3A4 Drug Metab. Dispos. 2003, 31, 1077 1080
    74. 74
      Ethell, B. T.; Ekins, S.; Wang, J.; Burchell, B. Quantitative structure activity relationships for the glucuronidation of simple phenols by expressed human UGT1A6 and UGT1A9 Drug Metab. Dispos. 2002, 30, 734 738
    75. 75
      Ekins, S.; Mirny, L.; Schuetz, E. G. A ligand-based approach to understanding selectivity of nuclear hormone receptors PXR, CAR, FXR, LXRα and LXRβ Pharm. Res. 2002, 19, 1788 1800
    76. 76
      Ekins, S.; Kim, R. B.; Leake, B. F.; Dantzig, A. H.; Schuetz, E.; Lan, L. B.; Yasuda, K.; Shepard, R. L.; Winter, M. A.; Schuetz, J. D.; Wikel, J. H.; Wrighton, S. A. Three dimensional quantitative structure-activity relationships of inhibitors of P-glycoprotein Mol. Pharmacol. 2002, 61, 964 973
    77. 77
      Ekins, S.; Kim, R. B.; Leake, B. F.; Dantzig, A. H.; Schuetz, E.; Lan, L. B.; Yasuda, K.; Shepard, R. L.; Winter, M. A.; Schuetz, J. D.; Wikel, J. H.; Wrighton, S. A. Application of three dimensional quantitative structure-activity relationships of P-glycoprotein inhibitors and substrates Mol. Pharmacol. 2002, 61, 974 981
    78. 78
      Ekins, S.; Crumb, W. J.; Sarazan, R. D.; Wikel, J. H.; Wrighton, S. A. Three dimensional quantitative structure activity relationship for the inhibition of the hERG (human ether-a-gogo related gene) potassium channel J. Pharmacol. Exp. Ther. 2002, 301, 427 434
    79. 79
      Ekins, S.; Boulanger, B.; Swaan, P. W.; Hupcey, M. A. Z. Towards a new age of virtual ADME/TOX and multidimensional drug discovery J. Comput. Aided Mol. Des 2002, 16, 381 401
    80. 80
      Ekins, S.; de Groot, M.; Jones, J. P. Pharmacophore and three dimensional quantitative structure activity relationship methods for modeling cytochrome P450 active sites Drug Metab. Dispos. 2001, 29, 936 944
    81. 81
      Ekins, S.; Ring, B. J.; Bravi, G.; Wikel, J. H.; Wrighton, S. A. Predicting drug-drug interactions in silico using pharmacophores: a paradigm for the next millennium. In Pharmacophore perception, development, and use in drug design; Guner, O. F., Ed.; IUL: San Diego, 2000; pp 269 299.
    82. 82
      Paranjpe, P. V.; Grass, G. M.; Sinko, P. J. In Silico Tools for Drug Absorption Prediction: Experience to Date Am. J. Drug Deliv. 2003, 1, 133 148
    83. 83
      Obrezanova, O.; Csanyi, G.; Gola, J. M.; Segall, M. D. Gaussian processes: a method for automatic QSAR modeling of ADME properties J. Chem. Inf. Model. 2007, 47, 1847 1857