Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers

Meir Glick,* Jeremy L. Jenkins, James H. Nettles, Hamilton Hitchings, and John W. Davies
Lead Discovery Center, Novartis Institutes for Biomedical Research Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, and Equbits LLC, 2625 Middlefield Road, #102, Palo Alto, California 94306
J. Chem. Inf. Model., 2006, 46 (1), pp 193–200
DOI: 10.1021/ci050374h
Publication Date (Web): December 3, 2005
Copyright © 2006 American Chemical Society
*

 Corresponding author e-mail:  meir.glick@novartis.com; phone:  617-871-7130.

,

 Novartis Institutes for Biomedical Research Inc.

,

 Equbits LLC.

Abstract

High-throughput screening (HTS) plays a pivotal role in lead discovery for the pharmaceutical industry. In tandem, cheminformatics approaches are employed to increase the probability of the identification of novel biologically active compounds by mining the HTS data. HTS data is notoriously noisy, and therefore, the selection of the optimal data mining method is important for the success of such an analysis. Here, we describe a retrospective analysis of four HTS data sets using three mining approaches:  Laplacian-modified naive Bayes, recursive partitioning, and support vector machine (SVM) classifiers with increasing stochastic noise in the form of false positives and false negatives. All three of the data mining methods at hand tolerated increasing levels of false positives even when the ratio of misclassified compounds to true active compounds was 5:1 in the training set. False negatives in the ratio of 1:1 were tolerated as well. SVM outperformed the other two methods in capturing active compounds and scaffolds in the top 1%. A Murcko scaffold analysis could explain the differences in enrichments among the four data sets. This study demonstrates that data mining methods can add a true value to the screen even when the data is contaminated with a high level of stochastic noise.

Tools

History

  • Published In Issue January 23, 2006
  • Received September 4, 2005

Recommend & Share

Related Content

Other ACS content by these authors: