Supervised Self-Organizing Maps in Drug Discovery. 1. Robust Behavior with Overdetermined Data Sets

Yun-De Xiao, Aaron Clauset, Rebecca Harris, Ersin Bayram,§ Peter Santago, II,§ and Jeffrey D. Schmitt*
Molecular Design Group, Targacept Inc., 200 East First Street, Suite 300, Winston-Salem, North Carolina 27101-4165, Department of Computer Science, University of New Mexico, Albuquerque, New Mexico 87131, and Virginia Tech - Wake Forest University School of Biomedical Engineering and Sciences, Medical Center Boulevard, Winston-Salem, North Carolina 27157-1022
J. Chem. Inf. Model., 2005, 45 (6), pp 1749–1758
DOI: 10.1021/ci0500839
Publication Date (Web): October 4, 2005
Copyright © 2005 American Chemical Society

 Targacept Inc.

,

 University of New Mexico.

,
§

 Virginia Tech - Wake Forest University School of Biomedical Engineering and Sciences.

,
*

 Corresponding author e-mail:  jeff.schmitt@targacept.com.

Abstract

The utility of the supervised Kohonen self-organizing map was assessed and compared to several statistical methods used in QSAR analysis. The self-organizing map (SOM) describes a family of nonlinear, topology preserving mapping methods with attributes of both vector quantization and clustering that provides visualization options unavailable with other nonlinear methods. In contrast to most chemometric methods, the supervised SOM (sSOM) is shown to be relatively insensitive to noise and feature redundancy. Additionally, sSOMs can make use of descriptors having only nominal linear correlation with the target property. Results herein are contrasted to partial least squares, stepwise multiple linear regression, the genetic functional algorithm, and genetic partial least squares, collectively referred to throughout as the “standard methods”. The k-nearest neighbor (kNN) classification method was also performed to provide a direct comparison with a different classification method. The widely studied dihydrofolate reductase (DHFR) inhibition data set of Hansch and Silipo is used to evaluate the ability of sSOMs to classify unknowns as a function of increasing class resolution. The contribution of the sSOM neighborhood kernel to its predictive ability is assessed in two experiments:  (1) training with the k-means clustering limit, where the neighborhood radius is zero throughout the training regimen, and (2) training the sSOM until the neighborhood radius is reduced to zero. Results demonstrate that sSOMs provide more accurate predictions than standard linear QSAR methods.

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

History

  • Published In Issue November 28, 2005
  • Received March 11, 2005

Recommend & Share