A Fractal Approach for Selecting an Appropriate Bin Size for Cell-Based Diversity Estimation

Dimitris K. Agrafiotis* and Dmitrii N. Rassokhin
3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341
J. Chem. Inf. Comput. Sci., 2002, 42 (1), pp 117–122
DOI: 10.1021/ci010314l
Publication Date (Web): January 5, 2002
Copyright © 2002 American Chemical Society
*

 Corresponding author phone:  (610)458-6045; fax:  (610)458-8249; e-mail:  dimitris@3dp.com.

Abstract

A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number of points considered, and the dimensionality of the feature space. The peak of this distribution represents the optimal bin size for a given data set and sample size. Although box counting can be performed in an algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

History

  • Published In Issue January 28, 2002
  • Received September 3, 2001

Recommend & Share

Related Content

Other ACS content by these authors: