Diversity and Coverage of Structural Sublibraries Selected Using the SAGE and SCA Algorithms

Charles H. Reynolds,* Alexander Tropsha, Lori B. Pfahler,§ Ross Druker,§ Subhas Chakravorty,§ G. Ethiraj, and Weifan Zheng§#
The R. W. Johnson Pharmaceutical Research Institute, Welsh and McKean Roads, Spring House, Pennsylvania 19477, The Laboratory for Molecular Modeling, School of Pharmacy, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, and Rohm and Haas Company, 727 Norristown Road, Spring House, Pennsylvania 19477
J. Chem. Inf. Comput. Sci., 2001, 41 (6), pp 1470–1477
DOI: 10.1021/ci010041u
Publication Date (Web): October 12, 2001
Copyright © 2001 American Chemical Society
*

 Corresponding author phone:  (215)628-5675; e-mail:  Creynol1@prius.jnj.com.

,

 The R. W. Johnson Pharmaceutical Research Institute.

,

 The University of North Carolina at Chapel Hill.

,
§

 Rohm and Haas Company.

,

 Present address:  Merck, West Point, PA 19486.

,
#

 Present address:  Glaxo-Smithkline, King of Prussia, PA.

Abstract

It is often impractical to synthesize and test all compounds in a large exhaustive chemical library. Herein, we discuss rational approaches to selecting representative subsets of virtual libraries that help direct experimental synthetic efforts for diverse library design. We compare the performance of two stochastic sampling algorithms, Simulating Annealing Guided Evaluation (SAGE; Zheng, W.; Cho, S. J.; Waller, C. L.; Tropsha, A. J. Chem. Inf. Comput. Sci. 1999, 39, 738−746.) and Stochastic Cluster Analysis (SCA; Reynolds, C. H.; Druker, R.; Pfahler, L. B. Lead Discovery Using Stochastic Cluster Analysis (SCA):  A New Method for Clustering Structurally Similar Compounds J. Chem. Inf. Comput. Sci. 1998, 38, 305−312.) for their ability to select both diverse and representative subsets of the entire chemical library space. The SAGE and SCA algorithms were compared using u- and s-optimal metrics as an independent assessment of diversity and coverage. This comparison showed that both algorithms were capable of generating sublibraries in descriptor space that are diverse and give reasonable coverage (i.e. are representative) of the original full library. Tests were carried out using simulated two-dimensional data sets and a 27 000 compound proprietary structural library as represented by computed Molconn-Z descriptors. One of the key observations from this work is that the algorithmically simple SCA method is capable of selecting subsets that are comparable to the more computationally intensive SAGE method.

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

History

  • Published In Issue November 26, 2001
  • Received April 23, 2001

Recommend & Share