Recursive Median Partitioning for Virtual Screening of Large Databases

Jeffrey W. Godden, John R. Furr, and Jürgen Bajorath*§
Department of Computer-Aided Drug DiscoveryAlbany Molecular Research, Inc. (AMRI), 21 Corporate Circle, Albany, New York 12212-5098, AMRI Bothell Research Center (AMRI-BRC), 18804 North Creek Parkway, Bothell, Washington 98011, and Department of Biological Structure, University of Washington, Seattle, Washington 98195
J. Chem. Inf. Comput. Sci., 2003, 43 (1), pp 182–188
DOI: 10.1021/ci0203848
Publication Date (Web): January 8, 2003
Copyright © 2003 American Chemical Society

 Albany Molecular Research, Inc. (AMRI).

,

 AMRI Bothell Research Center (AMRI-BRC).

,
*

 Corresponding author phone:  (425)424-7297; fax:  (425)424-7299; e-mail:  jurgen.bajorath@albmolecular.com. Address correspondence at AMRI-BRC.

,
§

 University of Washington.

Abstract

Recently, we have introduced the median partitioning (MP) method for diversity selection and compound classification. The MP approach utilizes property descriptors with continuous value ranges, transforms these descriptors into a binary classification scheme by determining their medians in source databases, and divides database molecules in subsequent steps into populations above or below these medians. Having previously demonstrated the usefulness of MP for the classification of molecules according to biological activity, we have now gone a step further and extended the methodology for application in virtual screening. In these calculations, a series of bait molecules having desired activity is added to large compound databases, and subsequent iterations or recursions are carried out to reduce the number of candidate molecules until a small number of compounds are found in partitions enriched with bait molecules. For each recursion step, descriptor combinations are identified that copartition as many active molecules as possible. Descriptor selection is facilitated by application of a genetic algorithm (GA). The recursive MP approach (RMP) has been applied to five diverse biological activity classes in virtual screening of a database consisting of approximately 1.34 million molecules to which different types of active compounds were added. RMP analysis produced hit rates of up to 21%, dependent on the biological activity class, and led to an average 3600-fold improvement over random selection for the activity classes that were used as test cases.

Tools

History

  • Published In Issue January 27, 2003
  • Received September 27, 2002

Recommend & Share

Related Content

Other ACS content by these authors: