Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics

Sebastian G. Rohrer and Knut Baumann*
Institute of Pharmaceutical Chemistry, Beethovenstrasse 55, Braunschweig University of Technology, 38106 Braunschweig, Germany
J. Chem. Inf. Model., 2008, 48 (4), pp 704–718
DOI: 10.1021/ci700099u
Publication Date (Web): April 2, 2008
Copyright © 2008 American Chemical Society
* Phone: +49-531-3912751 . Fax: +49-531-3912799. E-mail: k.baumann@tu-braunschweig.de.
ACS AuthorChoice

Abstract

Abstract Image

A common finding of many reports evaluating ligand-based virtual screening methods is that validation results vary considerably with changing benchmark data sets. It is widely assumed that these data set specific effects are caused by the redundancy, self-similarity, and cluster structure inherent to those data sets. These phenomena manifest themselves in the data sets’ representation in descriptor space, which is termed the data set topology. A methodology for the characterization of data set topology based on spatial statistics is introduced. The method is nonparametric and can deal with arbitrary distributions of descriptor values. With this methodology it is possible to associate differences in virtual screening performance on different data sets with differences in data set topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark data sets by a more favorable topology. Finally it is shown, that the composition of some benchmark data sets causes topologies that lead to overoptimistic validation results even in very “simple” descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased data sets and may provide a tool for the future design of unbiased benchmark data sets.

Citing Articles

View all 8 citing articles

Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.

This article has been cited by 7 ACS Journal articles (5 most recent appear below).

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

Explore by:


History

  • Published In Issue April 28, 2008
  • Article ASAPApril 02, 2008
  • Received: March 20, 2007

Recommend & Share

  • Share on ACS NetworkACS Network
  • Add to FacebookFacebook
  • Tweet ThisTweet This
  • Add to CiteULikeCiteULike
  • Add to NewsvineNewsvine
  • Digg ThisDigg This
  • Add to DeliciousDelicious

Related Content

Other ACS content by these authors: