Article-chem information
Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics

Abstract

A common finding of many reports evaluating ligand-based virtual screening methods is that validation results vary considerably with changing benchmark data sets. It is widely assumed that these data set specific effects are caused by the redundancy, self-similarity, and cluster structure inherent to those data sets. These phenomena manifest themselves in the data sets’ representation in descriptor space, which is termed the data set topology. A methodology for the characterization of data set topology based on spatial statistics is introduced. The method is nonparametric and can deal with arbitrary distributions of descriptor values. With this methodology it is possible to associate differences in virtual screening performance on different data sets with differences in data set topology. Moreover, the better virtual screening performance of certain descriptors can be explained by their ability of representing the benchmark data sets by a more favorable topology. Finally it is shown, that the composition of some benchmark data sets causes topologies that lead to overoptimistic validation results even in very “simple” descriptor spaces. Spatial statistics analysis as proposed here facilitates the detection of such biased data sets and may provide a tool for the future design of unbiased benchmark data sets.
Citing Articles
Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.
This article has been cited by 7 ACS Journal articles (5 most recent appear below).

REPROVIS-DB: A Benchmark System for Ligand-Based Virtual Screening Derived from Reproducible Prospective Applications
Peter Ripphausen, Anne Mai Wassermann, and Jürgen BajorathJournal of Chemical Information and Modeling2011 51 (10), 2467-2473REPROVIS-DB: A Benchmark System for Ligand-Based Virtual Screening Derived from Reproducible Prospective Applications
Peter Ripphausen, Anne Mai Wassermann, and Jürgen BajorathJournal of Chemical Information and Modeling2011 51 (10), 2467-2473Benchmark calculations are essential for the evaluation of virtual screening (VS) methods. Typically, classes of known active compounds taken from the medicinal chemistry literature are divided into reference molecules (search templates) and potential ...

Development of a Method To Consistently Quantify the Structural Distance between Scaffolds and To Assess Scaffold Hopping Potential
Ruifang Li, Dagmar Stumpfe, Martin Vogt, Hanna Geppert, and Jürgen BajorathJournal of Chemical Information and Modeling2011 51 (10), 2507-2514Development of a Method To Consistently Quantify the Structural Distance between Scaffolds and To Assess Scaffold Hopping Potential
Ruifang Li, Dagmar Stumpfe, Martin Vogt, Hanna Geppert, and Jürgen BajorathJournal of Chemical Information and Modeling2011 51 (10), 2507-2514We introduce a method to determine a structural distance between any pair of molecular scaffolds. The development of this approach was motivated by the need to accurately evaluate scaffold hopping studies in virtual screening and medicinal chemistry and ...

DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening — A Versatile Tool for Benchmarking Docking Programs and Scoring Functions
Simon M. Vogel, Matthias R. Bauer, and Frank M. BoecklerJournal of Chemical Information and Modeling2011 51 (10), 2650-2665DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening — A Versatile Tool for Benchmarking Docking Programs and Scoring Functions
Simon M. Vogel, Matthias R. Bauer, and Frank M. BoecklerJournal of Chemical Information and Modeling2011 51 (10), 2650-2665For widely applied in silico screening techniques success depends on the rational selection of an appropriate method. We herein present a fast, versatile, and robust method to construct demanding evaluation kits for objective in silico screening (DEKOIS). ...

Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation
Hanna Geppert, Martin Vogt and Jürgen BajorathJournal of Chemical Information and Modeling2010 50 (2), 205-216Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation
Hanna Geppert, Martin Vogt and Jürgen BajorathJournal of Chemical Information and Modeling2010 50 (2), 205-216

Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data
Sebastian G. Rohrer and Knut BaumannJournal of Chemical Information and Modeling2009 49 (2), 169-184Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data
Sebastian G. Rohrer and Knut BaumannJournal of Chemical Information and Modeling2009 49 (2), 169-184Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of ...
Tools
-
Add to Favorites
-
Download Citation
-
Email a Colleague -
Permalink
Order Reprints
Rights & Permissions
Citation Alerts
History
- Published In Issue April 28, 2008
- Article ASAPApril 02, 2008
- Received: March 20, 2007
Cart
ACS
Network






