Article
A Hierarchical Clustering Approach for Large Compound Libraries
Purchase the full-text
- PDF/HTML,
figures/images,
references and tables,
(where available)
Abstract
A modified version of the k-means clustering algorithm was developed that is able to analyze large compound libraries. A distance threshold determined by plotting the sum of radii of leaf clusters was used as a termination criterion for the clustering process. Hierarchical trees were constructed that can be used to obtain an overview of the data distribution and inherent cluster structure. The approach is also applicable to ligand-based virtual screening with the aim to generate preferred screening collections or focused compound libraries. Retrospective analysis of two activity classes was performed: inhibitors of caspase 1 [interleukin 1 (IL1) cleaving enzyme, ICE] and glucocorticoid receptor ligands. The MDL Drug Data Report (MDDR) and Collection of Bioactive Reference Analogues (COBRA) databases served as the compound pool, for which binary trees were produced. Molecules were encoded by all Molecular Operating Environment 2D descriptors and topological pharmacophore atom types. Individual clusters were assessed for their purity and enrichment of actives belonging to the two ligand classes. Significant enrichment was observed in individual branches of the cluster tree. After clustering a combined database of MDDR, COBRA, and the SPECS catalog, it was possible to retrieve MDDR ICE inhibitors with new scaffolds using COBRA ICE inhibitors as seeds. A Java implementation of the clustering method is available via the Internet (http://www.modlab.de).
Citing Articles
Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.
This article has been cited by 10 ACS Journal articles (5 most recent appear below).

Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors
Oscar Miguel Rivera-Borroto, Yovani Marrero-Ponce, José Manuel García-de la Vega, and Ricardo del Corazón Grau-ÁbaloJournal of Chemical Information and Modeling2011 51 (12), 3036-3049Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors
Oscar Miguel Rivera-Borroto, Yovani Marrero-Ponce, José Manuel García-de la Vega, and Ricardo del Corazón Grau-ÁbaloJournal of Chemical Information and Modeling2011 51 (12), 3036-3049Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors ...

Toward an Improved Clustering of Large Data Sets Using Maximum Common Substructures and Topological Fingerprints
Alexander BöckerJournal of Chemical Information and Modeling2008 48 (11), 2097-2107Toward an Improved Clustering of Large Data Sets Using Maximum Common Substructures and Topological Fingerprints
Alexander BöckerJournal of Chemical Information and Modeling2008 48 (11), 2097-2107A new clustering algorithm was developed that is able to group large data sets with more than 100,000 molecules according to their chemotypes. The algorithm preclusters a data set using a fingerprint version of the hierarchical k-means algorithm. ...

Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets
Mark L. BrewerJournal of Chemical Information and Modeling2007 47 (5), 1727-1733Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets
Mark L. BrewerJournal of Chemical Information and Modeling2007 47 (5), 1727-1733A spectral clustering method is presented and applied to two-dimensional molecular structures, where it has been found particularly useful in the analysis of screening data. The method provides a means to quantify (1) the degree of intermolecular ...

Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space
Ansgar Schuffenhauer, Nathan Brown, Peter Ertl, Jeremy L. Jenkins, Paul Selzer, and Jacques HamonJournal of Chemical Information and Modeling2007 47 (2), 325-336Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space
Ansgar Schuffenhauer, Nathan Brown, Peter Ertl, Jeremy L. Jenkins, Paul Selzer, and Jacques HamonJournal of Chemical Information and Modeling2007 47 (2), 325-336Classification methods for data sets of molecules according to their chemical structure were evaluated for their biological relevance, including rule-based, scaffold-oriented classification methods and clustering based on molecular descriptors. Three data ...

Radial Clustergrams: Visualizing the Aggregate Properties of Hierarchical Clusters
Dimitris K. Agrafiotis, Deepak Bandyopadhyay, and Michael FarnumJournal of Chemical Information and Modeling2007 47 (1), 69-75Radial Clustergrams: Visualizing the Aggregate Properties of Hierarchical Clusters
Dimitris K. Agrafiotis, Deepak Bandyopadhyay, and Michael FarnumJournal of Chemical Information and Modeling2007 47 (1), 69-75A new radial space-filling method for visualizing cluster hierarchies is presented. The method, referred to as a radial clustergram, arranges the clusters into a series of layers, each representing a different level of the tree. It uses adjacency of nodes ...
Tools
-
Add to Favorites
-
Download Citation
-
Email a Colleague -
Permalink
Order Reprints
Rights & Permissions
Citation Alerts
History
- Published In Issue July 25, 2005
- Received January 3, 2005
Cart


ACS
Network
“Hidden” Impact on Virtual Screening Results






