A Hierarchical Clustering Approach for Large Compound Libraries

Alexander Böcker, Swetlana Derksen, Elena Schmidt, Andreas Teckentrup, and Gisbert Schneider*
Johann Wolfgang Goethe-Universität, Institut für Organische Chemie und Chemische Biologie, Marie-Curie-Str. 11, D-60439 Frankfurt, Germany, and Boehringer Ingelheim Pharma GmbH & Co. KG, Department of Lead Discovery, D-88397 Biberach a.d. Riss, Germany
J. Chem. Inf. Model., 2005, 45 (4), pp 807–815
DOI: 10.1021/ci0500029
Publication Date (Web): May 3, 2005
Copyright © 2005 American Chemical Society

Abstract

A modified version of the k-means clustering algorithm was developed that is able to analyze large compound libraries. A distance threshold determined by plotting the sum of radii of leaf clusters was used as a termination criterion for the clustering process. Hierarchical trees were constructed that can be used to obtain an overview of the data distribution and inherent cluster structure. The approach is also applicable to ligand-based virtual screening with the aim to generate preferred screening collections or focused compound libraries. Retrospective analysis of two activity classes was performed:  inhibitors of caspase 1 [interleukin 1 (IL1) cleaving enzyme, ICE] and glucocorticoid receptor ligands. The MDL Drug Data Report (MDDR) and Collection of Bioactive Reference Analogues (COBRA) databases served as the compound pool, for which binary trees were produced. Molecules were encoded by all Molecular Operating Environment 2D descriptors and topological pharmacophore atom types. Individual clusters were assessed for their purity and enrichment of actives belonging to the two ligand classes. Significant enrichment was observed in individual branches of the cluster tree. After clustering a combined database of MDDR, COBRA, and the SPECS catalog, it was possible to retrieve MDDR ICE inhibitors with new scaffolds using COBRA ICE inhibitors as seeds. A Java implementation of the clustering method is available via the Internet (http://www.modlab.de).

Citing Articles

View all 19 citing articles

Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.

This article has been cited by 10 ACS Journal articles (5 most recent appear below).

  • Cover Image

    Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors

    Oscar Miguel Rivera-Borroto, Yovani Marrero-Ponce, José Manuel García-de la Vega, and Ricardo del Corazón Grau-Ábalo
    Journal of Chemical Information and Modeling2011 51 (12), 3036-3049
    • Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors

      Oscar Miguel Rivera-Borroto, Yovani Marrero-Ponce, José Manuel García-de la Vega, and Ricardo del Corazón Grau-Ábalo
      Journal of Chemical Information and Modeling2011 51 (12), 3036-3049

      Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors ...

  • Cover Image

    Toward an Improved Clustering of Large Data Sets Using Maximum Common Substructures and Topological Fingerprints

    Alexander Böcker
    Journal of Chemical Information and Modeling2008 48 (11), 2097-2107
    • Toward an Improved Clustering of Large Data Sets Using Maximum Common Substructures and Topological Fingerprints

      Alexander Böcker
      Journal of Chemical Information and Modeling2008 48 (11), 2097-2107

      A new clustering algorithm was developed that is able to group large data sets with more than 100,000 molecules according to their chemotypes. The algorithm preclusters a data set using a fingerprint version of the hierarchical k-means algorithm. ...

  • Cover Image

    Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets

    Mark L. Brewer
    Journal of Chemical Information and Modeling2007 47 (5), 1727-1733
    • Development of a Spectral Clustering Method for the Analysis of Molecular Data Sets

      Mark L. Brewer
      Journal of Chemical Information and Modeling2007 47 (5), 1727-1733

      A spectral clustering method is presented and applied to two-dimensional molecular structures, where it has been found particularly useful in the analysis of screening data. The method provides a means to quantify (1) the degree of intermolecular ...

  • Cover Image

    Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space

    Ansgar Schuffenhauer, Nathan Brown, Peter Ertl, Jeremy L. Jenkins, Paul Selzer, and Jacques Hamon
    Journal of Chemical Information and Modeling2007 47 (2), 325-336
    • Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space

      Ansgar Schuffenhauer, Nathan Brown, Peter Ertl, Jeremy L. Jenkins, Paul Selzer, and Jacques Hamon
      Journal of Chemical Information and Modeling2007 47 (2), 325-336

      Classification methods for data sets of molecules according to their chemical structure were evaluated for their biological relevance, including rule-based, scaffold-oriented classification methods and clustering based on molecular descriptors. Three data ...

  • Cover Image

    Radial Clustergrams:  Visualizing the Aggregate Properties of Hierarchical Clusters

    Dimitris K. Agrafiotis, Deepak Bandyopadhyay, and Michael Farnum
    Journal of Chemical Information and Modeling2007 47 (1), 69-75
    • Radial Clustergrams:  Visualizing the Aggregate Properties of Hierarchical Clusters

      Dimitris K. Agrafiotis, Deepak Bandyopadhyay, and Michael Farnum
      Journal of Chemical Information and Modeling2007 47 (1), 69-75

      A new radial space-filling method for visualizing cluster hierarchies is presented. The method, referred to as a radial clustergram, arranges the clusters into a series of layers, each representing a different level of the tree. It uses adjacency of nodes ...

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

Explore by:


History

  • Published In Issue July 25, 2005
  • Received January 3, 2005

Recommend & Share

  • Share on ACS NetworkACS Network
  • Add to FacebookFacebook
  • Tweet ThisTweet This
  • Add to CiteULikeCiteULike
  • Add to NewsvineNewsvine
  • Digg ThisDigg This
  • Add to DeliciousDelicious

Related Content

Other ACS content by these authors: