Cheminformatics Analysis of Organic Substituents:  Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups

Peter Ertl*
Novartis Pharma AG, Molecular Simulation Group, WKL-125.14.20, CH-4002 Basel, Switzerland
J. Chem. Inf. Comput. Sci., 2003, 43 (2), pp 374–380
DOI: 10.1021/ci0255782
Publication Date (Web): December 6, 2002
Copyright © 2003 American Chemical Society

Abstract

A large set of more than 3 million molecules was processed to find all the organic substituents contained in the set and to identify the most common ones. During the analysis, 849 574 unique substituents were found. Extrapolated to the number of known organic molecules, this result suggests that about 3.1 million substituents are known. Based on these findings the size of virtual organic chemistry space accessible using currently known synthetic methods is estimated to be between 1020 and 1024 molecules. The extracted substituents were characterized by calculated electronic, hydrophobic, steric, and hydrogen bonding properties as well as by the drug-likeness index. Various possible applications of such a large database of drug-like substituents characterized by calculated properties are discussed and illustrated by reference to a Web-based tool for automatic identification of bioisosteric groups.

Citing Articles

View all 36 citing articles

Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.

This article has been cited by 24 ACS Journal articles (5 most recent appear below).

  • Cover Image

    Virtual Libraries of Tetrapyrrole Macrocycles. Combinatorics, Isomers, Product Distributions, and Data Mining

    Masahiko Taniguchi, Hai Du, and Jonathan S. Lindsey
    Journal of Chemical Information and Modeling2011 51 (9), 2233-2247
    • Virtual Libraries of Tetrapyrrole Macrocycles. Combinatorics, Isomers, Product Distributions, and Data Mining

      Masahiko Taniguchi, Hai Du, and Jonathan S. Lindsey
      Journal of Chemical Information and Modeling2011 51 (9), 2233-2247

      A software program (PorphyrinViLiGe) has been developed to enumerate the type and relative amounts of substituted tetrapyrrole macrocycles in a virtual library formed by one of four different classes of reactions. The classes include (1) 4-fold reaction ...

  • Cover Image

    Natural Product-Like Virtual Libraries: Recursive Atom-Based Enumeration

    Melvin J. Yu
    Journal of Chemical Information and Modeling2011 51 (3), 541-557
    • Natural Product-Like Virtual Libraries: Recursive Atom-Based Enumeration

      Melvin J. Yu
      Journal of Chemical Information and Modeling2011 51 (3), 541-557

      A new molecular enumerator is described that allows chemically and architecturally diverse sets of natural product-like and drug-like structures to be generated from a core structure as simple as a single carbon atom or as complex as a polycyclic ring ...

  • Cover Image

    Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set

    Iurii Sushko, Sergii Novotarskyi, Robert Körner, Anil Kumar Pandey, Artem Cherkasov, Jiazhong Li, Paola Gramatica, Katja Hansen, Timon Schroeter, Klaus-Robert Müller, Lili Xi, Huanxiang Liu, Xiaojun Yao, Tomas Öberg, Farhad Hormozdiari, Phuong Dao, Cenk Sahinalp, Roberto Todeschini, Pavel Polishchuk, Anatoliy Artemenko, Victor Kuz’min, Todd M. Martin, Douglas M. Young, Denis Fourches, Eugene Muratov, Alexander Tropsha, Igor Baskin, Dragos Horvath, Gilles Marcou, Christophe Muller, Alexander Varnek, Volodymyr V. Prokopenko, and Igor V. Tetko
    Journal of Chemical Information and Modeling2010 50 (12), 2094-2111
    • Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set

      Iurii Sushko, Sergii Novotarskyi, Robert Körner, Anil Kumar Pandey, Artem Cherkasov, Jiazhong Li, Paola Gramatica, Katja Hansen, Timon Schroeter, Klaus-Robert Müller, Lili Xi, Huanxiang Liu, Xiaojun Yao, Tomas Öberg, Farhad Hormozdiari, Phuong Dao, Cenk Sahinalp, Roberto Todeschini, Pavel Polishchuk, Anatoliy Artemenko, Victor Kuz’min, Todd M. Martin, Douglas M. Young, Denis Fourches, Eugene Muratov, Alexander Tropsha, Igor Baskin, Dragos Horvath, Gilles Marcou, Christophe Muller, Alexander Varnek, Volodymyr V. Prokopenko, and Igor V. Tetko
      Journal of Chemical Information and Modeling2010 50 (12), 2094-2111

      The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of “distance to model” (DM) is defined as a metric of similarity between the training ...

  • Cover Image

    WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry

    Daniel J. Warner, Edward J. Griffen and Stephen A. St-Gallay
    Journal of Chemical Information and Modeling2010 50 (8), 1350-1357
    • WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry

      Daniel J. Warner, Edward J. Griffen and Stephen A. St-Gallay
      Journal of Chemical Information and Modeling2010 50 (8), 1350-1357

      An algorithm to automatically identify and extract matched molecular pairs from a collection of compounds has been developed, allowing the learning associated with each molecular transformation to be readily exploited in drug discovery projects. Here, we ...

  • Cover Image

    Bioisosteric Similarity of Molecules Based on Structural Alignment and Observed Chemical Replacements in Drugs

    Markus Krier and Michael C. Hutter
    Journal of Chemical Information and Modeling2009 49 (5), 1280-1297
    • Bioisosteric Similarity of Molecules Based on Structural Alignment and Observed Chemical Replacements in Drugs

      Markus Krier and Michael C. Hutter
      Journal of Chemical Information and Modeling2009 49 (5), 1280-1297

      The algorithmic concept used to assess the evolutionary relationship between protein sequences was adopted to the comparison of drug-like compounds. For this purpose, we have developed a method that uses the SMILES representation of the molecules to ...

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

Explore by:


History

  • Published In Issue March 24, 2003
  • Received August 14, 2002

Recommend & Share

  • Share on ACS NetworkACS Network
  • Add to FacebookFacebook
  • Tweet ThisTweet This
  • Add to CiteULikeCiteULike
  • Add to NewsvineNewsvine
  • Digg ThisDigg This
  • Add to DeliciousDelicious

Related Content

Other ACS content by these authors: