Research Article
Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups
Purchase the full-text
- PDF/HTML,
figures/images,
references and tables,
(where available)
Abstract
A large set of more than 3 million molecules was processed to find all the organic substituents contained in the set and to identify the most common ones. During the analysis, 849 574 unique substituents were found. Extrapolated to the number of known organic molecules, this result suggests that about 3.1 million substituents are known. Based on these findings the size of virtual organic chemistry space accessible using currently known synthetic methods is estimated to be between 1020 and 1024 molecules. The extracted substituents were characterized by calculated electronic, hydrophobic, steric, and hydrogen bonding properties as well as by the drug-likeness index. Various possible applications of such a large database of drug-like substituents characterized by calculated properties are discussed and illustrated by reference to a Web-based tool for automatic identification of bioisosteric groups.
Citing Articles
Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.
This article has been cited by 24 ACS Journal articles (5 most recent appear below).

Virtual Libraries of Tetrapyrrole Macrocycles. Combinatorics, Isomers, Product Distributions, and Data Mining
Masahiko Taniguchi, Hai Du, and Jonathan S. LindseyJournal of Chemical Information and Modeling2011 51 (9), 2233-2247Virtual Libraries of Tetrapyrrole Macrocycles. Combinatorics, Isomers, Product Distributions, and Data Mining
Masahiko Taniguchi, Hai Du, and Jonathan S. LindseyJournal of Chemical Information and Modeling2011 51 (9), 2233-2247A software program (PorphyrinViLiGe) has been developed to enumerate the type and relative amounts of substituted tetrapyrrole macrocycles in a virtual library formed by one of four different classes of reactions. The classes include (1) 4-fold reaction ...

Natural Product-Like Virtual Libraries: Recursive Atom-Based Enumeration
Melvin J. YuJournal of Chemical Information and Modeling2011 51 (3), 541-557Natural Product-Like Virtual Libraries: Recursive Atom-Based Enumeration
Melvin J. YuJournal of Chemical Information and Modeling2011 51 (3), 541-557A new molecular enumerator is described that allows chemically and architecturally diverse sets of natural product-like and drug-like structures to be generated from a core structure as simple as a single carbon atom or as complex as a polycyclic ring ...

Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set
Iurii Sushko, Sergii Novotarskyi, Robert Körner, Anil Kumar Pandey, Artem Cherkasov, Jiazhong Li, Paola Gramatica, Katja Hansen, Timon Schroeter, Klaus-Robert Müller, Lili Xi, Huanxiang Liu, Xiaojun Yao, Tomas Öberg, Farhad Hormozdiari, Phuong Dao, Cenk Sahinalp, Roberto Todeschini, Pavel Polishchuk, Anatoliy Artemenko, Victor Kuz’min, Todd M. Martin, Douglas M. Young, Denis Fourches, Eugene Muratov, Alexander Tropsha, Igor Baskin, Dragos Horvath, Gilles Marcou, Christophe Muller, Alexander Varnek, Volodymyr V. Prokopenko, and Igor V. TetkoJournal of Chemical Information and Modeling2010 50 (12), 2094-2111Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set
Iurii Sushko, Sergii Novotarskyi, Robert Körner, Anil Kumar Pandey, Artem Cherkasov, Jiazhong Li, Paola Gramatica, Katja Hansen, Timon Schroeter, Klaus-Robert Müller, Lili Xi, Huanxiang Liu, Xiaojun Yao, Tomas Öberg, Farhad Hormozdiari, Phuong Dao, Cenk Sahinalp, Roberto Todeschini, Pavel Polishchuk, Anatoliy Artemenko, Victor Kuz’min, Todd M. Martin, Douglas M. Young, Denis Fourches, Eugene Muratov, Alexander Tropsha, Igor Baskin, Dragos Horvath, Gilles Marcou, Christophe Muller, Alexander Varnek, Volodymyr V. Prokopenko, and Igor V. TetkoJournal of Chemical Information and Modeling2010 50 (12), 2094-2111The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of “distance to model” (DM) is defined as a metric of similarity between the training ...

WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry
Daniel J. Warner, Edward J. Griffen and Stephen A. St-GallayJournal of Chemical Information and Modeling2010 50 (8), 1350-1357WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry
Daniel J. Warner, Edward J. Griffen and Stephen A. St-GallayJournal of Chemical Information and Modeling2010 50 (8), 1350-1357An algorithm to automatically identify and extract matched molecular pairs from a collection of compounds has been developed, allowing the learning associated with each molecular transformation to be readily exploited in drug discovery projects. Here, we ...

Bioisosteric Similarity of Molecules Based on Structural Alignment and Observed Chemical Replacements in Drugs
Markus Krier and Michael C. HutterJournal of Chemical Information and Modeling2009 49 (5), 1280-1297Bioisosteric Similarity of Molecules Based on Structural Alignment and Observed Chemical Replacements in Drugs
Markus Krier and Michael C. HutterJournal of Chemical Information and Modeling2009 49 (5), 1280-1297The algorithmic concept used to assess the evolutionary relationship between protein sequences was adopted to the comparison of drug-like compounds. For this purpose, we have developed a method that uses the SMILES representation of the molecules to ...
Tools
-
Add to Favorites
-
Download Citation
-
Email a Colleague -
Permalink
Order Reprints
Rights & Permissions
Citation Alerts
History
- Published In Issue March 24, 2003
- Received August 14, 2002
Cart


ACS
Network






