Exploiting Multiple Descriptor Sets in QSAR Studies
Abstract

A quantitative structure–activity relationship (QSAR) is a model relating a specific biological response to the chemical structures of compounds. There are many descriptor sets available to characterize chemical structure, raising the question of how to choose among them or how to use all of them for training a QSAR model. Making efficient use of all sets of descriptors is particularly problematic when active compounds are rare among the assay response data. We consider various strategies to make use of the richness of multiple descriptor sets when assay data are poor in active compounds. Comparisons are made using data from four bioassays, each with five sets of molecular descriptors. The recommended method takes all available descriptors from all sets and uses an algorithm to partition them into groups called phalanxes. Distinct statistical models are trained, each based on only the descriptors in one phalanx, and the models are then averaged in an ensemble of models. By giving the descriptors a chance to contribute in different models, the recommended method uses more of the descriptors in model averaging. This results in better ranking of active compounds to identify a shortlist of drug candidates for development.
Cited By
This article is cited by 6 publications.
- Xiaoyi Zhang, Wenling Niu, Tang Tang, Chengfei Hou, Yajie Guo, Ren Kong. A Strategy to Find Novel Candidate DKAs Inhibitors Using Modified QSAR Model with Favorable Druggability Properties. Chemical Research in Chinese Universities 2019, 35 (6) , 1111-1118. https://doi.org/10.1007/s40242-019-9183-5
- Hwanho Choi, Hongsuk Kang, Kee-Choo Chung, Hwangseo Park. Development and application of a comprehensive machine learning program for predicting molecular biochemical and pharmacological properties. Physical Chemistry Chemical Physics 2019, 21 (9) , 5189-5199. https://doi.org/10.1039/C8CP07002D
- Govindan Subramanian, Gennady Poda. In silico ligand-based modeling of h BACE-1 inhibitors. Chemical Biology & Drug Design 2018, 91 (3) , 817-827. https://doi.org/10.1111/cbdd.13147
- Anne Tromelin, Claire Chabanet, Karine Audouze, Florian Koensgen, Elisabeth Guichard. Multivariate statistical analysis of a large odorants database aimed at revealing similarities and links between odorants and odors. Flavour and Fragrance Journal 2018, 33 (1) , 106-126. https://doi.org/10.1002/ffj.3430
- Miao Qi, Ting Wang, Yugen Yi, Na Gao, Jun Kong, Jianzhong Wang. Joint L2,1 Norm and Fisher Discrimination Constrained Feature Selection for Rational Synthesis of Microporous Aluminophosphates. Molecular Informatics 2017, 36 (4) , 1600076. https://doi.org/10.1002/minf.201600076
- Igor I. Baskin, David Winkler, Igor V. Tetko. A renaissance of neural networks in drug discovery. Expert Opinion on Drug Discovery 2016, 11 (8) , 785-795. https://doi.org/10.1080/17460441.2016.1201262



