Genetic Programming for the Induction of Decision Trees to Model Ecotoxicity DataClick to copy article linkArticle link copied!
Abstract
Automatic induction of decision trees and production rules from data to develop structure−activity models for toxicity prediction has recently received much attention, and the majority of methodologies reported in the literature are based upon recursive partitioning employing greedy searches to choose the best splitting attribute and value at each node. These approaches can be successful; however, the greedy search will necessarily miss regions of the search space. Recent literature has demonstrated the applicability of genetic programming to decision tree induction to overcome this problem. This paper presents a variant of this novel approach, using fewer mutation options and a simpler fitness function, demonstrating its utility in inducing decision trees for ecotoxicity data, via a case study of two data sets giving improved accuracy and generalization ability over a popular decision tree inducer.
†
Department of Chemical Engineering, University of Leeds.
*
Corresponding author phone: +44 113 343 2427; fax: +44 113 343 2405; e-mail: [email protected].
‡
School of Civil Engineering, University of Leeds.
§
AstraZeneca UK Ltd.
‖
Centre of Ecology and Hydrology.
Cited By
This article is cited by 18 publications.
- Jian Jiao, Shi-Miao Tan, Rui-Ming Luo, and Yan-Ping Zhou . A Robust Boosting Regression Tree with Applications in Quantitative Structure−Activity Relationship Studies of Organic Compounds. Journal of Chemical Information and Modeling 2011, 51
(4)
, 816-828. https://doi.org/10.1021/ci100429u
- Yan-Ping Zhou, Li-Juan Tang, Jian Jiao, Dan-Dan Song, Jian-Hui Jiang and Ru-Qin Yu . Modified Particle Swarm Optimization Algorithm for Adaptively Configuring Globally Optimal Classification and Regression Trees. Journal of Chemical Information and Modeling 2009, 49
(5)
, 1144-1153. https://doi.org/10.1021/ci800374h
- Rafael Rivera-Lopez, Juana Canul-Reich, Efrén Mezura-Montes, Marco Antonio Cruz-Chávez. Induction of decision trees as classification models through metaheuristics. Swarm and Evolutionary Computation 2022, 69 , 101006. https://doi.org/10.1016/j.swevo.2021.101006
- Saad M. Darwish, Tamer A. Shendi, Ahmed Younes. Chemometrics approach for the prediction of chemical compounds’ toxicity degree based on quantum inspired optimization with applications in drug discovery. Chemometrics and Intelligent Laboratory Systems 2019, 193 , 103826. https://doi.org/10.1016/j.chemolab.2019.103826
- Saad M. Darwish, Tamer A. Shendi, Ahmed Younes. Quantum‐inspired genetic programming model with application to predict toxicity degree for chemical compounds. Expert Systems 2019, 36
(4)
https://doi.org/10.1111/exsy.12415
- Gonzalo Cerruela García, Nicolás García-Pedrajas, Irene Luque Ruiz, Miguel Ángel Gómez-Nieto. An ensemble approach for in silico prediction of Ames mutagenicity. Journal of Mathematical Chemistry 2018, 56
(7)
, 2085-2098. https://doi.org/10.1007/s10910-018-0855-z
- Ceyda Oksel, Cai Y. Ma, Jing J. Liu, Terry Wilkins, Xue Z. Wang. Literature Review of (Q)SAR Modelling of Nanomaterial Toxicity. 2017, 103-142. https://doi.org/10.1007/978-3-319-47754-1_5
- Ceyda Oksel, David A. Winkler, Cai Y. Ma, Terry Wilkins, Xue Z. Wang. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches. Nanotoxicology 2016, 10
(7)
, 1001-1012. https://doi.org/10.3109/17435390.2016.1161857
- Saeed Yousefinejad, Bahram Hemmateenejad. Chemometrics tools in QSAR/QSPR studies: A historical perspective. Chemometrics and Intelligent Laboratory Systems 2015, 149 , 177-204. https://doi.org/10.1016/j.chemolab.2015.06.016
- Ceyda Oksel, Cai Y. Ma, Jing J. Liu, Terry Wilkins, Xue Z. Wang. (Q)SAR modelling of nanomaterial toxicity: A critical review. Particuology 2015, 21 , 1-19. https://doi.org/10.1016/j.partic.2014.12.001
- Philip Judson. The Application of Structure–Activity Relationships to the Prediction of the Mutagenic Activity of Chemicals. 2012, 1-19. https://doi.org/10.1007/978-1-61779-421-6_1
- Yang Yang, Tian Lin, Xiao L. Weng, Jawwad A. Darr, Xue Z. Wang. Data flow modeling, data mining and QSAR in high-throughput discovery of functional nanomaterials. Computers & Chemical Engineering 2011, 35
(4)
, 671-678. https://doi.org/10.1016/j.compchemeng.2010.04.018
- Chao Y. Ma, Xue Z. Wang. Inductive data mining based on genetic programming: Automatic generation of decision trees from data for process historical data analysis. Computers & Chemical Engineering 2009, 33
(10)
, 1602-1616. https://doi.org/10.1016/j.compchemeng.2009.04.005
- . Bibliography. 2009, 1-241. https://doi.org/10.1002/9783527628766.biblio
- Chao Y Ma, Frances V Buontempo, Xue Z Wang. Inductive data mining: Automatic generation of decision trees from data for QSAR modelling and process historical data analysis. 2008, 581-586. https://doi.org/10.1016/S1570-7946(08)80102-2
- Paul Watson. Naïve Bayes Classification Using 2D Pharmacophore Feature Triplet Vectors. Journal of Chemical Information and Modeling 2008, 48
(1)
, 166-178. https://doi.org/10.1021/ci7003253
- X. Z. Wang, F. V. Buontempo, A. Young, D. Osborn. Induction of decision trees using genetic programming for modelling ecotoxicity data: adaptive discretization of real-valued endpoints. SAR and QSAR in Environmental Research 2006, 17
(5)
, 451-471. https://doi.org/10.1080/10629360600933723
- M. Mwense, X. Z. Wang, F. V. Buontempo, N. Horan, A. Young, D. Osborn. QSAR approach for mixture toxicity prediction using independent latent descriptors and fuzzy membership functions†. SAR and QSAR in Environmental Research 2006, 17
(1)
, 53-73. https://doi.org/10.1080/10659360600562202
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.