Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models
Abstract
The techniques of combining the results of multiple classification models to produce a single prediction have been investigated for many years. In earlier applications, the multiple models to be combined were developed by altering the training set. The use of these so-called resampling techniques, however, poses the risk of reducing predictivity of the individual models to be combined and/or over fitting the noise in the data, which might result in poorer prediction of the composite model than the individual models. In this paper, we suggest a novel approach, named Decision Forest, that combines multiple Decision Tree models. Each Decision Tree model is developed using a unique set of descriptors. When models of similar predictive quality are combined using the Decision Forest method, quality compared to the individual models is consistently and significantly improved in both training and testing steps. An example will be presented for prediction of binding affinity of 232 chemicals to the estrogen receptor.
*
Corresponding author phone: (870)543-7142; fax: (870)543-7662; e-mail: [email protected] Corresponding address: NCTR, 3900 NCTR Road, HFT 20, Jefferson, AR 72079.
†
National Center for Toxicological Research.
‡
Northrop Grumman Information Technology.
Cited By
This article is cited by 139 publications.
- Sugunadevi Sakkiah, Carmine Leggett, Bohu Pan, Wenjing Guo, Luis G. Valerio, Jr., Huixiao Hong. Development of a Nicotinic Acetylcholine Receptor nAChR α7 Binding Activity Prediction Model. Journal of Chemical Information and Modeling 2020, 60 (4) , 2396-2404. https://doi.org/10.1021/acs.jcim.0c00139
- Yangmei Zhang, Tuan Van Vu, Junying Sun, Jianjun He, Xiaojing Shen, Weili Lin, Xiaoye Zhang, Junting Zhong, Wenkang Gao, Yaqiang Wang, Tzung May Fu, Yaping Ma, Weijun Li, Zongbo Shi. Significant Changes in Chemistry of Fine Particles in Wintertime Beijing from 2007 to 2017: Impact of Clean Air Actions. Environmental Science & Technology 2020, 54 (3) , 1344-1352. https://doi.org/10.1021/acs.est.9b04678
- Beatrice Pecoraro, Marco Tutone, Ewelina Hoffman, Victoria Hutter, Anna Maria Almerico, Matthew Traynor. Predicting Skin Permeability by Means of Computational Approaches: Reliability and Caveats in Pharmaceutical Studies. Journal of Chemical Information and Modeling 2019, 59 (5) , 1759-1771. https://doi.org/10.1021/acs.jcim.8b00934
- Abdul Karim, Avinash Mishra, M. A. Hakim Newton, Abdul Sattar. Efficient Toxicity Prediction via Simple Features Using Shallow Neural Networks and Decision Trees. ACS Omega 2019, 4 (1) , 1874-1888. https://doi.org/10.1021/acsomega.8b03173
- Leihong Wu, Zhichao Liu, Scott Auerbach, Ruili Huang, Minjun Chen, Kristin McEuen, Joshua Xu, Hong Fang, and Weida Tong . Integrating Drug’s Mode of Action into Quantitative Structure–Activity Relationships for Improved Prediction of Drug-Induced Liver Injury. Journal of Chemical Information and Modeling 2017, 57 (4) , 1000-1006. https://doi.org/10.1021/acs.jcim.6b00719
- Hui Wen Ng, Stephen W. Doughty, Heng Luo, Hao Ye, Weigong Ge, Weida Tong, and Huixiao Hong . Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets. Chemical Research in Toxicology 2015, 28 (12) , 2343-2351. https://doi.org/10.1021/acs.chemrestox.5b00358
- Youjun Xu, Ziwei Dai, Fangjin Chen, Shuaishi Gao, Jianfeng Pei, and Luhua Lai . Deep Learning for Drug-Induced Liver Injury. Journal of Chemical Information and Modeling 2015, 55 (10) , 2085-2093. https://doi.org/10.1021/acs.jcim.5b00238
- Yunierkis Pérez-Castillo, Cosmin Lazar, Jonatan Taminau, Mathy Froeyen, Miguel Ángel Cabrera-Pérez, and Ann Nowé . GA(M)E-QSAR: A Novel, Fully Automatic Genetic-Algorithm-(Meta)-Ensembles Approach for Binary Classification in Ligand-Based Drug Design. Journal of Chemical Information and Modeling 2012, 52 (9) , 2366-2386. https://doi.org/10.1021/ci300146h
- Jian Jiao, Shi-Miao Tan, Rui-Ming Luo, and Yan-Ping Zhou . A Robust Boosting Regression Tree with Applications in Quantitative Structure−Activity Relationship Studies of Organic Compounds. Journal of Chemical Information and Modeling 2011, 51 (4) , 816-828. https://doi.org/10.1021/ci100429u
- Kirk Simmons, John Kinney, Aaron Owens, Daniel A. Kleier, Karen Bloch, Dave Argentar, Alicia Walsh and Ganesh Vaidyanathan . Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening. Journal of Chemical Information and Modeling 2008, 48 (11) , 2196-2206. https://doi.org/10.1021/ci800164u
- Huixiao Hong, Qian Xie, Weigong Ge, Feng Qian, Hong Fang, Leming Shi, Zhenqiang Su, Roger Perkins and Weida Tong . Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics. Journal of Chemical Information and Modeling 2008, 48 (7) , 1337-1344. https://doi.org/10.1021/ci800038f
- Craig L. Bruce,, James L. Melville,, Stephen D. Pickett, and, Jonathan D. Hirst. Contemporary QSAR Classifiers Compared. Journal of Chemical Information and Modeling 2007, 47 (1) , 219-227. https://doi.org/10.1021/ci600332j
- Meir Glick,, Jeremy L. Jenkins,, James H. Nettles,, Hamilton Hitchings, and, John W. Davies. Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers. Journal of Chemical Information and Modeling 2006, 46 (1) , 193-200. https://doi.org/10.1021/ci050374h
- Frances V. Buontempo,, Xue Zhong Wang,, Mulaisho Mwense,, Nigel Horan,, Anita Young, and, Daniel Osborn. Genetic Programming for the Induction of Decision Trees to Model Ecotoxicity Data. Journal of Chemical Information and Modeling 2005, 45 (4) , 904-912. https://doi.org/10.1021/ci049652n
- Vladimir Svetnik,, Ting Wang,, Christopher Tong,, Andy Liaw,, Robert P. Sheridan, and, Qinghua Song. Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling. Journal of Chemical Information and Modeling 2005, 45 (3) , 786-799. https://doi.org/10.1021/ci0500379
- Robert Kirk DeLisle and, Steven L. Dixon. Induction of Decision Trees via Evolutionary Programming. Journal of Chemical Information and Computer Sciences 2004, 44 (3) , 862-870. https://doi.org/10.1021/ci034188s
- Jörg K. Wegner,, Holger Fröhlich, and, Andreas Zell. Feature Selection for Descriptor Based Classification Models. 1. Theory and GA-SEC Algorithm. Journal of Chemical Information and Computer Sciences 2004, 44 (3) , 921-930. https://doi.org/10.1021/ci0342324
- Arja Asikainen,, Juhani Ruuskanen, and, Kari Tuppurainen. Spectroscopic QSAR Methods and Self-Organizing Molecular Field Analysis for Relating Molecular Structure and Estrogenic Activity. Journal of Chemical Information and Computer Sciences 2003, 43 (6) , 1974-1981. https://doi.org/10.1021/ci034110b
- Vladimir Svetnik,, Andy Liaw,, Christopher Tong,, J. Christopher Culberson,, Robert P. Sheridan, and, Bradley P. Feuston. Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. Journal of Chemical Information and Computer Sciences 2003, 43 (6) , 1947-1958. https://doi.org/10.1021/ci034160g
- Jindřich Charvát, Aleš Procházka, Matěj Fričl, Oldřich Vyšata, Lucie Himmlová. Diffuse reflectance spectroscopy in dental caries detection and classification. Signal, Image and Video Processing 2020, 14 (5) , 1063-1070. https://doi.org/10.1007/s11760-020-01640-4
- Ulf W. Liebal, An N. T. Phan, Malvika Sudhakar, Karthik Raman, Lars M. Blank. Machine Learning Applications for Mass Spectrometry-Based Metabolomics. Metabolites 2020, 10 (6) , 243. https://doi.org/10.3390/metabo10060243
- Kamel Mansouri, Nicole Kleinstreuer, Ahmed M. Abdelaziz, Domenico Alberga, Vinicius M. Alves, Patrik L. Andersson, Carolina H. Andrade, Fang Bai, Ilya Balabin, Davide Ballabio, Emilio Benfenati, Barun Bhhatarai, Scott Boyer, Jingwen Chen, Viviana Consonni, Sherif Farag, Denis Fourches, Alfonso T. García-Sosa, Paola Gramatica, Francesca Grisoni, Chris M. Grulke, Huixiao Hong, Dragos Horvath, Xin Hu, Ruili Huang, Nina Jeliazkova, Jiazhong Li, Xuehua Li, Huanxiang Liu, Serena Manganelli, Giuseppe F. Mangiatordi, Uko Maran, Gilles Marcou, Todd Martin, Eugene Muratov, Dac-Trung Nguyen, Orazio Nicolotti, Nikolai G. Nikolov, Ulf Norinder, Ester Papa, Michel Petitjean, Geven Piir, Pavel Pogodin, Vladimir Poroikov, Xianliang Qiao, Ann M. Richard, Alessandra Roncaglioni, Patricia Ruiz, Chetan Rupakheti, Sugunadevi Sakkiah, Alessandro Sangion, Karl-Werner Schramm, Chandrabose Selvaraj, Imran Shah, Sulev Sild, Lixia Sun, Olivier Taboureau, Yun Tang, Igor V. Tetko, Roberto Todeschini, Weida Tong, Daniela Trisciuzzi, Alexander Tropsha, George Van Den Driessche, Alexandre Varnek, Zhongyu Wang, Eva B. Wedebye, Antony J. Williams, Hongbin Xie, Alexey V. Zakharov, Ziye Zheng, Richard S. Judson. CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity. Environmental Health Perspectives 2020, 128 (2) , 027002. https://doi.org/10.1289/EHP5580
- Paras Chaudhary, Somya Jain, Adwitiya Sinha. Sustainable Approach for Forest Fire Prediction. 2020,,, 456-469. https://doi.org/10.1007/978-981-15-4451-4_36
- Devin Hunt, Megan Branson, Elizabeth Putnam, Mark Pershouse. Toxicoinformatics today. 2020,,, 51-62. https://doi.org/10.1016/B978-0-12-813724-6.00004-9
- Natalia Sizochenko, Michael Syzochenko, Natalja Fjodorova, Bakhtiyor Rasulev, Jerzy Leszczynski. Evaluating genotoxicity of metal oxide nanoparticles: Application of advanced supervised and unsupervised machine learning techniques. Ecotoxicology and Environmental Safety 2019, 185 , 109733. https://doi.org/10.1016/j.ecoenv.2019.109733
- Ramya Akula, Zachary Wieselthier, Laura Martin, Ivan Garibay. Forecasting the Success of Television Series using Machine Learning. 2019,,, 1-8. https://doi.org/10.1109/SoutheastCon42311.2019.9020419
- Stuart K. Grange, David C. Carslaw. Using meteorological normalisation to detect interventions in air quality time series. Science of The Total Environment 2019, 653 , 578-588. https://doi.org/10.1016/j.scitotenv.2018.10.344
- Sugunadevi Sakkiah, Rebecca Kusko, Weida Tong, Huixiao Hong. Applications of Molecular Dynamics Simulations in Computational Toxicology. 2019,,, 181-212. https://doi.org/10.1007/978-3-030-16443-0_10
- Tomás Sherwen, Rosie J. Chance, Liselotte Tinel, Daniel Ellis, Mat J. Evans, Lucy J. Carpenter. A machine-learning-based global sea-surface iodide distribution. Earth System Science Data 2019, 11 (3) , 1239-1262. https://doi.org/10.5194/essd-11-1239-2019
- Wangshu Tan, Gang Zhao, Yingli Yu, Chengcai Li, Jian Li, Ling Kang, Tong Zhu, Chunsheng Zhao. Method to retrieve cloud condensation nuclei number concentrations using lidar measurements. Atmospheric Measurement Techniques 2019, 12 (7) , 3825-3839. https://doi.org/10.5194/amt-12-3825-2019
- Ye S. Kang, Chan S. Ryu, Sae R. Jun, Si H. Jang, Jun W. Park, Hye Y. Song, Tapash K. Sarkar, Seong H. Kim, Won S. Lee. Distinguishing between closely related species of Allium and of Brassicaceae by narrowband hyperspectral imagery. Biosystems Engineering 2018, 176 , 103-113. https://doi.org/10.1016/j.biosystemseng.2018.10.003
- Gabriel Idakwo, Joseph Luttrell, Minjun Chen, Huixiao Hong, Zhaoxian Zhou, Ping Gong, Chaoyang Zhang. A review on machine learning methods for in silico toxicity prediction. Journal of Environmental Science and Health, Part C 2018, 36 (4) , 169-191. https://doi.org/10.1080/10590501.2018.1537118
- Sugunadevi Sakkiah, Wenjing Guo, Bohu Pan, Rebecca Kusko, Weida Tong, Huixiao Hong. Computational prediction models for assessing endocrine disrupting potential of chemicals. Journal of Environmental Science and Health, Part C 2018, 36 (4) , 192-218. https://doi.org/10.1080/10590501.2018.1537132
- Hongyu Zhou, Zheng Dong, Peng Tao. Recognition of protein allosteric states and residues: Machine learning approaches. Journal of Computational Chemistry 2018, 39 (20) , 1481-1490. https://doi.org/10.1002/jcc.25218
- Cagatay Catal, Akhan Akbulut. Automatic energy expenditure measurement for health science. Computer Methods and Programs in Biomedicine 2018, 157 , 31-37. https://doi.org/10.1016/j.cmpb.2018.01.015
- Hui Wen Ng, Carmine Leggett, Sugunadevi Sakkiah, Bohu Pan, Hao Ye, Leihong Wu, Chandrabose Selvaraj, Weida Tong, Huixiao Hong. Competitive docking model for prediction of the human nicotinic acetylcholine receptor α7 binding of tobacco constituents. Oncotarget 2018, 9 (24) , 16899-16916. https://doi.org/10.18632/oncotarget.24458
- Chandrabose Selvaraj, Sugunadevi Sakkiah, Weida Tong, Huixiao Hong. Molecular dynamics simulations and applications in computational toxicology and nanotoxicology. Food and Chemical Toxicology 2018, 112 , 495-506. https://doi.org/10.1016/j.fct.2017.08.028
- M. Karimi, H. Karami, M. Gholami, H. Khatibzadehazad, N. Moslemi. Priority index considering temperature and date proximity for selection of similar days in knowledge-based short term load forecasting method. Energy 2018, 144 , 928-940. https://doi.org/10.1016/j.energy.2017.12.083
- Huixiao Hong, Jieqiang Zhu, Minjun Chen, Ping Gong, Chaoyang Zhang, Weida Tong. Quantitative Structure–Activity Relationship Models for Predicting Risk of Drug-Induced Liver Injury in Humans. 2018,,, 77-100. https://doi.org/10.1007/978-1-4939-7677-5_5
- Supratik Kar, Kunal Roy, Jerzy Leszczynski. Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling. 2018,,, 141-169. https://doi.org/10.1007/978-1-4939-7899-1_6
- S. Thakkar, R. Perkins, H. Hong, W. Tong. Computational Toxicology. 2018,,, 327-350. https://doi.org/10.1016/B978-0-12-801238-3.64317-9
- Ryo Sugawara, Jiawei Huang, Kazuki Takashima, Taku Komura, Yoshifumi Kitarmura. Random-forest-based initializer for solving inverse problem in 3D motion tracking systems. 2018,,, 1-2. https://doi.org/10.1145/3281505.3283393
- Stuart K. Grange, David C. Carslaw, Alastair C. Lewis, Eirini Boleti, Christoph Hueglin. Random forest meteorological normalisation models for Swiss PM10 trend analysis. Atmospheric Chemistry and Physics 2018, 18 (9) , 6223-6239. https://doi.org/10.5194/acp-18-6223-2018
- Huixiao Hong, Shraddha Thakkar, Minjun Chen, Weida Tong. Development of Decision Forest Models for Prediction of Drug-Induced Liver Injury in Humans Using A Large Set of FDA-approved Drugs. Scientific Reports 2017, 7 (1) https://doi.org/10.1038/s41598-017-17701-7
- Sugunadevi Sakkiah, Chandrabose Selvaraj, Ping Gong, Chaoyang Zhang, Weida Tong, Huixiao Hong. Development of estrogen receptor beta binding prediction model using large sets of chemicals. Oncotarget 2017, 8 (54) , 92989-93000. https://doi.org/10.18632/oncotarget.21723
- , , Anthony J. Winder, Susanne Siemonsen, Fabian Flottmann, Jens Fiehler, Nils D. Forkert. Comparison of classification methods for voxel-based prediction of acute ischemic stroke outcome following intra-arterial intervention. 2017,,, 101344B. https://doi.org/10.1117/12.2254118
- G.J. Myatt, L.D. Beilke, K.P. Cross. In Silico Tools and their Application. 2017,,, 156-176. https://doi.org/10.1016/B978-0-12-409547-2.12379-0
- Kunal Roy, Supratik Kar. Importance of Applicability Domain of QSAR Models. 2017,,, 1012-1043. https://doi.org/10.4018/978-1-5225-1762-7.ch039
- D. Asturiol, S. Casati, A. Worth. Consensus of classification trees for skin sensitisation hazard prediction. Toxicology in Vitro 2016, 36 , 197-209. https://doi.org/10.1016/j.tiv.2016.07.014
- Huixiao Hong, Diego Rua, Sugunadevi Sakkiah, Chandrabose Selvaraj, Weigong Ge, Weida Tong. Consensus Modeling for Prediction of Estrogenic Activity of Ingredients Commonly Used in Sunscreen Products. International Journal of Environmental Research and Public Health 2016, 13 (10) , 958. https://doi.org/10.3390/ijerph13100958
- Heng Luo, Hao Ye, Hui Wen Ng, Sugunadevi Sakkiah, Donna L. Mendrick, Huixiao Hong. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Scientific Reports 2016, 6 (1) https://doi.org/10.1038/srep32115
- Kamel Mansouri, Ahmed Abdelaziz, Aleksandra Rybacka, Alessandra Roncaglioni, Alexander Tropsha, Alexandre Varnek, Alexey Zakharov, Andrew Worth, Ann M. Richard, Christopher M. Grulke, Daniela Trisciuzzi, Denis Fourches, Dragos Horvath, Emilio Benfenati, Eugene Muratov, Eva Bay Wedebye, Francesca Grisoni, Giuseppe F. Mangiatordi, Giuseppina M. Incisivo, Huixiao Hong, Hui W. Ng, Igor V. Tetko, Ilya Balabin, Jayaram Kancherla, Jie Shen, Julien Burton, Marc Nicklaus, Matteo Cassotti, Nikolai G. Nikolov, Orazio Nicolotti, Patrik L. Andersson, Qingda Zang, Regina Politi, Richard D. Beger, Roberto Todeschini, Ruili Huang, Sherif Farag, Sine A. Rosenberg, Svetoslav Slavov, Xin Hu, Richard S. Judson. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environmental Health Perspectives 2016, 124 (7) , 1023-1033. https://doi.org/10.1289/ehp.1510267
- Huixiao Hong, Benjamin Harvey, Giuseppe Palmese, Joseph Stanzione, Hui Ng, Sugunadevi Sakkiah, Weida Tong, Joshua Sadler. Experimental Data Extraction and in Silico Prediction of the Estrogenic Activity of Renewable Replacements for Bisphenol A. International Journal of Environmental Research and Public Health 2016, 13 (7) , 705. https://doi.org/10.3390/ijerph13070705
- Huixiao Hong, Jie Shen, Hui Ng, Sugunadevi Sakkiah, Hao Ye, Weigong Ge, Ping Gong, Wenming Xiao, Weida Tong. A Rat α-Fetoprotein Binding Activity Prediction Model to Facilitate Assessment of the Endocrine Disruption Potential of Environmental Chemicals. International Journal of Environmental Research and Public Health 2016, 13 (4) , 372. https://doi.org/10.3390/ijerph13040372
- Parviz Shahbazikhah, John H. Kalivas, Erik Andries, Trevor O'Loughlin. Using the L 1 norm to select basis set vectors for multivariate calibration and calibration updating. Journal of Chemometrics 2016, 30 (3) , 109-120. https://doi.org/10.1002/cem.2778
- Huixiao Hong, Minjun Chen, Hui Wen Ng, Weida Tong. QSAR Models at the US FDA/NCTR. 2016,,, 431-459. https://doi.org/10.1007/978-1-4939-3609-0_18
- Shu Mao, Hui Wen Ng, Michael Orr, Heng Luo, Hao Ye, Weigong Ge, Weida Tong, Huixiao Hong. Homology Model and Ligand Binding Interactions of the Extracellular Domain of the Human α4β2 Nicotinic Acetylcholine Receptor. Journal of Biomedical Science and Engineering 2016, 09 (01) , 41-100. https://doi.org/10.4236/jbise.2016.91005
- Saeed Yousefinejad, Bahram Hemmateenejad. Chemometrics tools in QSAR/QSPR studies: A historical perspective. Chemometrics and Intelligent Laboratory Systems 2015, 149 , 177-204. https://doi.org/10.1016/j.chemolab.2015.06.016
- Naveen Khatri, Viney Lather, A K Madan. Diverse models for anti-HIV activity of purine nucleoside analogs. Chemistry Central Journal 2015, 9 (1) https://doi.org/10.1186/s13065-015-0109-0
- John H. Kalivas, Károly Héberger, Erik Andries. Sum of ranking differences (SRD) to ensemble multivariate calibration model merits for tuning parameter selection and comparing calibration methods. Analytica Chimica Acta 2015, 869 , 21-33. https://doi.org/10.1016/j.aca.2014.12.056
- Antonio Lavecchia. Machine-learning approaches in drug discovery: methods and applications. Drug Discovery Today 2015, 20 (3) , 318-331. https://doi.org/10.1016/j.drudis.2014.10.012
- Ke Liu, Xiaojing Chen, Limin Li, Huiling Chen, Xiukai Ruan, Wenbin Liu. A consensus successive projections algorithm – multiple linear regression method for analyzing near infrared spectra. Analytica Chimica Acta 2015, 858 , 16-23. https://doi.org/10.1016/j.aca.2014.12.033
- Mayu Sakurada, Takehisa Yairi, Yuta Nakajima, Naoki Nishimura, Devi Parikh. Semantic classification of spacecraft's status: integrating system intelligence and human knowledge. 2015,,, 81-84. https://doi.org/10.1109/ICOSC.2015.7050783
- Somayeh Pirhadi, Fereshteh Shiri, Jahan B. Ghasemi. Multivariate statistical analysis methods in QSAR. RSC Advances 2015, 5 (127) , 104635-104665. https://doi.org/10.1039/C5RA10729F
- Kunal Roy, Supratik Kar. Importance of Applicability Domain of QSAR Models. 2015,,, 180-211. https://doi.org/10.4018/978-1-4666-8136-1.ch005
- Antonina Danylenko, Welf Lowe. Merging Classifiers of Different Classification Approaches. 2014,,, 706-715. https://doi.org/10.1109/ICDMW.2014.64
- J.P. Doucet, A. Doucet-Panaye. Structure–activity relationship study of trifluoromethylketone inhibitors of insect juvenile hormone esterase: Comparison of several classification methods. SAR and QSAR in Environmental Research 2014, 25 (7) , 589-616. https://doi.org/10.1080/1062936X.2014.919959
- P.K. Schmieder, R.C. Kolanczyk, M.W. Hornung, M.A. Tapper, J.S. Denny, B.R. Sheedy, H. Aladjov. A rule-based expert system for chemical prioritization using effects-based chemical categories. SAR and QSAR in Environmental Research 2014, 25 (4) , 253-287. https://doi.org/10.1080/1062936X.2014.898691
- Huixiao Hong, Roger Perkins, Leming Shi, Hong Fang, Donna Mendrick, Weida Tong. Molecular Biomarkers for Personalized Medicine. 2014,,, 607-644. https://doi.org/10.1201/b15465-15
- Yankun Li, Jing Jing. A consensus PLS method based on diverse wavelength variables models for analysis of near-infrared spectra. Chemometrics and Intelligent Laboratory Systems 2014, 130 , 45-49. https://doi.org/10.1016/j.chemolab.2013.10.005
- Hui Wen Ng, Wenqian Zhang, Mao Shu, Heng Luo, Weigong Ge, Roger Perkins, Weida Tong, Huixiao Hong. Competitive molecular docking approach for predicting estrogen receptor subtype α agonists and antagonists. BMC Bioinformatics 2014, 15 (Suppl 11) , S4. https://doi.org/10.1186/1471-2105-15-S11-S4
- Minjun Chen, Huixiao Hong, Hong Fang, Reagan Kelly, Guangxu Zhou, Jürgen Borlak, Weida Tong. Quantitative Structure-Activity Relationship Models for Predicting Drug-Induced Liver Injury Based on FDA-Approved Drug Labeling Annotation and Using a Large Collection of Drugs. Toxicological Sciences 2013, 136 (1) , 242-249. https://doi.org/10.1093/toxsci/kft189
- Jie Shen, Lei Xu, Hong Fang, Ann M. Richard, Jeffrey D. Bray, Richard S. Judson, Guangxu Zhou, Thomas J. Colatsky, Jason L. Aungst, Christina Teng, Steve C. Harris, Weigong Ge, Susie Y. Dai, Zhenqiang Su, Abigail C. Jacobs, Wafa Harrouk, Roger Perkins, Weida Tong, Huixiao Hong. EADB: An Estrogenic Activity Database for Assessing Potential Endocrine Activity. Toxicological Sciences 2013, 135 (2) , 277-291. https://doi.org/10.1093/toxsci/kft164
- A. K. Madan, Sanjay Bajaj, Harish Dureja. Classification Models for Safe Drug Molecules. 2013,,, 99-124. https://doi.org/10.1007/978-1-62703-059-5_5
- Hong Fang, Huixiao Hong, Zhichao Liu, Roger Perkins, Reagan Kelly, John Beresney, Weida Tong, Bruce A. Fowler. Omics Biomarkers in Risk Assessment. 2013,,, 195-213. https://doi.org/10.1016/B978-0-12-396461-8.00013-0
- Parviz Shahbazikhah, John H. Kalivas. A consensus modeling approach to update a spectroscopic calibration. Chemometrics and Intelligent Laboratory Systems 2013, 120 , 142-153. https://doi.org/10.1016/j.chemolab.2012.06.006
- Kao-Shing Hwang, Yu-Jen Chen, Wei-Cheng Jiang, Tsung-Wen Yang. Induced states in a decision tree constructed by Q-learning. Information Sciences 2012, 213 , 39-49. https://doi.org/10.1016/j.ins.2012.06.009
- Pulan Yu, David J Wild. Fast rule-based bioactivity prediction using associative classification mining. Journal of Cheminformatics 2012, 4 (1) https://doi.org/10.1186/1758-2946-4-29
- Eugene Myshkin, Richard Brennan, Tatiana Khasanova, Tatiana Sitnik, Tatiana Serebriyskaya, Elena Litvinova, Alexey Guryanov, Yuri Nikolsky, Tatiana Nikolskaya, Svetlana Bureeva. Prediction of Organ Toxicity Endpoints by QSAR Modeling Based on Precise Chemical-Histopathology Annotations. Chemical Biology & Drug Design 2012, 80 (3) , 406-416. https://doi.org/10.1111/j.1747-0285.2012.01411.x
- Patricia Ruiz, Gino Begluitti, Terry Tincher, John Wheeler, Moiz Mumtaz. Prediction of Acute Mammalian Toxicity Using QSAR Methods: A Case Study of Sulfur Mustard and Its Breakdown Products. Molecules 2012, 17 (8) , 8982-9001. https://doi.org/10.3390/molecules17088982
- Kuo-Wei Hsu. Hybrid ensembles of decision trees and artificial neural networks. 2012,,, 25-29. https://doi.org/10.1109/CyberneticsCom.2012.6381610
- Kao-Shing Hwang, Yu-Jen Chen, Chun-Ju Wu. Fusion of Multiple Behaviors Using Layered Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 2012, 42 (4) , 999-1004. https://doi.org/10.1109/TSMCA.2012.2183349
- Faizan Sahigara, Kamel Mansouri, Davide Ballabio, Andrea Mauri, Viviana Consonni, Roberto Todeschini. Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. Molecules 2012, 17 (5) , 4791-4810. https://doi.org/10.3390/molecules17054791
- Felix Hammann, Juergen Drewe. Decision tree models for data mining in hit discovery. Expert Opinion on Drug Discovery 2012, 7 (4) , 341-352. https://doi.org/10.1517/17460441.2012.668182
- Michael Krein, Tao-Wei Huang, Lisa Morkowchuk, Dimitris K. Agrafiotis, Curt M. Breneman. Developing Best Practices for Descriptor-Based Property Prediction: Appropriate Matching of Datasets, Descriptors, Methods, and Expectations. 2012,,, 33-64. https://doi.org/10.1002/9783527645121.ch2
- Huixiao Hong, Svetoslav Slavov, Weigong Ge, Feng Qian, Zhenqiang Su, Hong Fang, Yiyu Cheng, Roger Perkins, Leming Shi, Weida Tong. Mold 2 Molecular Descriptors for QSAR. 2012,,, 65-109. https://doi.org/10.1002/9783527645121.ch3
- Brooks McPhail, Yunfeng Tie, Huixiao Hong, Bruce A. Pearce, Laura K. Schnackenberg, Weigong Ge, Luis G. Valerio, James C. Fuscoe, Weida Tong, Dan A. Buzatu, Jon G. Wilkes, Bruce A. Fowler, Eugene Demchuk, Richard D. Beger. Modeling Chemical Interaction Profiles: I. Spectral Data-Activity Relationship and Structure-Activity Relationship Models for Inhibitors and Non-inhibitors of Cytochrome P450 CYP3A4 and CYP2D6 Isozymes. Molecules 2012, 17 (3) , 3383-3406. https://doi.org/10.3390/molecules17033383
- Yunfeng Tie, Brooks McPhail, Huixiao Hong, Bruce A. Pearce, Laura K. Schnackenberg, Weigong Ge, Dan A. Buzatu, Jon G. Wilkes, James C. Fuscoe, Weida Tong, Bruce A. Fowler, Richard D. Beger, Eugene Demchuk. Modeling Chemical Interaction Profiles: II. Molecular Docking, Spectral Data-Activity Relationship, and Structure-Activity Relationship Models for Potent and Weak Inhibitors of Cytochrome P450 CYP3A4 Isozyme. Molecules 2012, 17 (3) , 3407-3460. https://doi.org/10.3390/molecules17033407
- N. Sukumar, Michael P. Krein, Mark J. Embrechts. Predictive Cheminformatics in Drug Discovery: Statistical Modeling for Analysis of Micro-array and Gene Expression Data. 2012,,, 165-194. https://doi.org/10.1007/978-1-61779-965-5_9
- M. Gerdes, D. Scholz. Fuzzy condition monitoring of recirculation fans and filters. CEAS Aeronautical Journal 2011, 2 (1-4) , 81-87. https://doi.org/10.1007/s13272-011-0021-9
- Marlene Castro-Melchor, Huong Le, Wei-Shou Hu. Transcriptome Data Analysis for Cell Culture Processes. 2011,,, 27-70. https://doi.org/10.1007/10_2011_116
- Xincai Luo, Jennifer R. Krumrine, Ashok B. Shenvi, M. Edward Pierson, Peter R. Bernstein. Calculation and application of activity discriminants in lead optimization. Journal of Molecular Graphics and Modelling 2010, 29 (3) , 372-381. https://doi.org/10.1016/j.jmgm.2010.07.005
- João D. Ferreira, Francisco M. Couto, . Semantic Similarity for Automatic Classification of Chemical Compounds. PLoS Computational Biology 2010, 6 (9) , e1000937. https://doi.org/10.1371/journal.pcbi.1000937
- Jianping Huang, Hong Fang, Xiaohui Fan. Decision forest for classification of gene expression data. Computers in Biology and Medicine 2010, 40 (8) , 698-704. https://doi.org/10.1016/j.compbiomed.2010.06.004
- X Fan, E K Lobenhofer, M Chen, W Shi, J Huang, J Luo, J Zhang, S J Walker, T-M Chu, L Li, R Wolfinger, W Bao, R S Paules, P R Bushel, J Li, T Shi, T Nikolskaya, Y Nikolsky, H Hong, Y Deng, Y Cheng, H Fang, L Shi, W Tong. Consistency of predictive signature genes and classifiers generated using different microarray platforms. The Pharmacogenomics Journal 2010, 10 (4) , 247-257. https://doi.org/10.1038/tpj.2010.34
- Aiying Chang, Tiejun Wu, Bao Xin. Notice of Retraction: The prediction of pulverized coal ignition property based on piecewise least squares support vector machine. 2010,,, 251-254. https://doi.org/10.1109/ICCSIT.2010.5563669
- ALISON G. BOYER. Consistent Ecological Selectivity through Time in Pacific Island Avian Extinctions. Conservation Biology 2010, 24 (2) , 511-519. https://doi.org/10.1111/j.1523-1739.2009.01341.x
- Huixiao Hong, Federico Goodsaid, Leming Shi, Weida Tong. Molecular biomarkers: a US FDA effort. Biomarkers in Medicine 2010, 4 (2) , 215-225. https://doi.org/10.2217/bmm.09.81
- Matthew R. Kunz, Joshua Ottaway, John H. Kalivas, Erik Andries. Impact of standardization sample design on Tikhonov regularization variants for spectroscopic calibration maintenance and transfer. Journal of Chemometrics 2010, 24 (3-4) , 218-229. https://doi.org/10.1002/cem.1302
- Wangdong Ni, Steven D. Brown, Ruilin Man. Data fusion in multivariate calibration transfer. Analytica Chimica Acta 2010, 661 (2) , 133-142. https://doi.org/10.1016/j.aca.2009.12.026
- Xue Xu, Wei Yang, Yan Li, Yonghua Wang. Discovery of estrogen receptor modulators: a review of virtual screening and SAR efforts. Expert Opinion on Drug Discovery 2010, 5 (1) , 21-31. https://doi.org/10.1517/17460440903490395
- Simone Mocellin, John F. Thompson, Sandro Pasquali, Maria C. Montesco, Pierluigi Pilati, Donato Nitti, Robyn P. Saw, Richard A. Scolyer, Jonathan R. Stretch, Carlo R. Rossi. Sentinel Node Status Prediction by Four Statistical Models. Annals of Surgery 2009, 250 (6) , 964-969. https://doi.org/10.1097/SLA.0b013e3181b07ffd
- Chao Y. Ma, Xue Z. Wang. Inductive data mining based on genetic programming: Automatic generation of decision trees from data for process historical data analysis. Computers & Chemical Engineering 2009, 33 (10) , 1602-1616. https://doi.org/10.1016/j.compchemeng.2009.04.005
- Xueguang SHAO, Da CHEN, Heng XU, Zhichao LIU, Wensheng CAI. Improving the Robustness and Stability of Partial Least Squares Regression for Near-infrared Spectral Analysis. Chinese Journal of Chemistry 2009, 27 (7) , 1328-1332. https://doi.org/10.1002/cjoc.200990222
- Ralph L. Kodell, Bruce A. Pearce, Songjoon Baek, Hojin Moon, Hongshik Ahn, John F. Young, James J. Chen. A model-free ensemble method for class prediction with application to biomedical decision making. Artificial Intelligence in Medicine 2009, 46 (3) , 267-276. https://doi.org/10.1016/j.artmed.2008.11.001
- Peter Willett. Similarity methods in chemoinformatics. Annual Review of Information Science and Technology 2009, 43 (1) , 1-117. https://doi.org/10.1002/aris.2009.1440430108
- Eddie Y. T. Ma, Stefan C. Kremer. Neural Grammar Networks. 2009,,, 67-96. https://doi.org/10.1007/978-3-642-04003-0_4
- Mark A. Pershouse, Melisa Bunderson Schelvan, Amy Erbe, Corbin Schwanke, Elizabeth Putnam. Toxicoinformatics Today. 2009,,, 49-55. https://doi.org/10.1016/B978-0-12-373593-5.00004-5
- A. Roncaglioni, N. Piclin, M. Pintore, E. Benfenati. Binary classification models for endocrine disrupter effects mediated through the estrogen receptor. SAR and QSAR in Environmental Research 2008, 19 (7-8) , 697-733. https://doi.org/10.1080/10629360802550606
- Lior Rokach. An evolutionary algorithm for constructing a decision forest: Combining the classification of disjoints decision trees. International Journal of Intelligent Systems 2008, 23 (4) , 455-482. https://doi.org/10.1002/int.20277
- Suzanne Smit, Huub C.J. Hoefsloot, Age K. Smilde. Statistical data processing in clinical proteomics. Journal of Chromatography B 2008, 866 (1-2) , 77-88. https://doi.org/10.1016/j.jchromb.2007.10.042
- Igor V. Tetko. Associative Neural Network. 2008,,, 180-197. https://doi.org/10.1007/978-1-60327-101-1_10
- Hojin Moon, Hongshik Ahn, Ralph L. Kodell, Songjoon Baek, Chien-Ju Lin, James J. Chen. Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Artificial Intelligence in Medicine 2007, 41 (3) , 197-207. https://doi.org/10.1016/j.artmed.2007.07.003
- Quan Liao, Jianhua Yao, Shengang Yuan. Prediction of mutagenic toxicity by combination of Recursive Partitioning and Support Vector Machines. Molecular Diversity 2007, 11 (2) , 59-72. https://doi.org/10.1007/s11030-007-9057-5
- Da Chen, Wensheng Cai, Xueguang Shao. Removing uncertain variables based on ensemble partial least squares. Analytica Chimica Acta 2007, 598 (1) , 19-26. https://doi.org/10.1016/j.aca.2007.07.023
- Edwin J. Matthews, Naomi L. Kruhlak, R. Daniel Benz, Julian Ivanov, Gilles Klopman, Joseph F. Contrera. A comprehensive model for reproductive and developmental toxicity hazard identification: II. Construction of QSAR models to predict activities of untested chemicals. Regulatory Toxicology and Pharmacology 2007, 47 (2) , 136-155. https://doi.org/10.1016/j.yrtph.2006.10.001
- H. Van de Waterbeemd. In Silico Models to Predict Oral Absorption. 2007,,, 669-697. https://doi.org/10.1016/B0-08-045044-X/00145-0
- X. Z. Wang, F. V. Buontempo, A. Young, D. Osborn. Induction of decision trees using genetic programming for modelling ecotoxicity data: adaptive discretization of real-valued endpoints. SAR and QSAR in Environmental Research 2006, 17 (5) , 451-471. https://doi.org/10.1080/10629360600933723
- Chris Williams. Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Molecular Diversity 2006, 10 (3) , 311-332. https://doi.org/10.1007/s11030-006-9039-z
- Igor V. Tetko, Pierre Bruneau, Hans-Werner Mewes, Douglas C. Rohrer, Gennadiy I. Poda. Can we estimate the accuracy of ADME–Tox predictions?. Drug Discovery Today 2006, 11 (15-16) , 700-707. https://doi.org/10.1016/j.drudis.2006.06.013
- J. Devillers, N. Marchand-Geneste, A. Carpy, J. M. Porcher. SAR and QSAR modeling of endocrine disruptors. SAR and QSAR in Environmental Research 2006, 17 (4) , 393-412. https://doi.org/10.1080/10629360600884397
- Zhenqiang Su, Weida Tong, Leming Shi, Xueguang Shao, Wensheng Cai. A Partial Least Squares‐Based Consensus Regression Method for the Analysis of Near‐Infrared Complex Spectral Data of Plant Samples. Analytical Letters 2006, 39 (9) , 2073-2083. https://doi.org/10.1080/00032710600724088
- Volker Schnecke, Jonas Boström. Computational chemistry-driven decision making in lead generation. Drug Discovery Today 2006, 11 (1-2) , 43-50. https://doi.org/10.1016/S1359-6446(05)03703-7
- Brian B. Goldman, W. Patrick Walters. Chapter 8 Machine Learning in Computational Chemistry. 2006,,, 127-140. https://doi.org/10.1016/S1574-1400(06)02008-1
- Arja Asikainen, Mikko Kolehmainen, Juhani Ruuskanen, Kari Tuppurainen. Structure-based classification of active and inactive estrogenic compounds by decision tree, LVQ and kNN methods. Chemosphere 2006, 62 (4) , 658-673. https://doi.org/10.1016/j.chemosphere.2005.04.115
- Weida Tong, Hong Fang, Qian Xie, Huixiao Hong, Leming Shi, Roger Perkins, Uwe Scherf, Federico Goodsaid, Felix Frueh. Gaining Confidence on Molecular Classification through Consensus Modeling and Validation. Toxicology Mechanisms and Methods 2006, 16 (2-3) , 59-68. https://doi.org/10.1080/15376520600558259
- Qing-hua Hu, Ming-yang Wang, Da-ren Yu. Construct Rough Decision Forests Based on Sequentially Data Reduction. 2006,,, 2284-2289. https://doi.org/10.1109/ICMLC.2006.258674
- H. Hong, W. Tong, Q. Xie, H. Fang, R. Perkins. An in silico ensemble method for lead discovery: decision forest. SAR and QSAR in Environmental Research 2005, 16 (4) , 339-347. https://doi.org/10.1080/10659360500203022
- Qian Xie, Luke D Ratnasinghe, Huixiao Hong, Roger Perkins, Ze-Zhong Tang, Nan Hu, Philip R Taylor, Weida Tong. Decision Forest Analysis of 61 Single Nucleotide Polymorphisms in a Case-Control Study of Esophageal Cancer; a novel method. BMC Bioinformatics 2005, 6 (S2) https://doi.org/10.1186/1471-2105-6-S2-S4
- Tatiana I. Netzeva, Andrew P. Worth, Tom Aldenberg, Romualdo Benigni, Mark T.D. Cronin, Paola Gramatica, Joanna S. Jaworska, Scott Kahn, Gilles Klopman, Carol A. Marchant, Glenn Myatt, Nina Nikolova-Jeliazkova, Grace Y. Patlewicz, Roger Perkins, David W. Roberts, Terry W. Schultz, David T. Stanton, Johannes J.M. van de Sandt, Weida Tong, Gilman Veith, Chihae Yang. Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships. Alternatives to Laboratory Animals 2005, 33 (2) , 155-173. https://doi.org/10.1177/026119290503300209
- Qing-Hua Hu, Da-Ren Yu, Ming-Yang Wang. Constructing Rough Decision Forests. 2005,,, 147-156. https://doi.org/10.1007/11548706_16
- Weida Tong, Qian Xie, Huixiao Hong, Hong Fang, Leming Shi, Roger Perkins, Emanuel F. Petricoin. Using Decision Forest to Classify Prostate Cancer Samples on the Basis of SELDI-TOF MS Data: Assessing Chance Correlation and Prediction Confidence. Environmental Health Perspectives 2004, 112 (16) , 1622-1627. https://doi.org/10.1289/txg.7109
- Leming Shi, Weida Tong, Federico Goodsaid, Felix W Frueh, Hong Fang, Tao Han, James C Fuscoe, Daniel A Casciano. QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies. Expert Review of Molecular Diagnostics 2004, 4 (6) , 761-777. https://doi.org/10.1586/14737159.4.6.761
- Huixiao Hong, Weida Tong, Roger Perkins, Hong Fang, Qian Xie, Leming Shi. Multiclass Decision Forest—A Novel Pattern Recognition Method for Multiclass Classification in Microarray Data Analysis. DNA and Cell Biology 2004, 23 (10) , 685-694. https://doi.org/10.1089/dna.2004.23.685
- Weida Tong, Qian Xie, Huixiao Hong, Leming Shi, Hong Fang, Roger Perkins. Assessment of Prediction Confidence and Domain Extrapolation of Two Structure-Activity Relationship Models for Predicting Estrogen Receptor Binding Activity. Environmental Health Perspectives 2004, 112 (12) , 1249-1254. https://doi.org/10.1289/ehp.7125
- Daniel C Weaver. Applying data mining techniques to library design, lead generation and lead optimization. Current Opinion in Chemical Biology 2004, 8 (3) , 264-270. https://doi.org/10.1016/j.cbpa.2004.04.005
- A.H. Asikainen, J. Ruuskanen, K.A. Tuppurainen. Performance of (consensus) kNN QSAR for predicting estrogenic activity in a large diverse set of organic compounds. SAR and QSAR in Environmental Research 2004, 15 (1) , 19-32. https://doi.org/10.1080/1062936032000169642
- Vladimir Svetnik, Andy Liaw, Christopher Tong, Ting Wang. Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules. 2004,,, 334-343. https://doi.org/10.1007/978-3-540-25966-4_33
- Weida Tong, Xiaoxi Cao, Stephen Harris, Hongmei Sun, Hong Fang, James Fuscoe, Angela Harris, Huixiao Hong, Qian Xie, Roger Perkins, Leming Shi, Dan Casciano. ArrayTrack--supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research.. Environmental Health Perspectives 2003, 111 (15) , 1819-1826. https://doi.org/10.1289/ehp.6497



