Symbolic, Neural, and Bayesian Machine Learning Models for Predicting Carcinogenicity of Chemical Compounds

Dennis Bahler* and Brian Stone
Artificial Intelligence Laboratory, Department of Computer Science, North Carolina State University, Raleigh, North Carolina 27695-8206
Carol Wellington
Department of Mathematics and Computer Science, Shippensburg University, Shippensburg, Pennsylvania 17257-2299
Douglas W. Bristol
National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709
J. Chem. Inf. Comput. Sci., 2000, 40 (4), pp 906–914
DOI: 10.1021/ci990116i
Publication Date (Web): July 24, 2000
Copyright © 2000 American Chemical Society
*

 To whom correspondence should be addressed. Telephone:  919-515-3369. E-mail:  bahler@ncsu.edu.

Abstract

Experimental programs have been underway for several years to determine the environmental effects of chemical compounds, mixtures, and the like. Among these programs is the National Toxicology Program (NTP) on rodent carcinogenicity. Because these experiments are costly and time-consuming, the rate at which test articles (i.e., chemicals) can be tested is limited. The ability to predict the outcome of the analysis at various points in the process would facilitate informed decisions about the allocation of testing resources. To assist human experts in organizing an empirical testing regime, and to try to shed light on mechanisms of toxicity, we constructed toxicity models using various machine learning and data mining methods, both existing and those of our own devising. These models took the form of decision trees, rule sets, neural networks, rules extracted from trained neural networks, and Bayesian classifiers. As a training set, we used recent results from rodent carcinogenicity bioassays conducted by the NTP on 226 test articles. We performed 10-way cross-validation on each of our models to approximate their expected error rates on unseen data. The data set consists of physical−chemical parameters of test articles, alerting chemical substructures, salmonella mutagenicity assay results, subchronic histopathology data, and information on route, strain, and sex/species for 744 individual experiments. These results contribute to the ongoing process of evaluating and interpreting the data collected from chemical toxicity studies.

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

History

  • Published In Issue July 24, 2000
  • Received September 9, 1999

Recommend & Share

Related Content

Other ACS content by these authors: