Random Forest Prediction of Mutagenicity from Empirical Physicochemical Descriptors

Qing-You Zhang and João Aires-de-Sousa*
CQFB and REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
J. Chem. Inf. Model., 2007, 47 (1), pp 1–8
DOI: 10.1021/ci050520j
Publication Date (Web): December 5, 2006
Copyright © 2007 American Chemical Society

Abstract

Abstract Image

Fast-to-calculate empirical physicochemical descriptors were investigated for their ability to predict mutagenicity (positive or negative Ames test) from the molecular structure. Fast methods are highly desired for the screening of large libraries of compounds. Global molecular descriptors and MOLMAP descriptors of bond properties were used to train random forests. Error percentages as low as 15% and 16% were achieved for an external test set with 472 compounds and for the training set with 4083 structures, respectively. High sensitivity and specificity were observed. Random forests were able to associate meaningful probabilities to the predictions and to explain the predictions in terms of similarities between query structures and compounds in the training set.

Citing Articles

View all 23 citing articles

Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.

This article has been cited by 7 ACS Journal articles (5 most recent appear below).

  • Cover Image

    Self Organizing Maps for Analysis of Polycyclic Aromatic Hydrocarbons 3-Way Data from Spilled Oils

    R. Fernández-Varela, M. P. Gómez-Carracedo, D. Ballabio, J. M. Andrade, V. Consonni and R. Todeschini
    Analytical Chemistry2010 82 (10), 4264-4271
    • Self Organizing Maps for Analysis of Polycyclic Aromatic Hydrocarbons 3-Way Data from Spilled Oils

      R. Fernández-Varela, M. P. Gómez-Carracedo, D. Ballabio, J. M. Andrade, V. Consonni and R. Todeschini
      Analytical Chemistry2010 82 (10), 4264-4271

      In this paper, the application of a new method based on self-organizing maps (SOM; termed MOLMAP, molecular map of atom-level properties) to handle 3-way data generated in a monitoring environmental study is presented. The study comprised 50 polycyclic ...

  • Cover Image

    The Ensemble Bridge Algorithm: A New Modeling Tool for Drug Discovery Problems

    Mark Culp, Kjell Johnson, George Michailidis
    Journal of Chemical Information and Modeling2010 50 (2), 309-316
    • The Ensemble Bridge Algorithm: A New Modeling Tool for Drug Discovery Problems

      Mark Culp, Kjell Johnson, George Michailidis
      Journal of Chemical Information and Modeling2010 50 (2), 309-316

      Ensemble algorithms have been historically categorized into two separate paradigms, boosting and random forests, which differ significantly in the way each ensemble is constructed. Boosting algorithms represent one extreme, where an iterative greedy ...

  • Cover Image

    Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation

    Hanna Geppert, Martin Vogt and Jürgen Bajorath
    Journal of Chemical Information and Modeling2010 50 (2), 205-216
    • Current Trends in Ligand-Based Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation

      Hanna Geppert, Martin Vogt and Jürgen Bajorath
      Journal of Chemical Information and Modeling2010 50 (2), 205-216
  • Cover Image

    Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity

    Pavel G. Polishchuk, Eugene N. Muratov, Anatoly G. Artemenko, Oleg G. Kolumbin, Nail N. Muratov and Victor E. Kuz’min
    Journal of Chemical Information and Modeling2009 49 (11), 2481-2488
    • Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity

      Pavel G. Polishchuk, Eugene N. Muratov, Anatoly G. Artemenko, Oleg G. Kolumbin, Nail N. Muratov and Victor E. Kuz’min
      Journal of Chemical Information and Modeling2009 49 (11), 2481-2488

      This work is devoted to the application of the random forest approach to QSAR analysis of aquatic toxicity of chemical compounds tested on Tetrahymena pyriformis. The simplex representation of the molecular structure approach implemented in HiT QSAR ...

  • Cover Image

    Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening

    Kirk Simmons, John Kinney, Aaron Owens, Daniel A. Kleier, Karen Bloch, Dave Argentar, Alicia Walsh and Ganesh Vaidyanathan
    Journal of Chemical Information and Modeling2008 48 (11), 2196-2206
    • Practical Outcomes of Applying Ensemble Machine Learning Classifiers to High-Throughput Screening (HTS) Data Analysis and Screening

      Kirk Simmons, John Kinney, Aaron Owens, Daniel A. Kleier, Karen Bloch, Dave Argentar, Alicia Walsh and Ganesh Vaidyanathan
      Journal of Chemical Information and Modeling2008 48 (11), 2196-2206

      Over the years numerous papers have presented the effectiveness of various machine learning methods in analyzing drug discovery biological screening data. The predictive performance of models developed using these methods has traditionally been evaluated ...

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

Explore by:


History

  • Published In Issue January 22, 2007
  • Received November 28, 2005

Recommend & Share

  • Share on ACS NetworkACS Network
  • Add to FacebookFacebook
  • Tweet ThisTweet This
  • Add to CiteULikeCiteULike
  • Add to NewsvineNewsvine
  • Digg ThisDigg This
  • Add to DeliciousDelicious

Related Content

Other ACS content by these authors: