General Melting Point Prediction Based on a Diverse Compound Data Set and Artificial Neural Networks

M. Karthikeyan
Information Division, National Chemical Laboratory, Pune - 411 008, India
Robert C. Glen and Andreas Bender*
Unilever Centre for Molecular Informatics, Chemistry Department, University of Cambridge, Cambridge CB2 1EW, United Kingdom
J. Chem. Inf. Model., 2005, 45 (3), pp 581–590
DOI: 10.1021/ci0500132
Publication Date (Web): May 3, 2005
Copyright © 2005 American Chemical Society
*

 Corresponding author phone:  +44 (1223) 763 073; fax:  +44 (1223) 763 076; e-mail:  andreas.bender@cantab.net.

Abstract

We report the development of a robust and general model for the prediction of melting points. It is based on a diverse data set of 4173 compounds and employs a large number of 2D and 3D descriptors to capture molecular physicochemical and other graph-based properties. Dimensionality reduction is performed by principal component analysis, while a fully connected feed-forward back-propagation artificial neural network is employed for model generation. The melting point is a fundamental physicochemical property of a molecule that is controlled by both single-molecule properties and intermolecular interactions due to packing in the solid state. Thus, it is difficult to predict, and previously only melting point models for clearly defined and smaller compound sets have been developed. Here we derive the first general model that covers a comparatively large and relevant part of organic chemical space. The final model is based on 2D descriptors, which are found to contain more relevant information than the 3D descriptors calculated. Internal random validation of the model achieves a correlation coefficient of R2 = 0.661 with an average absolute error of 37.6 °C. The model is internally consistent with a correlation coefficient of the test set of Q2 = 0.658 (average absolute error 38.2 °C) and a correlation coefficient of the internal validation set of Q2 = 0.645 (average absolute error 39.8 °C). Additional validation was performed on an external drug data set consisting of 277 compounds. On this external data set a correlation coefficient of Q2 = 0.662 (average absolute error 32.6 °C) was achieved, showing ability of the model to generalize. Compared to an earlier model for the prediction of melting points of druglike compounds our model exhibits slightly improved performance, despite the much larger chemical space covered. The remaining model error is due to molecular properties that are not captured using single-molecule based descriptors, namely both inter- and intramolecular interactions and crystal packing, for which examples of and reasons for outliers are given.

Available Supporting Information for This Article

Electronic Supporting Information files are available without a subscription to ACS Web Editions. All files are copyrighted by the American Chemical Society. Files may be downloaded for personal use; users are not permitted to reproduce, republish, redistribute, or resell any Supporting Information, either in whole or in part, in either machine-readable form or any other form. For permission to reproduce this material, contact the ACS Copyright Office by e-mail at copyright@acs.org or by fax at 202-776-8112.

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

Explore by:


History

  • Published In Issue May 23, 2005
  • Received January 12, 2005

Recommend & Share

Related Content

Other ACS content by these authors: