Development of Quantitative Structure−Activity Relationship and Classification Models for a Set of Carbonic Anhydrase Inhibitors

Brian E. Mattioni and Peter C. Jurs*
Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802
J. Chem. Inf. Comput. Sci., 2002, 42 (1), pp 94–102
DOI: 10.1021/ci0100696
Publication Date (Web): November 27, 2001
Copyright © 2002 American Chemical Society
*

 Corresponding author phone:  (814)865-3739; fax:  (814)865-3314; e-mail:  pcj@psu.edu.

Abstract

Mathematical models are developed to find quantitative structure−activity relationships that correlate chemical structure and inhibition toward three carbonic anhydrase (CA) isozymes:  CA I, II, and IV. Numerical descriptors are generated to encode important topological, geometric, and electronic features of molecular structure. After descriptor generation, multiple linear regression, and computational neural network (CNN) analyses are performed on various descriptor subsets to find superior models for prediction. Committees of five CNNs were utilized to average final predicted values for the 142-compound data set. For inhibitors of CA I, an 8−5−1 CNN committee produced a training set rms error of 0.105 log Ki (r2 = 0.994) and prediction set rms error of 0.208 log Ki (r2 = 0.980). Training and prediction set rms errors of 0.140 log Ki (r2 = 0.992) and 0.231 log Ki (r2 = 0.971), respectively, were produced by a 9−5−1 CNN committee for inhibitors of CA II. For prediction of CA IV inhibitors, an 8−5−1 CNN committee produced training and prediction set rms errors of 0.147 log Ki (r2 = 0.992) and 0.211 log Ki (r2 = 0.991), respectively. In addition, classification models were built using k-nearest neighbor (kNN) analysis to solve two- and three-class problems for inhibitors of CA IV. A three-descriptor classification model proved superior in labeling compounds as active or inactive inhibitors for the two-class problem. Training and prediction set percent classification rates of 100% and 87.1%, respectively, were obtained. For the three-class (active/moderate/inactive) problem, a five-descriptor model was deemed optimal producing a training set percent classification rate of 98.8% and prediction set rate of 79.0%.

Tools

History

  • Published In Issue January 28, 2002
  • Received July 21, 2001

Recommend & Share

Related Content

Other ACS content by these authors: