Database Mining Using Soft Computing Techniques. An Integrated Neural Network−Fuzzy Logic−Genetic Algorithm Approach

Thomas R. Cundari** and Marco Russo
Department of Chemistry, Computational Research on Materials Institute (CROMIUM), The University of Memphis, Memphis, Tennessee 38152-6060, and Department of PhysicsCorpo AStanza 18, University of Messina, C. da Papardo, Salita Sperone 31, Villaggio Sant'Agata 98166 (ME), National Institute of Nuclear Physics (INFN) Section of Catania, Corso Italia 57, 95127 (CT), Italy
J. Chem. Inf. Comput. Sci., 2001, 41 (2), pp 281–287
DOI: 10.1021/ci0000068
Publication Date (Web): January 26, 2001
Copyright © 2001 American Chemical Society
*

 To whom correspondence should be addressed.

,

 The University of Memphis.

,

 University of Messina.

Abstract

Two different soft computing (SC) techniques (a competitive learning neural network and an integrated neural network−fuzzy logic−genetic algorithm approach) are employed in the analysis of a database subset obtained from the Cambridge Structural Database. The chemical problem chosen for study is relevant to the relationship between various metric parameters in transition metal imido (LnMdNZ, Z = carbon-based substituent) complexes and the chemical consequences of such relationships. The SC techniques confirmed and quantified the suspected relationship between the metal−nitrogen bond length and the metal−nitrogen−substituent bond angle for transition metal imidos:  increased metal−nitrogen−carbon angles correlate with shortened metal−nitrogen distances. The mining effort also yielded an unexpected correlation between the NC distance and the MNC angleshorter NC correlate with larger MNC. A fuzzy inference system is used to construct an MNred−NC−MNC hypersurface. This hypersurface suggests a complicated interdependence among NC, MNred, and the angle subtended by these two bonds. Also, major portions of the hypersurface are very flat, in regions where MNC is approaching linearity. The relationships are also seen to be influenced by whether the imido substituent is an alkyl or aryl group. Computationally, the present results are of particular interest in two respects. First, SC classification was able to isolate an “outlier” cluster. Identification of outliers is important as they may correspond to unreported experimental errors in the database or novel chemical entities, both of which warrant further investigation. Second, the SC database mining not only confirmed and quantified a suspected relationship (MNred versus MNC) within the data but also yielded a trend that was not suspected (NC versus MNC).

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

History

  • Published In Issue March 26, 2001
  • Received January 30, 2000

Recommend & Share

Related Content

Other ACS content by these authors: