Correlation-Based Framework for Extraction of Insights from Quantum Chemistry Databases: Applications for Nanoclusters
    Correlation-Based Framework for Extraction of Insights from Quantum Chemistry Databases: Applications for Nanoclusters
    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2021, 61, 3, 1125–1135
    Published March 8, 2021
    The amount of quantum chemistry (QC) data is increasing year by year due to the continuous increase of computational power and development of new algorithms. However, in most cases, our atom-level knowledge of molecular systems has been obtained by manual data analyses based on selected descriptors. In this work, we introduce a data mining framework to accelerate the extraction of insights from QC datasets, which starts with a featurization process that converts atomic features into molecular properties (AtoMF). Then, it employs correlation coefficients (Pearson, Spearman, and Kendall) to investigate the AtoMF features relationship with a target property. We applied our framework to investigate three nanocluster systems, namely, PtnTM55–n, CenZr15–nO30, and (CHn + mH)/TM13. We found several interesting and consistent insights using Spearman and Kendall correlation coefficients, indicating that they are suitable for our approach; however, our results indicate that the Pearson coefficient is very sensitive to outliers and should not be used. Moreover, we highlight problems that can occur during this analysis and discuss how to handle them. Finally, we make available a new Python package that implements the proposed QC data mining framework, which can be used as is or modified to include new features.

    Supporting Information

    The Supporting Information is available free of charge at

    • QC datasets, description of the features, complementary analysis of the results, and a description of the Quandarium package (PDF)

    1. Fedor V. Ryzhkov, Yuliya E. Ryzhkova, Michail N. Elinson. Python in Chemistry: Physicochemical Tools. Processes 2023, 11 (10) , 2897.
    2. Lucas Rodrigues da Silva, Felipe Orlando Morais, João Paulo A. de Mendonça, Breno R.L. Galvão, Juarez L.F. Da Silva. Theoretical investigation of the stability of A55-B nanoalloys (A, B = Al, Cu, Zn, Ag). Computational Materials Science 2022, 215 , 111805.
    3. Richard Liam Marchese Robinson, Haralambos Sarimveis, Philip Doganis, Xiaodong Jia, Marianna Kotzabasaki, Christiana Gousiadou, Stacey Lynn Harper, Terry Wilkins. Identifying diverse metal oxide nanomaterials with lethal effects on embryonic zebrafish using machine learning. Beilstein Journal of Nanotechnology 2021, 12 , 1297-1325.

