Correlation-Based Framework for Extraction of Insights from Quantum Chemistry Databases: Applications for NanoclustersClick to copy article linkArticle link copied!
- Johnatan MuceliniJohnatan MuceliniSão Carlos Institute of Chemistry, University of São Paulo, P. O. Box 780, 13560-970 São Carlos, SP, BrazilMore by Johnatan Mucelini
- Marcos G. QuilesMarcos G. QuilesDepartment of Science and Technology, Federal University of São Paulo, 12247-014 São Jose dos Campos, SP, BrazilMore by Marcos G. Quiles
- Ronaldo C. PratiRonaldo C. PratiCenter for Mathematics, Computation and Cognition, Federal University of ABC, 09210-580 Santo André, SP, BrazilMore by Ronaldo C. Prati
- Juarez L. F. Da Silva*Juarez L. F. Da Silva*Email: [email protected]São Carlos Institute of Chemistry, University of São Paulo, P. O. Box 780, 13560-970 São Carlos, SP, BrazilMore by Juarez L. F. Da Silva
Abstract

The amount of quantum chemistry (QC) data is increasing year by year due to the continuous increase of computational power and development of new algorithms. However, in most cases, our atom-level knowledge of molecular systems has been obtained by manual data analyses based on selected descriptors. In this work, we introduce a data mining framework to accelerate the extraction of insights from QC datasets, which starts with a featurization process that converts atomic features into molecular properties (AtoMF). Then, it employs correlation coefficients (Pearson, Spearman, and Kendall) to investigate the AtoMF features relationship with a target property. We applied our framework to investigate three nanocluster systems, namely, PtnTM55–n, CenZr15–nO30, and (CHn + mH)/TM13. We found several interesting and consistent insights using Spearman and Kendall correlation coefficients, indicating that they are suitable for our approach; however, our results indicate that the Pearson coefficient is very sensitive to outliers and should not be used. Moreover, we highlight problems that can occur during this analysis and discuss how to handle them. Finally, we make available a new Python package that implements the proposed QC data mining framework, which can be used as is or modified to include new features.
Cited By
Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by ACS Publications if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles.
This article is cited by 3 publications.
- Fedor V. Ryzhkov, Yuliya E. Ryzhkova, Michail N. Elinson. Python in Chemistry: Physicochemical Tools. Processes 2023, 11
(10)
, 2897. https://doi.org/10.3390/pr11102897
- Lucas Rodrigues da Silva, Felipe Orlando Morais, João Paulo A. de Mendonça, Breno R.L. Galvão, Juarez L.F. Da Silva. Theoretical investigation of the stability of A55-B nanoalloys (A, B = Al, Cu, Zn, Ag). Computational Materials Science 2022, 215 , 111805. https://doi.org/10.1016/j.commatsci.2022.111805
- Richard Liam Marchese Robinson, Haralambos Sarimveis, Philip Doganis, Xiaodong Jia, Marianna Kotzabasaki, Christiana Gousiadou, Stacey Lynn Harper, Terry Wilkins. Identifying diverse metal oxide nanomaterials with lethal effects on embryonic zebrafish using machine learning. Beilstein Journal of Nanotechnology 2021, 12 , 1297-1325. https://doi.org/10.3762/bjnano.12.97
Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.
Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.
The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.