ACS Publications. Most Trusted. Most Cited. Most Read
Correlation-Based Framework for Extraction of Insights from Quantum Chemistry Databases: Applications for Nanoclusters
My Activity

Figure 1Loading Img
    Chemical Information

    Correlation-Based Framework for Extraction of Insights from Quantum Chemistry Databases: Applications for Nanoclusters
    Click to copy article linkArticle link copied!

    Other Access OptionsSupporting Information (1)

    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2021, 61, 3, 1125–1135
    Click to copy citationCitation copied!
    https://doi.org/10.1021/acs.jcim.0c01267
    Published March 8, 2021
    Copyright © 2021 American Chemical Society

    Abstract

    Click to copy section linkSection link copied!
    Abstract Image

    The amount of quantum chemistry (QC) data is increasing year by year due to the continuous increase of computational power and development of new algorithms. However, in most cases, our atom-level knowledge of molecular systems has been obtained by manual data analyses based on selected descriptors. In this work, we introduce a data mining framework to accelerate the extraction of insights from QC datasets, which starts with a featurization process that converts atomic features into molecular properties (AtoMF). Then, it employs correlation coefficients (Pearson, Spearman, and Kendall) to investigate the AtoMF features relationship with a target property. We applied our framework to investigate three nanocluster systems, namely, PtnTM55–n, CenZr15–nO30, and (CHn + mH)/TM13. We found several interesting and consistent insights using Spearman and Kendall correlation coefficients, indicating that they are suitable for our approach; however, our results indicate that the Pearson coefficient is very sensitive to outliers and should not be used. Moreover, we highlight problems that can occur during this analysis and discuss how to handle them. Finally, we make available a new Python package that implements the proposed QC data mining framework, which can be used as is or modified to include new features.

    Copyright © 2021 American Chemical Society

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.

    Recommended

    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. Add or change your institution or let them know you’d like them to include access.

    Supporting Information

    Click to copy section linkSection link copied!

    The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.0c01267.

    • QC datasets, description of the features, complementary analysis of the results, and a description of the Quandarium package (PDF)

    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

    Cited By

    Click to copy section linkSection link copied!
    Citation Statements
    Explore this article's citation statements on scite.ai

    This article is cited by 3 publications.

    1. Fedor V. Ryzhkov, Yuliya E. Ryzhkova, Michail N. Elinson. Python in Chemistry: Physicochemical Tools. Processes 2023, 11 (10) , 2897. https://doi.org/10.3390/pr11102897
    2. Lucas Rodrigues da Silva, Felipe Orlando Morais, João Paulo A. de Mendonça, Breno R.L. Galvão, Juarez L.F. Da Silva. Theoretical investigation of the stability of A55-B nanoalloys (A, B = Al, Cu, Zn, Ag). Computational Materials Science 2022, 215 , 111805. https://doi.org/10.1016/j.commatsci.2022.111805
    3. Richard Liam Marchese Robinson, Haralambos Sarimveis, Philip Doganis, Xiaodong Jia, Marianna Kotzabasaki, Christiana Gousiadou, Stacey Lynn Harper, Terry Wilkins. Identifying diverse metal oxide nanomaterials with lethal effects on embryonic zebrafish using machine learning. Beilstein Journal of Nanotechnology 2021, 12 , 1297-1325. https://doi.org/10.3762/bjnano.12.97

    Journal of Chemical Information and Modeling

    Cite this: J. Chem. Inf. Model. 2021, 61, 3, 1125–1135
    Click to copy citationCitation copied!
    https://doi.org/10.1021/acs.jcim.0c01267
    Published March 8, 2021
    Copyright © 2021 American Chemical Society

    Article Views

    516

    Altmetric

    -

    Citations

    Learn about these metrics

    Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

    Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

    The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.