ACS Publications. Most Trusted. Most Cited. Most Read
My Activity
CONTENT TYPES

Machine Learning Methods for Small Data Challenges in Molecular Science

  • Bozheng Dou
    Bozheng Dou
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    More by Bozheng Dou
  • Zailiang Zhu
    Zailiang Zhu
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    More by Zailiang Zhu
  • Ekaterina Merkurjev
    Ekaterina Merkurjev
    Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
  • Lu Ke
    Lu Ke
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    More by Lu Ke
  • Long Chen
    Long Chen
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    More by Long Chen
  • Jian Jiang*
    Jian Jiang
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
    *Email: [email protected]
    More by Jian Jiang
  • Yueying Zhu
    Yueying Zhu
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    More by Yueying Zhu
  • Jie Liu
    Jie Liu
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
    More by Jie Liu
  • Bengong Zhang
    Bengong Zhang
    Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
  • , and 
  • Guo-Wei Wei*
    Guo-Wei Wei
    Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
    Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
    Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
    *Email: [email protected]
    More by Guo-Wei Wei
Cite this: Chem. Rev. 2023, 123, 13, 8736–8780
Publication Date (Web):June 29, 2023
https://doi.org/10.1021/acs.chemrev.3c00189
Copyright © 2023 American Chemical Society

    Article Views

    10404

    Altmetric

    -

    Citations

    LEARN ABOUT THESE METRICS
    Other access options

    Abstract

    Abstract Image

    Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.

    Read this article

    To access this article, please review the available access options below.

    Get instant access

    Purchase Access

    Read this article for 48 hours. Check out below using your ACS ID or as a guest.

    Recommended

    Access through Your Institution

    You may have access to this article through your institution.

    Your institution does not have access to this content. You can change your affiliated institution below.

    Cited By

    This article is cited by 7 publications.

    1. Hosein Fooladi, Steffen Hirte, Johannes Kirchmair. Quantifying the Hardness of Bioactivity Prediction Tasks for Transfer Learning. Journal of Chemical Information and Modeling 2024, Article ASAP.
    2. Marian J. Menke, Yu-Fei Ao, Uwe T. Bornscheuer. Practical Machine Learning-Assisted Design Protocol for Protein Engineering: Transaminase Engineering for the Conversion of Bulky Substrates. ACS Catalysis 2024, 14 (9) , 6462-6469. https://doi.org/10.1021/acscatal.4c00987
    3. Liang Gao, Jiaping Lin, Liquan Wang, Lei Du. Machine Learning-Assisted Design of Advanced Polymeric Materials. Accounts of Materials Research 2024, Article ASAP.
    4. Lewis Bass, Luke H. Elder, Dan E. Folescu, Negin Forouzesh, Igor S. Tolokh, Anuj Karpatne, Alexey V. Onufriev. Improving the Accuracy of Physics-Based Hydration-Free Energy Predictions by Machine Learning the Remaining Error Relative to the Experiment. Journal of Chemical Theory and Computation 2024, 20 (1) , 396-410. https://doi.org/10.1021/acs.jctc.3c00981
    5. Lei Xing, Cong-Lin Zhao, Han-Zhang Mou, JianBin Pan, Bin Kang, Hong-Yuan Chen, Jing-Juan Xu. Next Generation of Mass Spectrometry Imaging: from Micrometer to Subcellular Resolution. Chemical & Biomedical Imaging 2023, 1 (8) , 670-682. https://doi.org/10.1021/cbmi.3c00061
    6. Song Xia, Eric Chen, Yingkai Zhang. Integrated Molecular Modeling and Machine Learning for Drug Design. Journal of Chemical Theory and Computation 2023, 19 (21) , 7478-7495. https://doi.org/10.1021/acs.jctc.3c00814
    7. Henrique C. Silva Junior, Heloisa N. S. Menezes, Glaucio B. Ferreira, Guilherme P. Guedes. Rapid and Accurate Prediction of the Axial Magnetic Anisotropy in Cobalt(II) Complexes Using a Machine-Learning Approach. Inorganic Chemistry 2023, 62 (37) , 14838-14842. https://doi.org/10.1021/acs.inorgchem.3c02569

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    Pair your accounts.

    Export articles to Mendeley

    Get article recommendations from ACS based on references in your Mendeley library.

    You’ve supercharged your research process with ACS and Mendeley!

    STEP 1:
    Click to create an ACS ID

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    Please note: If you switch to a different device, you may be asked to login again with only your ACS ID.

    MENDELEY PAIRING EXPIRED
    Your Mendeley pairing has expired. Please reconnect