Leveraging Machine Learning for Size and Shape Analysis of Nanoparticles: A Shortcut to Electron Microscopy

Characterizing nanoparticles (NPs) is crucial in nanoscience due to the direct influence of their physiochemical properties on their behavior. Various experimental techniques exist to analyze the size and shape of NPs, each with advantages, limitations, proneness to uncertainty, and resource requirements. One of them is electron microscopy (EM), often considered the gold standard, which offers visualization of the primary particles. However, despite its advantages, EM can be expensive, less accessible, and difficult to apply during dynamic processes. Therefore, using EM for specific experimental conditions, such as observing dynamic processes or visualizing low-contrast particles, is challenging. This study showcases the potential of machine learning in deriving EM parameters by utilizing cost-effective and dynamic techniques such as dynamic light scattering (DLS) and UV–vis spectroscopy. Our developed model successfully predicts the size and shape parameters of gold NPs based on DLS and UV–vis results. Furthermore, we demonstrate the practicality of our model in situations in which conducting EM measurements presents a challenge: Tracking in situ the synthesis of 100 nm gold NPs.


Machine Learning Model
We trained a gradient-boosted decision trees as implemented in XGBoost, where we optimized the hyperparameters over the grid in Table S1 using the tree of Parzen estimator's strategy.We optimized the hyperparameters in an inner train/test split.We used the jackknife+ strategy 2 technique to obtain prediction intervals, as implemented in MAPIE, 3 considering absolute conformity scores on a validation set.For train/test splits, we stratified on the target column, which we binned in four approximately equally populated bins.To estimate robustness, we performed the workflow ten times with different random seeds.The maximum depth of a tree is employed to regulate over-fitting because higher depth enables the model to learn relationships that are exceedingly tailored to a particular sample.

100 nm Gold Particle Synthesis
First, 20 nm AuNPs were synthesised in a three-neck round-bottomed flask, by adding 1 mL of HAuCl4 (25 mM) to 150 mL of sodium citrate (2.2 mM) at 100 °C.This reaction was run for 15 min until the solution turned a red-wine colour, which produced gold seeds of 10 nm and 3 • 10 -12 NP mL -1 .The solution was cooled down to 90 °C and, sequentially, 1 mL of sodium citrate (60 mM) and 1 mL of a HAuCl4 solution (25 mM) were injected with a delay time of approximately 2 min, followed by a 30 min reaction period.This step was repeated one more time.The dispersion was diluted by extracting 55 ml of dispersion and adding 53 ml of MilliQ water and 2 ml of sodium citrate (60 mM) and was used for seeds to synthesise 100 nm particles.To the diluted dispersion, 1 ml of HAuCl4 (25 mM) was injected, followed by a 30 min waiting period.This process was repeated until the particles reached a size of 100 nm. 4

Model Extrapolation
Additionally, we can demonstrate that our model could even forecast the min.Feret diameter of a particle category that it was not initially trained on-spongosomes.Spongosomes are a type of lipid-polymer hybrid nanoparticles that necessitate staining or cryo-EM for visualization.Nonetheless, we managed to extrapolate our model, and it correctly predicted the min.Feret diameter for these particles.Figure S8 a) and b) show the spectra and in Table S3 the parameters that were obtained via DLS and UV-Vis that were used to predict the TEM outcome, in

Figure S3 .
Figure S3.Spearman's rank correlation matrix of the parameters measured via DLS, UV-Vis, and TEM.A Spearman correlation of 1 signifies a perfect positive correlation, 0 implies no correlation, and -1 suggests an inverse correlation between two parameters.

Figure S4 .
Figure S4.Summary of the SHAP analysis to determine the feature importance for the model predicting the TEM min.Feret diameter based on DLS and UV-Vis.Each point in the plot indicates a data point, and the grey vertical line indicates the average predicted Feret diameter.A negative SHAP value (shown on the abscissa) indicates a Feret diameter predicted being low with respect to the baseline, and a positive SHAP value a predicted Feret diameter being higher.The features are ranked according to their impact on the prediction, and the most important feature has the highest position.

Figure S5 .
Figure S5.Representative parity plots of a) TEM average size, b) TEM size standard deviation, c) the surface area, d) particle perimeter, e) aspect ratio and f) sphericity.The black line indicates perfect overlay between predicted value (y-axis) and measured value (x-axis).

Figure S8 .
Figure S8.Panel a) depicts a TEM micrograph of the spongosomes after staining, while Panel b) illustrates the parameters obtained from DLS and UV-Vis analysis, which were utilized to predict the TEM outcome.

Table S1 .
Summary of the measured parameters obtained from DLS, UV-Vis, and TEM for each particle batch

Table S2 .
Hyperparameter ranges with short explanations 1 considered for the XGBoost models.

Table S3 .
Input parameters obtained by DLS and UV-Vis analysis to predict the TEM outcome during a 100 nm AuNP synthesis.