Information Entropy as a Reliable Measure of Nanoparticle Dispersity

Nanoparticle size impacts properties vital to applications ranging from drug delivery to diagnostics and catalysis. As such, evaluating nanoparticle size dispersity is of fundamental importance. Conventional approaches, such as standard deviation, usually require the nanoparticle population to follow a known distribution and are ill-equipped to deal with highly poly- or heterodisperse populations. Herein, we propose the use of information entropy as an alternative and assumption-free method for describing nanoparticle size distributions. This measure works equally well for mono-, poly-, and heterodisperse populations and represents an unbiased route to evaluation and optimization of nanoparticle synthesis. We provide intuitive software tools for analysis and supply guidelines for interpretation with respect to known standards.

M any characteristics of nanoparticles, including their biodistribution as well as catalytic, optical, and electric properties, depend on their spatial dimensions. 1−5 Yet, accurate reporting on the size distribution remains a challenge, 6,7 largely due to the absence of a reliable, widely applicable method of representing the dispersity of the population. While a histogram contains a full description of the particle size distribution, 8 working with a scalar measure of dispersity is advantageous to determine correlations with synthetic conditions or particle performance. 9 The need for a reliable descriptor of dispersity is particularly profound when optimizing syntheses to produce monodisperse particles. 10 Several recent studies have focused on using statistical methods to optimize the synthesis of a variety of monodisperse nanoparticles; however, limited success was achieved in identifying the experimental variables that determine dispersity. 11−15 Researchers have largely relied upon standard deviation for evaluating dispersity, but this measurement is only valid when applied to a normal distribution and may provide an insufficient representation of the sample. 6,8,16−19 Standard deviation is often referenced against the mean size to produce the unit-less coefficient of variance (COV), which reflects the relative spread of the given distribution. Less commonly used metrics in nanoparticle science reporting include the percentile values, corresponding to the particle sizes that encompass 10, 50, and 90% of the total population (D 10 , D 50 , and D 90 respectively). 9,18,20−22 Using these percentile values, a representation of the dispersity can be calculated, (D 90 − D 10 )/D 50 , referred to as the span. 18,20 This is a more widely applicable measurement; it does not rely on a known distribution and can be employed even with polydisperse samples. Like the COV, the span is a relative measure, which relates the spread of the population to the mean or median value. These metrics focus on describing the broadness of the distribution. Dispersity represents the inhomogeneity of particle sizes observed in a population; while this typically correlates with the spread, it is not equivalent. For example, a population of nanoparticles with two distinct sizes will demonstrate a span and standard deviation dependent on the difference between the two sizes; the heterogeneity of sizes remains constant.
We propose the use of a modified version of the information entropy equation to accurately evaluate dispersity. Information entropy (H) was first proposed by Claude Shannon in 1948 to quantify the amount of information produced by a given process. 23 It is calculated by The probability of outcome i is denoted p i ; there are n possible outcomes. The entropy is often described as being analogous to the amount of information conveyed when the outcome of an event is observed. 24 For example, if the result of a process is absolutely certain, i.e., there is only one possible outcome (n = 1, p 1 = 1), no information is gained by observing the outcome as it is always the same and the entropy is found to be 0. As the uncertainty of the outcome increases, so does the entropy, and more information is revealed by the result.
This measure of entropy was soon adopted by a wide variety of areas of study outside of information theory including the study of species diversity, population genetics, molecular analysis, and finance. 25−28 The pertinence of information entropy to such diverse fields lies in its ability to encapsulate not only the number of subcategories but also the relative quantities observed. It is not restricted to known distributions and can be applied to any data set. These properties make information entropy an ideal candidate to measure dispersity in nanoparticle populations. Figure 1 illustrates an example of the entropy calculation. Here, 35 particles of 7 different possible sizes are sorted into appropriate bins (or intervals) of a histogram. The number of particles in each bin is divided by the total to determine the probability distribution.

■ METHODS
In order to implement information entropy as a reliable measure of nanoparticle dispersity, three properties are required: (1) A linear relationship between entropy and population dispersity would facilitate interpretation and aid implementation into statistical optimization methods. (2) The entropy needs to be independent of the mean particle size. (3) The data needs to be discretized. Hence, we propose the following modification to the entropy calculation to produce the nanoparticle entropy, E: The response of the information entropy, H, to increasing but equally probably outcomes (i.e., p i = p j≠i ) displays a logarithmic trend ( Figure S1). Therefore, the exponential, a monotonic function, was added in eq 2. The resulting nanoparticle entropy, E, increases linearly with dispersity. The nanoparticle entropy is independent of the mean particle size, another important characteristic. The same distribution will produce the same entropy regardless of the mean. Further information is found in the SI (Figures S1−S3). Please note that monodispersity criteria are based on the relative deviation from the mean particle size. 29 To this end, a normalized entropy (E n ) can be obtained via dividing E by the mean. Since the nanoparticle entropy, E, has the same units as the bin width, the normalized nanoparticle entropy, E n , follows as being dimensionless.
It is important to note that these entropy calculations require discrete data. While the size of nanoparticles is a continuous variable, both the imaging system and analysis method will impose a limit to the exactness of each measurement. This resolution becomes the bin width of a histogram, which effectively presents the nanoparticle distribution as a discrete data set. For an unbiased representation of the nanoparticle dispersity, the bin width must be included in the entropy calculation. The reasoning is as follows: a large bin width will result in fewer bins and therefore a lower entropy; a smaller bin width for the same population will have a larger number of bins and a proportionally larger entropy. By including the bin width in the calculation of the entropy, this variability is avoided (see also Figures S4 and S5). Note that eq 2 assumes the use of a constant bin width.
Entropy depends on the sample size and asymptotically approaches the true value with increasing population. While several methods have been proposed to deal with this issue, we implement here the quadratic extrapolation by Strong et al. for its simplicity and low computational cost. 30 This process relies on calculating E for the total population of M measurements and two subpopulations comprised of M/2 and M/4 measurements randomly selected from the main. This data is then fitted to eq 3, where x represents the sample size.
This method is powerful but requires sufficient data to adequately fit the quadratic. Figure 2 shows the results of the sample size correction for two different populations. For a given sample size, distributions with the characteristics of those shown in Figure 2a,c were randomly generated, and the nanoparticle entropy was calculated with and without sample size correction. This was repeated 100 times. Figure 2b,d shows the mean entropy with standard deviation as a function of the sample size for the respective populations. Without the sample size correction by quadratic extrapolation, at least 500 data points were required for population 1 and 900 for population 2 to achieve an entropy within 15% of the true value. With correction, these reduce to 100 and 150, respectively. On this account, we have developed a reliability index in the accompanying software (Matlab GUI and Excel Macro) to evaluate whether the sample size is sufficiently large for a reliable estimate of the entropy. We note that these sample size requirements are in line with sampling guidelines on conventional approaches. 8 Further details about the sample size correction and the reliability index can be found in the SI.
In order to relate the normalized entropy E n to established definitions of size uniformity, we have developed evaluation criteria for monodispersity based on definitions for dispersions provided by the National Institute of Standards and Technology (NIST) and guidelines used in nanocluster catalysis. 7,29,31,32 The NIST requires that 90% of the particles must lie within ±5% of the mean for a population to be considered monodisperse. 29 In nanocluster analysis, a population is monodisperse if the standard deviation is ≤5% of the mean and near-monodisperse if it is ≤15% (i.e., COV = 0.05 and 0.15, respectively). 31,32 It is important to note that the  Figure S3 produces E = 4.12σ and E 4.12 n = σ μ , which corresponds to limits of 0.206 and 0.618 for monodispersity and nearmonodispersity, respectively ( Table 1).
The NIST standard presupposes no particular distribution. If we assume the distribution described in the NIST is Gaussian, the following must be true: COV 0.05 1.645

=
. Using the linear relationship between E and σ, a limit of E n follows with 0.125. However, as the NIST guidelines do not specify a normal distribution, we evaluated the robustness of this limit for non-normal populations. A distribution was designed to maximize entropy in the limits of compliance with the NIST requirements; a schematic describing this shape is shown in Figure 3d. The range of 90% of the population (range 90 ) is set by the NIST requirement, but the total range (range total ) remains variant. We have therefore defined a variable r as the ratio of range 90 and range total ; this is equal to the number of bins that lie in range 90 divided by the total number of bins. The relationship between E n and r is described in eq 4 and plotted in Figure 3. A detailed derivation of this equation can be found in the SI.  Figure 3a shows a 3D surface map of eq 4. The majority of the surface exhibits a shallow gradient; overall, 95.9% of all combinations of n 90 and n 10 result in an E n between 0.1 and 0.2 (r = 0.963 and 0.025, respectively). Deviating from a normal   distribution resulted in little change in E n . We therefore recommend using the cutoff of E n = 0.125, below which populations can be reliably considered as highly monodisperse.

■ RESULTS AND DISCUSSION
In order to evaluate and benchmark this approach, six different data sets of oleylamine-capped gold nanoparticles were analyzed; see Figure 4. In each case, a representative transmission electron micrograph is displayed alongside the histogram obtained by image analysis. For each sample, the number of counts (≈2000−5000) is provided together with the mean size, standard deviation, and corresponding COV obtained by conventional analysis of the data sets. The absolute and normalized nanoparticle entropy for each nanoparticle population was obtained with the user-friendly macro (SI).
To check for normality, the apparent Gaussian distribution based on the mean and standard deviation calculated from the raw data was plotted alongside the histogram of the experimental data in Figure 4. While statistical tests for normality are abundant, these are often not suitable for large data sets. 33 Experimental data will never produce a perfectly normal distribution at large sample sizes. Hence, the null hypothesis that the data is normal would be rejected in almost all cases. As an alternative, the Gaussian derived from the raw mean and standard deviation was compared to the probability distribution of each sample shown in Figure 4. Goodness of fit statistics including the sum of squared errors (SSE), the coefficient of determination (R 2 ), and the root-mean-square error (RMSE) are summarized in the Supporting Information (Table S1).
Populations A and C demonstrate low SSE and RMSE as well as R 2 values close to 1. These values agree with a visual confirmation that populations A and C are reasonably well represented by the Gaussian and thus described by the mean μ and standard deviation σ. Under these conditions, the standard deviation is capable of representing the dispersity. It follows then that, for a given change in standard deviation, the same change should be observed in E; this is the case for populations A and C. The standard deviation increases by 149% between population A and population C, and a 145% increase was measured for E. Furthermore, the obtained COV = 0.045 and E n of 0.18 for population A comply with the criteria of monodispersity as defined by Moser and co-workers while the population does not fully satisfy the NIST criteria. 32 In contrast, the other populations, B, D, E, and F deviate significantly from the apparent Gaussian distribution. This can result in incorrect interpretation as the standard deviation no longer directly correlates with dispersity. As noted previously, dispersity is the measure of the inhomogeneity of the size distribution rather than the breadth, which is correlated to the standard deviation. For populations which adhere to the Gaussian, standard deviation correlates directly with dispersity; as the population diverges from normality, this relationship breaks down and standard deviation becomes a less reliable measure of dispersity. For example, populations with apparent similar standard deviations, i.e., B and C, can display a significantly divergent degree of dispersity. Population B comprises a main population and a minor population; this small secondary population contains so few particles that it has little impact on the dispersity but does influence standard deviation. A similar behavior is observed between populations D and E. Populations F and E are comparable in dispersity but show a much larger disparity in standard deviation. Dispersity by definition is determined by the number of sizes observed and the relative quantities; in contrast, standard deviation depends on how these values are arranged around the mean.
As population E consists of one broad distribution it results in a lower standard deviation than population F; the two peaks in population F shift the population density away from the mean producing a larger standard deviation for a similar range (see also Figure S11 in the SI).  Overall, discrepancies with the normal distribution (such as asymmetry or multiple modes, both commonly observed in nanoparticle populations) reduce the capacity of standard deviation to describe dispersity. Nanoparticle entropy, in contrast, remains a reliable metric irrespective of the type of distribution and reflects the significant differences between each population. This measure offers a clear pathway for optimization toward uniformity, i.e., by minimizing E or E n .
The populations presented in Figure 4 were taken from a larger study in our lab investigating the effects of experimental conditions on the dispersity of the resulting populations. Multiple studies previously highlighted the role of the reaction time, which should appear as a significant variable with effects such as size focusing. 34 In our own analysis, when using COV as a measure of dispersity, the reaction time was not identified as significant (p = 0.2404). In contrast, E n highlighted the role of the reaction time on the homogeneity of the particle size (p = 0.0009). The large difference between these two p-values is consistent with the imprecision of COV and other like methods when measuring dispersity of non-normal populations. By not using an exact measure of dispersity, the cumulative error obscures important synthetic variables, here the role of the reaction time on the particle size distribution. This example highlights the importance of using an appropriate metric.
Please note that the method can be used on any data set that provides a histogram of the nanoparticle population, e.g., nanoparticle tracking analysis, disc centrifugation analysis, analytical ultracentrifugation, small-angle X-ray scattering, or dynamic light scattering. For the latter ensemble-based techniques, the data can be presented in a suitable format by approximating the number of measurements. Please note that, as the number of measurements usually exceeds 10 5 particles, an order of magnitude estimation will be more than sufficient with little to no impact on the final calculated nanoparticle entropy.

■ CONCLUSION
To conclude, we propose the use of nanoparticle entropy as a reliable measure to evaluate dispersity in nanoparticle populations. This approach allows any type of distribution to be described, irrespective of being mono-, poly-, or heterodisperse. We envision this approach to be particularly useful for optimization protocols that are targeted toward achieving size uniformity. To this end, nanoparticle entropy represents a reliable descriptor in automated synthetic procedures leveraging on advanced statistical tools, including design of experiment and machine learning.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.chemmater.0c00539. The MATLAB code as well as a user-friendly Macro for Microsoft Excel (Windows and MacOS) are available via https://github.com/adrena-lab/Nanoparticle_Entropy. Further details regarding the response of entropy as a function of dispersity; measurement resolution and bin width; sample size correction; further details on the determination of the NIST monodispersity criteria; and statistics on the experimental data (PDF)