Deep Learning Neural Network Approach for Predicting the Sorption of Ionizable and Polar Organic Pollutants to a Wide Range of Carbonaceous Materials

Most contaminants of emerging concern are polar and/or ionizable organic compounds, whose removal from engineered and environmental systems is difficult. Carbonaceous sorbents include activated carbon, biochar, fullerenes, and carbon nanotubes, with applications such as drinking water filtration, wastewater treatment, and contaminant remediation. Tools for predicting sorption of many emerging contaminants to these sorbents are lacking because existing models were developed for neutral compounds. A method to select the appropriate sorbent for a given contaminant based on the ability to predict sorption is required by researchers and practitioners alike. Here, we present a widely applicable deep learning neural network approach that excellently predicted the conventionally used Freundlich isotherm fitting parameters log KF and n (R2 > 0.98 for log KF, and R2 > 0.91 for n). The neural network models are based on parameters generally available for carbonaceous sorbents and/or parameters freely available from online databases. A freely accessible graphical user interface is provided.


INTRODUCTION
Persistent organic contaminants (POPs) are hydrophobic organic compounds that include the original 12 compounds regulated in the Stockholm convention (the "dirty dozen"). These toxic compounds have been of special concern because of their longevity ("persistence") and their potential long-range atmospheric transport. Over the past several decades, POPs and their environmental fate have been widely studied and approaches to elucidate their fate in the natural environment have been developed. 1 Today, many contaminants of emerging concern are polar and/or ionizable organic compounds, including pesticides, pharmaceuticals, and personal care products. For example, in 2010, approximately 50% of the industrial chemicals falling under European chemicals regulation (REACH) were ionizable organic compounds; of these, 27% were acids, 14% were bases, and 8% were zwitterions. 2 A state-of-the-art approach to predict the sorption of neutral hydrophobic organic contaminants to a given material (sorbent) are poly-parameter linear free-energy relationships (ppLFER). 3−6 The ppLFER concept for neutral compounds is based on the Abraham parameters E (excess molar refraction), S (dipolarity/polarizability), A (H-bond acidity), B (H-bond basicity), V (McGowan molar volume, cm 3 mol −1 /100), and L (log of the hexadecane−air partition coefficient). In addition, the sorption of organic compounds to carbonaceous sorbents is concentration-dependent (non-linear), a factor that was recently introduced into ppLFER for predicting the sorption of neutral organic compounds to activated carbon 4 and to soot. 5 Carbonaceous sorbent materials, such as activated carbon, soot, biochar, and carbon nanotubes (CNTs), have a wide range of applications, including drinking water filtration systems, wastewater treatment plants, and soil and sediment remediation. There are major limitations to the use of conventional ppLFER in the characterization of these systems. Specifically, the development of each ppLFER requires a substantial number of experiments with a wide range of compounds, such that every model is limited to a single sorbent the ppLFER must be developed for individually. Thus, compared to existing models, the ability to predict contaminant sorption as a function of the properties of the sorbent would be of great advantage, as it would facilitate selection of both the appropriate sorbent and its quantity for a given application.
Moreover, methods developed to predict the sorption of neutral compounds, such as the ppLFER, are not applicable to charged compounds because the occurrence of additional interactions, including electrostatic repulsion and attraction, charge-assisted H-bonding, cation bridging, cation−π bonding, and anion−π bonding, will depend on the speciation/ dissociation of a given ionizable organic compound. 7−10 These interactions cannot be accounted for in existing ppLFER concepts, and the prediction of the environmental fate of charged compounds is accordingly hindered.
A simple sorbent-dependent model based on experimental data to predict the sorption of organic acids was recently proposed. 11 The model predicted sorption distribution coefficients using the pH-dependent lipophilicity parameter log D OW and the specific surface area (SSA) of the carbonaceous sorbent. However, (i) acids are only one of the three types of ionizable organic compounds; (ii) the model could not satisfactorily predict the literature data likely because of differences in measurement protocols, including the measurement of the SSA, 12 and (iii) the concentrationdependent nonlinearity of sorption was not predicted.
To address these issues, we collected published data from over 10 years of experimental research. As the Freundlich isotherm fitting model, first published in 1907, 13 was the most widely applied model (66% of 210 papers collected), it was chosen as a target for prediction. To predict the Freundlich fitting parameters for the sorption of ionizable organic compounds to carbonaceous sorbents, a deep learning approach was developed and tested on independent literature data to validate the model's performance. The results showed that this newly developed method is able to predict the sorption of anionic, cationic, and zwitterionic ionizable organic compounds to carbonaceous sorbents and is therefore widely applicable. Moreover, it is based on parameters generally available for carbonaceous sorbents and additional compound descriptors that are freely available from online databases. A freely accessible graphical user interface is provided by the authors.

METHODS
2.1. Data Mining from the Literature. The literature from 2005 to 2019 was searched using three-word Scopus searches, including one keyword for the sorbent (CNT, activated carbon, biochar, graphene, carbonaceous, or graphite) and one keyword for the sorbate (polar or ionizable) and "sorption." Reviews and nonrelevant papers were excluded from the database, which resulted in a list of 210 papers. Thereafter, only papers including the Freundlich isotherm fit and reporting the SSA as well as the C, H, and O contents were selected for further analysis, which resulted in a core database sourced from 47 publications. 11,14−59 Every sorbent− sorbate combination used in these publications received a separate line in the database, which resulted in 328 lines for negatively charged and polar compounds (Table S1 in the Supporting Information) and 139 lines for compounds with a positive charge (Table S2 in the Supporting Information). Each line contained information on the combination of one single sorbate and one single sorbent under one specific pH condition. If pH was not reported, a pH of 7 was assumed for the sorbate property calculations (i.e., log D OW ). For highly carbonized materials with a carbon content >90% and no measurable H content, an H content of 0.01% was assumed for H/C calculations. The validity of this approach was tested by running the neural network with and without these data. The predictions based on the smaller data set did not differ substantially but were associated with larger prediction errors and a smaller working range because of the decreased number of available training items (data not shown).

Deep
Learning Neural Network Setup. The feedforward neural network was trained using an automated Bayesian regularization technique 60,61 in which the weights and biases of the network are assumed to be random variables with specified distributions. The regularization parameters are then connected to the variances associated with these distributions and are estimated using statistical techniques. The Bayesian regularization algorithm generally works best when the inputs and outputs are approximately scaled in the range between −1 and 1. Because K F values are orders of magnitude greater than this range and change drastically, considering log K F , instead of K F , as a target parameter significantly improved the quality of the trainings. To further improve the model, outliers were excluded from the training data sets. To this end, data lines containing n and log K F values smaller than the 5th percentile and larger than the 95th percentile were excluded from the training set.
Overfitting is a common problem during neural network training. In an over-fitted neural network, although the error of the training set is driven to a small value, the presentation of new data to the same network can result in large errors. This is because the trained network has memorized only the training examples and has not learned to generalize to new situations. The number of parameters in this study was reasonably smaller than the total number of data in the training set such that the chance of overfitting was small. In addition, network generalization was improved by training the neural network on the same data set multiple (50) times. The same Bayesian regularization back-propagation training technique was used in all multitraining sessions. Each training session started with different initial weights and biases as well as different divisions of data for training (70%), validation (15%), and test (15%) sets. Because different conditions led to different solutions, the final estimations were obtained by averaging between the outputs from all 50 trained networks. As a result, in the majority of cases, the mean squared error for the average output was lower than that for the individual sessions. The employed multitraining technique thus led to a better network generalization, which improved the network forecasting capability. This was particularly helpful for the small and noisy data set of compounds containing a positive charge.
Because the computational costs of multiple training can be high, we implemented a parallel computation scheme to greatly reduce the training times. The computational time for a complete multitraining session on an Intel Core i7-9700K CPU with 32 GB RAM was under a minute using all eight CPU cores.
2.3. Sensitivity Analysis. The variance-based global sensitivity analysis (GSA) of Sobol (2001) 62 was used to determine the importance of individual input parameters for the outcome of neural network predictions of log K F and n. A GSA, in contrast to a local sensitivity analysis (LSA), considers variabilities in the full range of values for all input parameters simultaneously. It is thus superior to LSA, in which the focus is the variability of a single parameter value at a time. As such, GSA offers a more rigorous solution for elucidating the impact of input parameter's variability considering that all other parameters are also variable. We used a latin hypercube sequencing sampler to generate 200,000 sample scenarios that uniformly covered the space of the input parameters. For each Environmental Science & Technology pubs.acs.org/est Article realization, input variables were perturbed at random within the range of each parameter variability in the full training data set. The fully trained model was solved 200,000 times for each randomly generated realization, for which the abovementioned computational setup took about 20 min. The spatial variability of each input parameter was assumed to follow a normal distribution defined by the standard deviation and mean value of that parameter alone; no correlation was assumed between the spatial variabilities of different input parameters. The firstorder Sobol indices (S i ) were then calculated from the GSA, as described in   63 and in Sobol and Levitan (1999). 64 2.4. Graphical User Interface. To maximize their range of applicability, the models for the graphical user interface were built on the complete data set (including the data lines previously excluded for validation). The interface is conceptually similar to previously developed graphical user interfaces. 65,66 The "CFreuPred" graphical user interface is capable of importing and exporting data from/to Excel or Open Office to ease data transfer to the widely used file format ".xls".      Four sorbent property parameters commonly reported in the literature and previously linked to sorption behavior were selected. 7,11,54,67,68 The sorbent content of carbon (C, %), hydrogen (H, %), and oxygen (O, %) as well as the SSA (m 2 / g) were sourced from the literature, and the molar ratios H/C and O/C were calculated. Among the >200 screened publications, 11,14−59 47 reported all of the above parameters. In addition, pH was also used as a fifth parameter. Thereby, for a given material, C is a proxy for homogeneity, SSA is a proxy for porosity and accessible sorption sites, H/C is a proxy for aromaticity, and O/C is a proxy for polarity, and the experimental pH is linked to the material's surface charge (negative charge increasing with pH).

Selection of Parameters. The classical presentation of the Freundlich equation is
Eight sorbate properties were selected to describe the molecular properties of ionizable and polar compounds: The five Abraham solute parameters (E, S, A, B, and V) were obtained from the freely accessible UFZ-LSER database. 69 The sixth Abraham parameter, describing hexadecane−air distribution (L), was not used because a pH-independent hydrophobicity parameter is conceptually not applicable to ionizable organic compounds, which dissociate depending on the surrounding pH and whose hydrophobicity thereby changes. Instead of L, the pH-dependent hydrophobicity parameter log D OW was calculated at the experimental pH, using the freely accessible ChemAxon online platform (chemicalize.com). When P (octanol−water partition coefficient for the neutral species) and P i (octanol−water partition coefficient for the ionized species) are known, D ow for acidic (anionic) compounds can be calculated as In addition, we used the experimental pH and the dissociation constants of the ionizable organic compounds to calculate the abundancy of ionized species present under a given condition using the Henderson−Hasselbach equation.
Several attempts to train the neural network for all types of compounds combined were not able to obtain meaningful results (data not shown), most likely because compounds containing a positive charge behave differently from polar and anionic compounds. For example, the hydrophobicity of acidic and polar compounds is generally positively linked to sorption. As the hydrophobicity of acidic compounds decreases with dissociation, sorption decreases as well. This can be explained in part by the electrostatic repulsion of the anions from the generally negatively charged surface functional groups on the carbonaceous sorbents. In contrast, when cationic ionizable organic compounds dissociate and their hydrophobicity decreases, their positive charge can be electrostatically   We therefore subdivided the data set into (i) negatively charged and polar compounds and (ii) compounds containing a positive charge. Zwitterions, which can have both charges, were grouped according to their speciation, with 0.001% of the compound being positively charged set as the threshold to place the compound in the second group. At <0.001% of the compound being positively charged, the contribution of the positive charge to overall sorption is most likely negligible. The two databases are presented in searchable xls Tables S1 and S2 of the Supporting Information.
3.2. Predicting Sorption of Anions and Polar Compounds. The model was constructed on the basis of a feed-forward deep learning neural network (also known as a multi-layered network of neurons) with 20 hidden layers between the input and output layers. These hidden layers process the complex nonlinear relationships between the 12 input parameters (sorbent and sorbate descriptors from Section 3.1) and the two output parameters (log K F and n). The neural-network-based predictions of log K F and n yielded very accurate predictions of the data from the training set and were able to cover a wide range of input parameters, as shown in Table 1 and Figure 1.
The 95% confidence interval for the prediction of log K F shows that predictions of K F are associated with errors below one order of magnitude. This is in the same or lower range as the errors of state-of-the-art prediction models of single carbonaceous sorbents and neutral compounds, 4,5 which demonstrated the excellent performance of our model in predicting the log K F for polar and anionic compounds as a function of sorbent properties. Typically, for carbonaceous sorbents, the concentration dependence of sorption (nonlinearity) increases at high concentrations (i.e., n decreases). Thus, the slightly larger errors associated with the prediction of n (Figure 1) can partially be explained by the strong dependence of the nonlinearity of sorption on the concentration range of interest during the measurement of a sorption isotherm. The values obtained from the literature were calculated based on widely varying concentration ranges (ng/ L range to mg/L range for c aq in the aqueous solution).  Environmental Science & Technology pubs.acs.org/est Article Therefore, the performance of the model in predicting the exponent n can be considered to be very good.
To validate the predictions, we randomly excluded 15 data lines (equal to 5% of the total dataset, see Table S3 in the Supporting Information) from the training data set prior to neural network training. The prediction results for these independent data are shown in Figure 2 and confirm the good model performance obtained for the K F and n of negatively charged and polar compounds.
The variance-based GSA of Sobol (2001) 62 was used to determine the importance of individual input parameters for the outcome of the neural network predictions of log K F and n. The first-order Sobol indices for the global sensitivity of log K F and n to the 12 input parameters are displayed in Figure 3. The SSA, the sorbent aromaticity, and polarity as approximated by H/C and O/C were the most important sorbent parameters for the prediction of log K F and n. The most important compound properties to predict log K F were the degree of dissociation (A − %), the pH-dependent hydrophobicity parameter log D OW , and the Abraham parameters for polarizability (S), H-bond basicity (B), and molar volume (V). The sensitivity of the predictions of log K F to sorbent and sorbate properties was similar, whereas the prediction of n was largely driven (>80%) by the properties of the sorbent. Thus, sorption was driven by interactions with specific sorption sites, which were consumed with increasing sorbent loading. Furthermore, the importance of SSA, H/C, and S indicated that π−π electron donor−acceptor interactions are a driving mechanism of sorption for negatively charged and polar compounds, as also reported in the literature. 7,9 3.3. Predicting Sorption of Cations and Zwitterions. The same deep learning approach used for negatively charged and polar compounds was applied to compounds containing a positive charge. Therefore, the input layer was extended for an additional parameter that accounted for the abundance (%) of positively charged species, resulting in a total of 13 input parameters. The neural-network-based predictions of log K F and n again yielded very accurate predictions over a wide working range of input parameters, as shown in Table 2 and Figure 4. Because of the smaller size of the training data set, the model's working range was smaller than for anions and polar compounds (see Tables 1 and 2). Similar to the predictions for anions and polar compounds, n was associated with higher prediction errors, likewise explained by the high concentration dependence of sorption nonlinearity (see Section 3.2).
To validate the predictions, we again randomly excluded 5% of the data lines (6 lines, see Table S4 in the Supporting Information) from the data set prior to neural network training. The results for these independent data confirmed the good predictions for both K F and n for compounds containing a positive charge ( Figure 5). However, the training data set was much smaller for this model than for the model presented in Section 3.2, and a larger body of literature will likely further increase the accuracy and applicability range of the model.
The calculation of the variance-based GSA was similar to that for anions and polar compounds. The importance of single input parameters for the prediction of log K F and n was more evenly distributed ( Figure 6), indicating that no single sorption process describable by these parameters was responsible for driving the sorption of compounds containing a positive charge. This is in good agreement with the literature, in which prediction of the sorption of compounds with a positive charge is often viewed as more challenging than is the case for negatively charged compounds. 7,8 The dipolarity/polarizability S was the only sorbate parameter with little to no significance for the sorption of compounds with a positive charge. In contrast, S was a sorbate property of high importance for predicting the sorption of polar and negatively charged compounds. This indicates that π electron donor−acceptor interactions are generally not the drivers of the sorption of these compounds. Instead, the amount of ionized positively charged species was an important sorbate parameter for prediction, indicating that electrostatic attraction contributed substantially to sorption. Compounds containing a positive charge therefore exhibit a very distinct sorption behavior that is in stark contrast to the behavior of other organic compounds. Untangling of this distinct behavior to further improve predictive models and enable the production of sorbents tailored for cations and zwitterions is an important challenge for future research. In these studies, additional sorbent parameters such as cation exchange capacity should be considered during sorbent characterization because most of the published studies on the sorption of ionizable organic compounds have not reported cation-or anion-exchange capacities.
3.4. Potential Model Applications and Environmental Implications. Prerequisites for the design of efficient water Environmental Science & Technology pubs.acs.org/est Article purification systems or remediation strategies are easily accessible tools able to predict the sorption of emerging contaminants, which are often ionizable and polar compounds.
To address this need, we made use of the available literature to develop two neural network-based models. Both performed excellently in predicting the sorption of organic anions, cations, and zwitterions as well as polar compounds to a wide range of carbonaceous materials. The first model was tailored to predict the sorption of polar and negatively charged contaminants and the second model that of compounds containing a positive charge, including zwitterions. To account for the concentration dependence of organic contaminant sorption to carbonaceous sorbent materials, both models can predict the Freundlich coefficient K F and the exponent n that accounts for the concentration dependency of sorption. The provided models are able to cover a very wide range of sorption scenarios and will thus be useful for scientists and practitioners in the fields of water purification and remediation. To increase the accessibility of the models to those who are not familiar with computational environments, we provide a graphical user interface as Supporting Information. To predict compounds and sorbent combinations with properties outside the range of the current version, the model can be trained with additional data, which will further improve its generalization and forecasting capabilities.
■ ASSOCIATED CONTENT
All data used in this study, including the training datasets (XLSX) "CFreuPred" graphical user interface (ZIP)