Predicting Drug-Induced Cholestasis with the Help of Hepatic Transporters—An in Silico Modeling Approach

Cholestasis represents one out of three types of drug induced liver injury (DILI), which comprises a major challenge in drug development. In this study we applied a two-class classification scheme based on k-nearest neighbors in order to predict cholestasis, using a set of 93 two-dimensional (2D) physicochemical descriptors and predictions of selected hepatic transporters’ inhibition (BSEP, BCRP, P-gp, OATP1B1, and OATP1B3). In order to assess the potential contribution of transporter inhibition, we compared whether the inclusion of the transporters’ inhibition predictions contributes to a significant increase in model performance in comparison to the plain use of the 93 2D physicochemical descriptors. Our findings were in agreement with literature findings, indicating a contribution not only from BSEP inhibition but a rather synergistic effect deriving from the whole set of transporters. The final optimal model was validated via both 10-fold cross validation and external validation. It performs quite satisfactorily resulting in 0.686 ± 0.013 for accuracy and 0.722 ± 0.014 for area under the receiver operating characteristic curve (AUC) for 10-fold cross-validation (mean ± standard deviation from 50 iterations).


■ INTRODUCTION
Drug induced liver injury (DILI) is a major issue worldwide, both for patients and health providers. 1,2 It is one of the primary causes for attrition during clinical and preclinical studies and the main reason for drug withdrawal from the market. 3−6 DILI is divided into types, (i) hepatocellular, (ii) cholestatic, or (iii) mixed (hepatocellular and cholestatic), according to the type of liver damage and the clinical chemistry biomarker alterations. 7 The cholestatic and mixed hepatocellular and cholestatic type are the two most severe manifestations of DILI and yield almost half of the recorded cases of DILI. 8,9 Cholestatic liver injury, or more simply cholestasis, is the disruption of the bile flow, which might be either due to biliary tract obstruction or to complications in bile acid uptake. While the mechanistic basis for hepatocellular DILI is still a mystery for the majority of the cases, more knowledge exists for cholestatic DILI. There is growing evidence for a vast amount of cholestasis cases pinpointing the important role of hepatic transporters. 10 Hepatic transporters are classified into basolateral and canalicular ones. Basolateral transporters are responsible for the uptake of drugs and other endobiotics and xenobiotics from the blood, influencing the exposure of the hepatocyte to potential damage. Canalicular transporters regulate the hepatic clearance, as well as the secretion of bile salts and bile conjugates into bile. 10−15 Any disturbance of the transporters' physiological function may result in the accumulation of potentially harmful bile products that can finally cause cholestasis. 10 Figure 1 provides an overview on the respective location of hepatocyte transporters.
Several transporters' malfunctions have been associated with cholestasis. The most important one, due to its pivotal role in bile salts clearance, is the bile salt export pump (BSEP). 8,10,16−21 Apart from BSEP, there is evidence for the implication of other canalicular efflux transporters such as the multidrug resistance-associated protein 2 (MRP2), [8][9][10]22 breast cancer resistance protein (BCRP), 8−10 multidrug resistance protein 3 (MDR3), 8,10 and P-glycoprotein (P-gp). 8−10 MDR3 functions as an ATP-dependent phospholipid flippase, translocating phosphatidylcholine from the inner to the outer canalicular membrane. Canalicular phospholipids are then solubilized by canalicular bile salts to form mixed micelles, protecting cholangiocytes from the detergent properties of bile salts. While P-gp is also not transporting bile salts, it is implicated in cholestasis because of its large amount of substrates and inhibitors which cause drug−drug interactions that disrupt the smooth function of the hepatocyte. 10 The basolateral transporters play also an important role, both the uptake transporters, such as organic anion transporting polypeptides 1B1, 1B3, and 2B1 (OATP1B1, 1B3, and 2B1) 8−10 and sodium (Na + ) taurocolate cotransporter (NTCP), [8][9][10]23,24 and the efflux transporters, like multidrug resistance-associated protein 3 8−10 and 4 8−10,20 (MRP3 and MRP4). In particular, in cases of cholestasis, the basolateral uptake transporters NTCP and OATP1B1 have been found down-regulated. 10,11 However, in this case, OATP1B3 is upregulated as a compensatory mechanism for the elimination of xenobiotics from sinusoidal blood. 10,25,26 On the contrary, in cases of cholestasis, MRP3 and MRP4 are up-regulated to facilitate the efflux of the toxic bile salts out of the hepatocyte. 27 Thus, simultaneous inhibition of several of these transporters could induce drug toxicity due to inadequate elimination from the blood or increase the cholestatic effect due to accumulation of bile salts in the hepatocyte.
Consequently, drug-induced liver injury and cholestasis are important toxicity alerts to be considered in drug development. Interestingly, there are only a few computational studies for the prediction of cholestasis reported in literature. 28,29 With respect to the involvement of hepatic transporter, there are some in vitro studies correlating cholestasis with transporter inhibition, such as BSEP, 17,18,30 MRP3, MRP4, 20 and NTCP. 31 Also several in silico studies for the identification of potentially cholestatic compounds via modeling of transporters and then associating them with the cholestatic effect of their inhibitors have been conducted. A characteristic example is the study by Greupink et al. in 2012, who developed a pharmacophore approach for NTCP 24 in order to identify potentially NTCP inhibitors. Under the same principles, in 2014 Ritschel and colleagues 71 performed a 3D ligand-based pharmacophore model for BSEP inhibition. However, in most of these cases the amount of validated drugs is small and what is basically described is the association between transporter inhibition and cholestasis. Thus, as the respective is associated with cholestasis, it is assumed that an inhibitor is causing cholestasis. Most recently, Muller et al., 32 in order to model DILI, also modeled some more hepatotoxicity end points, including cholestasis. Moreover, Mulliner et al. 33 presented a multilevel modeling approach for DILI, where cholestasis was also included as a morphological hepatobiliary finding. However, examining the liver transporters contribution was not within the scope of their work. Finally, it is noteworthy to mention some multiscale modeling approaches for DILI. DILIsym (www. dilisym.com) is a mechanistic mathematical model of DILI, that has been used to investigate the effects of BSEP inhibition on drug-induced liver injury, 34 as well as on bile acid-mediated DILI. 35 Additionally, Sluka et al. developed a multiscale, livercentric in silico modeling framework for acetaminophen pharmacology and metabolism that can be extended in predicting hepatotoxicity due to acetaminophen overdosing. 36 In this study we present a classification scheme in order to predict cholestasis from a public data set, using physicochemical descriptors as well as predicted transporter inhibition profiles. For the latter we used our in house classification models for BSEP, 37 43 and the SIDER database v2 44,45 using the search terms: "drug-induced cholestasis" or "cholestasis". The retrieved publications 8,31,46,47 were then investigated manually for human data, i.e. compounds that are positive or negative for drug-induced cholestasis in humans. Those compoundsin principle drugswere added to those obtained from the SIDER v2 44,45 database. Unfortunately, cholestasis is an end point that is not widely examined in terms of experimental or in silico studies that would potentially guide us to big data sets. Thus, even though we were able to compile several drugs positive for cholestasis, there was almost no information in terms of the negatives. On the other hand, DILI in general is studied quite extensively and there are several respective data sets. Since choleastasis is a possible manifestation for DILI, we can consider safely that any compound negative for DILI will definitely be also negative for cholestasis. Thus, the negative compounds for DILI that we had compiled and curated in a previous work were also used as negatives for this study. The data set was carefully curated according to the following rules: • All inorganic compounds were removed based on their chemical formula in MOE 2014.09. 48 • Salt parts and compounds containing metals and/or rare or special atoms were removed and the chemical structures were standardized using the Standardiser tool created by Francis Atkinson. 49 • Duplicates and permanently charged compounds were removed using MOE 2014.09. 48 Here we have to note that stereoisomers, even if biologically can be considered as different compounds, were considered as duplicates, since they give the exactly same vector of descriptors. If two (or more) stereoisomers are of the same class, only one was kept. If they were of different classes, they were all removed. • 3D structures were generated using CORINA (version 3.4), 50 and their energy was minimized with MOE 2014.09, 48 using default settings, but changing the gradient to 0.05 RMS kcal/(mol A 2 ). In addition, the existing chirality was preserved. After these curation steps 152 compounds remained as positives for cholestasis. The negatives for DILI, and subsequently for cholestasis, were 466 compounds. However, when uniting the data, there were compounds with contradictory class assignments. These compounds were removed Figure 1. Transporters located on the hepatocyte. Blue symbols represent mainly the canalicular transporters, and red symbols, the basolateral ones. The arrows define the direction of transport. The transporters used in this study are presented within rectangular frames. MRP1−6 multidrug resistance-associated proteins 1−6, OSTα/OSTβ organic solute transporter, BSEP bile salt export pump, BCRP breast cancer resistance protein, MATE1 multidrug and toxin extrusion transporter 1, ABCG5/G8 ATP-binding cassette subfamily G member 5/8, MDR3 multidrug resistance protein 3, P-gp P-glycoprotein, ATP8B1 ATPase-aminophospholipid transporter, OATP organic anion transporting polypeptide, NTCP sodium (Na + ) taurocolate cotransporting polypeptide, OCT organic cation transporter 1, OAT organic anion transporter.

Journal of Chemical Information and Modeling
Article from the data set, yielding a data set of in total 578 compounds (131 positives and 447 negatives). The compiled data set is provided in the Supporting Information.
External Test Set. Recentlyand after having already compiled our training set for cholestasis and developed the respective modela data set covering multiple levels of hepatotoxicity was published by Mulliner and co-workers. 33 The data are hierarchically clustered by the authors into three levels of hepatotoxicity: level 0 corresponds to general hepatotoxicity, level 1 corresponds to clinical chemistry findings and morphological finding as distinguished parts of general hepatotoxicity, and level 2 discriminates both clinical chemistry and morphological findings into hepatocellular and hepatobiliary injury. We use only clinical data, i.e. the human data, of morphological findings for hepatobiliary injury as an external test set for validating the developed cholestasis model. Once more, we performed chemical data curation and removed the compounds overlapping with the training set, which led to 1347 compounds (230 positives and 1117 negatives) as an external test set.
Merged Data Set. We also merged the training with the test set and tried to generate a model based on the united data. The merged data set comprises 1904 compounds: 355 positives and 1549 negatives for cholestasis. Here we must note that the total number of compounds in the merged data set is not the same as the sum of the compounds of training set and test set. The reason is that when removing the overlapping compounds between training set and test set, all of the compounds were removed from the test set, since we had already selected the final model for cholestasis. From those overlapping compounds, 21 had contradictory class labels. This did not matter so much, since they were totally removed from the test set. However, when merging the two data sets for modeling, we did not want to decide regarding the class of those compounds.
Generation of Statistical Models. Molecular Descriptors. For both data sets, several types of molecular descriptors have been calculated, such as all 192 2D MOE descriptors, the 3D VolSurf series of descriptors, 32 as well as ECFPs (extended connectivity fingerprints; ECFP6), using RDKit (http://www. rdkit.org/). 51 The list of the final descriptors used for the proposed model is given in the Supporting Information (Table  S1). In addition to this, predicted hepatic transporter inhibition profiles were also included in the list of descriptors. The transporters investigated comprise BSEP, P-gp, BCRP, OATP1B1, and OATP1B3.
In particular, for basolateral transporters we calculated the predictions for four in silico classification models built upon PaDEL descriptors 52 for OATP1B1 and OATP1B3 39 inhibition. For obtaining the predictions we use the models' version implemented in eTOXlab, 53 an open source modeling framework for implementing predictive models. Out of each model we got a binary result: positive or negative. For each transporter we use the sum of these binary scores, denoted "Sum binary score". The sum binary score can take values between 0 (if all models predict the compound as negative) and 4 (if all models predict the compound as positive). For basolateral transporters, we used the continuous score obtained by the BSEP 37 inhibition prediction model. Float predictionscores were also retrieved for P-glycoprotein 38 and BCRPinhibition. 38 A more thorough description of the transporters inhibition models is provided in the Supporting Information (Table S2), where the size of the training set, the inhibition threshold of the training set, and the algorithm and performance of each individual model based on AUC values are provided.
Algorithms Used. The two-class classification models were built using the software package WEKA (version 3.7.12). 54 We investigated the performance of several base classifiers, such as logistic regression, tree methods (random forest and J48 tree), support vector machines (SMO in WEKA with polynomial, RBF, and Puk kernels), naïve Bayes, and k-nearest neighbors. Moreover, because the data set is slightly imbalanced, in order to equilibrate the effect of the majority class on model performance, we also applied the cost-sensitive meta-classifier MetaCost. 55 The cost matrix applied corresponds to the imbalance ratio of the data (3:1). Additionally, several metaclassifiers were explored for attribute selection (AttributeSe-lectedClassifier), as well as for improving the statistical performance, such as Bagging 56 and Boosting. 57,58 Model Validation. The models were originally validated via 10-fold cross validation, which is considered a quite trustworthy method of validation. 59 The best modelsaccording to 10-fold cross-validation evaluationwere further validated via using the data set by Mulliner. 33 Subsequently, for the best obtained models, 50 iterations were performed by changing the crossvalidation seed (for splitting the data within cross validation) and the respective performance parameters were calculated. In order to compare whether the inclusion of the transporters predictions in the descriptors set improves significantly the model's performance, a two-sample t test was performed in R. 60 The statistics metrics taken into consideration were accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), area under the curve (AUC), precision and weighted average precision. Apart from the conventional metrics of accuracy, sensitivity, and specificity, we also use AUC, since it is a measure of the models capability to rank the predictions, while it decouples classifiers from class imbalance and error costs. Moreover receiver operating characteristic (ROC) graphs are very useful for visualization of the models results. 61 The MCC value, considering its formula, takes into account all values of the confusion matrix: Where, TP is true positives, FP is false positives, TN is true negatives, and FN is false negatives. Thus, it is considered more balanced and informative than the column-or linewise metrics. 62 Weighted average precision is the average precision obtained for the two classes but weighted from the total number of instances of the classes. 54 It is a quite helpful parameter in multiclass classification problems, as well as for imbalanced data sets where the number of negatives is greater than the number of positives. Especially for the latter case, due to the definition of precision [PPV = TP/(TP + FP)], its value for the positive class would be low, which not necessarily means that the total performance of the model is bad. Of course, since we are dealing with a toxicity classification problem, like cholestasis, the metrics that is of particular interest and that should by no means drop below 0.5 is sensitivity or true positive rate. Defining Applicability Domain of the Models. In order to be confident regarding the validity of the models we used, we investigated the coverage of the transporters models for the cholestasis data. Additionally, we checked how reliable the

Journal of Chemical Information and Modeling
Article predictions of the cholestasis model for the cholestasis test set are. The applicability domain was checked on KNIME with the Enalos nodes 63,64 that compute the applicability domain on the basis of the Euclidean distances. 65 The number of compounds within the model's applicability domain for each model and for each cholestasis data set is provided in the Supporting Information (Table S3).

■ RESULTS AND DISCUSSION
Generation of a Cholestasis Classification Model. Several combinations of descriptors and classifiers were investigated and the optimal classification model was selected on the basis of the results of 10-fold cross validation. With respect to the classifier, the best results were obtained using as base classifier IBkthe k-nearest neighbors implementation in WEKAwith k = 5. The meta-classifier MetaCost was also applied, with the application of the cost matrix [0.0, 1.0; 3.0, 0.0], i.e. weighting the minority class 3 times more than the majority class, in order to cope with the slightly imbalanced training set. 2D MOE descriptors were performing better than fingerprints and/or VolSurf descriptors, especially for sensitivity, MCC and AUC. Combining the VolSurf descriptors with 2D MOE descriptors also did not provide any significant improvement of the results. From the whole set of 2D MOE descriptors we decided to use a subset of 93 interpretable descriptors that give almost the same performance compared to using all 2D MOE descriptors. Apart from the 93 2D descriptors, we also included the predicted transporter inhibition profiles. In order to assess the importance and significance of this additional information individually, we used them in different combinations: all transporters, only BSEP, all transporters excluding either BSEP, or P-gp, or BCRP, or the OATPs. This led to in total seven models ( Table 1).
It is interesting to mention that we also exchanged the training-test set roles between the two data sets and tried to generate a model for cholestasis based on the bigger data set from the work of Mulliner et al. 33 However, even though the results for 10-fold cross validation were equivalent to the model we propose (generated on the compiled training set of 578 compounds), the results for the external validation of the 578 compounds were rather poor (results not shown). Thus, we decided to stay with our original model. Furthermore, we tried to generate a cholestasis model on the merged data for the subset of 93 2D MOE descriptors, with or without the transporters interaction profiles. Interestingly, the k-nearest neighbor settings (k = 5), which gave quite satisfactory results for 10-fold cross validation while modeling either the training or the test set standalone, did not have the same effect for the united data. For the merged data set SVM (SMO implementation in WEKA) using a polynomial kernel, with exponent equal to 2, performs better. The use of MetaCost with a cost matrix of [0.0, 1.0; 5.0, 0.0], due to the new imbalance ratio of the data, is also necessary. Additionally, under these settings, the performance of the model is significantly better when using the transporters predictions as additional descriptors. The obtained performance of this model, as well as the respective p-values of the performed two-sample paired t test out of 50 iterations, is presented in the Supporting Information (Table S4).

Journal of Chemical Information and Modeling
Article Inspecting the obtained results in Table 1, it becomes obvious that the best settings for the model for 10-fold cross validation are achieved with the inclusion of all transporter inhibition predictions in the list of descriptors. Nevertheless, this is not the case for the external validation, where including predicted inhibitor profiles for all transporters yields lower accuracy and specificity values, while sensitivity remains almost the same. Interestingly, the use of BSEP inhibition prediction stand-alone does not seem to be sufficient. There is a drop in the statisticsespecially for sensitivityin comparison to the use of the whole set of transporter predictions, both for 10-fold cross-validation and for the external test set.
Statistical Analysis of Transporter Predictions on the Model's Performance. In order to assess if the predicted transporter inhibition profiles indeed statistically significantly improve the models, we performed 50 iterations of 10-fold cross validation followed by a two sample t test on the performance parameters. For this we used the models with 2D MOE descriptors, 2D MOE plus all transporters, 2D MOE plus BSEP, and 2D MOE plus all transporters without BSEP ( Table  2).
Analyzing the p-values for the pairwise comparisons (Supporting Information Table S5) the main conclusion is that indeed the use of liver transporter inhibition predictions contributes significantly to the models performance when compared to the use of only 2D physicochemical descriptors. Interestingly, it is not only the BSEP inhibition contribution, which matters. On the contrary, when only BSEP predictions are used, there is only a slight increase in sensitivity and specificity of the model, while sensitivity decreases. The other way round, if all transporters predictions are used, sensitivity increases, but accuracy and specificity reach their peak only after the inclusion of BSEP predictions. This suggests that BSEP apparently contains important information. Nevertheless, it does not contain all the important information, despite the fact that it is the most widely discussed transporter in literature with respect to cholestasis. 8,10,16−21 A possible explanation for this interesting result could be the different thresholds required for BSEP inhibition to cause cholestasis versus the threshold of the BSEP inhibition model. For the BSEP model, every compounds with an IC 50 > 50 μM was classified as noninhibitors, while IC 50 < 10 μM was the threshold for characterizing a compound as inhibitor. However, for the development of cholestasis, an IC 50 < 300 μM has been reported as sufficient. 17 Thus, potentially predicted noninhibitors could actually be cholestatic compounds. The other transporters included in our study are in general not well described in literature via experimental procedures, but they are rather pinpointed due to the fact that they are transporting bile salts or bile conjugates (with the exception of P-gp, whose role is attributed mainly to drug−drug interactions). Thus, our study gives extra weight to literature indications concerning BCRP, P-gp, OATP1B1, and 1B3.
It was quite curious that even though the transporters predictions significantly increase the performance both for the model built on the training set of 578 compounds and for the model trained with the merged data set of 1904 compounds (but with a different base classifier); this was not the case for the prediction on the test set. The performance of all statistics metrics decrease, when transporters predictions are used as descriptors. We are unable to provide a solid explanation for this phenomenon. We can only speculate a different way of class assignment between the two data sets, since they are coming from different sources. Another plausible explanation could be the fact that the external validation set had a contradiction rate of almost 20% regarding the class labels of those compounds shared with the training set (71 out of 419 shared compounds had contradictory class labels). We assume that this is due to the drawbacks of the toxicity reporting system: under-reporting, 66−68 voluntarily basis, 68−70 difficulties to obtain the data, which are often proprietary, 66 as well as the lack of the prerequisite of a causal relationship between drug and adverse event. 68 In any case, despite these contradictions between the training and the test set, the model retained its satisfactory performance.
Furthermore, we would like to mention that there is also experimental evidence for the implication of the basolateral efflux transporters MRP3 and MRP4, [8][9][10]20 as well as for the canalicular efflux transporter MRP2. [8][9][10]22 We are aware of this fact, but for these transporters there are currently not sufficient data available to develop high quality models that can be further used for contributing to our cholestasis model. Additionally, we should pinpoint the fact that we are using predictions for the inhibition of transporters rather than real in vitro data. This could potentially accumulate some additional error in the final predictions of the cholestasis model. In any case, despite any deteriorating factors, our final in silico models for cholestasis were extensively validated with 10-fold cross validation and statistical tests. Furthermore, the external validation set was of a significant size being even bigger than the training set. For both cases the results were quite satisfactory. Moreover, for both training sets (regular and merged one) there is a trend indicating the importance of transporter predictions in the development of cholestasis. The performance of the classification models from which we obtained the predictions provides us with enough confidence to use them in our input matrix. Moreover, the percentage of the cholestasis data that are within the applicability domain of the transporters models ranges between 93.1% and 99.5% (Table S3, Supporting Information), which is quite satisfactory and further enhances our confidence in using predicted transporter interaction profiles as descriptors.

■ CONCLUSIONS
In this study we present a two-class classification model for the prediction of cholestasis (or cholestatic DILI) based on a public data set of 578 compounds. The model is incorporating information both from 2D physicochemical descriptors, as well as predictions of inhibition of the hepatic transporters BSEP, BCRP, P-gp, OATP1B1, and OATP1B3. The performance of the resulting model is rather satisfactory and is validated both via 50 iterations of different 10-fold cross validations, as well as an external test set of over 1300 compounds. 33 Our results demonstrate that adding transporter predictions as additional descriptors to the list of 2D physicochemical descriptors is improving model performance. This is in alignment with evidence from literature which shows that inhibition of selected hepatic transporters contributes to cholestasis.
Interestingly, the increase in model performance cannot be attributed solely to BSEP inhibition, which is the transporter that is most correlated to cholestasis in literature. The performance increases only when the whole list of transporter inhibition predictions is included. This result points toward a rather synergistic effect of several transporters, including the less elucidated role of OATPs, BCRP, and P-gp in cholestasis.

Article
Our study is the first of its kind regarding combining physicochemical descriptors and predicted transporter information in order to predict cholestasis. This provides a useful extension to previous approaches for the prediction of cholestasis.
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00518.
List with the subset of 93 2D MOE descriptors (Table  S1), a brief description of the models used for the transporters predictions (Table S2), the level of applicability domain coverage of the transporters models on cholestasis data and the cholestasis models for the cholestasis test set (Table S3), the performance of the model trained on the merged data set of 1904 compounds (Table S4), and the p-values of the pairwise statistical comparison of the models (Table S5)