In Silico Identification of Potential Thyroid Hormone System Disruptors among Chemicals in Human Serum and Chemicals with a High Exposure Index

Data on toxic effects are at large missing the prevailing understanding of the risks of industrial chemicals. Thyroid hormone (TH) system disruption includes interferences of the life cycle of the thyroid hormones and may occur in various organs. In the current study, high-throughput screening data available for 14 putative molecular initiating events of adverse outcome pathways, related to disruption of the TH system, were used to develop 19 in silico models for identification of potential thyroid hormone system-disrupting chemicals. The conformal prediction framework with the underlying Random Forest was used as a wrapper for the models allowing for setting the desired confidence level and controlling the error rate of predictions. The trained models were then applied to two different databases: (i) an in-house database comprising xenobiotics identified in human blood and ii) currently used chemicals registered in the Swedish Product Register, which have been predicted to have a high exposure index to consumers. The application of these models showed that among currently used chemicals, fewer were overall predicted as active compared to chemicals identified in human blood. Chemicals of specific concern for TH disruption were identified from both databases based on their predicted activity.


S1. Dataset preparation
Thyroperoxidase (TPO): the TPO inhibition dataset was taken from Friedman et al. 1 and in this study, 1074 ToxCast chemicals were screened in a single-concentration using the Amplex UltraRed TPO assay for their potential to inhibit TPO. In brief, the screening revealed 314 potent TPO inhibitors that elicited more than 20% decrease in maximal TPO activity. They were then divided into 3 categories; "highly selective", "low selective" and "non-selective" as defined by Friedman et al. 1 and then the non-selective compounds decreasing TPO activity by less than 22% were removed to obtain a more specific dataset with TPO inhibitors.
Iodothyronine deiodinases 1, 2, 3 (DIOs): datasets for inhibition of three deiodinases (DIO1, DIO2 and DIO3) were taken from Olker et al 2 . In this work, 1800 chemicals from the ToxCast database were screened for chemicals potency to inhibit DIO1, DIO2 and DIO3 in single-concentration experiments. The 240 potent inhibitors were tested in a concentration-response mode. Putative deiodinase inhibitors were defined as those confirmed in concentration-response screening and those showing more than 20% inhibition in the single concentration screening.
Sodium/iodide symporter (NIS). Data for NIS inhibition was taken from Wang et al 3 , who screened 293 unique chemicals selected from the ToxCast Phase I library in the Radioactive Iodide Uptake (RAIU) assay.
The library contains environmental contaminants, mostly pesticides and antimicrobials. After the initial single concentration testing, full dose-response experiments for 136 chemicals were performed in which 90 were eliminated due to cytotoxicity.

S2. Information on Human Blood Data Base (HBDB) and Swedish Product Register (SE-PR) lists
HBDB: The HBDB used in the current study is an in-house database described in details elsewhere 4 . It consists of 440 anthropogenic organic chemicals found in human blood world-wide, reported in studies published between 2000-2020. 419 compounds with clearly defined structures and which could be processed with RDkit were used for the prediction in this study. Pharmaceuticals, endogenous compounds and metals are not included in the HBDB and neither articles written in other languages than English or Swedish. SMILES, CAS number, and other identifiers were collected for all chemicals. The database is not considered exhaustive, as individual chemical names could have been foreseen with the general search terms used (we refer to the original publication by Engelhardt et al. 4 . for the details). The chemicals detected in blood HBDB mainly consists of nonpolar chemicals with 83% halogenated structures and 69% aromatic structures, out of which 11% are phenolic. The chemical groups in the HBDB are listed in Table S3. SE-PR: Anyone manufacturing or importing products to Sweden have to provide information on chemicals and chemical products to the Swedish Chemicals Agency. The information is stored in the SE-PR, containing information on e.g., quantity, product category and sector of use, and if the product is available to consumers. To be able to predict the potential of the chemicals in products to reach different human and environmental media, an exposure index (EI) was developed and applied to the SE-PR 5 . The target exposure matrices in the EI includes surface water, soil, air, sewage treatment plant and human (occupational and consumer). The calculation of the index utilizes use categories, quantities, consumer availability, hazard labelling, and number of products containing the compound. An evaluation of the calculation resulted in a slightly updated calculation and is described in detail elsewhere 6 . The SE-PR dataset used here contains organic chemicals with the highest exposure index to consumers (>6), and consists of 937 individual entries. Table S1. Performance of thyroid-specific conformal prediction models with significance levels (SL) of 0.02, 0.05, 0.15 and 0.3. nAp, nIAp, nBp, nEpnumber of compounds predicted as "active", "inactive", "both", and "empty"; TPRtrue positive rate; FPRfalse positive rate; TDR(A, B)true discovery rate for "active" and "both" regions; NPVnegative predictive value  Table S2. General CP models performance (all tested SLs). nAp, nIAp, nBp, nEp -number of compounds predicted as "active", "inactive", "both", and "empty";

S5
TPRtrue positive rate; FPRfalse positive rate; TDR(A, B)true discovery rate for "active" and "both" regions; NPVnegative predictive value  Figure S1. Efficiency, true discovery rate of "active" prediction region (TDR.A), true positive rate (TPR) and false positive rate (FPR) of general toxicity models S12  Figure S2.   Figure S6. Chemical space of HBDB and SE-PR investigated with principal component analysis (PCA).
Dimensionality is reduced to 2 dimensions explaining 35% of the total variation in the data. Compounds from HBDB and SE-PR lists predicted to be active with 90% confidence level in at least 9 thyroid-specific models are shown as orange and violet diamonds, respectively. All other compounds from the lists are shown in green and olive. Notably the most predicted active chemicals from both sets are dissimilar, HBDB actives (orange diamonds) are forming a cluster with a couple of exceptions. Most active compounds from SE-PR (violet diamonds) in general tend to separate from the rest SE-PR compounds (light-green circles)