Closing the Organofluorine Mass Balance in Marine Mammals Using Suspect Screening and Machine Learning-Based Quantification

High-resolution mass spectrometry (HRMS)-based suspect and nontarget screening has identified a growing number of novel per- and polyfluoroalkyl substances (PFASs) in the environment. However, without analytical standards, the fraction of overall PFAS exposure accounted for by these suspects remains ambiguous. Fortunately, recent developments in ionization efficiency (IE) prediction using machine learning offer the possibility to quantify suspects lacking analytical standards. In the present work, a gradient boosted tree-based model for predicting log IE in negative mode was trained and then validated using 33 PFAS standards. The root-mean-square errors were 0.79 (for the entire test set) and 0.29 (for the 7 PFASs in the test set) log IE units. Thereafter, the model was applied to samples of liver from pilot whales (n = 5; East Greenland) and white beaked dolphins (n = 5, West Greenland; n = 3, Sweden) which contained a significant fraction (up to 70%) of unidentified organofluorine and 35 unquantified suspect PFASs (confidence level 2–4). IE-based quantification reduced the fraction of unidentified extractable organofluorine to 0–27%, demonstrating the utility of the method for closing the fluorine mass balance in the absence of analytical standards.


Table of Contents
Table S1: Detailed information on the individual marine mammals sampled.
Table S2: Information for targeted PFAS and isotopically labelled internal standards, recovery and limits of quantification.
Table S3: Eluent information for the ion chromatography part of the EOF analysis by CIC.
Table S4: Inclusion list for suspect screening.
Table S5: The variable importance of 10 most influential PaDEL descriptors in the final model.
Table S6: All target PFAS with homologue series chemicals with -CF2-difference.
Table S7: All target PFAS with homologue series chemicals with -C2F4-difference.
Table S11: Fluorine mass balance calculations: known and unknown EOF.

Chemical and reagents
Acetonitrile ( ≥99%, Chromasolv TM ) was from Honeywell (France).Envicarb (Supelclean TM ) came from Sigma Aldrich.Stainless steel beads came from Next Advance©.Argon and oxygen gases used during CIC analysis were of purity grade 5.0 milliq, and fluoride standard (1000 mg/L) was from Thermo Scientific.Water was purified with a Millipore purification system that had a resistance of <18 MΩ/cm (Milli-Q water).Ammonium acetate and methanol (99.8%,LiChrosolv®) were from Merck (Darmstadt, Germany).

Sample preparation for organofluorine mass balance
Approximately 0.5 g of liver was fortified with 4 ml of acetonitrile together with 7-8 beads (stainless steel ø 4.8 mm) and homogenized in a bead blender (SPEX SamplePrep 1600 MiniG®) for 5 min at 1500 rpm.Samples were then centrifuged at 2000 rpm for 5 min (Centrifuge 5810, Eppendorf, Hamburg), and the supernatant was transferred to a new 13 mL PP tube.The extraction was repeated by adding another 4 mL of acetonitrile, vortexing and centrifuging again.The new supernatant was added to the existing tube containing the previous supernatant.Combined extracts were concentrated to 1 mL under a stream of nitrogen in a water bath at 40 ℃ (TurboVap LV Evaporator, Biotage).Concentrated extracts were then weighed and added to a 1.7 ml Eppendorf tube containing 25 mg EnviCarb and 50 μl acetic acid.The samples were vortexed and centrifuged for 10 min at 10 000 rpm (Galaxy 14D, Microcentrifuge, VWR), then split into two aliquots of 250 uL each: Aliquot 1: 250 uL of supernatant destined for UPLC-HRMS analysis was transferred to another Eppendorf tube and spiked with 1 ng of IS mix.250 μl of NH4OAc (4 mM in water) was added to the extracts as a buffer, vortexed and stored at -20 ℃.Upon analysis, extracts were adjusted to room temperature, vortexed, and transferred to LC vials.
Aliquot 2: 250 uL of supernatant destined for CIC analysis was transferred to another Eppendorf tube and stored at -20 ℃.ISs are not added to this aliquot since the fluorine in the isotopically labelled standards would influence the results.Upon analysis, extracts were adjusted to room temperature, and vortexed if needed.

Extractable organofluorine analysis
EOF measurements were carried out with a Thermo-Mitsubishi CIC using a previously described method.Extracts (200 uL per sample) were transferred into ceramic containers ("boats″) which contained glass wool for fluid dispersion.Boats were prebaked prior to analysis after being cleaning with soap, water and a basic solution.Samples were combusted slowly in a horizontal furnace (HF-210, Mitsubishi) at 1100°C under a flow of oxygen (400 mL/min), argon (200 mL/min), and water vapor mixed with argon (100 mL/min) for 5 minutes.Combustion gases were absorbed by MilliQ water during the combustion process with a gas absorber unit (GA-210, Mitsubishi).200 uL of the absorption solution was then injected onto an ion chromatograph (Dionex Integration HPIC, Thermo Fisher Scientific) equipped with an anion exchange column (2 x 50 mm guard column (Dionex IonPac AS19-4um) and 2 x 250 mm analytical column (Dionex IonPac AS19-4um) operated at 35°C.A gradient of aqueous hydroxide mobile phase was ramped from 8mM to 100 mM at 0.25 mL/min (Table S3) and fluoride was detected using a conductivity detector.
A standard calibration curve was prepared with a range of 0.05 to 100 ug F/mL and subsequently used for quantification within the linear range (R 2 >0.97).Mean fluoride concentration from procedural blanks was subtracted from samples before quantification.Boats were prebaked prior to analysis after being cleaning with soap, water and a basic solution.
Method detection limit (MDL) was calculated using the standard deviation of F concentrations from procedural blanks (n=3, each batch).
A Q-Exactive™ ultra high mass resolution (UHMR) hybrid Quadrupole-Orbitrap™ mass spectrometer was used with alternating Full Scan (FS) Data-Dependent (DD) MS 2 mode.In FS the scan range was 200-1800 m/z in resolution was 120 000 using the full width at half maximum definition for an m/z of 200.The Automatic Gain Control (AGC) target was 3 x 10 6 .Maximum inject time was 250 ms.
Electrospray ionization settings were set to negative mode, sheath gas flowrate was 30 arbitrary units (AU), aux gas flowrate was 10 AU, sweep gas flowrate was 0 AU, spray voltage was 3.7 kV, capillary temp was 350 °C, S-lens RF level was 50 AU, and aux gas heater temp was 350 °C.
For each analyte, the concentration of the lowest calibration standard with at least a 10000 counts per second (cps) and signal to noise ratio higher than 3 was used as the method limit of quantification (method LOQ).The mean extract volume ( = 1045.8uL) and mean sample mass (  = 0.54 g) were used to determine the LOQ in liver samples.
If a compound was found to be present in the blanks, its concentration in blanks plus three times the standard deviation in the blanks was used in place of the concentration in the lowest suitable calibration level to determine the LOQ with equation (1).

Confidence levels
Confidence Levels (CLs) were assigned to suspects according to the scale proposed by Schymanski et al (see overview in the SI). 23Briefly, CL 1 is a confirmed structure by comparison with a reference analytical standard, CL 2 is a probable structure and it is subdivided in CL 2a, which is assigned when the MS 2 is matched with literature or a library, and CL 2b, which in this study has been assigned to those suspects that are part of homologue series for which retention times increase with increasing chain length, but none or only a few fragments are observed.CL 3 is a tentative candidate for which one or more possible structures are proposed.CL 4 refers to a unequivocal molecular formula and CL 5 to and exact m/z of interest.Refer to accompanying Excel file.

Refer to accompanying Excel file.
Table 10: Suspects concentrations in ng/g, sum of suspects concentrations (∑Suspects (ng/g)), Suspects concentrations in ng F/g and sum of suspects concentrations in fluorine equivalents (∑FSuspects (ng F/g)).
Refer to accompanying Excel work book.
Table S11: Fluorine mass balance calculations: known and unknown EOF.
Refer to accompanying Excel work book.or larger (dark blue) compared to respective suspect and B) concentration estimations using predicted ionization efficiency for the same set of chemicals with leave-one-out approach.The mean fold errors in concentrations were 2.6 and 3.1 when smaller or larger homologue was used for quantification, respectively.The mean fold error for quantification with model was 2.7 compared to mean fold error over all homologues of 2.8.The difference between quantification with model compared to using a homologue series approach was statistically insignificant (Wilcoxon signed rank test; p = 0.26).
Similarly, the 10 target PFAS in a homologue series with difference in one -CF2-unit resulted in a mean fold concentration error of 2.2×, while the logIE modelling approach had a mean fold concentration error of 2.5×.The quantification with a smaller homologue (fold error of 2.2×) and larger homologue (fold error of 2.3×) performed essentially the same (Figure 4E and 4F).

Figure S1 :
Figure S1: Schematic of sample preparation for fluorine mass balance determination.

Figure S2 :
Figure S2: Comparison between quantification with a homologue series compound of -C2F4difference compared to model approach.

Figure S17 :
Figure S17: Target and model quantification in three spiked liver samples.

Figure S18 :
Figure S18: Predicted concentration in pg/ul for different isomers of Cl-PFNA.

Figure S19 :
Figure S19: Predicted concentration in pg/ul for different isomers of ether-PFOS.

Figure S1 :
Figure S1: Schematic of sample preparation for fluorine mass balance determination.

Figure S2 :
Figure S2: Comparison between quantification with A) a homologue of -C2F4-unit smaller (light blue)or larger (dark blue) compared to respective suspect and B) concentration estimations using predicted ionization efficiency for the same set of chemicals with leave-one-out approach.The mean fold errors in concentrations were 2.6 and 3.1 when smaller or larger homologue was used for quantification, respectively.The mean fold error for quantification with model was 2.7 compared to mean fold error over all homologues of 2.8.The difference between quantification with model compared to using a homologue series approach was statistically insignificant (Wilcoxon signed rank test; p = 0.26).Similarly, the 10 target PFAS in a homologue series with difference in one -CF2-unit resulted in a mean fold concentration error of 2.2×, while the logIE modelling approach had a mean fold concentration error of 2.5×.The quantification with a smaller homologue (fold error of 2.2×) and larger homologue (fold error of 2.3×) performed essentially the same (Figure4E and 4F).

Figure S3 :Figure S4 :
Figure S3: PFAS profiles in marine mammal liver samples.GD: Greenlandic dolphins, PW: Greenlandic pilot whales, SD: Swedish dolphins.Profiles are similar among all samples, with the notable difference of 7:3 FTCA being present only in SD as well as FOSA having a higher percentage in these samples.

Figure S6 :
Figure S6: Extracted Ion Chromatograms of n:3 fluorotelomer carboxylic acids (n:3 FTCAs) in liver sample of Swedish dolphin (SD2).Single scan dropouts are due to the triggering of MS/MS scans.

Figure S17 :
Figure S17: Target and model quantification in three spiked liver samples.Concentrations in pg/μl in final extract.

Figure S18 :
Figure S18: Predicted concentration in pg/ul for different isomers of Cl-PFNA, chlorine position from closest to furthest from carboxyl functional group (left to right).Even though there is variability, predicted concentrations stay in the same order of magnitude.

Figure S19 :
Figure S19: Predicted concentration in pg/ul for different isomers of ether-PFOS, ether position from furthest to closest from sulfonic acid functional group (left to right).Even though there is variability, predicted concentrations stay in the same order of magnitude.

Figure S20 :
Figure S20: Model application domain was visually assessed with principal component analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE) analysis to ensure that the suspect chemicals are similar to chemicals used in training.The results for targets, suspects and non-PFAS included in Liigand et al. of A) first and second principal component, B) first and third principal component and C) t-SNE analysis confirmed that the targets added to the training set overlapped with the suspects and therefore, the ionization efficiency prediction model is appicable on suspect PFAS.

Table S1 :
Detailed information on the marine mammals sampled.Empty cells indicate unknown information.NMR: Swedish museum of natural history; ACES: department of environmental science at Stockholm University, CITES: Convention on International Trade in Endangered Species of Wild Fauna and Flora.

Table S2 :
Information for targeted PFAS.Some compounds did not have a direct internal standard therefore internal standard with the closest possible structure was used.IS: internal standard, SD: standard deviation, LOQ: limit of quantification.
* : indicates targets not found in any liver sample.

Table S3 :
Eluent information for the ion chromatography part of the EOF analysis by CIC.

Table S4 :
Inclusion list for suspect screening.

Table S5 :
The relative variable importance and details of the 10 most influential PaDEL descriptors for the final model based on combined training and test set data.

Table S6 :
All target PFAS with homologue series chemicals with -CF2-difference.

Table S7 :
All target PFAS with homologue series chemicals with -C2F4-difference.