DNA Adductomics for the Biological Effect Assessment of Contaminant Exposure in Marine Sediments

Exposure to chemical pollution can induce genetic and epigenetic alterations, developmental changes, and reproductive disorders, leading to population declines in polluted environments. These effects are triggered by chemical modifications of DNA nucleobases (DNA adducts) and epigenetic dysregulation. However, linking DNA adducts to the pollution load in situ remains challenging, and the lack of evidence-based DNA adductome response to pollution hampers the development and application of DNA adducts as biomarkers for environmental health assessment. Here, we provide the first evidence for pollution effects on the DNA modifications in wild populations of Baltic sentinel species, the amphipod Monoporeia affinis. A workflow based on high-resolution mass spectrometry to screen and characterize genomic DNA modifications was developed, and its applicability was demonstrated by profiling DNA modifications in the amphipods collected in areas with varying pollution loads. Then, the correlations between adducts and the contaminants level (polycyclic aromatic hydrocarbons (PAHs), trace metals, and pollution indices) in the sediments at the collection sites were evaluated. A total of 119 putative adducts were detected, and some (5-me-dC, N6-me-dA, 8-oxo-dG, and dI) were structurally characterized. The DNA adductome profiles, including epigenetic modifications, differed between the animals collected in areas with high and low contaminant levels. Furthermore, the correlations between the adducts and PAHs were similar across the congeners, indicating possible additive effects. Also, high-mass adducts had significantly more positive correlations with PAHs than low-mass adducts. By contrast, correlations between the DNA adducts and trace metals were stronger and more variable than for PAHs, indicating metal-specific effects. These associations between DNA adducts and environmental contaminants provide a new venue for characterizing genome-wide exposure effects in wild populations and apply DNA modifications in the effect-based assessment of chemical pollution.

, using ACN and MeOH as organic phase, respectively. S12 Fig. S7. The heatmap of Pearson correlation among the adducts. S13

S18
LIST OF TABLES Table S1. Summary of environmental conditions in the study areas, the Bothnian Sea and Northern Baltic Proper, during the sampling period (January 2018). Table S2. Studies that provided contaminant concentration data to support the classification of the study sites as moderately-to-heavily contaminated and reference sites (low contamination levels). S19 S20 Table S3. Generalized Linear Model output. S21 Table S4. PERMANOVA output for evaluating effects of basin (Basin) and contamination status (Status) on the concentrations of 10 metals and 11 PAH congeners in sediments. Table S5. HPLC methods tested for high mass adducts using BPDE-dG.  Table S7. Unpaired t-test comparing adduct-PAH correlation coefficients between the high-and low-mass adducts. CSV file S3. SIMPER outcome for selecting features responsible for 80% of the dissimilarities in the DNA adducts between the contaminated and reference sites. The average values for each adduct in the samples classified as Contaminated and Reference, the average squared distance, the ratio of the average contribution divided by the standard deviation (SD) of those contributions across all pairs of samples, and the relative contribution and the cumulative percentages are shown.

S3
Note S1. Chemical evaluation of contamination level in the sampled sediments

Sampling
Sediment samples were collected on three occasions: (1) January 2018, as a part of the amphipod collection campaign in BS and NBP coordinated by the Effect Screening Study and SNMMP; (2) January 2012, as a part of the sediment contamination survey within SNMMP in BS and NBP; and (3) January 2011, within the recipient control study in BS ( Table 1). The environmental conditions during sampling in the study areas are shown in Table S1. We selected 19 sites with well-characterized sediment contamination status, 9 in BS and 10 in the NBP (Fig. 1).
We used a benthic sled set for all field collections to sample the upper 2-3 cm of the sediment; thus, the measured concentrations represent the conditions in the uppermost sediment layer where the amphipods reside. The macrofauna was removed by sieving with 0.5-mm mesh, and the sediment was frozen at -20 °C. 1

Chemical analyses
Aliquots of 20 g (wet mass) of each sediment sample was used to determine TOC values after drying and homogenization. We used a Flash 2000 (Thermo Scientific) elemental analyser at Stockholm University, Department of Ecology, Environment and Plant Science, and expressed TOC values as % of the dry weight (dwt). The percent moisture in sediment was determined by calculating the weight lost during the drying procedure and the values reported.
Aliquots of 50 g of the frozen sediment were used for PAHs (11 priority congeners) 2,3 at the Swedish Environmental Research Institute (IVL), Sweden and 1 g for metal (As, Cd, Co, Cr, Cu, Hg, Ni, Pb, V and Zn) analyses at ALS Scandinavia, Sweden. Both laboratories have accreditation for their analytical services. Details on methods, detection limits, quality control and other information can be found on the company website (https://www.ivl.se/english/ivl/our-offer/our-services/analysis-ofsoil-and-sediment.html). The results were expressed as a concentration in ng g -1 dwt.
For the metal analysis in sediment samples, an aliquot of approximately 1 g of homogenized wet sediment was dried at 105 °C to constant weight. The metals (As, Cd, Co, Cr, Cu, Ni, Pb, Zn, and Hg) were analyzed by ICP-SFMS, ICP-AES and AFS at ALS Scandinavia; all analytical methods were accredited (SWEDAC). Details on methods, detection limits, quality control and other information can be found on the company website (https://www.alsglobal.se/en/environment/trace-metal-analysis). The results were expressed as a concentration in mg kg -1 dwt.

Data analysis
Classification principles. The chemical contaminant data were used to classify sampling sites into two categories (1) reference sites, where contaminant levels are relatively low, and the sediments are likely non-toxic and (2) contaminated sites, with at least one contaminant being at the potentially harmful levels ( Table S2). As PAH and metal concentration data were available for all sampling sites, these two contaminant groups were used in the classification.
For the total assessment of the metal contamination degree, we used the Tomlinson Pollution Load Index (PLI) calculated as a geometric average of the concentration factors 4 for the nine metals (As, Cd, Co, Cr, Cu, Hg, Ni, Pb and Zn): CF is the concentration factor for each metal calculated as: Cmetal is the measured concentration of the metal (mg/kg dwt) and Cbaseline is the regional reference value for this metal 2 ; n is the number of the metals included in PLI calculations.
As a criterion for the PAH contamination threshold for DNA adduct formation, we used a reported value of 290 µg/kg dwt established by a hockey-stick regression model relating DNA adduct formation in fish to sediment PAH concentration; 5 in this study, a mixture similar to the PAH composition in the test sediments from the Baltic Sea was used. The model approach was chosen to facilitate our site classification and identify exposure levels at which detectable and biologically relevant increases in DNA damage would be expected to occur in the amphipods. The alternative approach based on hazard quotients for individual PAHs and an overall hazard index was considered suboptimal because it would provide an estimate of the worst-case hazard of the sediment PAH to benthic organisms. 6 However, the latter would be toxicologically inconsistent with the hypothesis that this type of DNA damage is a precursor to more overt pathological conditions.
The site was classified as contaminated when either the PLI value exceeded 1 (i.e., due to the heavy metal pollution) or the sum of the PAH congeners (11 congeners measured in all sediments) exceeded 290 µg/kg dwt (due to PAH pollution). As the number of the PAH congeners was relatively low, our total PAH levels are likely underestimated, and, thus, some of the sites assigned as reference could have low-to-moderate PAH levels in sediments.

Statistical evaluation.
To compare contaminant levels for LmPAHs and HmPAHs, and PLI between the basins, we used Generalized Linear Models with exponential distribution and reciprocal link as implemented in JMP (version 15). Moreover, the complete datasets for PAH and metal concentrations were analyzed by main and pairwise permutational analysis of variance (PERMANOVA) in PRIMER v.6.4 7 with 999 permutations at a significance level of p<0.05. The concentrations were log(x+1) transformed and used to calculate Euclidean similarity matrices before the analysis. The homogeneity of dispersion test (PERMDISP) was carried out based on calculated distances to centroids to assess whether significant PERMANOVA results were based on location or dispersion effects. The PERMANOVA design included basin (BS and NBP) and contamination status (contaminated and reference sites) as factors; the interaction was also considered for testing whether the contaminants contributed differently in different basins. The model outcome was visualized by PCO plots. Finally, a similarity percentage (SIMPER) analysis was run to determine contaminant categories contributing most to the differences between the reference and contaminated sites.

Contaminant assessment
The contaminant levels were variable across the sites, and hot spots of contamination were identified for PAHs and metals (Cu, Zn, Pb, As, Cr and Hg). Moreover, some locations exceeded acceptable values provided by the sediment quality guidelines. Thus, a high level of contamination in some areas and high variability in the contaminants were observed even at a small spatial scale (Fig. S1).
Using total PAH concentrations and PLI values based on the metal concentration for all 19 sites, we classified them into reference (3 and 6 for BS and NBP, respectively) and contaminated (5 in each basin) sites ( Fig. 1 and Table 1). Notably, stations used as relatively unpolluted in the SNMMP (6020, 6022, and 6025) were classified as contaminated due to both PLI and total PAH levels exceeding the threshold values.
Using total concentrations of the low-(LmPAHs) and high-mass PAHs (HmPAHs) and metals, we found no significant basin × contaminant status interaction and no significant basin effect on the contaminant levels (Table S3). There was high variability in the contaminant levels between the sites as indicated by the Principle Coordinate Analysis (PCO; Fig. S3) following PERMANOVA that identified Contamination status effect as the only significant factor. The results of PERMDISP showed that the dispersion was not significantly different between the groups (Table S4); thus, the observed effect was due to the distance between the centroids and their location. The ordination biplot (Fig. S3) confirmed that metal contamination was mainly associated with US5 and, to some extent, SU4 sites, whereas high levels of hydrocarbons were associated with the sites 57-58 and Norrsundet. Thus, the difference between the sites assigned as Contaminated and Reference was significant and not affected by Basin. Moreover, all contaminant groups (i.e., metals, LmPAHs, and HmPAHs ) were significantly higher in the sediments classified as contaminated (Fig. S2).
The metals were primarily responsible for 80% of the dissimilarity between the contaminated and reference sites as suggested by SIMPER analysis following the PERMANOVA results (Table S4 and Fig. S4), with the most influential metals being Cu, Zn and Pb. A set of PAHs contributing to the dissimilarity between the site categories included both HmPAHs (BKF, BAA and ICDP) and LmPAHs (PA, PYR, BBF, and FLU), each contributing about 3.5%.

Note S2. Optimization of LC-HRMS for adduct detection.
Two LC-HRMS methods were used for detecting the nucleoside adducts, one for low-mass adducts (up to m/z 350), and the other for high-mass adducts (350-600 m/z). The method for low-mass adducts was presented in our earlier work. 8 For high-mass adducts, optimization of the analytical methodology was performed using BPDE-dG (100 fmol/μL) as a standard.
The optimization was based on having a suitable chromatographic elution and detecting the adduct (if unknown) by the FS/DIA approach. Five HPLC gradients were tested on the Ascentis Express F5 column (Table S5). Also, two mobile phase systems were tested; water and acetonitrile (ACN) with 0.1% formic acid and water and methanol (MeOH) with 0.1% formic acid. The MS detection method included FS coupled with DIA with an inclusion list for high-mass range adducts 350-600 m/z.
Concerning chromatography, it was important that BPDE-dG, a hydrophobic adduct, eluted late in the program to allow for relatively less hydrophobic adducts that eluted earlier to be separated. Several elution methods were tested ( Table S5). Two of the HPLC methods (methods 1 and 2; Table S5) tested with ACN as an organic phase resulted in a late elution of BPDE-dG at the washing step, where the percentage of B was at its maximum. At this point, BPDE-dG elutes with probable matrix impurities in the column, which can result in matrix effects such as ion suppression, decreasing the sensitivity and, thus, the response of the analyte. Another method used with ACN included a 28 min gradient (method 3; Table S5), where BPDE-dG eluted at 13.67 min, and the washing step started at 20 min. When employing HPLC method 5 (Table S5) with MeOH as the mobile phase, the analyte eluted at 20 min while the washing step of the program started at 22 min (Fig. S6). As MeOH was used as the organic phase, the analyte eluted later due to the lower eluotropic strength of MeOH compared to ACN. Although the analyte eluted later in the gradient, it still occurred before the washing step. Moreover, as the same mobile phase with MeOH was used for the LC-MS analysis of the low-mass adducts, this method was selected as an optimal HPLC method for the high-mass nucleoside adducts.
BPDE-dG has a protonated molecular ion, [M+H] + , with m/z 570.1989. The specific fragment ion of BPDE-dG, i.e., the nucleobase-adduct, obtained by the neutral loss of deoxyribose gives an m/z 454.1511. Therefore, this fragment ion should be present in the MS2 spectrum and the molecular ion in MS1 spectrum for BPDE-dG to be selected by the nLossFinder program as a possible adduct. Both ions were present in their respective spectrum with relatively high intensities and high mass accuracy (± 5 ppm; method 5, Table S5; Fig. S6); also seen were other characteristic fragments of BPDE-dG. Therefore, this method was selected for the screening of high-mass adducts.

Fig. S1. Classification results based on the total PAH concentration (SumPAH; µg/kg dwt) and Tomlinson Pollution Load Index (PLI, unitless) for metals in the Bothnian Sea (BS) and the Northern Baltic Proper (NBP).
The SumPAH and PLI values are shown for the sites that were classified as reference (Status: 0) and contaminated (Status: 1). The horizontal lines indicate threshold values for SumPAH (equals 290 µg/kg dwt; blue) and PLI (equals 1; red). Sampling sites are listed on the X-axis.

Fig. S2. Concentrations of LmPAHs and HmPAHs (µg/kg dwt) and total metal concentrations (Sum Metals; mg/kg dwt) in the Bothnian Sea (BS) and the Northern Baltic Proper (NBP) sites that were classified as Reference and Contaminated.
The distribution of each dataset is shown: the maximum, minimum, median, and the range on either side of the median.

Fig. S3. PCO biplot of reference (open symbols; BS0 and NBP0) and contaminated (filled symbols; BS1 and NBP1) sites in each basin.
The analysis was based on the Euclidean similarity matrix calculated from log-transformed concentration values for PAHs and metals. The sites are shown as blue (Bothnian Sea, BS) and red (Northern Baltic Proper, NBP) dots.

Fig. S4. Contribution of specific contaminants to the dissimilarity between the contaminated and reference sites according to SIMPER analysis based on the Euclidean matrices of the normalized data (left Y-axis).
The cut-off was set to 80% of the cumulative percentage explained by the data (right Y-axis). The metals contributed most (3.6% -9.9%) to the overall dissimilarity, followed by a set of HmPAHs (shown in bold) and LmPAHs (in Italics), with each congener contributing about 3.5%. Retention time using the employed LC-condition is shown on the X-axis, m/z of detected precursor ion on the Y-axis, and the circle size corresponds to the adduct's peak area (normalized to dG, as peak areaadduct ×100/peak area-dG).  (Table S5).

Fig. S6. Chromatogram of BPDE-dG (100 fmol/μL) standard solution analyzed with HPLC method 3 (A) and method 5 (B), using ACN and MeOH as organic phase, respectively
Note the EIC of precursor ion (not observed with method 3) and the five fragments, including the nucleobase adduct fragment (m/z 454.1511), and the common fragment representing protonated deoxyribose (m/z 117.0545), with ±3 ppm mass accuracy. For structures of the fragments, see (Motwani 2020). 9 A B S13 Fig. S7. The heatmap of Pearson correlations among the adducts. The colour shade in each square block indicates the strength and the direction of the correlation (positive or negative) between a pair of adducts. LMs, low-mass adducts; HMs, high-mass adducts. Note the lower frequency of the positive correlations across the LMs than the HMs, and even lower LM-HM correlations.

LMs
HMs LMs HMs Fig. S8. The heatmap of significant Pearson correlations between the relative adduct abundance and the contaminants. (A) the low-mass adducts and (B) the high-mass adducts. Each dot is a significant (p < 0.05) correlation with colour and shade, indicating the correlation direction (positive in blue and negative in red) and its strength (Pearson r: -0.62 to 0.25), with more intense shading and bigger circle size for stronger correlations. B A S15 Fig. S9. Pearson correlation coefficients for the adducts (low-mass: green; high-mass: red) and PAH concentrations for 11 congeners. In all congeners, LmPAHs, HmPAHs and the total PAHs, the correlation coefficient values were higher for the high-mass adducts than the low-mass adducts. The box-and-whiskers graphs show the minimum and maximum values (line extending parallel to the box), the lower and upper quartile (extremes of the box), the median (black bar) and the mean (yellow dot).The red line shows the optimal cut-off.

Fig. S11. Area under the receiver operating characteristic (ROC) curve (AUC) using individual adducts identified as predictors in the OPLS-DA model.
The 45-degree diagonal represents the expected projection of a random classifier with no predictive value. The X-axis represents the false positive rate, while the Y-axis represents the true positive rate. For each of the 11 discriminating putative adducts (the adduct name is shown on top), the left panel is the AUC and the corresponding confidence interval, and on the right, the boxplot with the normalized values for the adducts in Contaminated (red) and Reference (green) stations, with the red line showing the optimal cut-off. The box-and-whiskers graphs show the minimum and maximum values (line extending parallel to the box), the lower and upper quartile (extremes of the box), the median (black bar) and the mean (yellow dot).