Proteogenomic Workflow Reveals Molecular Phenotypes Related to Breast Cancer Mammographic Appearance

Proteogenomic approaches have enabled the generat̲ion of novel information levels when compared to single omics studies although burdened by extensive experimental efforts. Here, we improved a data-independent acquisition mass spectrometry proteogenomic workflow to reveal distinct molecular features related to mammographic appearances in breast cancer. Our results reveal splicing processes detectable at the protein level and highlight quantitation and pathway complementarity between RNA and protein data. Furthermore, we confirm previously detected enrichments of molecular pathways associated with estrogen receptor-dependent activity and provide novel evidence of epithelial-to-mesenchymal activity in mammography-detected spiculated tumors. Several transcript–protein pairs displayed radically different abundances depending on the overall clinical properties of the tumor. These results demonstrate that there are differentially regulated protein networks in clinically relevant tumor subgroups, which in turn alter both cancer biology and the abundance of biomarker candidates and drug targets.


S-2
Supporting Information - Table of Contents   Supplemental Tables   -Table S1. Clinical characteristics of breast cancer cohorts.
- Table S2. Clinical and histopathological information per sample. (XLSX) -  Table S10. Correlations between RNA and protein data by ER status (Sample Set 1 correlations within ER positive and ER negative tumors; RNA-DDA and RNA-DIA).

Supplemental Figures
- Figure S1. Computational workflow generated to analyze the DIA dataset (schematic overview of DIA computational workflow).
- Figure S2. Scheme of DIA search and overview of separate library search results (DIAsearch workflow and summary of identifications).
- Figure S3. Impact of sample preparation method on proteomic identifications (Library peptide/protein identification contribution dependent on sample preparation).
- Figure S4. Sample-level peptide and protein identifications within proteomic layers.
- Figure S5. Evaluation of p-value distribution out of RNA-protein correlation analyses (RNA-protein analyses in Sample Set 1).
- Figure S6. Correlation distribution between RNA and proteomic data in the validation dataset (RNA-protein analyses in Sample Set 2).
- Figure S7. Correlation of RNA-protein correlations between DDA and DIA data layers (Sample-wise RNA-DDA vs RNA-DIA correlations).
- Figure S8. Significant transcript-protein pairs pathway-wise annotation and correlations (Correlation of RNA-proteins across significantly enriched pathways for ER status analsys).
- Figure S9. Analysis of immunohistochemical stainings (Histopathological and immunohistochemical evaluation of tissues out of Sample Set 1).
- Figure S10. Confirmation of integrated findings in the validation dataset (RNA-protein comparison at the level of differential expression and pathway enrichment for ER status and appearance).
- Figure S11. Evaluation of biomarker signatures (RNA-protein correlation across biomarker signatures and FDA drug targets).
S-4 - Figure S12. RNA-protein correlations based on tumor subgroup (RNA-protein correlation within ER status and appearance tumors).
- Figure S13. Pathway representation of transcript-protein pairs displaying discordant correlations between ER positive and negative tumors.
- Figure S14. Pathway representation of transcript-protein pairs displaying discordant correlations between tumor appearance groups.
- Figure S15. Co-regulation clusters in ER positive and negative tumor samples out of the DDA data layer (Summary of step-wise definition of protein co-regulation groups across ER positive and negative tumors at the DDA level).
- Figure S16. Co-regulation clusters in ER positive and negative tumor samples out of the DIA data layer (Summary of step-wise definition of protein co-regulation groups across ER positive and negative tumors at the DIA level).

S-6
Supplementary Figures Figure S1. Computational workflow generated to analyze the DIA dataset.

S-7
Computational workflows (see Methods for details) were established in this study to generate spectral libraries (A) for DIA data search (B), assess differential transcript usage at the proteomic level (C), and detect and quantify SAVs Following our correlation analyses between matching transcripts and proteins (reported in Fig. 2), we evaluated the distribution of the negative and positive correlation coefficients-related p-values. We noticed that the uncorrected pvalues from negative Spearman correlations seemed to follow a background-like distribution (left panels). These pvalues became non-significant after Benjamini-Hochberg correction (right panels), thus suggesting that they might not be statistically meaningful or that the effect size might be too small.   Results for KI67 clinical cutoff (≥ 30% positive cells) are depicted as contingency tables in panel C.

Figure S10. Confirmation of integrated findings in the validation dataset.
Panels A and B display all transcript-protein pairs scaled Log2Ratios for ER status and appearance, respectively.
Significant differential expression at the RNA level is marked by full dots and in bigger size; concordance and discordance between RNA and protein layers are shown in green and purple, respectively). GSEA analyses were performed on RNA and DIA data layers for ER and spiculation statuses using the    In order to define the most represented pathways in our DDA MS data, we generated Spearman correlation matrices within the ER positive and ER negative tumor groups (A). The elbow method was used to define the minimum number Process.

S-28
Figure S16. Co-regulation clusters in ER positive and negative tumor samples out of the DIA data layer.

S-29
Following our analysis of clusters for DIA data, we generate Spearman correlation matrices within the ER positive and ER negative tumor groups (A) out of the DIA data layer. The elbow method was used to define the minimum number of clusters for both sample groups (ER positive: 13 clusters; ER negative: 13 clusters; panel B). Highly similar clusters based on Panther over-representation test-derived GOBP annotation were merged after calculating the distances between them in the ER positive (C) and negative (D) subgroups, and by selecting a new minimum number of protein clusters (E).

S-30
Figure S17. Protein cluster regulation dependent on Estrogen Receptor status out of the DDA data layer.
Co-regulated protein clusters in ER positive (left) and negative (right) tumors (see Fig S14) were extracted from the DDA data, annotated with GOBP terms, condensed, and visualized in Cytoscape (panel A). Edge thickness and length relates to cluster distance (Euclidean), node color relates to the scaled mean intensity of all protein in each cluster, and node size depends on the number of proteins in each cluster.

S-31
Panel B shows the correlation to mRNA of each protein per cluster for ER positive and negative tumors. Panel C displays the distributions of the differences between Spearman correlation coefficients between ER positive and negative tumors across clusters of co-regulation (FDA drug targets only).
Abbreviations: DDA: Data Dependent Acquisition ER: Estrogen Receptor; FDA: Food and Drug Administration. S-32