Modular Reorganization of Signaling Networks during the Development of Colon Adenoma and Carcinoma

Network science is an emerging tool in systems biology and oncology, providing novel, system-level insight into the development of cancer. The aim of this project was to study the signaling networks in the process of oncogenesis to explore the adaptive mechanisms taking part in the cancerous transformation of healthy cells. For this purpose, colon cancer proved to be an excellent candidate as the preliminary phase, and adenoma has a long evolution time. In our work, transcriptomic data have been collected from normal colon, colon adenoma, and colon cancer samples to calculating link (i.e., network edge) weights as approximative proxies for protein abundances, and link weights were included in the Human Cancer Signaling Network. Here we show that the adenoma phase clearly differs from the normal and cancer states in terms of a more scattered link weight distribution and enlarged network diameter. Modular analysis shows the rearrangement of the apoptosis- and the cell-cycle-related modules, whose pathway enrichment analysis supports the relevance of targeted therapy. Our work enriches the system-wide assessment of cancer development, showing specific changes for the adenoma state.


Supporting Texts
Text S1. Checking the robustness of the results by adding noise to the data Microarray data is well known to be often noisy. To improve the quality we collected data over 100 colon samples from each state (normal; adenoma; carcinoma). We chose to analyze parameters that do not rely and require precise measurements. However, we investigated into the robustness of our results by randomly elevating and decreasing half-half of the abundances by 5%. We found that our most relevant results are robust to this noise (see Figures S10 and S11 Tables S2 and  S4).

Text S2. The algorithm for the generation of EntOpt images
Entropy-based visualisation (http://apps.cytoscape.org/apps/entoptlayout) was extensively used in this research to visualise the important changes among the normal, adenoma and carcinoma networks with a link weight sensitive method. The algorithm for the optimal usage of the EntOpt Layout program was described by a member of our research group, Andrea Császár.
i. Each network was first visualised by the built-in Prefuse Force Directed Layout program of Cytoscape.
ii. The maximal run time was set to be 50 000 seconds.
iii. Finally, the program was runned four times, with different settings:

Text S3. Investigating into the modular overlap changes
The ModuLand plugin is able to detect highly overlapping modules, as well as to calculate the effective number and degree of modules. After comparing these between the three networks, we have found that although there are interesting differences, they are equivalent to the number of modules. Therefore, after normalization to the same number of modules, the differences in the overlap values disappeared. Calculating the exact number of modules is a current challenge in network science, thus these results ( Figures S16-19) are interesting, but not reliable enough.

Text S4. The method for the examination of targeted and immunotherapy-related pathways
Targeted and immunotherapy related pathways were analysed with the help of Gene Ontology Consortium, as the proteins belonging to the appropriate GO terms were selected. In case of EGFR and VEGFR inhibitors, 'epidermal growth factor receptor signaling pathway' and 'vascular endothelial growth factor receptor signaling pathway' terms were used, respectively. In the case of immunotherapy, MSI status and mismatch repair proteins are known to be important predictors of therapy efficacy. In this case, the proteins of the GO term 'mismatch repair' were chosen for analysis.
As a next step, after filtering for human proteins, the list of the proteins belonging to the appropriate GO term was downloaded, and the duplications were removed. Then the overlap with our dataset were calculated, and the abundance of the remaining proteins were looked up. Then, the median abundance was calculated for each of the pathways. Their modular affiliation and the relations with the strongest and weakest links in the network were also analysed. Figure S1. The hierarchy of modules in Level 1, calculated with the ModuLand plugin of Cytoscape, with normal link weights This figure was generated with the ModuLand program. After finding the functional modules in the network, the program generates a hierarchy between them, with the use of the modular link weights. The modules are highlighted with different colors, and the links between them are depicted with grey lines. Normal link weights were used in this calculation. S10 Figure S2. The hierarchy of modules in Level 1, calculated with the ModuLand plugin of Cytoscape, with adenoma link weights This figure was generated with the ModuLand program. After finding the functional modules in the network, the program generates a hierarchy between them, with the use of the modular link weights. The modules are highlighted with different colors, and the links between them are depicted with grey lines. Adenoma link weights were used in this calculation. S11 Figure S3. The hierarchy of modules in Level 1, calculated with the ModuLand plugin of Cytoscape, with carcinoma link weights This figure was generated with the ModuLand program. After finding the functional modules in the network, the program generates a hierarchy between them, with the use of the modular link weights. The modules are highlighted with different colors, and the links between them are depicted with grey lines. Carcinoma link weights were used in this calculation.

Figure S4. The distribution of the logarithmic link weights
The distribution of the logarithmic link weights demonstrates the large median and standard deviation of the adenoma network, similarly to the box-plot and distribution of the logarithmic and non-logarhitmic values (see the main text).

Figure S5. Number of links in different bins of data
To further demonstrate the differences among the three networks' link weight distribution (see the main text and Figure S6), the link weight data was binned according to the non-logarhitmic link weight cutoff values. It also shows that the adenoma network is the most important among the very small and very large link weights.

Figure S6. Probability density function of the logarithmic link weights
The probability density function of the logarithmic link weight data also demonstrates subtle differences between the distribution of the three networks.

Figure S7. Cumulative distribution and box plot of the non-logarithmic link weights
The cumulative distribution of the non-logarithmic link weights shows the same differences as with the logarithmic values (see the main text). Two areas among the small link weights (defined as link weights under 50) and medium link weights (defined as link weights between 100 and 500) are highlighted.

S16
The box-plot of the non-logarithmic link weight demonstrates the large median and standard deviation of the adenoma network, similarly to the box-plot of the logarithmic values (see the main text). The median and p values (paired Wilcoxon) are highlighted.

Figure S8. Cumulative distribution of the abundances
The cumulative distribution of the protein abundances also shows the same main characteristics as the link weight distribution (see the main text). As in the case of a few missing values, the average abundance has been used, a small bulge appears around the average value. This correction did not affect the analysis significantly.

Figure S9. Cumulative distribution of weighted degrees
The cumulative distribution of the weighted degrees shows less demonstrable differences than the link weight and abundance distribution (see the main text and Figures S6 and S8). In the area of small and large link weights, the adenoma network has the largest cumulative probability. S19 Figure S10. Cumulative distribution of link weights with additional 5% noise The noisy network was created as described in Text S1. The results are robust to noise, as the adenoma network has the most cumulative probability among the small link weights, and the normal network among the medium link weights (see the main text for context). The results are also significant (paired Wilcoxon-test, p<0.0001 for normal-adenoma, adenoma-carcinoma, normal-carcinoma pairs).

Figure S11. Box plot of link weights with additional 5% noise
The noisy network was created as described in Text S1. The results are robust to noise, as the adenoma network has the largest, and the carcinoma network has the smallest standard deviation (see the main text for context).

Figure S12. The EntOpt image of the unweighted Human Cancer Signaling Network
The image was made with the EntOpt Layout program, as described above. The entropy calculations were conducted without the use of link weights. The nodes are highlighted with the color yellow.

Figure S13. The EntOpt image of the network with normal weights
The image was made with the EntOpt Layout program, as described above. The entropy calculations were conducted with the use of the normal colon link weights. The nodes are highlighted with the color yellow.

Figure S14. The EntOpt image of the network with adenoma weights
The image was made with the EntOpt Layout program, as described above. The entropy calculations were conducted with the use of the colon adenoma link weights. The nodes are highlighted with the color yellow.

Figure S15. The EntOpt image of the network with carcinoma weights
The image was made with the EntOpt Layout program, as described above. The entropy calculations were conducted with the use of the colon carcinoma link weights. The nodes are highlighted with the color yellow.

Figure S16. Change of the ModuLand overlap values based on the different number of modules with logarithmic link weights
We have investigated into the modular overlap changes, however we found that it strictly relies on the number of modules, which is not a precise entity. Therefore we did not implement this result into the main text.

Figure S17. Change of the ModuLand overlap values based on the different number of modules with nonlogarithmic link weights
We have investigated into the modular overlap changes, however we found that it strictly relies on the number of modules, which is not a precise entity. Therefore we did not implement this result into the main text.

Figure S18. Cumulative distribution of the effective degree of modules
The effective degree of modules were calculated using the ModuLand plugin (see Materials and Methods) from the weighted degree measure of the nodes on the first hierarchical level, each representing a module of the original network. We found that there were no significant changes between the normal, adenoma and carcinoma networks, indicating that the strength of the links between the modules are even. Statistical analysis was performed using Wilcoxon-test. (pN-A = 0.9723, pN-C = 0.1101, pA-C= 0.1314)

Figure S19. Cumulative distribution of the normalized modular overlap
Normalizing the overlap values to the number of modules the difference between normal network versus the adenoma and carcinoma networks almost disappeared, indicating that higher number of modules in the network increases the probability of modular overlap.

Figure S20. EGFR-, VEGFR-signaling and mismatch repair related nodes in the normal network
The EntOpt image of the normal network with the coloring of the different pathways. Nodes in the EGFR pathway are green, in the VEGFR pathway are red and in the mismatch repair pathway are blue. As it seems, these nodes do not form different modules, as they are strongly intertwined with the center of the network.

Figure S21. EGFR-, VEGFR-signaling and mismatch repair related nodes in the adenoma network
The EntOpt image of the adenoma network with the coloring of the different pathways. Nodes in the EGFR pathway are green, in the VEGFR pathway are red and in the mismatch repair pathway are blue. As it seems, these nodes do not form different modules, as they are strongly intertwined with the center of the network.

Figure S22. EGFR-, VEGFR-signaling and mismatch repair related nodes in the carcinoma network
The EntOpt image of the carcinoma network with the coloring of the different pathways. Nodes in the EGFR pathway are green, in the VEGFR pathway are red and in the mismatch repair pathway are blue. As it seems, these nodes do not form different modules, as they are strongly intertwined with the center of the network. In this research, five GEO data series were processed, which contained normal colon, colon adenoma and adenocarcinoma gene expression data. The distribution of the number of samples is shown by the table above.   The apoptosis related modules are highlighted with red. The cell cycle related modules are highlighted with blue.

S37
Tables S5 and S6. The relevant changes in the strongest and weakest 1% of the links  Modules highlighted with blue belong to the cell cycle regulation process, Modules highlighted with red belong to the apoptosis regulation process.
We have listed here only the link weights belonging to the apoptosis or cell cycle related modules.