Analysis from the results of a label free mass spectrometry experiment on paraffin embedded samples stratified as follows:
On the previously conducted analysis, we could see that filtering the data based on the coverage score was not a good idea as the majority of the proteins had low coverage values. The same analysis was conducted here and the summary statistics of the coverage score for each sample is shown on the table below, and based on that and the plots we can see a similar behaviour regarding the coverage.
| Fib1 | Fib2 | Fib3 | Fib4 | Fib5 | RBC1 | RBC2 | RBC3 | RBC4 | RBC5 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Min. | 0.18000 | 0.07000 | 0.0600 | 0.28000 | 0.19000 | 0.19000 | 0.04000 | 0.08000 | 0.30000 | 0.13000 |
| 1st Qu. | 4.73000 | 4.53500 | 4.3800 | 4.38000 | 4.30000 | 5.03000 | 5.01000 | 4.71000 | 5.61750 | 5.02000 |
| Median | 10.17000 | 10.07500 | 9.9800 | 9.52500 | 9.42000 | 10.94000 | 11.69000 | 10.95000 | 11.06500 | 10.90000 |
| Mean | 15.71651 | 15.68055 | 15.3679 | 15.47617 | 14.15617 | 16.54819 | 15.68757 | 15.77537 | 16.03192 | 15.75879 |
| 3rd Qu. | 21.55000 | 23.03000 | 20.6500 | 21.46000 | 19.35000 | 22.54000 | 20.87000 | 21.42250 | 21.89000 | 20.62000 |
| Max. | 97.28000 | 91.55000 | 94.3700 | 93.88000 | 94.37000 | 100.00000 | 100.00000 | 100.00000 | 100.00000 | 100.00000 |
The Venn Diagram shows that 101 proteins are overlapping between the RBC (Set_1) and FIB (Set_2) Groups - and this intersection is used for comparison between groups, 60 proteins are unique to the RBC samples, and 115 proteins are unique to the FIB samples.
The correlation matrices with the co-abundant proteins were used to create the adjacency matrices necessary for the network analysis. Here we have three distinct datasets: i) Fibrin samples with proteins exclusive to the FIB group (115 proteins), ii) Red Blood cells samples with with proteins exclusive to the RBC group (60 proteins), and iii) All samples with the 101 proteins that are common to all samples.
Using the correlation matrix above to extract information on the abundancy profile of the top 50 highly correlated proteins with a threshold of +/- 0.80, the network analysis was carried out and resulted in the graph below. The node size is proportional with the degree (how connected the protein is), the red edges represent positive correlation and blue edges represent negative correlation. The optimal community structure was calculated for the graph, in using the maximal modularity score.This analysis resulted in the six communities represented as the node colours and shown in the table below.
The same method for constructing the co-expression network used on the RBC samples was applied to the FIB dataset here, in this case using the 115 proteins unique to FIB samples. The optimal community analysis resulted in three communities listed in the table below.
Here I used a linear model approach to assess differential abundance/expression between the two groups - this analysis resulted in 58 differentially abundant proteins. The table below shows the metrics of the top-ranked proteins from the linear model fit.
The differentially abundant protein scores were used to perform the coexpression analysis on the comparison between groups. The heatmap below shows differentially abundant proteins and resulted in clusters that are correspondent with the sample groups. A coexpression matrix was generated and used to construct the co-expression network. The node colours are based on the optimal community analysis, which is shown below in Table 4. Only edges with a correlation greater than +/- 0.80 were used to plot the graph.
Pathway enrichment analysis were conducted on differentially abundant proteins related RBC and FIB using the InterMineR R package. Here the pathways were tested for over-representation in each of the proteins with fold change related with RBC and FIB relative to what is expected by chance and a p-value is computed for each pathway. The plots below represent the top 10 enriched pathways for the aforementioned communities - you can hover the bars for p-value information.