Introduction

The results to be examined in this analysis are those of a label free mass spectrometry experiment - raw spectra are supplied to a BLAST-type engine that searches for the proteins that best explain the observed spectra, with various measures of confidence around that estimate

QC Filtering

When performing this analysis, I was asked to give back some measure of quality control - this was difficult to do without the raw spectra. The usual protocol for getting an estimate of false discovery rates in this way is to submit a few dummy queries in the form of jumbled sequences at the end of the spectra search and the number of hits acquired for these gives some measure of the quality of the spectra and the experiment overall. In this way, some filtering steps would ultimately be useless. For example, examine this plot of coverage versus sequence score, along with density estimates:

## ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
## corrplot 0.84 loaded
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

In all of these plots, there is a roughly linear relationship between score and coverage - in this way, it would not make any sense to filter based on coverage because the proteins with the highest score will usually have high coverage as well. An arbitrary mark of > 70 coverage would exclude most of the proteins in sample. Play around with this interactive plot and see the relationship between coverage and score.

Complete plots of coverage density can be seen below, and we can see the vast majority of proteins detected by the protein search have coverage of below 50.

Based on these plots, I think it is a bad idea to filter based on coverage - just because the coverage is low does not mean the proteins in question are not present.

Network analysis

Rosanna used a very useful tool called STRING to do carry out some exploratory analyses - we can try to do some in house network analysis using a concept borrowed from gene regulatory networks - coexpression correlation. Essentially, the question asked here is whether or not, across samples, the score of any given protein \(i\) is correlated with the score of another protein, \(j\). Correlation here is measured by Pearson’s correlation coefficient, which can be illustrated by the following:

## Warning in x + seq(1, 7, 1): longer object length is not a multiple of shorter
## object length

Here, the correlation between two sets of points are shown. For a relatively confident protein correlation matrix to be built, we can only consider the proteins whose correlation coefficient is greater than 0.8. In order to build this matrix, the first step was to get common lists of proteins using Venny.

These common sets are those proteins that overlap between all samples except the parafin embedded samples.The scores in the parafin samples were lower and they also did not have many detected proteins either. However, the following plots may suggest they merit inclusion as at least the RBC parafin sample appears quite similar to the other RBC samples.

## Warning: Removed 12 rows containing non-finite values (stat_ydensity).
## Warning: Removed 118 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing non-finite values (stat_ydensity).
## Warning: Removed 66 rows containing missing values (geom_point).

However, note the scale of the 2 plots: in addition, the parafin sample contains half the muber of proteins the RBC sample has. For now we can exclude the parafin samples as it may detract from coexpression methods by including one lower quality sample with the others.

Using Venny, we can get a list of 145 overlapping proteins across our 7 included samples.

## NULL
## [1] 385
##      Accession Score_rbc1 Score_rbc2 Score_rbc3
## 1       P02549     603.26     555.22     589.77
## 2       P11277     529.80     469.59     451.29
## 3       P35579     252.14     243.18     109.26
## 4       P16157     272.12     265.50     253.84
## 5       P01024     283.21     218.57     341.97
## 6       Q60FE5     182.62     166.18     117.20
## 7       Q9Y490     170.79     178.82     107.57
## 8       P04114      88.78      57.12     317.58
## 9       P04040     160.11     165.76     187.03
## 10      Q4VB86      88.57     111.50      78.08
## 11      P18206      84.96      82.30      48.70
## 12      P01023     109.50      80.01     197.52
## 13      P55072      76.30      81.12      75.02
## 14      P16452      75.38      73.45      59.41
## 15      P68871    2270.40    2191.33    1843.45
## 16      P02787      93.97      73.75     157.86
## 17      P02647     107.60      80.23     139.91
## 18      P02730     104.66     112.66     155.54
## 19      P02751      32.48      60.67       8.96
## 20      P69905    2603.91    2561.08    2047.56
## 21      Q13228      92.21      73.10      74.43
## 22      P02042    1294.86    1164.09    1073.23
## 23      P11142      73.19      77.98      77.60
## 24      P26038      40.62      36.37      22.96
## 25      P06727      59.43      52.24      79.93
## 26      P01009     107.58      57.60     128.13
## 27      P53396      21.81      30.63      17.60
## 28      P35612      37.17      44.20      62.14
## 29      P00918     124.98     100.40     109.52
## 30      Q00610      42.55      39.96      70.72
## 31      Q86UX7      46.11      44.62      20.83
## 32      P60709     101.21     110.86      81.91
## 33      P07384      51.55      42.10      41.09
## 34      P02671      72.40      67.00      58.89
## 35      P04406      83.13      66.95      74.40
## 36      P07900      36.60      45.92      64.91
## 37      P00558      57.17      52.22      41.40
## 38      Q5HYB6      88.10      85.83      56.40
## 39  A0A2R8YGX3      52.31      60.42      26.76
## 40      P00915     206.18     191.51     257.80
## 41      P00450      43.40      38.94     101.25
## 42      P00488      59.27      52.78      70.77
## 43      P50395      36.86      39.50      81.62
## 44      P04196      40.02      56.42      44.23
## 45      P30041      65.49      66.23      68.40
## 46      P31948      34.73      34.56      53.45
## 47      P21980      35.03      33.82      36.95
## 48      P22314      43.11      39.22      57.53
## 49      P62258      38.63      36.54      44.13
## 50      Q86VP6      37.57      33.18      40.18
## 51      Q08495      32.12      36.64      32.21
## 52      P06733      38.31      36.42      31.39
## 53      P00738      51.85      45.34      34.12
## 54      P32119     101.89      99.22      86.93
## 55      E7EV99      47.37      41.93      39.16
## 56  A0A0A0MSI0      44.34      43.92      38.57
## 57      P00352      46.73      36.19      55.22
## 58      P02675      52.78      31.03      21.55
## 59      P00491      30.47      43.43      54.69
## 60      P17987      33.21      39.91      32.17
## 61      P28289      51.45      45.81      46.97
## 62      Q01518      22.63      26.24       9.01
## 63      Q00013      15.07      39.28      35.31
## 64      P08238      26.35      37.04      49.33
## 65      P01857      79.31      50.32     115.95
## 66      P00338      21.94      23.01      12.76
## 67      P07195      45.81      36.27      41.51
## 68      P68032      50.25      54.32      58.93
## 69      O43707      34.99      36.18      24.62
## 70      P69892     269.65     258.89     209.79
## 71      P00739      30.08      29.61      24.59
## 72      P08514      43.23      31.83      21.50
## 73      Q9UQ80      11.78      26.94      22.50
## 74      O14818      25.39      35.69      20.94
## 75      P23526      38.83      25.50      29.54
## 76      Q9BQE3      39.72      37.03      13.73
## 77      P00734      21.62      39.01      45.17
## 78      O75083      22.69      26.22      17.05
## 79  A0A0A0MS51      31.67      39.21      52.13
## 80      P63104      31.45      34.86      38.40
## 81      P13798      31.49      32.39      62.39
## 82      P02649      26.64      28.19      15.30
## 83      Q9BT78      35.60      22.43      21.61
## 84      P48506      31.88      29.44      49.59
## 85      P0DMV9      21.21      17.61      35.02
## 86      P60842      19.51      26.96      25.58
## 87      P19105      22.69      32.57       7.06
## 88      Q9UNZ2      28.16      27.91      42.38
## 89      P07738      21.95      25.77      40.47
## 90      P25786      21.17      30.66      26.49
## 91      P27105      26.72      29.05      27.07
## 92      P37802      20.52      20.54      21.31
## 93      Q9H4B7      25.63      25.53      27.87
## 94      P40227      24.94      21.79      31.43
## 95      P07996      22.23      27.54      15.98
## 96      P02766      45.46      46.44      59.90
## 97      P54578      23.66      23.88      34.36
## 98      Q5T985      32.31      19.73      68.93
## 99      Q32Q12      37.10      29.95      21.02
## 100     P30153      21.07      16.82      26.91
## 101     P52209      24.96      25.83      23.69
## 102     P01011      18.12      15.57      37.41
## 103     P17174      20.94      18.14       9.37
## 104     P30043      74.48      49.57     107.18
## 105     P11021      18.06      16.41      13.16
## 106     P62333      12.15      21.93       8.13
## 107     Q13200      10.87      21.76      30.26
## 108     P68366      34.19      33.36       8.59
## 109     P38606      12.87      10.63      13.41
## 110     F5H345       9.36      18.18      12.55
## 111     E9PM69      19.53      17.86      26.45
## 112     E7EPV7      46.08      37.74      40.19
## 113     P31946      25.45      21.23      29.43
## 114     P53004       8.24      22.70      26.05
## 115     P07451      18.87      14.14      22.03
## 116     P10909      31.60      27.06      48.69
## 117     Q92905       8.06      14.07      17.57
## 118     P01859      58.01      39.56      94.20
## 119     Q14624      41.63      21.87      72.66
## 120     P01042       7.84      15.36      17.91
## 121     P09960      15.56      16.05      33.63
## 122     P28074      23.26      27.00      21.41
## 123     Q15257      22.58      17.01      21.30
## 124     P31939      32.68      15.57      24.32
## 125     P62826      36.51      21.54      21.34
## 126     P61224      23.34      23.16      16.81
## 127     P37837      23.60      24.31      21.22
## 128     Q99832      26.58      13.65      32.25
## 129     P50990      18.26      17.99      32.83
## 130     E7EQ12      16.30      21.93      23.18
## 131     P01871      23.93      27.39      53.00
## 132     G5E9F8      22.47      24.49      36.11
## 133     P54725      11.45      12.12      15.99
## 134     P61981      17.89      15.34      22.45
## 135     P27348      17.25      16.77      13.18
## 136     P01008      21.50       5.94      29.42
## 137     P61204      22.91      13.82      20.54
## 138     P11166      27.36      33.17      26.24
## 139     P62805      18.15      22.49       1.82
## 140     P13716      53.02      31.08      45.52
## 141     P05155      14.07      13.99      23.80
## 142     P01876      24.59      27.88      19.58
## 143     P00568      21.55      21.58      31.75
## 144     Q99733      14.26      13.01      19.80
## 145     P30101      12.15      11.02       4.03
## 146     P30086      21.43      13.61      15.75
## 147     P48426      18.90      17.88      15.68
## 148     P22061      16.13      15.92      11.67
## 149     P07737      12.80      19.71      10.69
## 150     P62195       6.07      14.52       8.47
## 151     P25787      21.65      26.45      25.93
## 152     P49721      26.19      14.72      10.92
## 153     O43242      16.87       8.99      12.03
## 154     Q16401       7.58      15.42      16.89
## 155     Q06323      16.81      16.65      17.89
## 156     Q9UL46      23.71      23.52      15.54
## 157     P49247      23.81      10.84      32.63
## 158     P02743      11.97      17.56      18.01
## 159     P50991      27.36      15.93      28.91
## 160     P60174      29.98      25.35      25.43
## 161     Q9C0C9       8.36      13.26      30.88
## 162     H7BZ94       9.61       7.19       6.67
## 163     C9JEU5      38.36      24.52      26.96
## 164     F5H265      27.79      26.39       9.02
## 165     P02765      47.49      52.95      74.15
## 166     P01834      41.26      43.86      74.29
## 167     P04217      18.63      13.56      33.13
## 168     Q9UKV8       4.04       8.67       7.48
## 169     O95782       8.96       5.16       7.72
## 170     P27797      10.36      11.58       5.65
## 171     P16152      11.02      12.66      20.00
## 172     P08603      20.93      13.91      39.58
## 173     Q16531      10.46       9.40      10.71
## 174     P14625       9.45      15.76       6.93
## 175     P50502      18.76      14.60      20.21
## 176     Q13630      17.73      16.04      16.60
## 177     P11413       7.70      11.17      18.38
## 178     P02008      43.30      41.99      63.40
## 179     P02790      26.09      10.67      27.34
## 180     Q14974      24.32      16.01      24.79
## 181     P17858      20.07      12.03      19.56
## 182     Q15691       7.78       8.86       8.00
## 183     Q8WUM4      17.56      11.39       6.39
## 184     P18669      17.41      12.88      11.65
## 185     P24666      17.45      20.45       8.09
## 186     P62937      18.23      20.47      22.27
## 187     P35998      10.45       8.87      15.58
## 188     P20618      15.91      20.67      23.05
## 189     P28070      12.17      13.44      21.52
## 190     O00231       3.81       4.56       8.30
## 191     Q9BWD1       7.48      11.33       4.84
## 192     P25325      10.40      12.15      18.22
## 193     P29401      18.56      10.83      21.82
## 194     P04004      20.12      14.30      26.46
## 195     C9J0K6      11.74      13.57      18.06
## 196     H3BPK3      23.89      21.47      22.25
## 197     M0R0Y2      11.36      17.37      19.24
## 198 A0A0C4DGZ5       0.00       6.09       2.46
## 199 A0A087WYS1       5.15       9.21      12.18
## 200 A0A0G2JMB2      17.98      21.54      20.16
## 201 A0A2R8Y5T7      17.68      13.72      17.86
## 202     B1ALA9       8.28      14.23      18.15
## 203     Q04917       6.97       8.04      12.80
## 204     O95336      12.29       8.34      11.05
## 205     P19652      10.59       8.16      19.54
## 206     P02656      12.37      10.74       9.94
## 207     P0DP25      28.51      15.09      23.74
## 208     Q13618       5.25       5.57      18.29
## 209     Q5TDH0       9.64       7.00      14.74
## 210     Q9NY33      20.48      12.24      26.89
## 211     P00740      15.12      17.83      20.12
## 212     P31150      17.30      10.76      36.14
## 213     Q9HC38      10.86       9.77       9.54
## 214     P36959      10.59       9.77       2.92
## 215     P00390       3.85       9.07      19.64
## 216     P09105      10.72      13.10      18.33
## 217     P05546       7.14       6.28      19.49
## 218     P01861      23.64      12.21      49.61
## 219     P13645       2.27      10.42      15.92
## 220     P30613      12.18       9.91      27.61
## 221     P43034       6.26       5.21       0.00
## 222     Q9NTK5      12.80      17.52      14.73
## 223     Q9GZP4      17.52      11.05      11.47
## 224     P00747      11.67      13.02      25.96
## 225     Q6XQN6      16.83      10.99      14.50
## 226     P62191       9.65      13.48      15.07
## 227     Q15404       4.02       8.33       3.83
## 228     O95810      14.30      12.72       2.84
## 229     O75368      11.54      11.36      18.83
## 230     Q9Y4E8      12.02       8.55      25.03
## 231     P45974      36.17      11.93      45.24
## 232     J3KQ34       9.00       5.85       4.53
## 233     E9PLD0      10.14       9.01      10.62
## 234     I3L0N3       4.63       3.77      13.47
## 235     P02774      20.09       7.00      30.09
## 236     X6RA14      14.82       8.94      19.73
## 237 A0A087WW66       7.94       4.30      18.47
## 238     P02763      12.54       7.27      41.42
## 239     P43652       5.43       2.38      19.04
## 240     Q9NZD4       6.23       8.23      13.74
## 241     P20073      12.46       9.52       8.04
## 242     P02655      14.11       8.60       8.11
## 243     P05090       0.00       2.79       3.68
## 244     O14791       2.38       6.81       9.62
## 245     Q5VW32       9.32       8.21       5.96
## 246     P04003      11.87       9.20      21.85
## 247     P52907       7.84       8.41      27.13
## 248     P00751      15.40       4.77      25.26
## 249     Q9Y2V2       6.45       9.53       5.71
## 250     O00299       6.23       6.51       9.62
## 251     P01031       7.11       2.83      48.24
## 252     P31146       4.69       5.73       5.82
## 253     P00742       2.01       7.39      11.86
## 254     Q9Y3I1       9.94       8.46       9.02
## 255     Q9H479       8.93      12.33       5.38
## 256     P07954       2.07       4.79       4.24
## 257     P48507       5.87       7.52      10.65
## 258     P09211      16.90       7.89      23.05
## 259     P16403       7.20       4.70       4.85
## 260     P00492       7.03       8.20      19.48
## 261     P19827      27.47      10.92      41.92
## 262     P04264       6.97       4.33      22.40
## 263     Q04760       9.56       6.25       7.43
## 264     Q5VVQ6       3.19       3.19      18.62
## 265     Q15365       3.10       5.16       5.09
## 266     Q96G03      14.01       6.03      17.01
## 267     P08567       7.61       8.93       6.35
## 268     P13796       2.73       5.46      26.73
## 269     P22891       0.00       3.06       4.60
## 270     P28072       5.71       8.11       7.98
## 271     Q99436       4.37       2.98      10.62
## 272     P51665       0.00       1.70       6.93
## 273     P13489      16.59       9.78      10.61
## 274     Q9Y265       8.72       8.84       6.06
## 275     Q9BSL1       6.57       8.31       7.35
## 276     Q9UIA9      10.14       2.11      13.51
## 277     Q9UK55       7.60       9.08       6.25
## 278     C9JVE2       2.78       6.42       2.14
## 279     B4E3S0      10.37       9.59       6.64
## 280     H0YLA4       2.22       2.94       7.83
## 281     E7EM64       9.95       6.28      12.48
## 282     F5H5V4      10.45      11.72      16.09
## 283     K7ERI9       9.75       4.63      13.25
## 284     R4GN98       8.66       4.51       6.67
## 285     I3L397      21.49      15.04      24.04
## 286     P54727       4.16       2.02       6.88
## 287     P49189       4.94       2.34       4.74
## 288     P35858       2.05       2.06      11.16
## 289     P01019       9.44       5.14      15.86
## 290     P05089       4.58       1.70      11.95
## 291     P60953       6.61       4.30       5.95
## 292     Q96DG6       2.41       2.15       4.66
## 293     P07360       6.44       5.96      13.68
## 294     P02748      13.30       5.44      18.49
## 295     P02775       9.50       9.54      10.67
## 296     O76003       1.99       3.81      10.01
## 297     P14770       2.32       2.22       2.37
## 298     Q6B0K9       5.83       5.37       6.40
## 299     P55010       2.25       4.83       5.24
## 300     P30740       7.47       4.75       4.13
## 301     Q15181      13.54       6.32      18.23
## 302     Q9BS40       2.29       6.23       7.00
## 303     Q9GZT8       3.20       4.92       3.22
## 304     O75340       4.46       2.57       4.55
## 305     P23284       2.62       4.45       0.00
## 306     P43686       8.83       0.00       6.28
## 307     P28066       9.67       7.88      15.58
## 308     O00487       2.98       0.00       1.62
## 309     Q15008       5.14       0.00      10.75
## 310     O15067       8.64       4.56       7.38
## 311     Q07960       5.76       6.03       6.87
## 312     P26447       5.51       5.15       9.31
## 313     P00441      10.03       6.85       9.42
## 314     P62328      17.20      14.98       0.00
## 315     O14980       3.13       1.91      10.64
## 316     Q5QPM9       3.05       6.00       9.19
## 317     D6RF62      10.04       4.05      12.11
## 318     Q5T2B5       5.83       4.92      12.74
## 319     F8W9S7       2.51       2.07      11.74
## 320     H0Y3Y9       2.36       2.36       6.52
## 321     I3L0S1       6.40       6.01      10.70
## 322     H0YJS4       3.95       4.56       3.02
## 323     H3BRV9       5.79       4.67       7.29
## 324     E5RIW3      12.34       5.31      10.18
## 325     H3BMU1       0.00       6.58       2.20
## 326     C9JJ34       5.48       5.07       3.09
## 327 A0A182DWH7       2.16       5.30       2.02
## 328     K7ER96       6.87       4.10      10.06
## 329     F8W0K0       0.00       3.96       1.96
## 330 A0A2Q2TTZ9       5.13       8.13      17.40
## 331     S4R460       8.63       7.93      14.60
## 332     P01619       6.84       6.50      15.26
## 333 A0A0A0MSV6       7.59       5.40       2.83
## 334     P04424       0.00       0.00       4.31
## 335     O75531       2.97       2.02       2.48
## 336     P02745       3.17       2.37       4.59
## 337     P02747       5.33       2.92       5.66
## 338     P20851       2.24       2.14       1.89
## 339     Q5TEZ5       2.42       2.55       2.35
## 340     Q8IUI8       2.25       1.99       6.53
## 341     P00167       6.58       2.71       7.35
## 342     P00748       0.00       2.00       6.07
## 343     Q01469       0.00       1.83       2.02
## 344     P04921       6.86       4.21       3.95
## 345     Q9UBQ7       2.47       0.00       3.53
## 346     Q9Y5Z4       1.91       0.00       0.00
## 347     O00505       2.96       2.95       2.80
## 348     O00410       5.08       2.72      12.69
## 349     P35527       0.00       0.00       1.73
## 350     Q7Z494       0.00       0.00       2.14
## 351     Q8NGB2       0.00       0.00       0.00
## 352     Q92882       1.60       0.00       0.00
## 353     Q9NRX4       9.17       0.00       5.29
## 354     P02776       6.46       2.03       6.33
## 355     O43598       7.30       3.21       4.15
## 356     P05387       2.43       3.25       6.97
## 357     P61956       3.34       2.86       2.99
## 358     P10599       8.75       3.71       7.28
## 359     Q14166       1.64       0.00       0.00
## 360     P68036       6.42       3.85       7.76
## 361     O75348       2.17       2.29       2.35
## 362     G3XAM2       1.85       1.92       5.74
## 363     Q5T0D2       1.86       0.00       5.81
## 364     E5RHK0       2.51       0.00       0.00
## 365     E7EQ47       1.90       2.36       5.01
## 366     I3L0K2       3.62       2.80       3.79
## 367     H0YJC6      17.39       9.23       2.33
## 368     H7C5G1       1.62       0.00       2.39
## 369     F5GY80       5.68       2.20       6.09
## 370     E7EWE1       2.25       0.00       0.00
## 371     D6RAW0       2.31       2.25       2.48
## 372     H7BXY6       0.00       0.00       0.00
## 373     J3QS45       2.78       2.70       2.62
## 374     K7EN45       3.09       2.88       5.95
## 375     K7ENY4       3.24       2.68       7.07
## 376 A0A075B6K5       3.74       3.50       3.89
## 377 A0A0B4J1U7       1.74       0.00       5.30
## 378 A0A075B6R9       1.67       0.00       3.28
## 379 A0A0U1RR22       0.00       2.29       0.00
## 380     P01624       1.97       0.00       7.11
## 381     K7ELW5       2.56       2.27       2.76
## 382 A0A0U1RQV5       1.81       1.73       2.56
## 383 A0A075B6Z2       0.00       1.64       1.67
## 384 A0A0A0MRZ8       2.56       2.33       2.41
## 385     H0YDX6       2.55       2.00       3.03

We can then implement some basic QC and take the 50 proteins from the common list with the highest scores for our coexpresison matrix.

## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## Warning: Ignoring unknown parameters: colour

We can use this correlation matrix of coexpressed proteins to create an adjacency matrix where everything over a certain value can be encoded as a 1 or a degree between one node and another and create a matrix.

The resulting network can then be visualised here - http://rpubs.com/shaneoconnell96/gene_name_corr_matrix_rbc

We can see from this network there are 3 distinct groups highlighted by community analysis. This essentially gives us an overall picture of what proteins are coexpressed with others - individual queries of different bioinformatics databases can be facilitated by this protocol based on what proteins seem to be playing an important role in this network. For example, while a sparse network, the yellow group here appear to be the only link between the denser red and blue groups - they have no direct link without these intermediary proteins (in this network).

The group memberships can be viewed here:

##          id    label group
## 1      HBA2     HBA2     1
## 2       HBB      HBB     1
## 3       HBD      HBD     1
## 4     SPTA1    SPTA1     2
## 5      SPTB     SPTB     2
## 6        C3       C3     3
## 7      ANK1     ANK1     1
## 8      HBG2     HBG2     1
## 9      MYH9     MYH9     1
## 10      CA1      CA1     3
## 11     FLNA     FLNA     1
## 12     TLN1     TLN1     1
## 13      CAT      CAT     3
## 14      CA2      CA2     2
## 15      A2M      A2M     3
## 16    APOA1    APOA1     3
## 17 SERPINA1 SERPINA1     3
## 18   SLC4A1   SLC4A1     3
## 19    PRDX2    PRDX2     1
## 20     ACTB     ACTB     1
## 21       TF       TF     3
## 22 SELENBP1 SELENBP1     2
## 23     APOB     APOB     3
## 24      VCL      VCL     1
## 25    GAPDH    GAPDH     2
## 26      VCP      VCP     1
## 27    EPB42    EPB42     1
## 28    BLVRB    BLVRB     3
## 29    HSPA8    HSPA8     3
## 30      FGA      FGA     1
## 31    PRDX6    PRDX6     3
## 32    F13A1    F13A1     3
## 33     PGK1     PGK1     1
## 34     ALAD     ALAD     2
## 35      FGB      FGB     1
## 36       HP       HP     1
## 37    CAPN1    CAPN1     2
## 38    TMOD1    TMOD1     2
## 39    ACTC1    ACTC1     3
## 40     AHSG     AHSG     3
## 41  ALDH1A1  ALDH1A1     3
## 42   FERMT3   FERMT3     1

The fibrin network can then be visualised here - http://rpubs.com/shaneoconnell96/genematfib As we can see, there is a lot more going on here from a network standpoint - there are more sparse interaction networks and overall a higher number of closed groups - for instance the 2 leftmost groups. There could be many biological interpretations for this.

##          id    label group
## 1       ALB      ALB     4
## 2       HBB      HBB     4
## 3      HBA1     HBA1     4
## 4       HBD      HBD     4
## 5        C3       C3     5
## 6      APOB     APOB     5
## 7        TF       TF     4
## 8       A2M      A2M     5
## 9       FGB      FGB     5
## 10      FGA      FGA     5
## 11    APOA1    APOA1     5
## 12      C4B      C4B     5
## 13 SERPINA1 SERPINA1     5
## 14    SPTA1    SPTA1     3
## 15       CP       CP     5
## 16     FLNA     FLNA     1
## 17    F13A1    F13A1     2
## 18     AHSG     AHSG     5
## 19       F2       F2     6
## 20     TLN1     TLN1     1
## 21     MYH9     MYH9     1
## 22      PLG      PLG     4
## 23       C5       C5     5
## 24    ITIH4    ITIH4     5
## 25      CFH      CFH     4
## 26    ITIH1    ITIH1     5
## 27     ACTB     ACTB     1
## 28      FN1      FN1     1
## 29      HPX      HPX     4
## 30       HP       HP     2
## 31      TTR      TTR     4
## 32     SPTB     SPTB     3
## 33      CAT      CAT     4
## 34      CA1      CA1     5
## 35      CLU      CLU     5
## 36     ANK1     ANK1     4
## 37      HPR      HPR     5
## 38 SERPINC1 SERPINC1     4
## 39    IGLL5    IGLL5     4
## 40      HRG      HRG     6
## 41 SERPINA3 SERPINA3     3
## 42       GC       GC     4

Finally, the entire network can be visualised here - http://rpubs.com/shaneoconnell96/commonnetworkgenenames This is a really interesting network diagram - there is one dense cluster of entirely closed off proteins in the blue group. You can have a look at this data frame to see what proteins/genes are interacting with each other in the blue group:

##          id    label group
## 1      HBA1     HBA1     1
## 2       HBB      HBB     1
## 3       HBD      HBD     1
## 4     SPTA1    SPTA1     1
## 5      SPTB     SPTB     1
## 6        C3       C3     3
## 7      ANK1     ANK1     1
## 8      MYH9     MYH9     2
## 9       CA1      CA1     1
## 10     TLN1     TLN1     2
## 11      CAT      CAT     1
## 12      CA2      CA2     1
## 13      A2M      A2M     3
## 14    APOA1    APOA1     3
## 15 SERPINA1 SERPINA1     3
## 16   SLC4A1   SLC4A1     1
## 17    PRDX2    PRDX2     1
## 18     ACTB     ACTB     2
## 19       TF       TF     3
## 20 SELENBP1 SELENBP1     1
## 21     APOB     APOB     3
## 22      VCL      VCL     2
## 23    GAPDH    GAPDH     1
## 24      VCP      VCP     1
## 25    BLVRB    BLVRB     1
## 26      FGA      FGA     3
## 27    PRDX6    PRDX6     1
## 28    F13A1    F13A1     2
## 29     PGK1     PGK1     1
## 30      FGB      FGB     3
## 31       HP       HP     2
## 32    CAPN1    CAPN1     1
## 33    ACTC1    ACTC1     2
## 34     AHSG     AHSG     3
## 35   FERMT3   FERMT3     2
## 36     LDHB     LDHB     1
## 37      TTR      TTR     3
## 38       CP       CP     3
## 39   ITGA2B   ITGA2B     2
## 40     UBA1     UBA1     1
## 41    ITIH4    ITIH4     3
## 42      MSN      MSN     2
## 43      HRG      HRG     2
## 44     AHCY     AHCY     1
## 45    YWHAE    YWHAE     1
## 46     ENO1     ENO1     2
##  [1] "HBA1"     "HBB"      "HBD"      "SPTA1"    "SPTB"     "ANK1"    
##  [7] "CA1"      "CAT"      "CA2"      "SLC4A1"   "PRDX2"    "SELENBP1"
## [13] "GAPDH"    "VCP"      "BLVRB"    "PRDX6"    "PGK1"     "CAPN1"   
## [19] "LDHB"     "UBA1"     "AHCY"     "YWHAE"

This is especially interesting because this entirely closed off group contains elements from both the RBC and FIB networks respectively - this total network may represent a more confident estimate of coexpression networks as it used more data points to calculate correlation estimates.

We can perform some enrichment analysis on this blue group versus the rest of the groups in the network to examine whta biological processes they are associated with.

The proteins in this group appear to be significantly enriched for RBC related biolgocial pathways. We can also examine all of the other protein groups too:

This enrichment result suggests the other groups are involved in more fibrin related biological processes - this is a really interesting network result from using common proteins.

We can now perform some differential protein expression analysis.

Differential analysis

Below is data frame containing all of the common scores we want to incorporate for our analysis.

##   Accession common_rbc1 common_rbc2 common_rbc3 common_fib1 common_fib2
## 1    P02549      603.26      555.22      589.77      202.52      122.55
## 2    P11277      529.80      469.59      451.29       94.44       79.44
## 3    P35579      252.14      243.18      109.26      137.74      421.31
## 4    P16157      272.12      265.50      253.84       81.45       52.32
## 5    P01024      283.21      218.57      341.97      623.36      346.62
## 6    Q9Y490      170.79      178.82      107.57      150.36      334.64
##   common_fib3 common_fib4
## 1       66.96      106.59
## 2       49.95       83.54
## 3      391.85      203.19
## 4       50.58       53.31
## 5      405.86      518.68
## 6      362.43      241.09
##    Accession common_rbc1 common_rbc2 common_rbc3 common_fib1 common_fib2
## 17    P69905     2603.91     2561.08     2047.56      835.77      497.98
## 12    P68871     2270.40     2191.33     1843.45     1268.70      617.27
## 19    P02042     1294.86     1164.09     1073.23      638.26      318.43
## 1     P02549      603.26      555.22      589.77      202.52      122.55
## 2     P11277      529.80      469.59      451.29       94.44       79.44
## 5     P01024      283.21      218.57      341.97      623.36      346.62
## 4     P16157      272.12      265.50      253.84       81.45       52.32
## 3     P35579      252.14      243.18      109.26      137.74      421.31
## 32    P00915      206.18      191.51      257.80       87.85       64.97
## 6     Q9Y490      170.79      178.82      107.57      150.36      334.64
## 8     P04040      160.11      165.76      187.03       89.24       44.60
## 24    P00918      124.98      100.40      109.52       38.32       19.72
## 10    P01023      109.50       80.01      197.52      431.36      210.03
## 14    P02647      107.60       80.23      139.91      264.21      122.51
## 22    P01009      107.58       57.60      128.13      217.49      123.64
## 15    P02730      104.66      112.66      155.54       63.40       50.07
## 42    P32119      101.89       99.22       86.93       71.05       48.94
## 26    P60709      101.21      110.86       81.91      107.79      233.55
## 13    P02787       93.97       73.75      157.86      482.88      188.63
## 18    Q13228       92.21       73.10       74.43       27.44        3.90
## 7     P04114       88.78       57.12      317.58      609.99      142.85
## 9     P18206       84.96       82.30       48.70       54.76      163.73
## 29    P04406       83.13       66.95       74.40       40.36       46.45
## 46    P01857       79.31       50.32      115.95      189.36      157.70
## 11    P55072       76.30       81.12       75.02       21.28       26.60
## 64    P30043       74.48       49.57      107.18       40.69        7.22
## 28    P02671       72.40       67.00       58.89      303.02      240.69
## 37    P30041       65.49       66.23       68.40       30.43       18.10
## 21    P06727       59.43       52.24       79.93      107.22       83.29
## 34    P00488       59.27       52.78       70.77      175.15      166.17
## 68    P01859       58.01       39.56       94.20      154.58      118.15
## 31    P00558       57.17       52.22       41.40       24.95       18.70
## 43    P02675       52.78       31.03       21.55      365.84       84.63
## 41    P00738       51.85       45.34       34.12      100.51      110.08
## 27    P07384       51.55       42.10       41.09        4.63       27.18
## 48    P68032       50.25       54.32       58.93       55.54      138.21
## 84    P02765       47.49       52.95       74.15      170.38      154.38
## 25    Q86UX7       46.11       44.62       20.83       36.85       56.86
## 47    P07195       45.81       36.27       41.51       26.80       32.70
## 62    P02766       45.46       46.44       59.90       95.94       24.01
## 33    P00450       43.40       38.94      101.25      192.99       52.88
## 50    P08514       43.23       31.83       21.50       34.19       71.94
## 38    P22314       43.11       39.22       57.53        5.99        5.92
## 69    Q14624       41.63       21.87       72.66      126.37       31.10
## 85    P01834       41.26       43.86       74.29      223.22       98.68
## 20    P26038       40.62       36.37       22.96        6.18       61.93
## 36    P04196       40.02       56.42       44.23       78.31       79.66
## 51    P23526       38.83       25.50       29.54        8.19        3.65
## 39    P62258       38.63       36.54       44.13       13.98       11.74
## 40    P06733       38.31       36.42       31.39       24.88       56.34
##    common_fib3 common_fib4
## 17      704.33      575.92
## 12      848.73      711.13
## 19      354.44      355.07
## 1        66.96      106.59
## 2        49.95       83.54
## 5       405.86      518.68
## 4        50.58       53.31
## 3       391.85      203.19
## 32       61.27       88.26
## 6       362.43      241.09
## 8        54.65       37.48
## 24       22.19       21.00
## 10      261.57      354.39
## 14      158.51      234.28
## 22      126.51      205.27
## 15       19.73       53.45
## 42       58.81       48.41
## 26      251.59      154.39
## 13      284.39      334.06
## 18        7.68        8.74
## 7       107.29      454.10
## 9       195.58       96.21
## 29       49.27       36.91
## 46      210.14      159.70
## 11       15.03       16.80
## 64        5.46       18.15
## 28      403.82      603.68
## 37       13.99       24.47
## 21       97.78      120.24
## 34      208.39      156.19
## 68      151.99      129.74
## 31       27.86        5.95
## 43      257.63      757.23
## 41      191.65       98.77
## 27       16.14        6.71
## 48      186.96       89.44
## 84      146.58      185.46
## 25       59.39       49.48
## 47       28.50       29.50
## 62       58.88       62.44
## 33       69.34      155.26
## 50       55.56       45.10
## 38        3.47        6.37
## 69       67.42      121.68
## 85      120.69      172.86
## 20       50.71        6.76
## 36       80.28       64.14
## 51        0.00        2.38
## 39       12.45       11.37
## 40       54.71       35.00

Firstly, we must examine how the samples seperate apart - a formality to test for batch effects. Now we can visualise how different the samples look compared to one another:

## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
## Warning: Removed 12 rows containing non-finite values (stat_boxplot).

Ideally, we would like the samples to be analysed to be as similar to each other as possible - hence, we will apply a normalisation step.. First, we can look to just getting a visual on the proteins most variable across conditions. We will also look to add gene symbols here instead of uniprot IDs to make it more readily interpretable.

Here, we can see a few clusters of variable proteins / genes across the samples. We can obtain a list of these after we normalise and perform our statistical test.

##  [1] "coefficients"     "stdev.unscaled"   "sigma"            "df.residual"     
##  [5] "cov.coefficients" "pivot"            "rank"             "Amean"           
##  [9] "method"           "design"
##        rbc.vs.fibc
## Down            19
## NotSig          70
## Up              36
##           logFC  AveExpr         t      P.Value    adj.P.Val        B
## ADD2   3.523710 10.33885  9.755782 4.322803e-06 0.0003809351 4.814930
## SPTB   3.012327 13.97215  9.201347 7.013079e-06 0.0003809351 4.305464
## UBA1   3.379121 10.41227  8.908779 9.142443e-06 0.0003809351 4.119141
## ANK1   2.514001 13.39419  8.578292 1.244253e-05 0.0003888290 3.728493
## HBA1   2.241807 16.72707  7.984419 2.218940e-05 0.0004675932 3.081652
## CA2    2.518498 12.15356  7.718634 2.906292e-05 0.0004675932 2.887596
## VCP    2.310769 11.74943  7.704539 2.948764e-05 0.0004675932 2.856833
## PLG   -2.390628 12.18991 -7.690224 2.992596e-05 0.0004675932 2.873334
## YWHAE  1.992553 10.97619  7.458367 3.811730e-05 0.0005294070 2.646976
## SPTA1  2.663962 14.44386  6.810118 7.732511e-05 0.0009133175 1.814079

##         logFC  AveExpr        t      P.Value    adj.P.Val        B Accession
## ADD2 3.523710 10.33885 9.755782 4.322803e-06 0.0003809351 4.814930      ADD2
## SPTB 3.012327 13.97215 9.201347 7.013079e-06 0.0003809351 4.305464      SPTB
## UBA1 3.379121 10.41227 8.908779 9.142443e-06 0.0003809351 4.119141      UBA1
## ANK1 2.514001 13.39419 8.578292 1.244253e-05 0.0003888290 3.728493      ANK1
## HBA1 2.241807 16.72707 7.984419 2.218940e-05 0.0004675932 3.081652      HBA1
## CA2  2.518498 12.15356 7.718634 2.906292e-05 0.0004675932 2.887596       CA2

The above plots demostrate the differentially expresssed proteins / genes across the 7 samples, comparing RBC clots to FIB clots. The table presented shows the genes with their respective log2 fold changes - the higher the number, the more dramatic the change in the expression level of this protein between conditions relative to RBC vs. FIB. We can see if these results match up with what we would expect to see in different clots through a gene ontology enrichment analysis below. We can begin by using the top 30 proteins sorted by expression:

##      identifier                                            description
## 1 R-HSA-1247673 Erythrocytes take up oxygen and release carbon dioxide
## 2 R-MMU-3371511                                        HSF1 activation
## 3 R-HSA-3371511                                        HSF1 activation
## 4 R-HSA-1237044 Erythrocytes take up carbon dioxide and release oxygen
## 5 R-HSA-1480926                        O2/CO2 exchange in erythrocytes
## 6         04141            Protein processing in endoplasmic reticulum
##                  pValue count populationAnnotationCount
## 1 6.7295002869789135E-6     4                         9
## 2  7.490093104871675E-6     4                         8
## 3 1.7534187128697088E-5     4                        12
## 4 1.8962733317353597E-5     4                        13
## 5 1.8962733317353597E-5     4                        13
## 6  6.260353449041497E-4     8                       331
##     identifier                                    description
## 1        04610            Complement and coagulation cascades
## 2 R-HSA-114608                        Platelet degranulation 
## 3  R-HSA-76005   Response to elevated platelet cytosolic Ca2+
## 4 R-HSA-140877    Formation of Fibrin Clot (Clotting Cascade)
## 5  R-HSA-76002 Platelet activation, signaling and aggregation
## 6 R-MMU-114608                        Platelet degranulation 
##                   pValue count populationAnnotationCount
## 1  8.184659933607973E-40    26                       145
## 2  9.711568423889633E-17    14                       127
## 3 1.1350383573899567E-16    14                       132
## 4  1.747126239887467E-15    10                        39
## 5    2.9098273677178E-14    15                       259
## 6  7.843940449044366E-14    12                       122

We can see from these resulting plots that some typical pathways are thrown up here associated with the different enrichments - the differentially expressed proteins in the RBC samples were significantly enriched for erythrocyte and small molecule transporting systems, and the differentially expressed proteins in the Fibrin samples were significantly associated with various platelet nd coagulation pathways.

STRING results can be visualised here for Fibrin related proteins -https://string-db.org/cgi/network.pl?taskId=fyhfSIJPuVah And here for RBC related proteins - https://string-db.org/cgi/network.pl?taskId=NlK4wpf2wxs5

Conclusion

This analysis showed some interesting results with respect to network dynamics between Fibrin and RBC rich clots. Using a common set found across all samples, most of the expected biological pathways associated with the 2 different clots considered were found to be enriched, such as platelet pathways enriched for Fibrin and molecule transport for RBC. This result shows us to some extent that the proteins detected by the MS/MS search were relevant and what we expect. Examination of the STRING results provided above is also interesting - both of the interaction networks had significantly more connections than would be expected from a random list of proteins, which means that the experiment pulled out some biologically relevant information. The analyses presented here will serve a basis for further hypothesis generation and interrogation.

Further experiments could be imporved through adding a larger sample size, wherein different models and techniques could be applied to glean more biological insights - in addition, the methods presented here would be aided greatly by an increased sample size and reduce the number of spurious associations picked up by all methods.