The results to be examined in this analysis are those of a label free mass spectrometry experiment - raw spectra are supplied to a BLAST-type engine that searches for the proteins that best explain the observed spectra, with various measures of confidence around that estimate
When performing this analysis, I was asked to give back some measure of quality control - this was difficult to do without the raw spectra. The usual protocol for getting an estimate of false discovery rates in this way is to submit a few dummy queries in the form of jumbled sequences at the end of the spectra search and the number of hits acquired for these gives some measure of the quality of the spectra and the experiment overall. In this way, some filtering steps would ultimately be useless. For example, examine this plot of coverage versus sequence score, along with density estimates:
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.3
## ✓ tidyr 1.0.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
## corrplot 0.84 loaded
##
## Attaching package: 'plotly'
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
In all of these plots, there is a roughly linear relationship between score and coverage - in this way, it would not make any sense to filter based on coverage because the proteins with the highest score will usually have high coverage as well. An arbitrary mark of > 70 coverage would exclude most of the proteins in sample. Play around with this interactive plot and see the relationship between coverage and score.
Complete plots of coverage density can be seen below, and we can see the vast majority of proteins detected by the protein search have coverage of below 50.
Based on these plots, I think it is a bad idea to filter based on coverage - just because the coverage is low does not mean the proteins in question are not present.
Rosanna used a very useful tool called STRING to do carry out some exploratory analyses - we can try to do some in house network analysis using a concept borrowed from gene regulatory networks - coexpression correlation. Essentially, the question asked here is whether or not, across samples, the score of any given protein \(i\) is correlated with the score of another protein, \(j\). Correlation here is measured by Pearson’s correlation coefficient, which can be illustrated by the following:
## Warning in x + seq(1, 7, 1): longer object length is not a multiple of shorter
## object length
Here, the correlation between two sets of points are shown. For a relatively confident protein correlation matrix to be built, we can only consider the proteins whose correlation coefficient is greater than 0.8. In order to build this matrix, the first step was to get common lists of proteins using Venny.
These common sets are those proteins that overlap between all samples except the parafin embedded samples.The scores in the parafin samples were lower and they also did not have many detected proteins either. However, the following plots may suggest they merit inclusion as at least the RBC parafin sample appears quite similar to the other RBC samples.
## Warning: Removed 12 rows containing non-finite values (stat_ydensity).
## Warning: Removed 118 rows containing missing values (geom_point).
## Warning: Removed 7 rows containing non-finite values (stat_ydensity).
## Warning: Removed 66 rows containing missing values (geom_point).
However, note the scale of the 2 plots: in addition, the parafin sample contains half the muber of proteins the RBC sample has. For now we can exclude the parafin samples as it may detract from coexpression methods by including one lower quality sample with the others.
Using Venny, we can get a list of 145 overlapping proteins across our 7 included samples.
## NULL
## [1] 385
## Accession Score_rbc1 Score_rbc2 Score_rbc3
## 1 P02549 603.26 555.22 589.77
## 2 P11277 529.80 469.59 451.29
## 3 P35579 252.14 243.18 109.26
## 4 P16157 272.12 265.50 253.84
## 5 P01024 283.21 218.57 341.97
## 6 Q60FE5 182.62 166.18 117.20
## 7 Q9Y490 170.79 178.82 107.57
## 8 P04114 88.78 57.12 317.58
## 9 P04040 160.11 165.76 187.03
## 10 Q4VB86 88.57 111.50 78.08
## 11 P18206 84.96 82.30 48.70
## 12 P01023 109.50 80.01 197.52
## 13 P55072 76.30 81.12 75.02
## 14 P16452 75.38 73.45 59.41
## 15 P68871 2270.40 2191.33 1843.45
## 16 P02787 93.97 73.75 157.86
## 17 P02647 107.60 80.23 139.91
## 18 P02730 104.66 112.66 155.54
## 19 P02751 32.48 60.67 8.96
## 20 P69905 2603.91 2561.08 2047.56
## 21 Q13228 92.21 73.10 74.43
## 22 P02042 1294.86 1164.09 1073.23
## 23 P11142 73.19 77.98 77.60
## 24 P26038 40.62 36.37 22.96
## 25 P06727 59.43 52.24 79.93
## 26 P01009 107.58 57.60 128.13
## 27 P53396 21.81 30.63 17.60
## 28 P35612 37.17 44.20 62.14
## 29 P00918 124.98 100.40 109.52
## 30 Q00610 42.55 39.96 70.72
## 31 Q86UX7 46.11 44.62 20.83
## 32 P60709 101.21 110.86 81.91
## 33 P07384 51.55 42.10 41.09
## 34 P02671 72.40 67.00 58.89
## 35 P04406 83.13 66.95 74.40
## 36 P07900 36.60 45.92 64.91
## 37 P00558 57.17 52.22 41.40
## 38 Q5HYB6 88.10 85.83 56.40
## 39 A0A2R8YGX3 52.31 60.42 26.76
## 40 P00915 206.18 191.51 257.80
## 41 P00450 43.40 38.94 101.25
## 42 P00488 59.27 52.78 70.77
## 43 P50395 36.86 39.50 81.62
## 44 P04196 40.02 56.42 44.23
## 45 P30041 65.49 66.23 68.40
## 46 P31948 34.73 34.56 53.45
## 47 P21980 35.03 33.82 36.95
## 48 P22314 43.11 39.22 57.53
## 49 P62258 38.63 36.54 44.13
## 50 Q86VP6 37.57 33.18 40.18
## 51 Q08495 32.12 36.64 32.21
## 52 P06733 38.31 36.42 31.39
## 53 P00738 51.85 45.34 34.12
## 54 P32119 101.89 99.22 86.93
## 55 E7EV99 47.37 41.93 39.16
## 56 A0A0A0MSI0 44.34 43.92 38.57
## 57 P00352 46.73 36.19 55.22
## 58 P02675 52.78 31.03 21.55
## 59 P00491 30.47 43.43 54.69
## 60 P17987 33.21 39.91 32.17
## 61 P28289 51.45 45.81 46.97
## 62 Q01518 22.63 26.24 9.01
## 63 Q00013 15.07 39.28 35.31
## 64 P08238 26.35 37.04 49.33
## 65 P01857 79.31 50.32 115.95
## 66 P00338 21.94 23.01 12.76
## 67 P07195 45.81 36.27 41.51
## 68 P68032 50.25 54.32 58.93
## 69 O43707 34.99 36.18 24.62
## 70 P69892 269.65 258.89 209.79
## 71 P00739 30.08 29.61 24.59
## 72 P08514 43.23 31.83 21.50
## 73 Q9UQ80 11.78 26.94 22.50
## 74 O14818 25.39 35.69 20.94
## 75 P23526 38.83 25.50 29.54
## 76 Q9BQE3 39.72 37.03 13.73
## 77 P00734 21.62 39.01 45.17
## 78 O75083 22.69 26.22 17.05
## 79 A0A0A0MS51 31.67 39.21 52.13
## 80 P63104 31.45 34.86 38.40
## 81 P13798 31.49 32.39 62.39
## 82 P02649 26.64 28.19 15.30
## 83 Q9BT78 35.60 22.43 21.61
## 84 P48506 31.88 29.44 49.59
## 85 P0DMV9 21.21 17.61 35.02
## 86 P60842 19.51 26.96 25.58
## 87 P19105 22.69 32.57 7.06
## 88 Q9UNZ2 28.16 27.91 42.38
## 89 P07738 21.95 25.77 40.47
## 90 P25786 21.17 30.66 26.49
## 91 P27105 26.72 29.05 27.07
## 92 P37802 20.52 20.54 21.31
## 93 Q9H4B7 25.63 25.53 27.87
## 94 P40227 24.94 21.79 31.43
## 95 P07996 22.23 27.54 15.98
## 96 P02766 45.46 46.44 59.90
## 97 P54578 23.66 23.88 34.36
## 98 Q5T985 32.31 19.73 68.93
## 99 Q32Q12 37.10 29.95 21.02
## 100 P30153 21.07 16.82 26.91
## 101 P52209 24.96 25.83 23.69
## 102 P01011 18.12 15.57 37.41
## 103 P17174 20.94 18.14 9.37
## 104 P30043 74.48 49.57 107.18
## 105 P11021 18.06 16.41 13.16
## 106 P62333 12.15 21.93 8.13
## 107 Q13200 10.87 21.76 30.26
## 108 P68366 34.19 33.36 8.59
## 109 P38606 12.87 10.63 13.41
## 110 F5H345 9.36 18.18 12.55
## 111 E9PM69 19.53 17.86 26.45
## 112 E7EPV7 46.08 37.74 40.19
## 113 P31946 25.45 21.23 29.43
## 114 P53004 8.24 22.70 26.05
## 115 P07451 18.87 14.14 22.03
## 116 P10909 31.60 27.06 48.69
## 117 Q92905 8.06 14.07 17.57
## 118 P01859 58.01 39.56 94.20
## 119 Q14624 41.63 21.87 72.66
## 120 P01042 7.84 15.36 17.91
## 121 P09960 15.56 16.05 33.63
## 122 P28074 23.26 27.00 21.41
## 123 Q15257 22.58 17.01 21.30
## 124 P31939 32.68 15.57 24.32
## 125 P62826 36.51 21.54 21.34
## 126 P61224 23.34 23.16 16.81
## 127 P37837 23.60 24.31 21.22
## 128 Q99832 26.58 13.65 32.25
## 129 P50990 18.26 17.99 32.83
## 130 E7EQ12 16.30 21.93 23.18
## 131 P01871 23.93 27.39 53.00
## 132 G5E9F8 22.47 24.49 36.11
## 133 P54725 11.45 12.12 15.99
## 134 P61981 17.89 15.34 22.45
## 135 P27348 17.25 16.77 13.18
## 136 P01008 21.50 5.94 29.42
## 137 P61204 22.91 13.82 20.54
## 138 P11166 27.36 33.17 26.24
## 139 P62805 18.15 22.49 1.82
## 140 P13716 53.02 31.08 45.52
## 141 P05155 14.07 13.99 23.80
## 142 P01876 24.59 27.88 19.58
## 143 P00568 21.55 21.58 31.75
## 144 Q99733 14.26 13.01 19.80
## 145 P30101 12.15 11.02 4.03
## 146 P30086 21.43 13.61 15.75
## 147 P48426 18.90 17.88 15.68
## 148 P22061 16.13 15.92 11.67
## 149 P07737 12.80 19.71 10.69
## 150 P62195 6.07 14.52 8.47
## 151 P25787 21.65 26.45 25.93
## 152 P49721 26.19 14.72 10.92
## 153 O43242 16.87 8.99 12.03
## 154 Q16401 7.58 15.42 16.89
## 155 Q06323 16.81 16.65 17.89
## 156 Q9UL46 23.71 23.52 15.54
## 157 P49247 23.81 10.84 32.63
## 158 P02743 11.97 17.56 18.01
## 159 P50991 27.36 15.93 28.91
## 160 P60174 29.98 25.35 25.43
## 161 Q9C0C9 8.36 13.26 30.88
## 162 H7BZ94 9.61 7.19 6.67
## 163 C9JEU5 38.36 24.52 26.96
## 164 F5H265 27.79 26.39 9.02
## 165 P02765 47.49 52.95 74.15
## 166 P01834 41.26 43.86 74.29
## 167 P04217 18.63 13.56 33.13
## 168 Q9UKV8 4.04 8.67 7.48
## 169 O95782 8.96 5.16 7.72
## 170 P27797 10.36 11.58 5.65
## 171 P16152 11.02 12.66 20.00
## 172 P08603 20.93 13.91 39.58
## 173 Q16531 10.46 9.40 10.71
## 174 P14625 9.45 15.76 6.93
## 175 P50502 18.76 14.60 20.21
## 176 Q13630 17.73 16.04 16.60
## 177 P11413 7.70 11.17 18.38
## 178 P02008 43.30 41.99 63.40
## 179 P02790 26.09 10.67 27.34
## 180 Q14974 24.32 16.01 24.79
## 181 P17858 20.07 12.03 19.56
## 182 Q15691 7.78 8.86 8.00
## 183 Q8WUM4 17.56 11.39 6.39
## 184 P18669 17.41 12.88 11.65
## 185 P24666 17.45 20.45 8.09
## 186 P62937 18.23 20.47 22.27
## 187 P35998 10.45 8.87 15.58
## 188 P20618 15.91 20.67 23.05
## 189 P28070 12.17 13.44 21.52
## 190 O00231 3.81 4.56 8.30
## 191 Q9BWD1 7.48 11.33 4.84
## 192 P25325 10.40 12.15 18.22
## 193 P29401 18.56 10.83 21.82
## 194 P04004 20.12 14.30 26.46
## 195 C9J0K6 11.74 13.57 18.06
## 196 H3BPK3 23.89 21.47 22.25
## 197 M0R0Y2 11.36 17.37 19.24
## 198 A0A0C4DGZ5 0.00 6.09 2.46
## 199 A0A087WYS1 5.15 9.21 12.18
## 200 A0A0G2JMB2 17.98 21.54 20.16
## 201 A0A2R8Y5T7 17.68 13.72 17.86
## 202 B1ALA9 8.28 14.23 18.15
## 203 Q04917 6.97 8.04 12.80
## 204 O95336 12.29 8.34 11.05
## 205 P19652 10.59 8.16 19.54
## 206 P02656 12.37 10.74 9.94
## 207 P0DP25 28.51 15.09 23.74
## 208 Q13618 5.25 5.57 18.29
## 209 Q5TDH0 9.64 7.00 14.74
## 210 Q9NY33 20.48 12.24 26.89
## 211 P00740 15.12 17.83 20.12
## 212 P31150 17.30 10.76 36.14
## 213 Q9HC38 10.86 9.77 9.54
## 214 P36959 10.59 9.77 2.92
## 215 P00390 3.85 9.07 19.64
## 216 P09105 10.72 13.10 18.33
## 217 P05546 7.14 6.28 19.49
## 218 P01861 23.64 12.21 49.61
## 219 P13645 2.27 10.42 15.92
## 220 P30613 12.18 9.91 27.61
## 221 P43034 6.26 5.21 0.00
## 222 Q9NTK5 12.80 17.52 14.73
## 223 Q9GZP4 17.52 11.05 11.47
## 224 P00747 11.67 13.02 25.96
## 225 Q6XQN6 16.83 10.99 14.50
## 226 P62191 9.65 13.48 15.07
## 227 Q15404 4.02 8.33 3.83
## 228 O95810 14.30 12.72 2.84
## 229 O75368 11.54 11.36 18.83
## 230 Q9Y4E8 12.02 8.55 25.03
## 231 P45974 36.17 11.93 45.24
## 232 J3KQ34 9.00 5.85 4.53
## 233 E9PLD0 10.14 9.01 10.62
## 234 I3L0N3 4.63 3.77 13.47
## 235 P02774 20.09 7.00 30.09
## 236 X6RA14 14.82 8.94 19.73
## 237 A0A087WW66 7.94 4.30 18.47
## 238 P02763 12.54 7.27 41.42
## 239 P43652 5.43 2.38 19.04
## 240 Q9NZD4 6.23 8.23 13.74
## 241 P20073 12.46 9.52 8.04
## 242 P02655 14.11 8.60 8.11
## 243 P05090 0.00 2.79 3.68
## 244 O14791 2.38 6.81 9.62
## 245 Q5VW32 9.32 8.21 5.96
## 246 P04003 11.87 9.20 21.85
## 247 P52907 7.84 8.41 27.13
## 248 P00751 15.40 4.77 25.26
## 249 Q9Y2V2 6.45 9.53 5.71
## 250 O00299 6.23 6.51 9.62
## 251 P01031 7.11 2.83 48.24
## 252 P31146 4.69 5.73 5.82
## 253 P00742 2.01 7.39 11.86
## 254 Q9Y3I1 9.94 8.46 9.02
## 255 Q9H479 8.93 12.33 5.38
## 256 P07954 2.07 4.79 4.24
## 257 P48507 5.87 7.52 10.65
## 258 P09211 16.90 7.89 23.05
## 259 P16403 7.20 4.70 4.85
## 260 P00492 7.03 8.20 19.48
## 261 P19827 27.47 10.92 41.92
## 262 P04264 6.97 4.33 22.40
## 263 Q04760 9.56 6.25 7.43
## 264 Q5VVQ6 3.19 3.19 18.62
## 265 Q15365 3.10 5.16 5.09
## 266 Q96G03 14.01 6.03 17.01
## 267 P08567 7.61 8.93 6.35
## 268 P13796 2.73 5.46 26.73
## 269 P22891 0.00 3.06 4.60
## 270 P28072 5.71 8.11 7.98
## 271 Q99436 4.37 2.98 10.62
## 272 P51665 0.00 1.70 6.93
## 273 P13489 16.59 9.78 10.61
## 274 Q9Y265 8.72 8.84 6.06
## 275 Q9BSL1 6.57 8.31 7.35
## 276 Q9UIA9 10.14 2.11 13.51
## 277 Q9UK55 7.60 9.08 6.25
## 278 C9JVE2 2.78 6.42 2.14
## 279 B4E3S0 10.37 9.59 6.64
## 280 H0YLA4 2.22 2.94 7.83
## 281 E7EM64 9.95 6.28 12.48
## 282 F5H5V4 10.45 11.72 16.09
## 283 K7ERI9 9.75 4.63 13.25
## 284 R4GN98 8.66 4.51 6.67
## 285 I3L397 21.49 15.04 24.04
## 286 P54727 4.16 2.02 6.88
## 287 P49189 4.94 2.34 4.74
## 288 P35858 2.05 2.06 11.16
## 289 P01019 9.44 5.14 15.86
## 290 P05089 4.58 1.70 11.95
## 291 P60953 6.61 4.30 5.95
## 292 Q96DG6 2.41 2.15 4.66
## 293 P07360 6.44 5.96 13.68
## 294 P02748 13.30 5.44 18.49
## 295 P02775 9.50 9.54 10.67
## 296 O76003 1.99 3.81 10.01
## 297 P14770 2.32 2.22 2.37
## 298 Q6B0K9 5.83 5.37 6.40
## 299 P55010 2.25 4.83 5.24
## 300 P30740 7.47 4.75 4.13
## 301 Q15181 13.54 6.32 18.23
## 302 Q9BS40 2.29 6.23 7.00
## 303 Q9GZT8 3.20 4.92 3.22
## 304 O75340 4.46 2.57 4.55
## 305 P23284 2.62 4.45 0.00
## 306 P43686 8.83 0.00 6.28
## 307 P28066 9.67 7.88 15.58
## 308 O00487 2.98 0.00 1.62
## 309 Q15008 5.14 0.00 10.75
## 310 O15067 8.64 4.56 7.38
## 311 Q07960 5.76 6.03 6.87
## 312 P26447 5.51 5.15 9.31
## 313 P00441 10.03 6.85 9.42
## 314 P62328 17.20 14.98 0.00
## 315 O14980 3.13 1.91 10.64
## 316 Q5QPM9 3.05 6.00 9.19
## 317 D6RF62 10.04 4.05 12.11
## 318 Q5T2B5 5.83 4.92 12.74
## 319 F8W9S7 2.51 2.07 11.74
## 320 H0Y3Y9 2.36 2.36 6.52
## 321 I3L0S1 6.40 6.01 10.70
## 322 H0YJS4 3.95 4.56 3.02
## 323 H3BRV9 5.79 4.67 7.29
## 324 E5RIW3 12.34 5.31 10.18
## 325 H3BMU1 0.00 6.58 2.20
## 326 C9JJ34 5.48 5.07 3.09
## 327 A0A182DWH7 2.16 5.30 2.02
## 328 K7ER96 6.87 4.10 10.06
## 329 F8W0K0 0.00 3.96 1.96
## 330 A0A2Q2TTZ9 5.13 8.13 17.40
## 331 S4R460 8.63 7.93 14.60
## 332 P01619 6.84 6.50 15.26
## 333 A0A0A0MSV6 7.59 5.40 2.83
## 334 P04424 0.00 0.00 4.31
## 335 O75531 2.97 2.02 2.48
## 336 P02745 3.17 2.37 4.59
## 337 P02747 5.33 2.92 5.66
## 338 P20851 2.24 2.14 1.89
## 339 Q5TEZ5 2.42 2.55 2.35
## 340 Q8IUI8 2.25 1.99 6.53
## 341 P00167 6.58 2.71 7.35
## 342 P00748 0.00 2.00 6.07
## 343 Q01469 0.00 1.83 2.02
## 344 P04921 6.86 4.21 3.95
## 345 Q9UBQ7 2.47 0.00 3.53
## 346 Q9Y5Z4 1.91 0.00 0.00
## 347 O00505 2.96 2.95 2.80
## 348 O00410 5.08 2.72 12.69
## 349 P35527 0.00 0.00 1.73
## 350 Q7Z494 0.00 0.00 2.14
## 351 Q8NGB2 0.00 0.00 0.00
## 352 Q92882 1.60 0.00 0.00
## 353 Q9NRX4 9.17 0.00 5.29
## 354 P02776 6.46 2.03 6.33
## 355 O43598 7.30 3.21 4.15
## 356 P05387 2.43 3.25 6.97
## 357 P61956 3.34 2.86 2.99
## 358 P10599 8.75 3.71 7.28
## 359 Q14166 1.64 0.00 0.00
## 360 P68036 6.42 3.85 7.76
## 361 O75348 2.17 2.29 2.35
## 362 G3XAM2 1.85 1.92 5.74
## 363 Q5T0D2 1.86 0.00 5.81
## 364 E5RHK0 2.51 0.00 0.00
## 365 E7EQ47 1.90 2.36 5.01
## 366 I3L0K2 3.62 2.80 3.79
## 367 H0YJC6 17.39 9.23 2.33
## 368 H7C5G1 1.62 0.00 2.39
## 369 F5GY80 5.68 2.20 6.09
## 370 E7EWE1 2.25 0.00 0.00
## 371 D6RAW0 2.31 2.25 2.48
## 372 H7BXY6 0.00 0.00 0.00
## 373 J3QS45 2.78 2.70 2.62
## 374 K7EN45 3.09 2.88 5.95
## 375 K7ENY4 3.24 2.68 7.07
## 376 A0A075B6K5 3.74 3.50 3.89
## 377 A0A0B4J1U7 1.74 0.00 5.30
## 378 A0A075B6R9 1.67 0.00 3.28
## 379 A0A0U1RR22 0.00 2.29 0.00
## 380 P01624 1.97 0.00 7.11
## 381 K7ELW5 2.56 2.27 2.76
## 382 A0A0U1RQV5 1.81 1.73 2.56
## 383 A0A075B6Z2 0.00 1.64 1.67
## 384 A0A0A0MRZ8 2.56 2.33 2.41
## 385 H0YDX6 2.55 2.00 3.03
We can then implement some basic QC and take the 50 proteins from the common list with the highest scores for our coexpresison matrix.
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
## Warning: Ignoring unknown parameters: colour
We can use this correlation matrix of coexpressed proteins to create an adjacency matrix where everything over a certain value can be encoded as a 1 or a degree between one node and another and create a matrix.
The resulting network can then be visualised here - http://rpubs.com/shaneoconnell96/gene_name_corr_matrix_rbc
We can see from this network there are 3 distinct groups highlighted by community analysis. This essentially gives us an overall picture of what proteins are coexpressed with others - individual queries of different bioinformatics databases can be facilitated by this protocol based on what proteins seem to be playing an important role in this network. For example, while a sparse network, the yellow group here appear to be the only link between the denser red and blue groups - they have no direct link without these intermediary proteins (in this network).
The group memberships can be viewed here:
## id label group
## 1 HBA2 HBA2 1
## 2 HBB HBB 1
## 3 HBD HBD 1
## 4 SPTA1 SPTA1 2
## 5 SPTB SPTB 2
## 6 C3 C3 3
## 7 ANK1 ANK1 1
## 8 HBG2 HBG2 1
## 9 MYH9 MYH9 1
## 10 CA1 CA1 3
## 11 FLNA FLNA 1
## 12 TLN1 TLN1 1
## 13 CAT CAT 3
## 14 CA2 CA2 2
## 15 A2M A2M 3
## 16 APOA1 APOA1 3
## 17 SERPINA1 SERPINA1 3
## 18 SLC4A1 SLC4A1 3
## 19 PRDX2 PRDX2 1
## 20 ACTB ACTB 1
## 21 TF TF 3
## 22 SELENBP1 SELENBP1 2
## 23 APOB APOB 3
## 24 VCL VCL 1
## 25 GAPDH GAPDH 2
## 26 VCP VCP 1
## 27 EPB42 EPB42 1
## 28 BLVRB BLVRB 3
## 29 HSPA8 HSPA8 3
## 30 FGA FGA 1
## 31 PRDX6 PRDX6 3
## 32 F13A1 F13A1 3
## 33 PGK1 PGK1 1
## 34 ALAD ALAD 2
## 35 FGB FGB 1
## 36 HP HP 1
## 37 CAPN1 CAPN1 2
## 38 TMOD1 TMOD1 2
## 39 ACTC1 ACTC1 3
## 40 AHSG AHSG 3
## 41 ALDH1A1 ALDH1A1 3
## 42 FERMT3 FERMT3 1
The fibrin network can then be visualised here - http://rpubs.com/shaneoconnell96/genematfib As we can see, there is a lot more going on here from a network standpoint - there are more sparse interaction networks and overall a higher number of closed groups - for instance the 2 leftmost groups. There could be many biological interpretations for this.
## id label group
## 1 ALB ALB 4
## 2 HBB HBB 4
## 3 HBA1 HBA1 4
## 4 HBD HBD 4
## 5 C3 C3 5
## 6 APOB APOB 5
## 7 TF TF 4
## 8 A2M A2M 5
## 9 FGB FGB 5
## 10 FGA FGA 5
## 11 APOA1 APOA1 5
## 12 C4B C4B 5
## 13 SERPINA1 SERPINA1 5
## 14 SPTA1 SPTA1 3
## 15 CP CP 5
## 16 FLNA FLNA 1
## 17 F13A1 F13A1 2
## 18 AHSG AHSG 5
## 19 F2 F2 6
## 20 TLN1 TLN1 1
## 21 MYH9 MYH9 1
## 22 PLG PLG 4
## 23 C5 C5 5
## 24 ITIH4 ITIH4 5
## 25 CFH CFH 4
## 26 ITIH1 ITIH1 5
## 27 ACTB ACTB 1
## 28 FN1 FN1 1
## 29 HPX HPX 4
## 30 HP HP 2
## 31 TTR TTR 4
## 32 SPTB SPTB 3
## 33 CAT CAT 4
## 34 CA1 CA1 5
## 35 CLU CLU 5
## 36 ANK1 ANK1 4
## 37 HPR HPR 5
## 38 SERPINC1 SERPINC1 4
## 39 IGLL5 IGLL5 4
## 40 HRG HRG 6
## 41 SERPINA3 SERPINA3 3
## 42 GC GC 4
Finally, the entire network can be visualised here - http://rpubs.com/shaneoconnell96/commonnetworkgenenames This is a really interesting network diagram - there is one dense cluster of entirely closed off proteins in the blue group. You can have a look at this data frame to see what proteins/genes are interacting with each other in the blue group:
## id label group
## 1 HBA1 HBA1 1
## 2 HBB HBB 1
## 3 HBD HBD 1
## 4 SPTA1 SPTA1 1
## 5 SPTB SPTB 1
## 6 C3 C3 3
## 7 ANK1 ANK1 1
## 8 MYH9 MYH9 2
## 9 CA1 CA1 1
## 10 TLN1 TLN1 2
## 11 CAT CAT 1
## 12 CA2 CA2 1
## 13 A2M A2M 3
## 14 APOA1 APOA1 3
## 15 SERPINA1 SERPINA1 3
## 16 SLC4A1 SLC4A1 1
## 17 PRDX2 PRDX2 1
## 18 ACTB ACTB 2
## 19 TF TF 3
## 20 SELENBP1 SELENBP1 1
## 21 APOB APOB 3
## 22 VCL VCL 2
## 23 GAPDH GAPDH 1
## 24 VCP VCP 1
## 25 BLVRB BLVRB 1
## 26 FGA FGA 3
## 27 PRDX6 PRDX6 1
## 28 F13A1 F13A1 2
## 29 PGK1 PGK1 1
## 30 FGB FGB 3
## 31 HP HP 2
## 32 CAPN1 CAPN1 1
## 33 ACTC1 ACTC1 2
## 34 AHSG AHSG 3
## 35 FERMT3 FERMT3 2
## 36 LDHB LDHB 1
## 37 TTR TTR 3
## 38 CP CP 3
## 39 ITGA2B ITGA2B 2
## 40 UBA1 UBA1 1
## 41 ITIH4 ITIH4 3
## 42 MSN MSN 2
## 43 HRG HRG 2
## 44 AHCY AHCY 1
## 45 YWHAE YWHAE 1
## 46 ENO1 ENO1 2
## [1] "HBA1" "HBB" "HBD" "SPTA1" "SPTB" "ANK1"
## [7] "CA1" "CAT" "CA2" "SLC4A1" "PRDX2" "SELENBP1"
## [13] "GAPDH" "VCP" "BLVRB" "PRDX6" "PGK1" "CAPN1"
## [19] "LDHB" "UBA1" "AHCY" "YWHAE"
This is especially interesting because this entirely closed off group contains elements from both the RBC and FIB networks respectively - this total network may represent a more confident estimate of coexpression networks as it used more data points to calculate correlation estimates.
We can perform some enrichment analysis on this blue group versus the rest of the groups in the network to examine whta biological processes they are associated with.
The proteins in this group appear to be significantly enriched for RBC related biolgocial pathways. We can also examine all of the other protein groups too:
This enrichment result suggests the other groups are involved in more fibrin related biological processes - this is a really interesting network result from using common proteins.
We can now perform some differential protein expression analysis.
Below is data frame containing all of the common scores we want to incorporate for our analysis.
## Accession common_rbc1 common_rbc2 common_rbc3 common_fib1 common_fib2
## 1 P02549 603.26 555.22 589.77 202.52 122.55
## 2 P11277 529.80 469.59 451.29 94.44 79.44
## 3 P35579 252.14 243.18 109.26 137.74 421.31
## 4 P16157 272.12 265.50 253.84 81.45 52.32
## 5 P01024 283.21 218.57 341.97 623.36 346.62
## 6 Q9Y490 170.79 178.82 107.57 150.36 334.64
## common_fib3 common_fib4
## 1 66.96 106.59
## 2 49.95 83.54
## 3 391.85 203.19
## 4 50.58 53.31
## 5 405.86 518.68
## 6 362.43 241.09
## Accession common_rbc1 common_rbc2 common_rbc3 common_fib1 common_fib2
## 17 P69905 2603.91 2561.08 2047.56 835.77 497.98
## 12 P68871 2270.40 2191.33 1843.45 1268.70 617.27
## 19 P02042 1294.86 1164.09 1073.23 638.26 318.43
## 1 P02549 603.26 555.22 589.77 202.52 122.55
## 2 P11277 529.80 469.59 451.29 94.44 79.44
## 5 P01024 283.21 218.57 341.97 623.36 346.62
## 4 P16157 272.12 265.50 253.84 81.45 52.32
## 3 P35579 252.14 243.18 109.26 137.74 421.31
## 32 P00915 206.18 191.51 257.80 87.85 64.97
## 6 Q9Y490 170.79 178.82 107.57 150.36 334.64
## 8 P04040 160.11 165.76 187.03 89.24 44.60
## 24 P00918 124.98 100.40 109.52 38.32 19.72
## 10 P01023 109.50 80.01 197.52 431.36 210.03
## 14 P02647 107.60 80.23 139.91 264.21 122.51
## 22 P01009 107.58 57.60 128.13 217.49 123.64
## 15 P02730 104.66 112.66 155.54 63.40 50.07
## 42 P32119 101.89 99.22 86.93 71.05 48.94
## 26 P60709 101.21 110.86 81.91 107.79 233.55
## 13 P02787 93.97 73.75 157.86 482.88 188.63
## 18 Q13228 92.21 73.10 74.43 27.44 3.90
## 7 P04114 88.78 57.12 317.58 609.99 142.85
## 9 P18206 84.96 82.30 48.70 54.76 163.73
## 29 P04406 83.13 66.95 74.40 40.36 46.45
## 46 P01857 79.31 50.32 115.95 189.36 157.70
## 11 P55072 76.30 81.12 75.02 21.28 26.60
## 64 P30043 74.48 49.57 107.18 40.69 7.22
## 28 P02671 72.40 67.00 58.89 303.02 240.69
## 37 P30041 65.49 66.23 68.40 30.43 18.10
## 21 P06727 59.43 52.24 79.93 107.22 83.29
## 34 P00488 59.27 52.78 70.77 175.15 166.17
## 68 P01859 58.01 39.56 94.20 154.58 118.15
## 31 P00558 57.17 52.22 41.40 24.95 18.70
## 43 P02675 52.78 31.03 21.55 365.84 84.63
## 41 P00738 51.85 45.34 34.12 100.51 110.08
## 27 P07384 51.55 42.10 41.09 4.63 27.18
## 48 P68032 50.25 54.32 58.93 55.54 138.21
## 84 P02765 47.49 52.95 74.15 170.38 154.38
## 25 Q86UX7 46.11 44.62 20.83 36.85 56.86
## 47 P07195 45.81 36.27 41.51 26.80 32.70
## 62 P02766 45.46 46.44 59.90 95.94 24.01
## 33 P00450 43.40 38.94 101.25 192.99 52.88
## 50 P08514 43.23 31.83 21.50 34.19 71.94
## 38 P22314 43.11 39.22 57.53 5.99 5.92
## 69 Q14624 41.63 21.87 72.66 126.37 31.10
## 85 P01834 41.26 43.86 74.29 223.22 98.68
## 20 P26038 40.62 36.37 22.96 6.18 61.93
## 36 P04196 40.02 56.42 44.23 78.31 79.66
## 51 P23526 38.83 25.50 29.54 8.19 3.65
## 39 P62258 38.63 36.54 44.13 13.98 11.74
## 40 P06733 38.31 36.42 31.39 24.88 56.34
## common_fib3 common_fib4
## 17 704.33 575.92
## 12 848.73 711.13
## 19 354.44 355.07
## 1 66.96 106.59
## 2 49.95 83.54
## 5 405.86 518.68
## 4 50.58 53.31
## 3 391.85 203.19
## 32 61.27 88.26
## 6 362.43 241.09
## 8 54.65 37.48
## 24 22.19 21.00
## 10 261.57 354.39
## 14 158.51 234.28
## 22 126.51 205.27
## 15 19.73 53.45
## 42 58.81 48.41
## 26 251.59 154.39
## 13 284.39 334.06
## 18 7.68 8.74
## 7 107.29 454.10
## 9 195.58 96.21
## 29 49.27 36.91
## 46 210.14 159.70
## 11 15.03 16.80
## 64 5.46 18.15
## 28 403.82 603.68
## 37 13.99 24.47
## 21 97.78 120.24
## 34 208.39 156.19
## 68 151.99 129.74
## 31 27.86 5.95
## 43 257.63 757.23
## 41 191.65 98.77
## 27 16.14 6.71
## 48 186.96 89.44
## 84 146.58 185.46
## 25 59.39 49.48
## 47 28.50 29.50
## 62 58.88 62.44
## 33 69.34 155.26
## 50 55.56 45.10
## 38 3.47 6.37
## 69 67.42 121.68
## 85 120.69 172.86
## 20 50.71 6.76
## 36 80.28 64.14
## 51 0.00 2.38
## 39 12.45 11.37
## 40 54.71 35.00
Firstly, we must examine how the samples seperate apart - a formality to test for batch effects. Now we can visualise how different the samples look compared to one another:
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
## Warning: Removed 12 rows containing non-finite values (stat_boxplot).
Ideally, we would like the samples to be analysed to be as similar to each other as possible - hence, we will apply a normalisation step.. First, we can look to just getting a visual on the proteins most variable across conditions. We will also look to add gene symbols here instead of uniprot IDs to make it more readily interpretable.
Here, we can see a few clusters of variable proteins / genes across the samples. We can obtain a list of these after we normalise and perform our statistical test.
## [1] "coefficients" "stdev.unscaled" "sigma" "df.residual"
## [5] "cov.coefficients" "pivot" "rank" "Amean"
## [9] "method" "design"
## rbc.vs.fibc
## Down 19
## NotSig 70
## Up 36
## logFC AveExpr t P.Value adj.P.Val B
## ADD2 3.523710 10.33885 9.755782 4.322803e-06 0.0003809351 4.814930
## SPTB 3.012327 13.97215 9.201347 7.013079e-06 0.0003809351 4.305464
## UBA1 3.379121 10.41227 8.908779 9.142443e-06 0.0003809351 4.119141
## ANK1 2.514001 13.39419 8.578292 1.244253e-05 0.0003888290 3.728493
## HBA1 2.241807 16.72707 7.984419 2.218940e-05 0.0004675932 3.081652
## CA2 2.518498 12.15356 7.718634 2.906292e-05 0.0004675932 2.887596
## VCP 2.310769 11.74943 7.704539 2.948764e-05 0.0004675932 2.856833
## PLG -2.390628 12.18991 -7.690224 2.992596e-05 0.0004675932 2.873334
## YWHAE 1.992553 10.97619 7.458367 3.811730e-05 0.0005294070 2.646976
## SPTA1 2.663962 14.44386 6.810118 7.732511e-05 0.0009133175 1.814079
## logFC AveExpr t P.Value adj.P.Val B Accession
## ADD2 3.523710 10.33885 9.755782 4.322803e-06 0.0003809351 4.814930 ADD2
## SPTB 3.012327 13.97215 9.201347 7.013079e-06 0.0003809351 4.305464 SPTB
## UBA1 3.379121 10.41227 8.908779 9.142443e-06 0.0003809351 4.119141 UBA1
## ANK1 2.514001 13.39419 8.578292 1.244253e-05 0.0003888290 3.728493 ANK1
## HBA1 2.241807 16.72707 7.984419 2.218940e-05 0.0004675932 3.081652 HBA1
## CA2 2.518498 12.15356 7.718634 2.906292e-05 0.0004675932 2.887596 CA2
The above plots demostrate the differentially expresssed proteins / genes across the 7 samples, comparing RBC clots to FIB clots. The table presented shows the genes with their respective log2 fold changes - the higher the number, the more dramatic the change in the expression level of this protein between conditions relative to RBC vs. FIB. We can see if these results match up with what we would expect to see in different clots through a gene ontology enrichment analysis below. We can begin by using the top 30 proteins sorted by expression:
## identifier description
## 1 R-HSA-1247673 Erythrocytes take up oxygen and release carbon dioxide
## 2 R-MMU-3371511 HSF1 activation
## 3 R-HSA-3371511 HSF1 activation
## 4 R-HSA-1237044 Erythrocytes take up carbon dioxide and release oxygen
## 5 R-HSA-1480926 O2/CO2 exchange in erythrocytes
## 6 04141 Protein processing in endoplasmic reticulum
## pValue count populationAnnotationCount
## 1 6.7295002869789135E-6 4 9
## 2 7.490093104871675E-6 4 8
## 3 1.7534187128697088E-5 4 12
## 4 1.8962733317353597E-5 4 13
## 5 1.8962733317353597E-5 4 13
## 6 6.260353449041497E-4 8 331
## identifier description
## 1 04610 Complement and coagulation cascades
## 2 R-HSA-114608 Platelet degranulation
## 3 R-HSA-76005 Response to elevated platelet cytosolic Ca2+
## 4 R-HSA-140877 Formation of Fibrin Clot (Clotting Cascade)
## 5 R-HSA-76002 Platelet activation, signaling and aggregation
## 6 R-MMU-114608 Platelet degranulation
## pValue count populationAnnotationCount
## 1 8.184659933607973E-40 26 145
## 2 9.711568423889633E-17 14 127
## 3 1.1350383573899567E-16 14 132
## 4 1.747126239887467E-15 10 39
## 5 2.9098273677178E-14 15 259
## 6 7.843940449044366E-14 12 122
We can see from these resulting plots that some typical pathways are thrown up here associated with the different enrichments - the differentially expressed proteins in the RBC samples were significantly enriched for erythrocyte and small molecule transporting systems, and the differentially expressed proteins in the Fibrin samples were significantly associated with various platelet nd coagulation pathways.
STRING results can be visualised here for Fibrin related proteins -https://string-db.org/cgi/network.pl?taskId=fyhfSIJPuVah And here for RBC related proteins - https://string-db.org/cgi/network.pl?taskId=NlK4wpf2wxs5
This analysis showed some interesting results with respect to network dynamics between Fibrin and RBC rich clots. Using a common set found across all samples, most of the expected biological pathways associated with the 2 different clots considered were found to be enriched, such as platelet pathways enriched for Fibrin and molecule transport for RBC. This result shows us to some extent that the proteins detected by the MS/MS search were relevant and what we expect. Examination of the STRING results provided above is also interesting - both of the interaction networks had significantly more connections than would be expected from a random list of proteins, which means that the experiment pulled out some biologically relevant information. The analyses presented here will serve a basis for further hypothesis generation and interrogation.
Further experiments could be imporved through adding a larger sample size, wherein different models and techniques could be applied to glean more biological insights - in addition, the methods presented here would be aided greatly by an increased sample size and reduce the number of spurious associations picked up by all methods.