This is an R Markdown document containing the data wrangling, data exploration and quantification of SWATH data collected from piglet brains. The tissue samples used to generate this data contains grey and white matter derived from parasagital region of the frontal cortex. Sample preparation can be found in the manuscripts folder of this project. Briefely, sample preparation consisted of cell lysis, alkylation with acrylamide, digestion with Tripsin and desalting using amonium nitrate. Each brain sample was run through an ABSIEX5600 triple TOF machine in duplicate to ensure measurment reprodicubillity. There are 3 experimental groups in this study:
1) 8-day old healthy piglet (CONTROL)
2) Hypoxic-ischameic brain injured piglet treated with hypotherimia and stem cell vehicle (HI+HTH+PBS)
3) Hypoxic-ischeamic brain injured piglet treated with hypothermia and stem cell therapy (HI+HTH+SC)
Firstly, I would like to identify the major sources of variation in my data and identify whether such sources of variation correspond to biological conditions, or experimental bias. I would like to visualize trends or patterns between samples, whether they ‘naturally’ cluster according to known biological conditions.
In this section we will check for the presences of possible confounding variables using the mixomics package.
From the literature we know that there are multiple factors both environmental and inherited that can influence susceptibility of neonates to global HI injury other than our treatment.
To begin, will investigate the effect of: 1) Sex: Males have a higher incidence of negative outcome following HI injury 2) Litter: This combines both genetic and developmental environment in which there are many factors 3) Hypotensive time: This is analogous to ischemia exposure.
# Here we aim to assess the effect of our duplicate runs on the ABSIEX5600 Triple TOF.
# The pca() function requires a matrix argument to compute so we will wrangle the h.protein.levels data frames into matrix class, center and scale them, perform PCA and plot the PCA.
# Creation of a matrix for PCA computation
h.protein.levels.matrix <- as.matrix(h.protein.levels)
h.mean.protein.levels.matrix <- as.matrix(h.mean.protein.levels)
correl <- read_excel("data/Processed_Human_SWATH06-03-2020.xlsx",
sheet = "COR")
correl<- na.omit(correl)
corrplot(cor(correl), method = "number")
#Center and scale the data (cns. is refers to data that has been centered and scaled.).
cns.h.protein.levels.matrix <- scale(h.protein.levels, center = TRUE)
cns.h.mean.protein.levels.matrix <- scale(h.mean.protein.levels, center = TRUE)
## By plotting the PCA we can establish assess reproducibillity between duplicate runs.
CnS.PCA <- pca(cns.h.protein.levels.matrix)
PCA_duplicates <- plotIndiv(CnS.PCA, title = "Duplicate runs on the ABSCIEX Triple TOF are highly correlated", caption = "The replicates are highly correlated and so can be averaged", pch = 19)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
print(PCA_duplicates)
## $df
## x y group
## Brain 2334 (sample 1) -17.748562 -8.210981 No group
## Brain 2335 (sample 1) 32.496428 -8.528131 No group
## Brain 2368 (sample 1) 6.317161 19.981653 No group
## Brain 2343 (sample 1) -11.608157 -6.520845 No group
## Brain 2344 (sample 1) 26.208810 44.104522 No group
## Brain 2350 (sample 1) -5.290846 -6.606496 No group
## Brain 2351 (sample 1) 16.030898 -6.555408 No group
## Brain 2367 (sample 1) -4.530920 18.915783 No group
## Brain 2374 (sample 1) -41.232593 11.170900 No group
## Brain 2380 (sample 1) 12.536025 -7.039252 No group
## Brain 2407 (sample 1) 16.216228 -8.922253 No group
## Brain 2542 (sample 1) 15.252914 -9.277906 No group
## Brain 2543 (sample 1) -71.628534 -6.323911 No group
## Brain 2544 (sample 1) 43.675244 -10.680222 No group
## Brain 2545 (sample 1) -41.737018 -7.220758 No group
## Brain 2546 (sample 1) 43.781040 -10.702571 No group
## Brain 2334 (sample 2) -20.980347 -7.615934 No group
## Brain 2335 (sample 2) 31.922825 -8.333861 No group
## Brain 2368 (sample 2) 4.615286 20.353621 No group
## Brain 2343 (sample 2) -13.346807 -6.365588 No group
## Brain 2344 (sample 2) 23.157342 46.170313 No group
## Brain 2350 (sample 2) -6.495310 -6.696607 No group
## Brain 2351 (sample 2) 13.284188 -6.547719 No group
## Brain 2367 (sample 2) -10.199390 20.421594 No group
## Brain 2374 (sample 2) -45.142138 11.958283 No group
## Brain 2380 (sample 2) 10.348787 -7.488454 No group
## Brain 2407 (sample 2) 12.555333 -8.444258 No group
## Brain 2542 (sample 2) 12.399515 -9.365552 No group
## Brain 2543 (sample 2) -70.929544 -6.777313 No group
## Brain 2544 (sample 2) 43.798913 -10.623168 No group
## Brain 2545 (sample 2) -46.987602 -7.398883 No group
## Brain 2546 (sample 2) 43.260827 -10.830600 No group
## Block
## Brain 2334 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2335 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2368 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2343 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2344 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2350 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2351 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2367 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2374 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2380 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2407 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2542 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2543 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2544 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2545 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2546 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2334 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2335 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2368 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2343 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2344 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2350 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2351 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2367 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2374 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2380 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2407 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2542 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2543 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2544 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2545 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2546 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## pch pch.levels cex col pch.legend
## Brain 2334 (sample 1) 19 19 3 #388ECC 19
## Brain 2335 (sample 1) 19 19 3 #388ECC 19
## Brain 2368 (sample 1) 19 19 3 #388ECC 19
## Brain 2343 (sample 1) 19 19 3 #388ECC 19
## Brain 2344 (sample 1) 19 19 3 #388ECC 19
## Brain 2350 (sample 1) 19 19 3 #388ECC 19
## Brain 2351 (sample 1) 19 19 3 #388ECC 19
## Brain 2367 (sample 1) 19 19 3 #388ECC 19
## Brain 2374 (sample 1) 19 19 3 #388ECC 19
## Brain 2380 (sample 1) 19 19 3 #388ECC 19
## Brain 2407 (sample 1) 19 19 3 #388ECC 19
## Brain 2542 (sample 1) 19 19 3 #388ECC 19
## Brain 2543 (sample 1) 19 19 3 #388ECC 19
## Brain 2544 (sample 1) 19 19 3 #388ECC 19
## Brain 2545 (sample 1) 19 19 3 #388ECC 19
## Brain 2546 (sample 1) 19 19 3 #388ECC 19
## Brain 2334 (sample 2) 19 19 3 #388ECC 19
## Brain 2335 (sample 2) 19 19 3 #388ECC 19
## Brain 2368 (sample 2) 19 19 3 #388ECC 19
## Brain 2343 (sample 2) 19 19 3 #388ECC 19
## Brain 2344 (sample 2) 19 19 3 #388ECC 19
## Brain 2350 (sample 2) 19 19 3 #388ECC 19
## Brain 2351 (sample 2) 19 19 3 #388ECC 19
## Brain 2367 (sample 2) 19 19 3 #388ECC 19
## Brain 2374 (sample 2) 19 19 3 #388ECC 19
## Brain 2380 (sample 2) 19 19 3 #388ECC 19
## Brain 2407 (sample 2) 19 19 3 #388ECC 19
## Brain 2542 (sample 2) 19 19 3 #388ECC 19
## Brain 2543 (sample 2) 19 19 3 #388ECC 19
## Brain 2544 (sample 2) 19 19 3 #388ECC 19
## Brain 2545 (sample 2) 19 19 3 #388ECC 19
## Brain 2546 (sample 2) 19 19 3 #388ECC 19
##
## $df.ellipse
## NULL
##
## $graph
# Save the figure output in the /output folder
png("figs/Global_data_analysis/Figure 1 PCA duplicates.png")
cor_2334 <- cor.test(correl$`Brain 2334 (sample 1)`, correl$`Brain 2334 (sample 2)`)
cor_2335 <-cor.test(correl$`Brain 2335 (sample 1)`, correl$`Brain 2335 (sample 2)`)
cor_2343 <-cor.test(correl$`Brain 2343 (sample 1)`, correl$`Brain 2343 (sample 2)`)
cor_2344 <-cor.test(correl$`Brain 2344 (sample 1)`, correl$`Brain 2344 (sample 2)`)
cor_2350 <-cor.test(correl$`Brain 2350 (sample 1)`, correl$`Brain 2350 (sample 2)`)
cor_2351 <-cor.test(correl$`Brain 2351 (sample 1)`, correl$`Brain 2351 (sample 2)`)
cor_2367 <-cor.test(correl$`Brain 2367 (sample 1)`, correl$`Brain 2367 (sample 2)`)
cor_2368 <-cor.test(correl$`Brain 2368 (sample 1)`, correl$`Brain 2368 (sample 2)`)
#cor_2370 <-cor.test(correl$`Brain 2370 (sample 1)`, correl$`Brain 2370 (sample 2)`)
cor_2374 <-cor.test(correl$`Brain 2374 (sample 1)`, correl$`Brain 2374 (sample 2)`)
cor_2380 <-cor.test(correl$`Brain 2380 (sample 1)`, correl$`Brain 2380 (sample 2)`)
cor_2407 <-cor.test(correl$`Brain 2407 (sample 1)`, correl$`Brain 2407 (sample 2)`)
cor_2542 <-cor.test(correl$`Brain 2542 (sample 1)`, correl$`Brain 2542 (sample 2)`)
cor_2543 <-cor.test(correl$`Brain 2543 (sample 1)`, correl$`Brain 2543 (sample 2)`)
cor_2544 <-cor.test(correl$`Brain 2544 (sample 1)`, correl$`Brain 2544 (sample 2)`)
cor_2545 <-cor.test(correl$`Brain 2545 (sample 1)`, correl$`Brain 2545 (sample 2)`)
cor_2546 <-cor.test(correl$`Brain 2546 (sample 1)`, correl$`Brain 2546 (sample 2)`)
ggplot(correl, aes(x = correl$`Brain 2334 (sample 1)`, y = correl$`Brain 2334 (sample 2)`))+
geom_point()
## Warning: Use of `correl$`Brain 2334 (sample 1)`` is discouraged. Use `Brain 2334
## (sample 1)` instead.
## Warning: Use of `correl$`Brain 2334 (sample 2)`` is discouraged. Use `Brain 2334
## (sample 2)` instead.
The plots demonstrate that each of the duplicate samples are highly correlated thus, for simplicity we will take the mean of the two duplicates. All visualizations from here on will be derived from the mean response intensity.
## Initial data visualization
Ini.pca <- pca(cns.h.mean.protein.levels.matrix)
plotIndiv(Ini.pca, pch = 19, group = clinic.values$Group)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
plotVar(Ini.pca, cex = 2)
# Generate plots to show the effect of the confounding factor
par(mfrow = c(3, 1))
plotIndiv(Ini.pca, group = clinic.values$Group, pch = 19, title = "Group effect", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
plotIndiv(Ini.pca, group = clinic.values$Litter, pch = 19, title = "Litter effect", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
plotIndiv(Ini.pca, group = clinic.values$Sex, pch = 19, title = "Sex Effect", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
plotIndiv(Ini.pca, group = clinic.values$`Neuropathology Grade`, pch = 19, title = "Cluster by Neuropathological Grade", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
plotIndiv(Ini.pca, group = clinic.values$Seizures, pch = 19, title = "Cluster by Seizures", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it
# None of these factors appear to play a part in the clustering. Though Sex is debatable
It appears that the variables that we have selected are not major sources of variation in our dataset. The second question like to ask is what are the key variables that contribute to the most variance in the data set.
data <- cns.h.mean.protein.levels.matrix
# 1 Run the method
MyResult.spca <- spca(data, ncomp = 3, keepX = c(15, 15, 15))
# tune function is used to produce a scree plot to find the optimum number of components.
tune <- tune.pca(data)
## Eigenvalues for the first 10 principal components, see object$sdev^2:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## 1135.30860 287.90395 78.89961 48.94920 38.79034 26.16509 24.12099
## PC8 PC9 PC10
## 17.49989 16.33863 14.85971
##
## Proportion of explained variance for the first 10 principal components, see object$explained_variance:
## PC1 PC2 PC3 PC4 PC5 PC6
## 0.655111711 0.166130379 0.045527760 0.028245353 0.022383345 0.015098146
## PC7 PC8 PC9 PC10
## 0.013918631 0.010098032 0.009427944 0.008574557
##
## Cumulative proportion explained variance for the first 10 principal components, see object$cum.var:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.6551117 0.8212421 0.8667699 0.8950152 0.9173985 0.9324967 0.9464153 0.9565134
## PC9 PC10
## 0.9659413 0.9745159
##
## Other available components:
## --------------------
## loading vectors: see object$rotation
## Scree plot shows elbow at PC2
screeplot(tune)
#plotIndiv(MyResult.spca,
# group = clinic.values$Group, # 2 Plot the samples
#legend = TRUE, title = "Group effects, sPCA comp 1 - 2",
#legend.title = "Group", pch = 19
#)
plotVar(MyResult.spca, cex = 3, , pch = 19)
selectVar(MyResult.spca, comp = 1)$value
## value.var
## sp|P34932|HSP74_HUMAN -0.46042533
## sp|P55786|PSA_HUMAN -0.36544014
## sp|Q13564|ULA1_HUMAN -0.36151546
## sp|P43034|LIS1_HUMAN -0.34682468
## sp|Q13561|DCTN2_HUMAN -0.32693177
## sp|Q8ND76|CCNY_HUMAN -0.24968367
## sp|Q13033|STRN3_HUMAN -0.22901700
## sp|P63208|SKP1_HUMAN -0.20576138
## sp|P11142|HSP7C_HUMAN -0.20356046
## sp|P61088|UBE2N_HUMAN -0.19717086
## sp|P61019|RAB2A_HUMAN -0.16987153
## sp|P62937|PPIA_HUMAN -0.15769942
## sp|P61106|RAB14_HUMAN -0.05031748
## sp|Q15185|TEBP_HUMAN -0.04795931
## sp|Q9UL25|RAB21_HUMAN -0.02444969
plotLoadings(MyResult.spca)
selectVar(MyResult.spca, comp = 2)$value
## value.var
## sp|Q96CX2|KCD12_HUMAN 0.41998767
## sp|P21333|FLNA_HUMAN 0.39061877
## sp|P05107|ITB2_HUMAN 0.38134484
## sp|P52907|CAZA1_HUMAN 0.36895218
## sp|P12004|PCNA_HUMAN 0.31630944
## sp|Q9Y490|TLN1_HUMAN 0.26821621
## sp|Q14019|COTL1_HUMAN 0.23382258
## sp|Q9P2E9|RRBP1_HUMAN 0.22914294
## sp|Q13510|ASAH1_HUMAN 0.19786611
## sp|E9PAV3|NACAM_HUMAN 0.19350547
## sp|P26038|MOES_HUMAN 0.13802526
## sp|O00299|CLIC1_HUMAN 0.08219426
## sp|Q9P1F3|ABRAL_HUMAN 0.06350730
## sp|P11310|ACADM_HUMAN 0.06026694
## sp|Q9H8S9|MOB1A_HUMAN 0.01546435
plotLoadings(MyResult.spca, comp = 2)
# Extract and save the largest predictors from the three principal components
write.csv(MyResult.spca$loadings, "output/Global_data_analysis/Principal component loadings - Human.csv")
Given the number of duplicates we will see the implications of removing the duplicate genes and seeing what other proteins are contributing to the variance.
## Visualization of the global comparisons using heatmaps and dendrograms
# heatmap with dendrogram reordering + expected ordering
color_vector <- c("1", "2", "2", "1", "1", "2", "1", "2", "1", "2", "2", "3", "3", "3", "3", "3")
colMain <- colorRampPalette(brewer.pal(3, "Blues"))(25)
hm <- heatmap(data, RowSideColors = color_vector)
print(hm)
## $rowInd
## [1] 14 16 2 11 6 12 10 4 8 7 5 3 1 9 15 13
##
## $colInd
## [1] 1224 1405 4 236 962 913 1591 394 370 756 1285 705 765 1207
## [15] 1234 197 1573 1507 456 1255 1690 982 861 1715 1431 1240 67 1400
## [29] 154 84 166 419 763 845 888 255 116 1064 176 681 297 1233
## [43] 186 555 443 393 1683 739 789 46 423 1468 1527 482 1568 1013
## [57] 1156 55 1730 1230 957 1718 1557 1501 1223 942 305 902 533 1595
## [71] 1109 472 1684 735 62 1094 1723 327 549 1615 770 221 717 543
## [85] 1728 1374 737 1293 1 564 556 475 1042 35 1545 784 1430 1510
## [99] 1596 1691 243 130 742 728 444 1361 100 1325 752 16 316 513
## [113] 1038 1635 287 786 1328 523 1320 1332 1479 248 308 619 1003 1441
## [127] 761 1713 749 795 1077 249 945 455 1291 483 1574 1554 1727 1175
## [141] 517 1059 401 1250 341 971 1121 437 562 369 506 591 1189 1167
## [155] 883 605 903 1051 733 1434 96 1369 1235 1547 1313 347 227 1061
## [169] 1411 1205 160 139 610 53 262 1609 656 150 1685 996 952 1582
## [183] 1136 1114 397 1300 606 521 1480 993 442 485 1284 1007 1215 837
## [197] 1012 433 862 1165 612 892 420 1287 560 1254 927 935 1389 1316
## [211] 1556 716 804 767 710 662 779 572 291 1170 83 179 1225 219
## [225] 399 621 922 421 977 245 261 558 1041 719 112 128 391 626
## [239] 725 276 518 1465 1010 222 1280 1366 624 1544 1491 660 665 974
## [253] 440 1208 121 1419 195 1119 738 343 1689 452 1317 1412 256 1476
## [267] 476 294 1276 226 1385 632 1377 948 1019 1436 152 1524 1278 408
## [281] 863 827 759 616 1006 99 1004 520 1382 1492 986 852 428 1218
## [295] 1227 1145 1113 457 714 145 1629 1680 751 1049 514 1675 155 469
## [309] 389 814 776 581 689 654 1186 106 435 803 1552 1481 1710 396
## [323] 981 658 149 1669 1105 904 1275 1184 1074 730 449 430 333 1417
## [337] 153 7 191 487 1698 1221 1260 1672 1355 683 900 450 966 939
## [351] 126 1469 1646 10 928 1081 384 280 726 526 847 1266 210 183
## [365] 898 208 123 306 914 71 324 963 368 1399 93 577 1137 1058
## [379] 886 912 1084 267 1226 1394 1263 224 546 328 1104 337 1550 462
## [393] 1214 460 1080 1039 1100 839 1307 897 1386 1658 1178 829 98 1147
## [407] 1393 1616 586 1482 403 909 875 498 193 1258 97 103 313 1211
## [421] 711 335 323 592 964 1657 1478 594 908 1459 1688 1035 1487 230
## [435] 940 1447 979 1052 184 674 602 348 953 559 1575 906 899 1378
## [449] 721 1295 650 671 855 631 809 1239 1373 1073 1126 840 1056 1579
## [463] 1161 497 20 357 698 930 1219 1602 676 82 223 494 275 120
## [477] 846 124 667 1070 1345 1549 1349 1408 1581 813 1306 1432 496 1217
## [491] 1398 1721 1123 1201 1564 1033 1726 1422 1653 646 473 1040 1371 1630
## [505] 1515 824 818 1499 352 879 989 1322 298 65 212 1402 718 119
## [519] 848 1203 45 11 169 172 1249 596 917 1118 122 309 181 1588
## [533] 89 608 1095 288 141 826 140 37 1289 44 388 378 618 871
## [547] 1359 1592 972 857 240 490 1521 648 258 38 1251 1044 627 142
## [561] 102 432 812 284 87 527 1265 331 554 6 127 425 201 568
## [575] 1553 1508 1268 1519 864 136 639 350 182 77 704 79 463 424
## [589] 1120 218 1444 896 322 798 885 1153 1725 830 753 1143 539 615
## [603] 961 758 783 385 1139 220 134 431 1466 70 477 563 778 653
## [617] 1236 553 41 754 503 132 147 1523 771 488 326 1292 439 101
## [631] 26 241 918 1559 1290 180 657 215 589 1305 1449 1256 1451 668
## [645] 164 32 1600 157 28 108 167 1253 817 5 76 263 515 600
## [659] 318 1183 426 33 582 567 148 1693 110 1464 1216 54 118 1506
## [673] 321 1528 552 1509 872 400 685 1699 690 332 1656 1673 1429 1612
## [687] 1338 854 59 1461 447 293 18 329 61 1594 743 1023 372 52
## [701] 295 531 1212 931 464 113 537 342 1060 949 630 923 1252 706
## [715] 688 1053 292 24 672 92 1370 944 1047 244 525 159 387 1540
## [729] 522 353 81 645 1687 1420 774 968 574 175 647 88 701 74
## [743] 1720 1107 593 642 806 792 1246 56 1005 1093 841 625 211 958
## [757] 383 1621 595 822 1097 151 304 242 769 279 873 1407 429 1577
## [771] 225 312 956 823 480 1490 764 68 1314 680 466 1347 64 544
## [785] 884 282 320 1157 107 1330 1213 744 404 1622 47 929 1362 1639
## [799] 530 1002 1640 314 402 1724 673 363 1403 588 623 133 448 51
## [813] 736 613 300 232 117 805 8 1388 946 584 980 228 828 237
## [827] 371 360 785 1368 1530 366 682 880 40 50 48 762 1197 325
## [841] 359 173 542 551 43 29 502 722 816 1611 168 470 1607 1413
## [855] 858 1087 893 1022 666 114 129 315 19 489 21 1155 415 85
## [869] 165 30 550 289 643 170 838 842 31 405 471 376 1642 257
## [883] 686 1437 260 617 1176 379 1146 178 1442 407 1620 1692 441 745
## [897] 791 775 529 1576 1244 988 859 1443 1379 1091 1391 269 534 1269
## [911] 239 1327 1709 1302 1180 604 66 994 969 253 651 1195 1619 781
## [925] 1471 409 1517 27 1096 373 213 13 915 1546 755 1462 438 202
## [939] 204 339 819 143 729 1169 205 1164 508 1695 233 1733 802 1299
## [953] 1624 998 1151 1643 1561 1647 1626 104 955 491 1237 144 264 919
## [967] 652 15 628 90 1229 271 1270 1504 1395 1046 57 1665 583 1088
## [981] 566 889 177 870 115 569 1571 1401 1608 86 990 1193 1277 1714
## [995] 702 203 1259 1696 699 1303 1115 950 484 1102 1729 1335 188 42
## [1009] 1543 34 362 833 603 317 920 1296 265 1560 1106 1637 561 1537
## [1023] 495 1410 684 1450 1520 1396 1732 548 458 644 1283 1448 330 138
## [1037] 997 1425 801 283 158 1570 25 1704 1108 49 1274 1708 1079 493
## [1051] 808 1261 1090 281 1644 1174 850 1463 984 492 1486 1584 740 1454
## [1065] 933 1082 358 1654 1257 882 1162 1676 1678 565 820 832 1243 835
## [1079] 1627 1011 1372 1089 105 382 1210 678 1578 932 1279 516 634 1192
## [1093] 286 1166 1103 1132 1652 1324 1404 1241 1144 1333 1365 1535 1326 1636
## [1107] 1188 1423 1700 1245 664 1298 796 380 200 135 1066 794 1493 1187
## [1121] 1310 1182 1551 1694 411 254 1671 970 231 459 78 578 1020 687
## [1135] 675 1014 1572 777 1050 1438 1134 1380 599 1026 39 375 478 355
## [1149] 235 395 356 427 171 1610 1518 1075 925 1001 274 1651 1662 1580
## [1163] 731 1495 1496 445 1500 1634 23 1674 1273 187 1650 1173 499 1424
## [1177] 1593 1421 1045 1489 1348 1297 303 1353 301 1661 622 1455 1494 1336
## [1191] 1063 1381 1057 700 1356 709 1599 607 1376 1055 268 1148 1514 1360
## [1205] 1294 1122 1177 1232 345 860 992 465 1585 597 1470 162 1539 1614
## [1219] 1154 1498 1337 163 538 1242 270 865 1681 1702 1705 1312 1538 810
## [1233] 296 1352 1185 557 540 1078 285 1558 532 418 246 611 346 1311
## [1247] 1503 36 194 311 1027 334 1318 131 17 185 659 1099 259 633
## [1261] 1308 541 695 1648 1072 536 937 1131 377 519 1127 844 1009 976
## [1275] 95 512 250 1569 1138 1460 868 507 299 692 1426 1638 938 1531
## [1289] 1128 1563 1069 821 655 987 214 620 746 1541 1682 1472 750 598
## [1303] 80 58 1586 1566 1475 1309 1567 1697 361 576 881 229 797 1458
## [1317] 641 936 1409 1660 190 694 414 921 351 1440 156 207 1301 1483
## [1331] 1548 2 3 217 1288 1150 995 1194 1024 1160 1111 696 901 1204
## [1345] 1384 504 1526 1350 1686 669 843 585 757 787 1484 1625 251 349
## [1359] 474 713 1649 811 486 500 1329 234 741 891 364 649 319 381
## [1373] 601 894 69 712 1522 1264 1397 1416 1679 192 272 772 1717 1363
## [1387] 278 1711 1028 1272 1601 1110 635 973 570 451 1220 1067 1152 535
## [1401] 780 174 1159 853 1664 1130 1583 1196 1191 924 510 1117 1085 1456
## [1415] 1435 1670 1209 1719 1712 1529 1171 1036 216 887 1474 1228 991 1062
## [1429] 1032 461 1525 1262 1282 1427 1076 1344 398 1722 1172 198 640 1168
## [1443] 1248 545 1516 1342 422 1029 1357 1617 374 1129 1597 1605 1358 1659
## [1457] 1018 1618 867 1667 703 1706 1701 1457 1645 1562 573 1452 1505 196
## [1471] 691 760 983 575 1351 1321 1542 1281 1703 505 967 266 782 748
## [1485] 876 1037 866 1048 73 587 9 547 12 1112 1632 1267 1346 1418
## [1499] 1631 934 1200 1655 1433 1628 413 834 354 501 1453 75 1125 815
## [1513] 1534 1387 1142 446 1502 734 340 1179 1068 479 453 609 146 1392
## [1527] 1086 926 1071 788 579 849 1641 1666 999 1271 1565 720 1477 454
## [1541] 1231 63 629 1315 851 336 1199 732 1606 916 137 1140 72 715
## [1555] 161 199 1092 1181 1590 91 1286 1202 111 1633 614 869 670 1000
## [1569] 1135 1513 1598 707 1222 1677 985 1439 1343 1331 790 1587 1034 1015
## [1583] 277 1098 856 1319 571 1323 895 878 1065 468 436 1445 1304 1707
## [1597] 1668 1446 1731 727 1512 60 708 766 1025 1340 1163 1663 386 410
## [1611] 1603 416 1555 1341 1414 94 910 1158 252 1339 793 1367 1473 960
## [1625] 907 836 1623 890 302 390 943 1016 1415 874 959 1116 1149 1375
## [1639] 1133 1054 831 1716 1247 1017 511 947 1083 941 1383 954 807 434
## [1653] 238 975 580 1124 1536 481 247 768 392 724 1198 799 1364 1031
## [1667] 911 636 965 528 1101 693 1206 1428 206 747 365 1589 290 417
## [1681] 1467 697 1008 800 307 406 1334 773 273 338 189 677 1390 1488
## [1695] 1043 1141 344 877 679 412 310 663 1604 524 125 1238 637 978
## [1709] 509 1354 1511 467 22 905 1533 1406 951 1497 1021 590 209 638
## [1723] 109 723 1613 1532 1030 1485 825 1190 367 661 14
##
## $Rowv
## NULL
##
## $Colv
## NULL
png("figs/Global_data_analysis/Global differences in protein expression.png")
# Import the data
Control_vs_ALL <- read_excel("data/Control vs ALL.xlsx")
# Separate out the Peak Name column into the UniprotID and drop NAs
Control_vs_ALL <- Control_vs_ALL %>%
separate(`Peak Name`, into = c("sp", "UniProt", "UP"), sep = "[|]") %>%
drop_na()
# Repeat above
ControlvsHI_PBS <- read_excel("data/ControlvsHI+PBS.xlsx")
ControlvsHI_PBS <- ControlvsHI_PBS %>%
separate(`Peak Name`, into = c("sp", "UniProt", "UP"), sep = "[|]") %>%
drop_na()
# Repeat above
HI_SCvsHI_PBS <- read_excel("data/HI+SCvsHI+PBS.xlsx")
HI_SCvsHI_PBS <- HI_SCvsHI_PBS %>%
separate(`Peak Name`, into = c("sp", "UniProt", "UP"), sep = "[|]") %>%
drop_na()
# ----------------------------------------------------------------
# Visualize the Volcano plots using the enhanced volcano package from biconductor
# ----------------------------------------------------------------
EnhancedVolcano(Control_vs_ALL,
lab = "UniProt",
x = "Log (Fold Change)",
y = "p-value",
xlim = c(-2, 2),
ylim = c(0, 2),
title = "Control animals vs HI+HTH"
)
## Warning: Removed 1 rows containing missing values (geom_hline).
EnhancedVolcano(ControlvsHI_PBS,
lab = "UniProt",
x = "Log (Fold Change)",
y = "p-value",
xlim = c(-2, 2),
ylim = c(0, 2),
title = "Control animals vs HI+HTH+PBS"
)
## Warning: Removed 1 rows containing missing values (geom_hline).
EnhancedVolcano(HI_SCvsHI_PBS,
lab = "UniProt",
x = "Log (Fold Change)",
y = "p-value",
xlim = c(-2, 2),
ylim = c(0, 2),
title = "HI+HTH+SC vs HI+HTH+PBS"
)
## Warning: Removed 1 rows containing missing values (geom_hline).
Next we will ask the question: What specific proteins are different between the three sample groups. We will do this with analysis of variance (ANOVA) with Tukey’s Honest significance test to show the between group differences.
#----------------------------------------------------------------------------------------------------------------------------------
Group <- as.factor(t(clinic.values$Group)) # groups you are comparing
Data <- as.matrix(h_data[, 34:49])
rownames(Data) <- h_data$Protein
Data <- na.omit(Data)
# we test one variable at a time, extracting the pvalues
aovmod <- apply(Data, 1, function(x) {
aov(x ~ Group)
})
# extract the p-values for each ANOVA
pvalaov <- sapply(aovmod, function(x) {
summary(x)[[1]][["Pr(>F)"]][1]
})
# Number of DE
sum(pvalaov <= 0.05)
## [1] 51
# Adjust for multiple testing
pvalaov_adj <- p.adjust(pvalaov, method = "BH")
sum(pvalaov_adj <= 0.1)
## [1] 0
# Number of DE
pval.05 <- pvalaov[pvalaov <= 0.05]
pval.1 <- pvalaov[pvalaov <= 0.1]
length(pval.05)
## [1] 51
length(pval.1)
## [1] 130
#--------------------------- If you have more than two Groups ------------------------------------
# perform posthoc Tukey test
postTurSHD <- t(lapply(aovmod, function(x) {
TukeyHSD(x)$"Group"[, "p adj"]
}))
# extract adjust p value for each pair
postTurSHD_Table <- data.frame(matrix(unlist(postTurSHD, use.names = TRUE), nrow = nrow(Data), byrow = T))
# set renames
rownames(postTurSHD_Table) <- names(aovmod)
# set colnames names
colnames(postTurSHD_Table) <- names(postTurSHD[[1]]) # this looks at the first variable and extracts the pairwise comparisons
HI_HTH_SC_HI_HTH_PBS.sig <- row.names(postTurSHD_Table)[which(postTurSHD_Table$`HI+HTH+SC-HI+HTH+PBS` <= 0.05)]
HI_HTH_SC_HI_HTH_PBS.sig <- row.names(postTurSHD_Table)[which(postTurSHD_Table$`HI+HTH+SC-HI+HTH+PBS` <= 0.1)]
HI_HTH_PBS_CONTROL.sig <- row.names(postTurSHD_Table)[which(postTurSHD_Table$`HI+HTH+PBS-CONTROL` <= 0.1)]
# Print significantly different proteins between the two treatment groups
print(HI_HTH_SC_HI_HTH_PBS.sig)
## [1] "sp|O43707|ACTN4_HUMAN" "sp|P18206|VINC_HUMAN"
## [3] "sp|P62195|PRS8_HUMAN" "sp|Q09666|AHNK_HUMAN"
## [5] "sp|P14550|AK1A1_HUMAN" "sp|Q06830|PRDX1_HUMAN"
## [7] "sp|P48444|COPD_HUMAN" "sp|Q06323|PSME1_HUMAN"
## [9] "sp|P60842|IF4A1_HUMAN" "sp|P37837|TALDO_HUMAN"
## [11] "sp|Q9UJU6|DBNL_HUMAN" "sp|Q9BR76|COR1B_HUMAN"
## [13] "sp|Q09028|RBBP4_HUMAN" "sp|O60256|KPRB_HUMAN"
## [15] "sp|P15311|EZRI_HUMAN" "sp|Q04637-3|IF4G1_HUMAN"
## [17] "sp|Q9UL46|PSME2_HUMAN" "sp|Q13418|ILK_HUMAN"
## [19] "sp|P35637|FUS_HUMAN" "sp|P62136|PP1A_HUMAN"
## [21] "sp|P02649|APOE_HUMAN" "sp|P16403|H12_HUMAN"
## [23] "sp|P43243|MATR3_HUMAN" "sp|P07858|CATB_HUMAN"
## [25] "sp|P49902|5NTC_HUMAN" "sp|P46976|GLYG_HUMAN"
## [27] "sp|P26583|HMGB2_HUMAN" "sp|Q99447-4|PCY2_HUMAN"
## [29] "sp|P15586|GNS_HUMAN" "sp|P27105|STOM_HUMAN"
## [31] "sp|P50440|GATM_HUMAN" "sp|P49321|NASP_HUMAN"
## [33] "sp|Q7L576|CYFP1_HUMAN" "sp|Q9Y6N5|SQOR_HUMAN"
# Save the output
write.csv(HI_HTH_SC_HI_HTH_PBS.sig, "output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS BH adjusted.csv")
write.csv(pval.05, "output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS p05.csv")
write.csv(pval.1, "output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS p=1.csv")
#============================================================================================
# Effect size
ggplot(postTurSHD_Table, aes(`HI+HTH+PBS-CONTROL`))+
geom_histogram(bins = 50, aes(alpha = 0.05))+
theme_minimal()+
geom_vline(xintercept = 0.05)+
geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text = element_text(size=11))+
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+
ggtitle("HI+HTH+PBS vs CONTROL")+
xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
ggplot(postTurSHD_Table, aes(`HI+HTH+SC-CONTROL`))+
geom_histogram(bins = 50, aes(alpha = 0.05))+
theme_minimal()+
geom_vline(xintercept = 0.05)+
geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+
ggtitle("HI+HTH+SC vs CONTROL")+
xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
ggplot(postTurSHD_Table, aes(`HI+HTH+SC-HI+HTH+PBS`))+
geom_histogram(bins = 50, aes(alpha = 0.05))+
theme_minimal()+
geom_vline(xintercept = 0.05)+
geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+
ggtitle("HI+HTH+PBS vs HI+HTH+SC")+
xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
# Here we aim to assess the effect of our duplicate runs on the ABSIEX5600 Triple TOF.
# Creation of a matrix for PCA computation
h.mean.protein.levels.matrix <- as.matrix(h.mean.protein.levels)
#Center and scale the data (cns. is refers to data that has been centered and scaled.).
cns.h.mean.protein.levels.matrix <- scale(h.mean.protein.levels, center = TRUE)
## By plotting the PCA we can establish assess reproducibillity between duplicate runs.
## Visualization of the global comparisons using heatmaps and dendrograms
data <- cns.h.mean.protein.levels.matrix
proteins <- read_csv("output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS p05.csv")
## New names:
## * `` -> ...1
## Rows: 51 Columns: 2
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ...1
## dbl (1): x
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
prot.names <- read_excel("output/Vehicle_vs_Stem cells/Protein namesHI-HTH-SC vs HI-HTH-PBS p05.xlsx")
## New names:
## * `` -> ...4
#prot.name <- as.data.frame(HI_HTH_SC_HI_HTH_PBS.sig)
proteins <- proteins$X1
## Warning: Unknown or uninitialised column: `X1`.
sub_data <- t(subset(data, select = proteins))
# heatmap with dendrogram reordering + expected ordering
color_vector <- c("1", "2", "2", "1", "1", "2", "2", "1", "1", "2", "2", "3", "3", "3", "3", "3")
colMain <- viridis(3)
#hm <- heatmap(data, RowSideColors = color_vector,col=magma(100, direction = 1))
dist.pear <- function(x) as.dist(1-cor(t(x)))
hclust.ave <- function(x) hclust(x, method="mcquitty")
#heatmap.2(sub_data,
# trace="none",
# distfun=dist.pear,
# hclustfun=hclust.ave,
# scale="col",
# ColSideColors = color_vector,
# col=viridis(100, direction = 1),
# margins = c(7,35),
# cexRow = 1.3,
# plot labels
# main = "Expression of significantly altered proteins",
#ylab = "Significant Proteins",
# xlab = "Piglet",
# labRow = prot.names$Protein,
#labCol = c("PBS.1", "SC.1", "SC.2","PBS.2", "PBS.3", "SC.3","SC.4", "PBS.4", "PBS.5", "SC.5", "SC.6", "CON.1", "CON.2", "CON.3","CON.4", "CON.5"))
data <- h.mean.protein.levels[HI_HTH_SC_HI_HTH_PBS.sig]
proteins <- colnames(data)
proteins <- sub("..........|*xxx", "", proteins)
proteins <- sub("\\_.*", "",proteins)
for (i in 1:length(data)){
plot <- ggplot(data, aes(x = Group, y = data[,i], fill = Group))+
geom_boxplot()+
geom_point()+
labs(y = "Response", x = "Treatment Group", title = paste(proteins[i]))+
theme(legend.position = "none")
print(plot)
}
data <- h.mean.protein.levels[HI_HTH_PBS_CONTROL.sig]
proteins <- colnames(data)
proteins <- sub("..........|*xxx", "", proteins)
proteins <- sub("\\_.*", "",proteins)
for (i in 1:length(data)){
plot <- ggplot(data, aes(x = Group, y = data[,i], fill = Group))+
geom_boxplot()+
geom_point()+
labs(y = "Response", x = "Treatment Group", title = paste(proteins[i]))+
theme(legend.position = "none")
print(plot)
}
# Data wrangling into a format suitable for cluster profiler
##transform our data into log2 base.
#calculate the log mean of each gene per group
PBS <- c("2334","2344","2350","2368","2374")
SC <- c("2335","2343","2351","2367","2380","2407")
CON <- c("2542","2543","2544","2545","2546")
PBS.g <- subset(Data, select = PBS)
SC.g <- subset(Data, select = SC)
CON.g <- subset(Data, select = CON)
PBS.mean <- rowMeans(PBS.g)
SC.mean <- rowMeans(SC.g)
CON.mean <- rowMeans(CON.g)
PBS.logmean <- log2(PBS.mean)
SC.logmean <- log2(SC.mean)
CON.logmean <- log2(CON.mean)
#because our data is already log2 transformed, we can take the difference between the means. And this is our log2 Fold Change or log2 Ratio == log2(control / test)
foldchange_CON.vs.HI_HTH_PBS <- as.matrix(SC.logmean-PBS.logmean)
foldchange_CON.vs.HI_HTH_PBS <- as.matrix(CON.logmean-PBS.logmean)
foldchange_CON.vs.HI_HTH_SC <- as.matrix(CON.logmean-SC.logmean)
CON.vs.HI_HTH_PBS <- data.frame(cbind(postTurSHD_Table$`HI+HTH+SC-HI+HTH+PBS`, foldchange_CON.vs.HI_HTH_PBS[,1]))
CON.vs.HI_HTH_PBS <- data.frame(cbind(postTurSHD_Table$`HI+HTH+PBS-CONTROL`, foldchange_CON.vs.HI_HTH_PBS [,1]))
CON.vs.HI_HTH_SC <- data.frame(cbind(postTurSHD_Table$`HI+HTH+SC-CONTROL`,foldchange_CON.vs.HI_HTH_SC[,1]))
sig <- CON.vs.HI_HTH_SC[CON.vs.HI_HTH_SC$X1 <= 0.01 & CON.vs.HI_HTH_PBS$X2 >= 0.5]
hist(foldchange_CON.vs.HI_HTH_PBS, main = "Distribution of log fold change between HI+HTH+PBS - HI+HTH+SC")
png("figs/Distribution of fold change between Control - HI+HTH+SC")
data <-CON.vs.HI_HTH_PBS
ggplot(data, aes(X2))+
geom_histogram(bins = 50, aes(alpha = 0.05))+
theme_minimal()+
geom_vline(xintercept = 0.05)+
geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text = element_text(size=11))+
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+
ggtitle("HI+HTH+PBS vs CONTROL")+
xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
## Warning: Removed 3 rows containing missing values (geom_bar).
data <- CON.vs.HI_HTH_PBS
ggplot(data, aes(X2))+
geom_histogram(bins = 50, aes(alpha = 0.05))+
theme_minimal()+
geom_vline(xintercept = 0.05)+
geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+
ggtitle("HI+HTH+SC vs CONTROL")+
xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
## Warning: Removed 3 rows containing missing values (geom_bar).
data <- CON.vs.HI_HTH_SC
ggplot(data, aes(X2))+
geom_histogram(bins = 50, aes(alpha = 0.05))+
theme_minimal()+
geom_vline(xintercept = 0.05)+
geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+
ggtitle("HI+HTH+PBS vs HI+HTH+SC")+
xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
## Warning: Removed 3 rows containing missing values (geom_bar).
d <- CON.vs.HI_HTH_PBS %>%
mutate(`pvalue` = `X1`, `logFC` = `X2` )
#write.csv(d, file = "output/Vehicle_vs_Stem cells/HI=HTH+PBS vs HI+HTH+SC.csv")
EnhancedVolcano(d,
lab = rownames(d),
x = "logFC",
y = "pvalue",
xlim = c(-3, 3),
ylim = c(0, 3),
title = 'HI+HTH+PBS versus HI+HTH+SC',
subtitle = "Differential protein expression",
pCutoff = 10e-2,
FCcutoff = 0.5,
pointSize = 1.0,
labSize = 3.0,
shape = 2,
colAlpha = 1)
d <- CON.vs.HI_HTH_PBS %>%
mutate(`pvalue` = `X1`, `logFC` = `X2` )
EnhancedVolcano(d,
lab = rownames(d),
x = "logFC",
y = "pvalue",
xlim = c(-3, 3),
ylim = c(0, 3),
title = 'CONTROL versus HI+HTH+PBS ',
subtitle = "Differential protein expression",
pCutoff = 10e-2,
FCcutoff = 0.5,
pointSize = 1.0,
labSize = 3.0,
shape = 2,
colAlpha = 1)
d <- CON.vs.HI_HTH_SC %>%
mutate(`pvalue` = `X1`, `logFC` = `X2` )
EnhancedVolcano(d,
lab = rownames(d),
x = "logFC",
y = "pvalue",
xlim = c(-3, 3),
ylim = c(0, 3),
title = 'CONTROL versus HI+HTH+SC',
subtitle = "Differential protein expression",
pCutoff = 10e-2,
FCcutoff = 0.5,
pointSize = 1.0,
labSize = 3.0,
shape = 2,
colAlpha = 1)
sub_data <- subset(h.mean.protein.levels, select = HI_HTH_SC_HI_HTH_PBS.sig)
sig.proteins.PCA <- scale(sub_data, center = TRUE)
sig.proteins.pca <- pca(sig.proteins.PCA)
plotIndiv(sig.proteins.pca, group = clinic.values$Group, title = "PCA using significant proteins")
# prepare prot.list
## assume that 1st column is ID
## 2nd column is fold change
## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)
Prot.list <- CON.vs.HI_HTH_PBS %>%
separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
dplyr::select(kegg, `Fold Change`, `p-value`, UP)%>%
filter(`p-value` <= 0.1)
Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg
## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
# =================================================================================================
## Pathway analysis data preparation
# This "Gene list" contains the KEGG ids
geneList <- d[, 2]
names(geneList) <- as.character(d[, 1])
geneList <- Prot.list.x
gene <- names(geneList)[abs(geneList) > 0.5]
#downloaded from http://www.disgenet.org/ds/DisGeNET/results/all_gene_disease_associations.tar.gz
# The above file cannot be run from a script but must instead be run from the console
gda <- read.delim("data/all_gene_disease_associations.txt")
disease2gene=gda[, c("diseaseId", "geneId")]
disease2name=gda[, c("diseaseId", "diseaseName")]
#A universal enrichment analyzer
enriched <- enricher(
gene,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe,
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.2, TERM2GENE=disease2gene, TERM2NAME=disease2name
)
## `universe` is not in character and will be ignored...
enrichedx <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')
dotplot(enrichedx, showCategory = 20)
cnetplot(enrichedx, showCategory = 5, foldChange = geneList)
png("output/cnetplot from universal enrichment analysis from disGenet")
# ==================================================================================================
# KEGG enrichment analysis
KEGG.enrichment <- enrichKEGG(
gene = gene,
keyType = "kegg",
organism = "hsa",
pvalueCutoff = 0.1
)
## Reading KEGG annotation online:
##
## Reading KEGG annotation online:
KEGG.enrichment <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')
barplot(KEGG.enrichment, showCategory = 6)
dotplot(KEGG.enrichment, showCategory = 20)
cnetplot(KEGG.enrichment, showCategory = 6, foldChange = geneList)
png("output/cnetplot from KEGG Enrichment Analysis")
## convert gene ID to Symbol
gse <- gseKEGG(
geneList = geneList,
keyType = "kegg",
organism = "hsa",
minGSSize = 1,
pvalueCutoff = 0.1,
verbose = FALSE
)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
# Reactome Pathway Analysis
#Protein Enrichment analysis
RPA <- enrichPathway(
gene = gene,
pvalueCutoff = 0.05)
RPAx <- setReadable(RPA, "org.Hs.eg.db", 'ENTREZID')
barplot(RPAx, showCategory = 20)
dotplot(RPAx, showCategory = 20)
emapplot(RPA, node_label = "all", showCategory = 5, foldChange = geneList)
cnetplot(RPAx, node_label = "all", showCategory = 20, text = 16, foldChange = geneList)
png("output/cnetplot from Reactome analysis")
#Protein set enrichment analysis
gseRPA <- gsePathway(geneList = geneList,
pvalueCutoff = 0.2)
## preparing geneSet collections...
## GSEA analysis...
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## leading edge analysis...
## done...
gseRPAx <- setReadable(gseRPA, "org.Hs.eg.db", 'ENTREZID')
cnetplot(gseRPAx, node_label = "all")
png("output/Vehicle_vs_Stem cells/cnetplot from Reactome GeneSet analysis")
## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)
Prot.list <- CON.vs.HI_HTH_PBS %>%
separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
dplyr::select(kegg, `Fold Change`, `p-value`)%>%
filter(`p-value` <= 0.1)
Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg
## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
#Functional Profile of a gene set at specific GO level. Given a vector of genes, this function will return the GO profile at a specific level.
ggoMF <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "MF",
level = 3,
readable = TRUE)
head(ggoMF)
## ID Description Count GeneRatio
## GO:0004133 GO:0004133 glycogen debranching enzyme activity 0 0/97
## GO:0009975 GO:0009975 cyclase activity 0 0/97
## GO:0016491 GO:0016491 oxidoreductase activity 7 7/97
## GO:0016740 GO:0016740 transferase activity 12 12/97
## GO:0016787 GO:0016787 hydrolase activity 19 19/97
## GO:0016829 GO:0016829 lyase activity 0 0/97
## geneID
## GO:0004133
## GO:0009975
## GO:0016491 SQOR/G6PD/PRDX1/AKR1B1/HADH/ACADM/GPX1
## GO:0016740 PRMT1/COMT/GYG1/PKM/UBA3/TKT/GSTM3/PCYT1A/PRPSAP2/ILK/GATM/PRKCD
## GO:0016787 AHCY/RAB8A/PSMB2/HSPA1A/HSPA1B/CAPN2/PSMD14/EIF4A1/PSAP/NT5C2/ARF3/PEPD/PSMC5/CNDP2/DDX3X/GNS/CTSB/MX1/PTPN6
## GO:0016829
barplot(ggoMF)
cnetplot(ggoMF , foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 1,
readable = TRUE)
head(ggoBP)
## ID Description Count GeneRatio
## GO:0008150 GO:0008150 biological_process 97 97/97
## geneID
## GO:0008150 PICK1/PCP4/AHCY/RAB8A/PRMT1/PSMB2/SMAP2/AP1B1/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/SH3BGRL/EIF3L/SLC3A2/CORO1B/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/ANXA5/PCYT1A/CHL1/SPTBN1/SLC17A6/PRPSAP2/SNRPN/SNURF/XPO5/PSAP/TPM4/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/TAGLN2/AHNAK/FERMT3/MATR3/GATM/APOE/HSPB1/ABRACL/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/SSR1/ARCN1/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/PLEC/MX1/NPM1/OSTF1/PTPN6/CLIC1/COTL1/ARPC1B/STOM
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 2,
readable = TRUE)
head(ggoBP)
## ID Description Count GeneRatio
## GO:0000003 GO:0000003 reproduction 11 11/97
## GO:0002376 GO:0002376 immune system process 39 39/97
## GO:0006791 GO:0006791 sulfur utilization 0 0/97
## GO:0006794 GO:0006794 phosphorus utilization 0 0/97
## GO:0007610 GO:0007610 behavior 9 9/97
## GO:0008152 GO:0008152 metabolic process 75 75/97
## geneID
## GO:0000003 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0002376 AHCY/PRMT1/PSMB2/AP1B1/GYG1/PKM/HSPA1A/HSPA1B/EZR/CYFIP1/SLC3A2/DBNL/PSMD14/G6PD/PSAP/PRDX1/MATR3/APOE/PSMC5/HMGB2/RPS6/MSN/PGM2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/GPX1/MX1/OSTF1/PTPN6/COTL1/ARPC1B/STOM
## GO:0006791
## GO:0006794
## GO:0007610 AHCY/COMT/UBA6/CAPN2/CHL1/PSAP/GATM/APOE/ARCN1
## GO:0008152 PICK1/AHCY/RAB8A/PRMT1/PSMB2/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/EIF3L/SLC3A2/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/PCYT1A/SPTBN1/PRPSAP2/SNRPN/XPO5/PSAP/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/AHNAK/MATR3/GATM/APOE/HSPB1/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/NPM1/PTPN6
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 3,
readable = TRUE)
head(ggoBP)
## ID Description Count GeneRatio
## GO:0019953 GO:0019953 sexual reproduction 4 4/97
## GO:0019954 GO:0019954 asexual reproduction 0 0/97
## GO:0022414 GO:0022414 reproductive process 11 11/97
## GO:0032504 GO:0032504 multicellular organism reproduction 7 7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism 0 0/97
## GO:0061887 GO:0061887 reproduction of symbiont in host 0 0/97
## geneID
## GO:0019953 PICK1/HMGB2/RPS6/DDX3X
## GO:0019954
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504 PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505
## GO:0061887
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 4,
readable = TRUE)
head(ggoBP)
## ID Description
## GO:0000747 GO:0000747 conjugation with cellular fusion
## GO:0000909 GO:0000909 sporocarp development involved in sexual reproduction
## GO:0007276 GO:0007276 gamete generation
## GO:0007618 GO:0007618 mating
## GO:0009566 GO:0009566 fertilization
## GO:0034293 GO:0034293 sexual sporulation
## Count GeneRatio geneID
## GO:0000747 0 0/97
## GO:0000909 0 0/97
## GO:0007276 4 4/97 PICK1/HMGB2/RPS6/DDX3X
## GO:0007618 0 0/97
## GO:0009566 0 0/97
## GO:0034293 0 0/97
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
## Warning in vattrs[[name]][index] <- value: number of items to replace is not a
## multiple of replacement length
ggoCC <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "CC",
level = 3,
readable = TRUE)
head(ggoCC)
## ID Description Count
## GO:0000131 GO:0000131 incipient cellular bud site 0
## GO:0000151 GO:0000151 ubiquitin ligase complex 2
## GO:0000159 GO:0000159 protein phosphatase type 2A complex 0
## GO:0000178 GO:0000178 exosome (RNase complex) 1
## GO:0000307 GO:0000307 cyclin-dependent protein kinase holoenzyme complex 0
## GO:0000408 GO:0000408 EKC/KEOPS complex 0
## GeneRatio geneID
## GO:0000131 0/97
## GO:0000151 2/97 CUL4B/BABAM2
## GO:0000159 0/97
## GO:0000178 1/97 CARHSP1
## GO:0000307 0/97
## GO:0000408 0/97
barplot(ggoCC)
#cnetplot(ggoCC , foldChange = geneList)
ggo <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 3,
readable = TRUE)
head(ggo)
## ID Description Count GeneRatio
## GO:0019953 GO:0019953 sexual reproduction 4 4/97
## GO:0019954 GO:0019954 asexual reproduction 0 0/97
## GO:0022414 GO:0022414 reproductive process 11 11/97
## GO:0032504 GO:0032504 multicellular organism reproduction 7 7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism 0 0/97
## GO:0061887 GO:0061887 reproduction of symbiont in host 0 0/97
## geneID
## GO:0019953 PICK1/HMGB2/RPS6/DDX3X
## GO:0019954
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504 PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505
## GO:0061887
#GO Enrichment Analysis of a gene set. Given a vector of genes, this function will return the enrichment GO categories after FDR control.
ego <- enrichGO(gene = gene,
universe = NULL,
OrgDb = org.Hs.eg.db,
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.2,
readable = TRUE)
head(ego) # No GO output
## ONTOLOGY ID
## GO:0043312 BP GO:0043312
## GO:0002283 BP GO:0002283
## GO:0042119 BP GO:0042119
## GO:0002446 BP GO:0002446
## GO:0043254 BP GO:0043254
## GO:0051258 BP GO:0051258
## Description GeneRatio
## GO:0043312 neutrophil degranulation 19/97
## GO:0002283 neutrophil activation involved in immune response 19/97
## GO:0042119 neutrophil activation 19/97
## GO:0002446 neutrophil mediated immunity 19/97
## GO:0043254 regulation of protein-containing complex assembly 17/97
## GO:0051258 protein polymerization 13/97
## BgRatio pvalue p.adjust qvalue
## GO:0043312 485/18670 5.714026e-12 5.492428e-09 4.300393e-09
## GO:0002283 488/18670 6.360722e-12 5.492428e-09 4.300393e-09
## GO:0042119 498/18670 9.044814e-12 5.492428e-09 4.300393e-09
## GO:0002446 499/18670 9.364753e-12 5.492428e-09 4.300393e-09
## GO:0043254 429/18670 7.103647e-11 3.333031e-08 2.609656e-08
## GO:0051258 283/18670 2.535521e-09 9.913887e-07 7.762253e-07
## geneID
## GO:0043312 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002283 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0042119 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002446 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0043254 PICK1/CUL4B/HSPA1A/HSPA1B/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/APOE/PSMC5/MSN/PRKCD/DDX3X/GFAP/COTL1/ARPC1B
## GO:0051258 PICK1/HSPA1A/HSPA1B/ZNF207/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/PRKCD/MX1/COTL1/ARPC1B
## Count
## GO:0043312 19
## GO:0002283 19
## GO:0042119 19
## GO:0002446 19
## GO:0043254 17
## GO:0051258 13
dotplot(ego)
barplot(ego, showCategory = 20)
cnetplot(ego, foldChange = geneList, showCategory = 5)
emapplot(ego, foldChange = geneList, showCategory = 10)
#enrichment analysis by DAVID
#enrDAVID <- enrichDAVID(gene,
# idType = "ENTREZ_GENE_ID",
#universe,
#minGSSize = 5,
#maxGSSize = 500,
#annotation = "GOTERM_BP_FAT",
#pvalueCutoff = 0.05,
#pAdjustMethod = "BH",
#qvalueCutoff = 0.2,
#species = NA)
#given a vector of genes, this function will return the enrichment NCG categories with FDR control
eDGN <- enrichDGN(gene,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe,
minGSSize = 10,
maxGSSize = 500,
qvalueCutoff = 0.2,
readable = TRUE)
## `universe` is not in character and will be ignored...
barplot(eDGN)
cnetplot(eDGN)
#gseDGN <- gseDGN(geneList,
#exponent = 1,
#minGSSize = 10,
#maxGSSize = 500,
#pvalueCutoff = 0.05,
#pAdjustMethod = "BH",
#verbose = TRUE,
#seed = TRUE,
# by = "DOSE", nPerm = 1000)
#cnetplot(gseDGN)
# The output of the following compares between the Control and the HI+HTH+PBS group
DO <- enrichDO(gene = names(geneList),
ont = "DO",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = NULL,
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.2,
readable = TRUE)
barplot(DO, showCategory = 20)
dotplot(DO, showCategory = 20)
#cnetplot(DO)
ncg <- enrichNCG(gene)
head(ncg)
## [1] ID Description GeneRatio BgRatio pvalue p.adjust
## [7] qvalue geneID Count
## <0 rows> (or 0-length row.names)
dgn <- enrichDGN(gene)
head(dgn)
## ID Description GeneRatio BgRatio
## umls:C0026848 umls:C0026848 Myopathy 12/90 298/17381
## umls:C0848332 umls:C0848332 Spots on skin 9/90 198/17381
## umls:C0024408 umls:C0024408 Machado-Joseph Disease 6/90 78/17381
## umls:C0036646 umls:C0036646 Age-related cataract 5/90 45/17381
## umls:C0151744 umls:C0151744 Myocardial Ischemia 12/90 476/17381
## umls:C0040822 umls:C0040822 Tremor 6/90 88/17381
## pvalue p.adjust qvalue
## umls:C0026848 4.300625e-08 6.919706e-05 5.405207e-05
## umls:C0848332 8.582970e-07 6.904999e-04 5.393719e-04
## umls:C0024408 3.100934e-06 1.388755e-03 1.084802e-03
## umls:C0036646 3.452468e-06 1.388755e-03 1.084802e-03
## umls:C0151744 6.120241e-06 1.538419e-03 1.201709e-03
## umls:C0040822 6.277333e-06 1.538419e-03 1.201709e-03
## geneID Count
## umls:C0026848 6520/2539/6638/9782/2628/348/5580/1508/3920/34/2670/5777 12
## umls:C0848332 5690/6520/2539/5052/348/3315/1508/34/4869 9
## umls:C0024408 9463/348/3315/2521/5580/2670 6
## umls:C0036646 3303/3304/2947/2539/348 5
## umls:C0151744 2992/3303/308/231/2628/348/3315/4478/372/5580/2171/4599 12
## umls:C0040822 8450/348/2521/5580/3920/2670 6
barplot(dgn, showCategory = 20)
dotplot(dgn, showCategory = 20)
cnetplot(dgn, showCategory = 10)
dgnx <- setReadable(dgn, 'org.Hs.eg.db')
y <- gseDO(geneList,
nPerm = 100,
minGSSize = 120,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
head(y, 3)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
ncg <- gseNCG(geneList,
nPerm = 1000,
minGSSize = 15,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
dgn <- gseDGN(geneList,
nPerm = 1000,
minGSSize = 5,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
# prepare prot.list
## assume that 1st column is ID
## 2nd column is fold change
## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)
Prot.list <- CON.vs.HI_HTH_PBS %>%
separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
dplyr::select(kegg, `Fold Change`, `p-value`, UP)%>%
filter(`p-value` <= 0.1)
Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg
## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
# =================================================================================================
## Pathway analysis data preparation
# This "Gene list" contains the KEGG ids
geneList <- d[, 2]
names(geneList) <- as.character(d[, 1])
geneList <- Prot.list.x
gene <- names(geneList)[abs(geneList) > 0.5]
#downloaded from http://www.disgenet.org/ds/DisGeNET/results/all_gene_disease_associations.tar.gz
# The above file cannot be run from a script but must instead be run from the console
gda <- read.delim("data/all_gene_disease_associations.txt")
disease2gene=gda[, c("diseaseId", "geneId")]
disease2name=gda[, c("diseaseId", "diseaseName")]
#A universal enrichment analyzer
enriched <- enricher(
gene,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe,
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.2, TERM2GENE=disease2gene, TERM2NAME=disease2name
)
## `universe` is not in character and will be ignored...
enrichedx <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')
dotplot(enrichedx, showCategory = 20)
cnetplot(enrichedx, showCategory = 5, foldChange = geneList)
png("output/cnetplot from universal enrichment analysis from disGenet")
# ==================================================================================================
# KEGG enrichment analysis
KEGG.enrichment <- enrichKEGG(
gene = gene,
keyType = "kegg",
organism = "hsa",
pvalueCutoff = 0.05
)
#KEGG.enrichment <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')
barplot(KEGG.enrichment, showCategory = 6)
dotplot(KEGG.enrichment, showCategory = 20)
#cnetplot(KEGG.enrichmentx, showCategory = 6, foldChange = geneList)
png("output/cnetplot from KEGG Enrichment Analysis")
## convert gene ID to Symbol
gse <- gseKEGG(
geneList = geneList,
keyType = "kegg",
organism = "hsa",
minGSSize = 1,
pvalueCutoff = 0.1,
verbose = FALSE
)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
head(gse)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
#cnetplot(gse, foldChange = geneList)
gsex <- setReadable(gse, "org.Hs.eg.db", 'ENTREZID')
#dotplot(gse, showCategory = 20)
#cnetplot(gsex,node_lable = "all")
png("output/cnetplot from KEGG geneset analysis")
#gseaplot(gse, geneSetID = 1)
# Reactome Pathway Analysis
#Protein Enrichment analysis
RPA <- enrichPathway(
gene = gene,
pvalueCutoff = 0.05)
RPAx <- setReadable(RPA, "org.Hs.eg.db", 'ENTREZID')
barplot(RPAx, showCategory = 20)
dotplot(RPAx, showCategory = 20)
emapplot(RPA, node_label = "all", showCategory = 5, foldChange = geneList)
cnetplot(RPAx, node_label = "all", showCategory = 5, text = 16, foldChange = geneList)
png("output/cnetplot from Reactome analysis")
#Protein set enrichment analysis
gseRPA <- gsePathway(geneList = geneList,
pvalueCutoff = 0.1)
## preparing geneSet collections...
## GSEA analysis...
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## leading edge analysis...
## done...
gseRPAx <- setReadable(gseRPA, "org.Hs.eg.db", 'ENTREZID')
#cnetplot(gseRPAx, node_label = "all")
png("output/Vehicle_vs_Stem cells/cnetplot from Reactome GeneSet analysis")
## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)
Prot.list <- CON.vs.HI_HTH_PBS %>%
separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
dplyr::select(kegg, `Fold Change`, `p-value`)%>%
filter(`p-value` <= 0.1)
Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg
## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
#Functional Profile of a gene set at specific GO level. Given a vector of genes, this function will return the GO profile at a specific level.
ggoMF <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "MF",
level = 3,
readable = TRUE)
head(ggoMF)
## ID Description Count GeneRatio
## GO:0004133 GO:0004133 glycogen debranching enzyme activity 0 0/97
## GO:0009975 GO:0009975 cyclase activity 0 0/97
## GO:0016491 GO:0016491 oxidoreductase activity 7 7/97
## GO:0016740 GO:0016740 transferase activity 12 12/97
## GO:0016787 GO:0016787 hydrolase activity 19 19/97
## GO:0016829 GO:0016829 lyase activity 0 0/97
## geneID
## GO:0004133
## GO:0009975
## GO:0016491 SQOR/G6PD/PRDX1/AKR1B1/HADH/ACADM/GPX1
## GO:0016740 PRMT1/COMT/GYG1/PKM/UBA3/TKT/GSTM3/PCYT1A/PRPSAP2/ILK/GATM/PRKCD
## GO:0016787 AHCY/RAB8A/PSMB2/HSPA1A/HSPA1B/CAPN2/PSMD14/EIF4A1/PSAP/NT5C2/ARF3/PEPD/PSMC5/CNDP2/DDX3X/GNS/CTSB/MX1/PTPN6
## GO:0016829
barplot(ggoMF)
cnetplot(ggoMF , foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 1,
readable = TRUE)
head(ggoBP)
## ID Description Count GeneRatio
## GO:0008150 GO:0008150 biological_process 97 97/97
## geneID
## GO:0008150 PICK1/PCP4/AHCY/RAB8A/PRMT1/PSMB2/SMAP2/AP1B1/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/SH3BGRL/EIF3L/SLC3A2/CORO1B/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/ANXA5/PCYT1A/CHL1/SPTBN1/SLC17A6/PRPSAP2/SNRPN/SNURF/XPO5/PSAP/TPM4/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/TAGLN2/AHNAK/FERMT3/MATR3/GATM/APOE/HSPB1/ABRACL/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/SSR1/ARCN1/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/PLEC/MX1/NPM1/OSTF1/PTPN6/CLIC1/COTL1/ARPC1B/STOM
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 2,
readable = TRUE)
head(ggoBP)
## ID Description Count GeneRatio
## GO:0000003 GO:0000003 reproduction 11 11/97
## GO:0002376 GO:0002376 immune system process 39 39/97
## GO:0006791 GO:0006791 sulfur utilization 0 0/97
## GO:0006794 GO:0006794 phosphorus utilization 0 0/97
## GO:0007610 GO:0007610 behavior 9 9/97
## GO:0008152 GO:0008152 metabolic process 75 75/97
## geneID
## GO:0000003 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0002376 AHCY/PRMT1/PSMB2/AP1B1/GYG1/PKM/HSPA1A/HSPA1B/EZR/CYFIP1/SLC3A2/DBNL/PSMD14/G6PD/PSAP/PRDX1/MATR3/APOE/PSMC5/HMGB2/RPS6/MSN/PGM2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/GPX1/MX1/OSTF1/PTPN6/COTL1/ARPC1B/STOM
## GO:0006791
## GO:0006794
## GO:0007610 AHCY/COMT/UBA6/CAPN2/CHL1/PSAP/GATM/APOE/ARCN1
## GO:0008152 PICK1/AHCY/RAB8A/PRMT1/PSMB2/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/EIF3L/SLC3A2/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/PCYT1A/SPTBN1/PRPSAP2/SNRPN/XPO5/PSAP/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/AHNAK/MATR3/GATM/APOE/HSPB1/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/NPM1/PTPN6
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 3,
readable = TRUE)
head(ggoBP)
## ID Description Count GeneRatio
## GO:0019953 GO:0019953 sexual reproduction 4 4/97
## GO:0019954 GO:0019954 asexual reproduction 0 0/97
## GO:0022414 GO:0022414 reproductive process 11 11/97
## GO:0032504 GO:0032504 multicellular organism reproduction 7 7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism 0 0/97
## GO:0061887 GO:0061887 reproduction of symbiont in host 0 0/97
## geneID
## GO:0019953 PICK1/HMGB2/RPS6/DDX3X
## GO:0019954
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504 PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505
## GO:0061887
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
ggoBP <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 4,
readable = TRUE)
head(ggoBP)
## ID Description
## GO:0000747 GO:0000747 conjugation with cellular fusion
## GO:0000909 GO:0000909 sporocarp development involved in sexual reproduction
## GO:0007276 GO:0007276 gamete generation
## GO:0007618 GO:0007618 mating
## GO:0009566 GO:0009566 fertilization
## GO:0034293 GO:0034293 sexual sporulation
## Count GeneRatio geneID
## GO:0000747 0 0/97
## GO:0000909 0 0/97
## GO:0007276 4 4/97 PICK1/HMGB2/RPS6/DDX3X
## GO:0007618 0 0/97
## GO:0009566 0 0/97
## GO:0034293 0 0/97
barplot(ggoBP)
cnetplot(ggoBP, foldChange = geneList)
## Warning in vattrs[[name]][index] <- value: number of items to replace is not a
## multiple of replacement length
ggoCC <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "CC",
level = 3,
readable = TRUE)
head(ggoCC)
## ID Description Count
## GO:0000131 GO:0000131 incipient cellular bud site 0
## GO:0000151 GO:0000151 ubiquitin ligase complex 2
## GO:0000159 GO:0000159 protein phosphatase type 2A complex 0
## GO:0000178 GO:0000178 exosome (RNase complex) 1
## GO:0000307 GO:0000307 cyclin-dependent protein kinase holoenzyme complex 0
## GO:0000408 GO:0000408 EKC/KEOPS complex 0
## GeneRatio geneID
## GO:0000131 0/97
## GO:0000151 2/97 CUL4B/BABAM2
## GO:0000159 0/97
## GO:0000178 1/97 CARHSP1
## GO:0000307 0/97
## GO:0000408 0/97
barplot(ggoCC)
cnetplot(ggoCC , foldChange = geneList)
## Warning in vattrs[[name]][index] <- value: number of items to replace is not a
## multiple of replacement length
ggo <- groupGO(gene = gene,
OrgDb = org.Hs.eg.db,
ont = "BP",
level = 3,
readable = TRUE)
head(ggo)
## ID Description Count GeneRatio
## GO:0019953 GO:0019953 sexual reproduction 4 4/97
## GO:0019954 GO:0019954 asexual reproduction 0 0/97
## GO:0022414 GO:0022414 reproductive process 11 11/97
## GO:0032504 GO:0032504 multicellular organism reproduction 7 7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism 0 0/97
## GO:0061887 GO:0061887 reproduction of symbiont in host 0 0/97
## geneID
## GO:0019953 PICK1/HMGB2/RPS6/DDX3X
## GO:0019954
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504 PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505
## GO:0061887
#GO Enrichment Analysis of a gene set. Given a vector of genes, this function will return the enrichment GO categories after FDR control.
ego <- enrichGO(gene = gene,
universe = NULL,
OrgDb = org.Hs.eg.db,
ont = "ALL",
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.2,
readable = TRUE)
head(ego) # No GO output
## ONTOLOGY ID
## GO:0043312 BP GO:0043312
## GO:0002283 BP GO:0002283
## GO:0042119 BP GO:0042119
## GO:0002446 BP GO:0002446
## GO:0043254 BP GO:0043254
## GO:0051258 BP GO:0051258
## Description GeneRatio
## GO:0043312 neutrophil degranulation 19/97
## GO:0002283 neutrophil activation involved in immune response 19/97
## GO:0042119 neutrophil activation 19/97
## GO:0002446 neutrophil mediated immunity 19/97
## GO:0043254 regulation of protein-containing complex assembly 17/97
## GO:0051258 protein polymerization 13/97
## BgRatio pvalue p.adjust qvalue
## GO:0043312 485/18670 5.714026e-12 5.492428e-09 4.300393e-09
## GO:0002283 488/18670 6.360722e-12 5.492428e-09 4.300393e-09
## GO:0042119 498/18670 9.044814e-12 5.492428e-09 4.300393e-09
## GO:0002446 499/18670 9.364753e-12 5.492428e-09 4.300393e-09
## GO:0043254 429/18670 7.103647e-11 3.333031e-08 2.609656e-08
## GO:0051258 283/18670 2.535521e-09 9.913887e-07 7.762253e-07
## geneID
## GO:0043312 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002283 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0042119 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002446 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0043254 PICK1/CUL4B/HSPA1A/HSPA1B/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/APOE/PSMC5/MSN/PRKCD/DDX3X/GFAP/COTL1/ARPC1B
## GO:0051258 PICK1/HSPA1A/HSPA1B/ZNF207/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/PRKCD/MX1/COTL1/ARPC1B
## Count
## GO:0043312 19
## GO:0002283 19
## GO:0042119 19
## GO:0002446 19
## GO:0043254 17
## GO:0051258 13
dotplot(ego)
barplot(ego, showCategory = 10)
cnetplot(ego, foldChange = geneList, showCategory = 7)
emapplot(ego, foldChange = geneList, showCategory = 10)
#enrichment analysis by DAVID
#enrDAVID <- enrichDAVID(gene,
# idType = "ENTREZ_GENE_ID",
#universe,
#minGSSize = 5,
#maxGSSize = 500,
#annotation = "GOTERM_BP_FAT",
#pvalueCutoff = 0.05,
#pAdjustMethod = "BH",
#qvalueCutoff = 0.2,
#species = NA)
#given a vector of genes, this function will return the enrichment NCG categories with FDR control
eDGN <- enrichDGN(gene,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe,
minGSSize = 10,
maxGSSize = 500,
qvalueCutoff = 0.2,
readable = TRUE)
## `universe` is not in character and will be ignored...
barplot(eDGN)
cnetplot(eDGN)
#gseDGN <- gseDGN(geneList,
#exponent = 1,
#minGSSize = 10,
#maxGSSize = 500,
#pvalueCutoff = 0.05,
#pAdjustMethod = "BH",
#verbose = TRUE,
#seed = TRUE,
# by = "DOSE", nPerm = 1000)
#cnetplot(gseDGN)
# The output of the following compares between the Control and the HI+HTH+PBS group
DO <- enrichDO(gene = names(geneList),
ont = "DO",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = NULL,
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.2,
readable = TRUE)
barplot(DO, showCategory = 20)
dotplot(DO, showCategory = 20)
#cnetplot(DO)
ncg <- enrichNCG(gene)
head(ncg)
## [1] ID Description GeneRatio BgRatio pvalue p.adjust
## [7] qvalue geneID Count
## <0 rows> (or 0-length row.names)
dgn <- enrichDGN(gene)
head(dgn)
## ID Description GeneRatio BgRatio
## umls:C0026848 umls:C0026848 Myopathy 12/90 298/17381
## umls:C0848332 umls:C0848332 Spots on skin 9/90 198/17381
## umls:C0024408 umls:C0024408 Machado-Joseph Disease 6/90 78/17381
## umls:C0036646 umls:C0036646 Age-related cataract 5/90 45/17381
## umls:C0151744 umls:C0151744 Myocardial Ischemia 12/90 476/17381
## umls:C0040822 umls:C0040822 Tremor 6/90 88/17381
## pvalue p.adjust qvalue
## umls:C0026848 4.300625e-08 6.919706e-05 5.405207e-05
## umls:C0848332 8.582970e-07 6.904999e-04 5.393719e-04
## umls:C0024408 3.100934e-06 1.388755e-03 1.084802e-03
## umls:C0036646 3.452468e-06 1.388755e-03 1.084802e-03
## umls:C0151744 6.120241e-06 1.538419e-03 1.201709e-03
## umls:C0040822 6.277333e-06 1.538419e-03 1.201709e-03
## geneID Count
## umls:C0026848 6520/2539/6638/9782/2628/348/5580/1508/3920/34/2670/5777 12
## umls:C0848332 5690/6520/2539/5052/348/3315/1508/34/4869 9
## umls:C0024408 9463/348/3315/2521/5580/2670 6
## umls:C0036646 3303/3304/2947/2539/348 5
## umls:C0151744 2992/3303/308/231/2628/348/3315/4478/372/5580/2171/4599 12
## umls:C0040822 8450/348/2521/5580/3920/2670 6
barplot(dgn, showCategory = 20)
dotplot(dgn, showCategory = 20)
cnetplot(dgn, showCategory = 10)
dgnx <- setReadable(dgn, 'org.Hs.eg.db')
y <- gseDO(geneList,
nPerm = 100,
minGSSize = 120,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
head(y, 3)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
ncg <- gseNCG(geneList,
nPerm = 1000,
minGSSize = 15,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
dgn <- gseDGN(geneList,
nPerm = 1000,
minGSSize = 5,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
## [1] ID Description setSize enrichmentScore
## [5] NES pvalue p.adjust qvalues
## <0 rows> (or 0-length row.names)
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.1252
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] gplots_3.1.1 viridis_0.6.1 viridisLite_0.4.0
## [4] readr_2.0.0 RDAVIDWebService_1.26.0 GOstats_2.54.0
## [7] Category_2.54.0 Matrix_1.2-18 graph_1.66.0
## [10] VIM_6.1.0 colorspace_2.0-2 org.Ss.eg.db_3.11.4
## [13] org.Hs.eg.db_3.11.4 tidyr_1.1.3 SWATH2stats_1.18.0
## [16] rWikiPathways_1.8.5 readxl_1.3.1 ReactomePA_1.32.0
## [19] plyr_1.8.6 mixOmics_6.12.2 lattice_0.20-44
## [22] MASS_7.3-54 mice_3.13.0 meshes_1.14.0
## [25] MeSH.Ssc.eg.db_1.13.0 MeSH.Hsa.eg.db_1.13.0 MeSHDbi_1.24.0
## [28] RColorBrewer_1.1-2 ggfortify_0.4.12 ggcorrplot_0.1.3
## [31] enrichplot_1.8.1 EnhancedVolcano_1.6.0 ggrepel_0.9.1
## [34] ggplot2_3.3.5 DOSE_3.14.0 dplyr_1.0.7
## [37] data.table_1.14.0 corrplot_0.90 clusterProfiler_3.16.1
## [40] broom.mixed_0.2.7 biomaRt_2.44.4 BiocManager_1.30.16
## [43] pathview_1.28.1 AnnotationDbi_1.50.3 IRanges_2.22.2
## [46] S4Vectors_0.26.1 Biobase_2.48.0 BiocGenerics_0.34.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.1 tidyselect_1.1.1 RSQLite_2.2.7
## [4] ranger_0.13.1 BiocParallel_1.22.0 scatterpie_0.1.6
## [7] munsell_0.5.0 withr_2.4.2 GOSemSim_2.14.2
## [10] highr_0.9 knitr_1.33 rstudioapi_0.13
## [13] robustbase_0.93-8 vcd_1.4-8 rJava_1.0-4
## [16] labeling_0.4.2 KEGGgraph_1.48.0 urltools_1.7.3
## [19] polyclip_1.10-0 bit64_4.0.5 farver_2.1.0
## [22] downloader_0.4 vctrs_0.3.8 generics_0.1.0
## [25] xfun_0.24 BiocFileCache_1.12.1 R6_2.5.0
## [28] graphlayouts_0.7.1 bitops_1.0-7 cachem_1.0.5
## [31] fgsea_1.14.0 gridGraphics_0.5-1 assertthat_0.2.1
## [34] vroom_1.5.3 scales_1.1.1 ggraph_2.0.5
## [37] nnet_7.3-14 gtable_0.3.0 tidygraph_1.2.0
## [40] rlang_0.4.11 genefilter_1.70.0 splines_4.0.2
## [43] broom_0.7.9 europepmc_0.4 checkmate_2.0.0
## [46] yaml_2.2.1 reshape2_1.4.4 abind_1.4-5
## [49] backports_1.2.1 qvalue_2.20.0 RBGL_1.64.0
## [52] tools_4.0.2 ggplotify_0.0.8 ellipsis_0.3.2
## [55] jquerylib_0.1.4 proxy_0.4-26 ggridges_0.5.3
## [58] Rcpp_1.0.7 progress_1.2.2 zlibbioc_1.34.0
## [61] purrr_0.3.4 RCurl_1.98-1.3 prettyunits_1.1.1
## [64] openssl_1.4.4 cowplot_1.1.1 zoo_1.8-9
## [67] haven_2.4.1 magrittr_2.0.1 RSpectra_0.16-0
## [70] MeSH.db_1.13.0 DO.db_2.9 openxlsx_4.2.4
## [73] lmtest_0.9-38 triebeard_0.3.0 reactome.db_1.70.0
## [76] matrixStats_0.60.0 xtable_1.8-4 hms_1.1.0
## [79] evaluate_0.14 XML_3.99-0.6 rio_0.5.27
## [82] gridExtra_2.3 compiler_4.0.2 ellipse_0.4.2
## [85] tibble_3.1.3 KernSmooth_2.23-20 crayon_1.4.1
## [88] htmltools_0.5.1.1 tzdb_0.1.2 corpcor_1.6.9
## [91] snow_0.4-3 DBI_1.1.1 tweenr_1.0.2
## [94] dbplyr_2.1.1 rappdirs_0.3.3 boot_1.3-28
## [97] car_3.0-11 cli_3.0.1 igraph_1.2.6
## [100] forcats_0.5.1 pkgconfig_2.0.3 rvcheck_0.1.8
## [103] laeken_0.5.1 foreign_0.8-81 sp_1.4-5
## [106] xml2_1.3.2 annotate_1.66.0 rARPACK_0.11-0
## [109] bslib_0.2.5.1 XVector_0.28.0 AnnotationForge_1.30.1
## [112] stringr_1.4.0 digest_0.6.27 Biostrings_2.56.0
## [115] rmarkdown_2.9 cellranger_1.1.0 fastmatch_1.1-3
## [118] GSEABase_1.50.1 curl_4.3.2 gtools_3.9.2
## [121] graphite_1.34.0 rjson_0.2.20 lifecycle_1.0.0
## [124] nlme_3.1-148 jsonlite_1.7.2 carData_3.0-4
## [127] askpass_1.1 fansi_0.5.0 pillar_1.6.2
## [130] survival_3.1-12 KEGGREST_1.28.0 fastmap_1.1.0
## [133] httr_1.4.2 DEoptimR_1.0-9 GO.db_3.11.4
## [136] glue_1.4.2 zip_2.2.0 png_0.1-7
## [139] bit_4.0.4 Rgraphviz_2.32.0 ggforce_0.3.3
## [142] class_7.3-19 stringi_1.7.3 sass_0.4.0
## [145] blob_1.2.2 caTools_1.18.2 memoise_2.0.0
## [148] e1071_1.7-8
warnings()