SWATH analysis of piglet data

This is an R Markdown document containing the data wrangling, data exploration and quantification of SWATH data collected from piglet brains. The tissue samples used to generate this data contains grey and white matter derived from parasagital region of the frontal cortex. Sample preparation can be found in the manuscripts folder of this project. Briefely, sample preparation consisted of cell lysis, alkylation with acrylamide, digestion with Tripsin and desalting using amonium nitrate. Each brain sample was run through an ABSIEX5600 triple TOF machine in duplicate to ensure measurment reprodicubillity. There are 3 experimental groups in this study:

1) 8-day old healthy piglet (CONTROL)
2) Hypoxic-ischameic brain injured piglet treated with hypotherimia and stem cell vehicle (HI+HTH+PBS)
3) Hypoxic-ischeamic brain injured piglet treated with hypothermia and stem cell therapy (HI+HTH+SC)

Firstly, I would like to identify the major sources of variation in my data and identify whether such sources of variation correspond to biological conditions, or experimental bias. I would like to visualize trends or patterns between samples, whether they ‘naturally’ cluster according to known biological conditions.

Data exploration

In this section we will check for the presences of possible confounding variables using the mixomics package.

From the literature we know that there are multiple factors both environmental and inherited that can influence susceptibility of neonates to global HI injury other than our treatment.

To begin, will investigate the effect of: 1) Sex: Males have a higher incidence of negative outcome following HI injury 2) Litter: This combines both genetic and developmental environment in which there are many factors 3) Hypotensive time: This is analogous to ischemia exposure.

# Here we aim to assess the effect of our duplicate runs on the ABSIEX5600 Triple TOF.
# The pca() function requires a matrix argument to compute so we will wrangle the h.protein.levels data frames into matrix class, center and scale them, perform PCA and plot the PCA.


# Creation of a matrix for PCA computation
h.protein.levels.matrix <- as.matrix(h.protein.levels)
h.mean.protein.levels.matrix <- as.matrix(h.mean.protein.levels)

correl <-  read_excel("data/Processed_Human_SWATH06-03-2020.xlsx", 
    sheet = "COR")
correl<- na.omit(correl)


corrplot(cor(correl), method = "number")

#Center and scale the data (cns. is refers to data that has been centered and scaled.).
cns.h.protein.levels.matrix <- scale(h.protein.levels, center = TRUE)
cns.h.mean.protein.levels.matrix <- scale(h.mean.protein.levels, center = TRUE)

## By plotting the PCA we can establish assess reproducibillity between duplicate runs.
CnS.PCA <- pca(cns.h.protein.levels.matrix)
PCA_duplicates <- plotIndiv(CnS.PCA, title = "Duplicate runs on the ABSCIEX Triple TOF are highly correlated", caption = "The replicates are highly correlated and so can be averaged", pch = 19)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

print(PCA_duplicates)
## $df
##                                x          y    group
## Brain 2334 (sample 1) -17.748562  -8.210981 No group
## Brain 2335 (sample 1)  32.496428  -8.528131 No group
## Brain 2368 (sample 1)   6.317161  19.981653 No group
## Brain 2343 (sample 1) -11.608157  -6.520845 No group
## Brain 2344 (sample 1)  26.208810  44.104522 No group
## Brain 2350 (sample 1)  -5.290846  -6.606496 No group
## Brain 2351 (sample 1)  16.030898  -6.555408 No group
## Brain 2367 (sample 1)  -4.530920  18.915783 No group
## Brain 2374 (sample 1) -41.232593  11.170900 No group
## Brain 2380 (sample 1)  12.536025  -7.039252 No group
## Brain 2407 (sample 1)  16.216228  -8.922253 No group
## Brain 2542 (sample 1)  15.252914  -9.277906 No group
## Brain 2543 (sample 1) -71.628534  -6.323911 No group
## Brain 2544 (sample 1)  43.675244 -10.680222 No group
## Brain 2545 (sample 1) -41.737018  -7.220758 No group
## Brain 2546 (sample 1)  43.781040 -10.702571 No group
## Brain 2334 (sample 2) -20.980347  -7.615934 No group
## Brain 2335 (sample 2)  31.922825  -8.333861 No group
## Brain 2368 (sample 2)   4.615286  20.353621 No group
## Brain 2343 (sample 2) -13.346807  -6.365588 No group
## Brain 2344 (sample 2)  23.157342  46.170313 No group
## Brain 2350 (sample 2)  -6.495310  -6.696607 No group
## Brain 2351 (sample 2)  13.284188  -6.547719 No group
## Brain 2367 (sample 2) -10.199390  20.421594 No group
## Brain 2374 (sample 2) -45.142138  11.958283 No group
## Brain 2380 (sample 2)  10.348787  -7.488454 No group
## Brain 2407 (sample 2)  12.555333  -8.444258 No group
## Brain 2542 (sample 2)  12.399515  -9.365552 No group
## Brain 2543 (sample 2) -70.929544  -6.777313 No group
## Brain 2544 (sample 2)  43.798913 -10.623168 No group
## Brain 2545 (sample 2) -46.987602  -7.398883 No group
## Brain 2546 (sample 2)  43.260827 -10.830600 No group
##                                                                                Block
## Brain 2334 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2335 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2368 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2343 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2344 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2350 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2351 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2367 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2374 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2380 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2407 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2542 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2543 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2544 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2545 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2546 (sample 1) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2334 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2335 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2368 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2343 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2344 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2350 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2351 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2367 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2374 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2380 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2407 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2542 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2543 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2544 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2545 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
## Brain 2546 (sample 2) Duplicate runs on the ABSCIEX Triple TOF are highly correlated
##                       pch pch.levels cex     col pch.legend
## Brain 2334 (sample 1)  19         19   3 #388ECC         19
## Brain 2335 (sample 1)  19         19   3 #388ECC         19
## Brain 2368 (sample 1)  19         19   3 #388ECC         19
## Brain 2343 (sample 1)  19         19   3 #388ECC         19
## Brain 2344 (sample 1)  19         19   3 #388ECC         19
## Brain 2350 (sample 1)  19         19   3 #388ECC         19
## Brain 2351 (sample 1)  19         19   3 #388ECC         19
## Brain 2367 (sample 1)  19         19   3 #388ECC         19
## Brain 2374 (sample 1)  19         19   3 #388ECC         19
## Brain 2380 (sample 1)  19         19   3 #388ECC         19
## Brain 2407 (sample 1)  19         19   3 #388ECC         19
## Brain 2542 (sample 1)  19         19   3 #388ECC         19
## Brain 2543 (sample 1)  19         19   3 #388ECC         19
## Brain 2544 (sample 1)  19         19   3 #388ECC         19
## Brain 2545 (sample 1)  19         19   3 #388ECC         19
## Brain 2546 (sample 1)  19         19   3 #388ECC         19
## Brain 2334 (sample 2)  19         19   3 #388ECC         19
## Brain 2335 (sample 2)  19         19   3 #388ECC         19
## Brain 2368 (sample 2)  19         19   3 #388ECC         19
## Brain 2343 (sample 2)  19         19   3 #388ECC         19
## Brain 2344 (sample 2)  19         19   3 #388ECC         19
## Brain 2350 (sample 2)  19         19   3 #388ECC         19
## Brain 2351 (sample 2)  19         19   3 #388ECC         19
## Brain 2367 (sample 2)  19         19   3 #388ECC         19
## Brain 2374 (sample 2)  19         19   3 #388ECC         19
## Brain 2380 (sample 2)  19         19   3 #388ECC         19
## Brain 2407 (sample 2)  19         19   3 #388ECC         19
## Brain 2542 (sample 2)  19         19   3 #388ECC         19
## Brain 2543 (sample 2)  19         19   3 #388ECC         19
## Brain 2544 (sample 2)  19         19   3 #388ECC         19
## Brain 2545 (sample 2)  19         19   3 #388ECC         19
## Brain 2546 (sample 2)  19         19   3 #388ECC         19
## 
## $df.ellipse
## NULL
## 
## $graph

# Save the figure output in the /output folder
png("figs/Global_data_analysis/Figure 1 PCA duplicates.png")
cor_2334 <- cor.test(correl$`Brain 2334 (sample 1)`, correl$`Brain 2334 (sample 2)`)
cor_2335 <-cor.test(correl$`Brain 2335 (sample 1)`, correl$`Brain 2335 (sample 2)`)
cor_2343 <-cor.test(correl$`Brain 2343 (sample 1)`, correl$`Brain 2343 (sample 2)`)
cor_2344 <-cor.test(correl$`Brain 2344 (sample 1)`, correl$`Brain 2344 (sample 2)`)
cor_2350 <-cor.test(correl$`Brain 2350 (sample 1)`, correl$`Brain 2350 (sample 2)`)
cor_2351 <-cor.test(correl$`Brain 2351 (sample 1)`, correl$`Brain 2351 (sample 2)`)
cor_2367 <-cor.test(correl$`Brain 2367 (sample 1)`, correl$`Brain 2367 (sample 2)`)
cor_2368 <-cor.test(correl$`Brain 2368 (sample 1)`, correl$`Brain 2368 (sample 2)`)
#cor_2370 <-cor.test(correl$`Brain 2370 (sample 1)`, correl$`Brain 2370 (sample 2)`)
cor_2374 <-cor.test(correl$`Brain 2374 (sample 1)`, correl$`Brain 2374 (sample 2)`)
cor_2380 <-cor.test(correl$`Brain 2380 (sample 1)`, correl$`Brain 2380 (sample 2)`)
cor_2407 <-cor.test(correl$`Brain 2407 (sample 1)`, correl$`Brain 2407 (sample 2)`)
cor_2542 <-cor.test(correl$`Brain 2542 (sample 1)`, correl$`Brain 2542 (sample 2)`)
cor_2543 <-cor.test(correl$`Brain 2543 (sample 1)`, correl$`Brain 2543 (sample 2)`)
cor_2544 <-cor.test(correl$`Brain 2544 (sample 1)`, correl$`Brain 2544 (sample 2)`)
cor_2545 <-cor.test(correl$`Brain 2545 (sample 1)`, correl$`Brain 2545 (sample 2)`)
cor_2546 <-cor.test(correl$`Brain 2546 (sample 1)`, correl$`Brain 2546 (sample 2)`)

ggplot(correl, aes(x = correl$`Brain 2334 (sample 1)`, y = correl$`Brain 2334 (sample 2)`))+
  geom_point()
## Warning: Use of `correl$`Brain 2334 (sample 1)`` is discouraged. Use `Brain 2334
## (sample 1)` instead.
## Warning: Use of `correl$`Brain 2334 (sample 2)`` is discouraged. Use `Brain 2334
## (sample 2)` instead.

The plots demonstrate that each of the duplicate samples are highly correlated thus, for simplicity we will take the mean of the two duplicates. All visualizations from here on will be derived from the mean response intensity.

## Initial data visualization
Ini.pca <- pca(cns.h.mean.protein.levels.matrix)
plotIndiv(Ini.pca, pch = 19, group = clinic.values$Group)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

plotVar(Ini.pca, cex = 2)

# Generate plots to show the effect of the confounding factor
par(mfrow = c(3, 1))

plotIndiv(Ini.pca, group = clinic.values$Group, pch = 19, title = "Group effect", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

plotIndiv(Ini.pca, group = clinic.values$Litter, pch = 19, title = "Litter effect", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

plotIndiv(Ini.pca, group = clinic.values$Sex, pch = 19, title = "Sex Effect", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

plotIndiv(Ini.pca, group = clinic.values$`Neuropathology Grade`, pch = 19, title = "Cluster by Neuropathological Grade", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

plotIndiv(Ini.pca, group = clinic.values$Seizures, pch = 19, title = "Cluster by Seizures", legend.position = "top", cex = 5)
## Warning in shape.input.plotIndiv(object = object, n = n, blocks = blocks, :
## 'ind.names' is set to FALSE as 'pch' overrides it

# None of these factors appear to play a part in the clustering. Though Sex is debatable

It appears that the variables that we have selected are not major sources of variation in our dataset. The second question like to ask is what are the key variables that contribute to the most variance in the data set.

data <- cns.h.mean.protein.levels.matrix 

# 1 Run the method
MyResult.spca <- spca(data, ncomp = 3, keepX = c(15, 15, 15)) 

# tune function is used to produce a scree plot to find the optimum number of components.
tune <- tune.pca(data)
## Eigenvalues for the first 10 principal components, see object$sdev^2: 
##        PC1        PC2        PC3        PC4        PC5        PC6        PC7 
## 1135.30860  287.90395   78.89961   48.94920   38.79034   26.16509   24.12099 
##        PC8        PC9       PC10 
##   17.49989   16.33863   14.85971 
## 
## Proportion of explained variance for the first 10 principal components, see object$explained_variance: 
##         PC1         PC2         PC3         PC4         PC5         PC6 
## 0.655111711 0.166130379 0.045527760 0.028245353 0.022383345 0.015098146 
##         PC7         PC8         PC9        PC10 
## 0.013918631 0.010098032 0.009427944 0.008574557 
## 
## Cumulative proportion explained variance for the first 10 principal components, see object$cum.var: 
##       PC1       PC2       PC3       PC4       PC5       PC6       PC7       PC8 
## 0.6551117 0.8212421 0.8667699 0.8950152 0.9173985 0.9324967 0.9464153 0.9565134 
##       PC9      PC10 
## 0.9659413 0.9745159 
## 
##  Other available components: 
##  -------------------- 
##  loading vectors: see object$rotation

## Scree plot shows elbow at PC2
screeplot(tune) 

#plotIndiv(MyResult.spca,
 # group = clinic.values$Group, # 2 Plot the samples
  #legend = TRUE, title = "Group effects, sPCA comp 1 - 2",
  #legend.title = "Group", pch = 19
#)

plotVar(MyResult.spca, cex = 3, , pch = 19)

selectVar(MyResult.spca, comp = 1)$value
##                         value.var
## sp|P34932|HSP74_HUMAN -0.46042533
## sp|P55786|PSA_HUMAN   -0.36544014
## sp|Q13564|ULA1_HUMAN  -0.36151546
## sp|P43034|LIS1_HUMAN  -0.34682468
## sp|Q13561|DCTN2_HUMAN -0.32693177
## sp|Q8ND76|CCNY_HUMAN  -0.24968367
## sp|Q13033|STRN3_HUMAN -0.22901700
## sp|P63208|SKP1_HUMAN  -0.20576138
## sp|P11142|HSP7C_HUMAN -0.20356046
## sp|P61088|UBE2N_HUMAN -0.19717086
## sp|P61019|RAB2A_HUMAN -0.16987153
## sp|P62937|PPIA_HUMAN  -0.15769942
## sp|P61106|RAB14_HUMAN -0.05031748
## sp|Q15185|TEBP_HUMAN  -0.04795931
## sp|Q9UL25|RAB21_HUMAN -0.02444969
plotLoadings(MyResult.spca)

selectVar(MyResult.spca, comp = 2)$value
##                        value.var
## sp|Q96CX2|KCD12_HUMAN 0.41998767
## sp|P21333|FLNA_HUMAN  0.39061877
## sp|P05107|ITB2_HUMAN  0.38134484
## sp|P52907|CAZA1_HUMAN 0.36895218
## sp|P12004|PCNA_HUMAN  0.31630944
## sp|Q9Y490|TLN1_HUMAN  0.26821621
## sp|Q14019|COTL1_HUMAN 0.23382258
## sp|Q9P2E9|RRBP1_HUMAN 0.22914294
## sp|Q13510|ASAH1_HUMAN 0.19786611
## sp|E9PAV3|NACAM_HUMAN 0.19350547
## sp|P26038|MOES_HUMAN  0.13802526
## sp|O00299|CLIC1_HUMAN 0.08219426
## sp|Q9P1F3|ABRAL_HUMAN 0.06350730
## sp|P11310|ACADM_HUMAN 0.06026694
## sp|Q9H8S9|MOB1A_HUMAN 0.01546435
plotLoadings(MyResult.spca, comp = 2)

# Extract and save the largest predictors from the three principal components
write.csv(MyResult.spca$loadings, "output/Global_data_analysis/Principal component loadings - Human.csv")

Given the number of duplicates we will see the implications of removing the duplicate genes and seeing what other proteins are contributing to the variance.

## Visualization of the global comparisons using heatmaps and dendrograms

# heatmap with dendrogram reordering + expected ordering
color_vector <- c("1", "2", "2", "1", "1", "2", "1", "2", "1", "2", "2", "3", "3", "3", "3", "3")
colMain <- colorRampPalette(brewer.pal(3, "Blues"))(25)
hm <- heatmap(data, RowSideColors = color_vector)

print(hm)
## $rowInd
##  [1] 14 16  2 11  6 12 10  4  8  7  5  3  1  9 15 13
## 
## $colInd
##    [1] 1224 1405    4  236  962  913 1591  394  370  756 1285  705  765 1207
##   [15] 1234  197 1573 1507  456 1255 1690  982  861 1715 1431 1240   67 1400
##   [29]  154   84  166  419  763  845  888  255  116 1064  176  681  297 1233
##   [43]  186  555  443  393 1683  739  789   46  423 1468 1527  482 1568 1013
##   [57] 1156   55 1730 1230  957 1718 1557 1501 1223  942  305  902  533 1595
##   [71] 1109  472 1684  735   62 1094 1723  327  549 1615  770  221  717  543
##   [85] 1728 1374  737 1293    1  564  556  475 1042   35 1545  784 1430 1510
##   [99] 1596 1691  243  130  742  728  444 1361  100 1325  752   16  316  513
##  [113] 1038 1635  287  786 1328  523 1320 1332 1479  248  308  619 1003 1441
##  [127]  761 1713  749  795 1077  249  945  455 1291  483 1574 1554 1727 1175
##  [141]  517 1059  401 1250  341  971 1121  437  562  369  506  591 1189 1167
##  [155]  883  605  903 1051  733 1434   96 1369 1235 1547 1313  347  227 1061
##  [169] 1411 1205  160  139  610   53  262 1609  656  150 1685  996  952 1582
##  [183] 1136 1114  397 1300  606  521 1480  993  442  485 1284 1007 1215  837
##  [197] 1012  433  862 1165  612  892  420 1287  560 1254  927  935 1389 1316
##  [211] 1556  716  804  767  710  662  779  572  291 1170   83  179 1225  219
##  [225]  399  621  922  421  977  245  261  558 1041  719  112  128  391  626
##  [239]  725  276  518 1465 1010  222 1280 1366  624 1544 1491  660  665  974
##  [253]  440 1208  121 1419  195 1119  738  343 1689  452 1317 1412  256 1476
##  [267]  476  294 1276  226 1385  632 1377  948 1019 1436  152 1524 1278  408
##  [281]  863  827  759  616 1006   99 1004  520 1382 1492  986  852  428 1218
##  [295] 1227 1145 1113  457  714  145 1629 1680  751 1049  514 1675  155  469
##  [309]  389  814  776  581  689  654 1186  106  435  803 1552 1481 1710  396
##  [323]  981  658  149 1669 1105  904 1275 1184 1074  730  449  430  333 1417
##  [337]  153    7  191  487 1698 1221 1260 1672 1355  683  900  450  966  939
##  [351]  126 1469 1646   10  928 1081  384  280  726  526  847 1266  210  183
##  [365]  898  208  123  306  914   71  324  963  368 1399   93  577 1137 1058
##  [379]  886  912 1084  267 1226 1394 1263  224  546  328 1104  337 1550  462
##  [393] 1214  460 1080 1039 1100  839 1307  897 1386 1658 1178  829   98 1147
##  [407] 1393 1616  586 1482  403  909  875  498  193 1258   97  103  313 1211
##  [421]  711  335  323  592  964 1657 1478  594  908 1459 1688 1035 1487  230
##  [435]  940 1447  979 1052  184  674  602  348  953  559 1575  906  899 1378
##  [449]  721 1295  650  671  855  631  809 1239 1373 1073 1126  840 1056 1579
##  [463] 1161  497   20  357  698  930 1219 1602  676   82  223  494  275  120
##  [477]  846  124  667 1070 1345 1549 1349 1408 1581  813 1306 1432  496 1217
##  [491] 1398 1721 1123 1201 1564 1033 1726 1422 1653  646  473 1040 1371 1630
##  [505] 1515  824  818 1499  352  879  989 1322  298   65  212 1402  718  119
##  [519]  848 1203   45   11  169  172 1249  596  917 1118  122  309  181 1588
##  [533]   89  608 1095  288  141  826  140   37 1289   44  388  378  618  871
##  [547] 1359 1592  972  857  240  490 1521  648  258   38 1251 1044  627  142
##  [561]  102  432  812  284   87  527 1265  331  554    6  127  425  201  568
##  [575] 1553 1508 1268 1519  864  136  639  350  182   77  704   79  463  424
##  [589] 1120  218 1444  896  322  798  885 1153 1725  830  753 1143  539  615
##  [603]  961  758  783  385 1139  220  134  431 1466   70  477  563  778  653
##  [617] 1236  553   41  754  503  132  147 1523  771  488  326 1292  439  101
##  [631]   26  241  918 1559 1290  180  657  215  589 1305 1449 1256 1451  668
##  [645]  164   32 1600  157   28  108  167 1253  817    5   76  263  515  600
##  [659]  318 1183  426   33  582  567  148 1693  110 1464 1216   54  118 1506
##  [673]  321 1528  552 1509  872  400  685 1699  690  332 1656 1673 1429 1612
##  [687] 1338  854   59 1461  447  293   18  329   61 1594  743 1023  372   52
##  [701]  295  531 1212  931  464  113  537  342 1060  949  630  923 1252  706
##  [715]  688 1053  292   24  672   92 1370  944 1047  244  525  159  387 1540
##  [729]  522  353   81  645 1687 1420  774  968  574  175  647   88  701   74
##  [743] 1720 1107  593  642  806  792 1246   56 1005 1093  841  625  211  958
##  [757]  383 1621  595  822 1097  151  304  242  769  279  873 1407  429 1577
##  [771]  225  312  956  823  480 1490  764   68 1314  680  466 1347   64  544
##  [785]  884  282  320 1157  107 1330 1213  744  404 1622   47  929 1362 1639
##  [799]  530 1002 1640  314  402 1724  673  363 1403  588  623  133  448   51
##  [813]  736  613  300  232  117  805    8 1388  946  584  980  228  828  237
##  [827]  371  360  785 1368 1530  366  682  880   40   50   48  762 1197  325
##  [841]  359  173  542  551   43   29  502  722  816 1611  168  470 1607 1413
##  [855]  858 1087  893 1022  666  114  129  315   19  489   21 1155  415   85
##  [869]  165   30  550  289  643  170  838  842   31  405  471  376 1642  257
##  [883]  686 1437  260  617 1176  379 1146  178 1442  407 1620 1692  441  745
##  [897]  791  775  529 1576 1244  988  859 1443 1379 1091 1391  269  534 1269
##  [911]  239 1327 1709 1302 1180  604   66  994  969  253  651 1195 1619  781
##  [925] 1471  409 1517   27 1096  373  213   13  915 1546  755 1462  438  202
##  [939]  204  339  819  143  729 1169  205 1164  508 1695  233 1733  802 1299
##  [953] 1624  998 1151 1643 1561 1647 1626  104  955  491 1237  144  264  919
##  [967]  652   15  628   90 1229  271 1270 1504 1395 1046   57 1665  583 1088
##  [981]  566  889  177  870  115  569 1571 1401 1608   86  990 1193 1277 1714
##  [995]  702  203 1259 1696  699 1303 1115  950  484 1102 1729 1335  188   42
## [1009] 1543   34  362  833  603  317  920 1296  265 1560 1106 1637  561 1537
## [1023]  495 1410  684 1450 1520 1396 1732  548  458  644 1283 1448  330  138
## [1037]  997 1425  801  283  158 1570   25 1704 1108   49 1274 1708 1079  493
## [1051]  808 1261 1090  281 1644 1174  850 1463  984  492 1486 1584  740 1454
## [1065]  933 1082  358 1654 1257  882 1162 1676 1678  565  820  832 1243  835
## [1079] 1627 1011 1372 1089  105  382 1210  678 1578  932 1279  516  634 1192
## [1093]  286 1166 1103 1132 1652 1324 1404 1241 1144 1333 1365 1535 1326 1636
## [1107] 1188 1423 1700 1245  664 1298  796  380  200  135 1066  794 1493 1187
## [1121] 1310 1182 1551 1694  411  254 1671  970  231  459   78  578 1020  687
## [1135]  675 1014 1572  777 1050 1438 1134 1380  599 1026   39  375  478  355
## [1149]  235  395  356  427  171 1610 1518 1075  925 1001  274 1651 1662 1580
## [1163]  731 1495 1496  445 1500 1634   23 1674 1273  187 1650 1173  499 1424
## [1177] 1593 1421 1045 1489 1348 1297  303 1353  301 1661  622 1455 1494 1336
## [1191] 1063 1381 1057  700 1356  709 1599  607 1376 1055  268 1148 1514 1360
## [1205] 1294 1122 1177 1232  345  860  992  465 1585  597 1470  162 1539 1614
## [1219] 1154 1498 1337  163  538 1242  270  865 1681 1702 1705 1312 1538  810
## [1233]  296 1352 1185  557  540 1078  285 1558  532  418  246  611  346 1311
## [1247] 1503   36  194  311 1027  334 1318  131   17  185  659 1099  259  633
## [1261] 1308  541  695 1648 1072  536  937 1131  377  519 1127  844 1009  976
## [1275]   95  512  250 1569 1138 1460  868  507  299  692 1426 1638  938 1531
## [1289] 1128 1563 1069  821  655  987  214  620  746 1541 1682 1472  750  598
## [1303]   80   58 1586 1566 1475 1309 1567 1697  361  576  881  229  797 1458
## [1317]  641  936 1409 1660  190  694  414  921  351 1440  156  207 1301 1483
## [1331] 1548    2    3  217 1288 1150  995 1194 1024 1160 1111  696  901 1204
## [1345] 1384  504 1526 1350 1686  669  843  585  757  787 1484 1625  251  349
## [1359]  474  713 1649  811  486  500 1329  234  741  891  364  649  319  381
## [1373]  601  894   69  712 1522 1264 1397 1416 1679  192  272  772 1717 1363
## [1387]  278 1711 1028 1272 1601 1110  635  973  570  451 1220 1067 1152  535
## [1401]  780  174 1159  853 1664 1130 1583 1196 1191  924  510 1117 1085 1456
## [1415] 1435 1670 1209 1719 1712 1529 1171 1036  216  887 1474 1228  991 1062
## [1429] 1032  461 1525 1262 1282 1427 1076 1344  398 1722 1172  198  640 1168
## [1443] 1248  545 1516 1342  422 1029 1357 1617  374 1129 1597 1605 1358 1659
## [1457] 1018 1618  867 1667  703 1706 1701 1457 1645 1562  573 1452 1505  196
## [1471]  691  760  983  575 1351 1321 1542 1281 1703  505  967  266  782  748
## [1485]  876 1037  866 1048   73  587    9  547   12 1112 1632 1267 1346 1418
## [1499] 1631  934 1200 1655 1433 1628  413  834  354  501 1453   75 1125  815
## [1513] 1534 1387 1142  446 1502  734  340 1179 1068  479  453  609  146 1392
## [1527] 1086  926 1071  788  579  849 1641 1666  999 1271 1565  720 1477  454
## [1541] 1231   63  629 1315  851  336 1199  732 1606  916  137 1140   72  715
## [1555]  161  199 1092 1181 1590   91 1286 1202  111 1633  614  869  670 1000
## [1569] 1135 1513 1598  707 1222 1677  985 1439 1343 1331  790 1587 1034 1015
## [1583]  277 1098  856 1319  571 1323  895  878 1065  468  436 1445 1304 1707
## [1597] 1668 1446 1731  727 1512   60  708  766 1025 1340 1163 1663  386  410
## [1611] 1603  416 1555 1341 1414   94  910 1158  252 1339  793 1367 1473  960
## [1625]  907  836 1623  890  302  390  943 1016 1415  874  959 1116 1149 1375
## [1639] 1133 1054  831 1716 1247 1017  511  947 1083  941 1383  954  807  434
## [1653]  238  975  580 1124 1536  481  247  768  392  724 1198  799 1364 1031
## [1667]  911  636  965  528 1101  693 1206 1428  206  747  365 1589  290  417
## [1681] 1467  697 1008  800  307  406 1334  773  273  338  189  677 1390 1488
## [1695] 1043 1141  344  877  679  412  310  663 1604  524  125 1238  637  978
## [1709]  509 1354 1511  467   22  905 1533 1406  951 1497 1021  590  209  638
## [1723]  109  723 1613 1532 1030 1485  825 1190  367  661   14
## 
## $Rowv
## NULL
## 
## $Colv
## NULL
png("figs/Global_data_analysis/Global differences in protein expression.png")
# Import the data
Control_vs_ALL <- read_excel("data/Control vs ALL.xlsx")

# Separate out the Peak Name column into the UniprotID and drop NAs
Control_vs_ALL <- Control_vs_ALL %>%
  separate(`Peak Name`, into = c("sp", "UniProt", "UP"), sep = "[|]") %>%
  drop_na()

# Repeat above
ControlvsHI_PBS <- read_excel("data/ControlvsHI+PBS.xlsx")
ControlvsHI_PBS <- ControlvsHI_PBS %>%
  separate(`Peak Name`, into = c("sp", "UniProt", "UP"), sep = "[|]") %>%
  drop_na()

# Repeat above
HI_SCvsHI_PBS <- read_excel("data/HI+SCvsHI+PBS.xlsx")
HI_SCvsHI_PBS <- HI_SCvsHI_PBS %>%
  separate(`Peak Name`, into = c("sp", "UniProt", "UP"), sep = "[|]") %>%
  drop_na()

# ----------------------------------------------------------------
# Visualize the Volcano plots using the enhanced volcano package from biconductor
# ----------------------------------------------------------------
EnhancedVolcano(Control_vs_ALL,
  lab = "UniProt",
  x = "Log (Fold Change)",
  y = "p-value",
  xlim = c(-2, 2),
  ylim = c(0, 2),
  title = "Control animals vs HI+HTH"
)
## Warning: Removed 1 rows containing missing values (geom_hline).

EnhancedVolcano(ControlvsHI_PBS,
  lab = "UniProt",
  x = "Log (Fold Change)",
  y = "p-value",
  xlim = c(-2, 2),
  ylim = c(0, 2),
  title = "Control animals vs HI+HTH+PBS"
)
## Warning: Removed 1 rows containing missing values (geom_hline).

EnhancedVolcano(HI_SCvsHI_PBS,
  lab = "UniProt",
  x = "Log (Fold Change)",
  y = "p-value",
  xlim = c(-2, 2),
  ylim = c(0, 2),
  title = "HI+HTH+SC vs HI+HTH+PBS"
)
## Warning: Removed 1 rows containing missing values (geom_hline).

Next we will ask the question: What specific proteins are different between the three sample groups. We will do this with analysis of variance (ANOVA) with Tukey’s Honest significance test to show the between group differences.

#----------------------------------------------------------------------------------------------------------------------------------
Group <- as.factor(t(clinic.values$Group)) # groups you are comparing

Data <- as.matrix(h_data[, 34:49])
rownames(Data) <- h_data$Protein
Data <- na.omit(Data)
# we test one variable at a time, extracting the pvalues
aovmod <- apply(Data, 1, function(x) {
  aov(x ~ Group)
})

# extract the p-values for each ANOVA
pvalaov <- sapply(aovmod, function(x) {
  summary(x)[[1]][["Pr(>F)"]][1]
})

# Number of DE
sum(pvalaov <= 0.05)
## [1] 51
# Adjust for multiple testing
pvalaov_adj <- p.adjust(pvalaov, method = "BH")
sum(pvalaov_adj <= 0.1)
## [1] 0
# Number of DE
pval.05 <- pvalaov[pvalaov <= 0.05]
pval.1 <- pvalaov[pvalaov <= 0.1]

length(pval.05)
## [1] 51
length(pval.1)
## [1] 130
#--------------------------- If you have more than two Groups ------------------------------------
# perform posthoc Tukey test
postTurSHD <- t(lapply(aovmod, function(x) {
  TukeyHSD(x)$"Group"[, "p adj"]
}))

# extract adjust p value for each pair
postTurSHD_Table <- data.frame(matrix(unlist(postTurSHD, use.names = TRUE), nrow = nrow(Data), byrow = T))

# set renames
rownames(postTurSHD_Table) <- names(aovmod)

# set colnames names
colnames(postTurSHD_Table) <- names(postTurSHD[[1]]) # this looks at the first variable and extracts the pairwise comparisons

HI_HTH_SC_HI_HTH_PBS.sig <- row.names(postTurSHD_Table)[which(postTurSHD_Table$`HI+HTH+SC-HI+HTH+PBS` <= 0.05)]

HI_HTH_SC_HI_HTH_PBS.sig <- row.names(postTurSHD_Table)[which(postTurSHD_Table$`HI+HTH+SC-HI+HTH+PBS` <= 0.1)]

HI_HTH_PBS_CONTROL.sig <- row.names(postTurSHD_Table)[which(postTurSHD_Table$`HI+HTH+PBS-CONTROL` <= 0.1)]


# Print significantly different proteins between the two treatment groups
print(HI_HTH_SC_HI_HTH_PBS.sig)
##  [1] "sp|O43707|ACTN4_HUMAN"   "sp|P18206|VINC_HUMAN"   
##  [3] "sp|P62195|PRS8_HUMAN"    "sp|Q09666|AHNK_HUMAN"   
##  [5] "sp|P14550|AK1A1_HUMAN"   "sp|Q06830|PRDX1_HUMAN"  
##  [7] "sp|P48444|COPD_HUMAN"    "sp|Q06323|PSME1_HUMAN"  
##  [9] "sp|P60842|IF4A1_HUMAN"   "sp|P37837|TALDO_HUMAN"  
## [11] "sp|Q9UJU6|DBNL_HUMAN"    "sp|Q9BR76|COR1B_HUMAN"  
## [13] "sp|Q09028|RBBP4_HUMAN"   "sp|O60256|KPRB_HUMAN"   
## [15] "sp|P15311|EZRI_HUMAN"    "sp|Q04637-3|IF4G1_HUMAN"
## [17] "sp|Q9UL46|PSME2_HUMAN"   "sp|Q13418|ILK_HUMAN"    
## [19] "sp|P35637|FUS_HUMAN"     "sp|P62136|PP1A_HUMAN"   
## [21] "sp|P02649|APOE_HUMAN"    "sp|P16403|H12_HUMAN"    
## [23] "sp|P43243|MATR3_HUMAN"   "sp|P07858|CATB_HUMAN"   
## [25] "sp|P49902|5NTC_HUMAN"    "sp|P46976|GLYG_HUMAN"   
## [27] "sp|P26583|HMGB2_HUMAN"   "sp|Q99447-4|PCY2_HUMAN" 
## [29] "sp|P15586|GNS_HUMAN"     "sp|P27105|STOM_HUMAN"   
## [31] "sp|P50440|GATM_HUMAN"    "sp|P49321|NASP_HUMAN"   
## [33] "sp|Q7L576|CYFP1_HUMAN"   "sp|Q9Y6N5|SQOR_HUMAN"
# Save the output
write.csv(HI_HTH_SC_HI_HTH_PBS.sig, "output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS BH adjusted.csv")
write.csv(pval.05, "output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS p05.csv")
write.csv(pval.1, "output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS p=1.csv")


#============================================================================================
# Effect size
ggplot(postTurSHD_Table, aes(`HI+HTH+PBS-CONTROL`))+
      geom_histogram(bins = 50, aes(alpha = 0.05))+
      theme_minimal()+
      geom_vline(xintercept = 0.05)+
      geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text = element_text(size=11))+
      theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+ 
      ggtitle("HI+HTH+PBS vs CONTROL")+
      xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text

ggplot(postTurSHD_Table, aes(`HI+HTH+SC-CONTROL`))+
  geom_histogram(bins = 50, aes(alpha = 0.05))+
      theme_minimal()+
      geom_vline(xintercept = 0.05)+
      geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
      theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+ 
      ggtitle("HI+HTH+SC vs CONTROL")+
      xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text

ggplot(postTurSHD_Table, aes(`HI+HTH+SC-HI+HTH+PBS`))+
  geom_histogram(bins = 50, aes(alpha = 0.05))+
      theme_minimal()+
      geom_vline(xintercept = 0.05)+
      geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
      theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+ 
      ggtitle("HI+HTH+PBS vs HI+HTH+SC")+
      xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text

# Here we aim to assess the effect of our duplicate runs on the ABSIEX5600 Triple TOF.

# Creation of a matrix for PCA computation
h.mean.protein.levels.matrix <- as.matrix(h.mean.protein.levels)

#Center and scale the data (cns. is refers to data that has been centered and scaled.).
cns.h.mean.protein.levels.matrix <- scale(h.mean.protein.levels, center = TRUE)

## By plotting the PCA we can establish assess reproducibillity between duplicate runs.
## Visualization of the global comparisons using heatmaps and dendrograms
data <- cns.h.mean.protein.levels.matrix

proteins <- read_csv("output/Vehicle_vs_Stem cells/Significantly altered proteins HI-HTH-SC vs HI-HTH-PBS p05.csv")
## New names:
## * `` -> ...1
## Rows: 51 Columns: 2
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ...1
## dbl (1): x
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
prot.names <- read_excel("output/Vehicle_vs_Stem cells/Protein namesHI-HTH-SC vs HI-HTH-PBS p05.xlsx")
## New names:
## * `` -> ...4
#prot.name <- as.data.frame(HI_HTH_SC_HI_HTH_PBS.sig)

proteins <- proteins$X1
## Warning: Unknown or uninitialised column: `X1`.
sub_data <- t(subset(data, select = proteins))



# heatmap with dendrogram reordering + expected ordering
color_vector <- c("1", "2", "2", "1", "1", "2", "2", "1", "1", "2", "2", "3", "3", "3", "3", "3")
colMain <- viridis(3)

#hm <- heatmap(data, RowSideColors = color_vector,col=magma(100, direction = 1))

dist.pear <- function(x) as.dist(1-cor(t(x)))
hclust.ave <- function(x) hclust(x, method="mcquitty")

#heatmap.2(sub_data, 
 #         trace="none", 
  #        distfun=dist.pear, 
   #       hclustfun=hclust.ave,
    #      scale="col", 
     #     ColSideColors = color_vector, 
      #    col=viridis(100, direction = 1),
       #   margins = c(7,35), 
        #  cexRow = 1.3,
           # plot labels
         #  main = "Expression of significantly altered proteins",
           #ylab = "Significant Proteins",
          # xlab = "Piglet",
        #  labRow = prot.names$Protein,
          #labCol = c("PBS.1", "SC.1", "SC.2","PBS.2", "PBS.3", "SC.3","SC.4", "PBS.4", "PBS.5", "SC.5", "SC.6", "CON.1", "CON.2", "CON.3","CON.4", "CON.5"))
data <- h.mean.protein.levels[HI_HTH_SC_HI_HTH_PBS.sig]

proteins <- colnames(data)

proteins <- sub("..........|*xxx", "", proteins)
proteins <- sub("\\_.*", "",proteins)

for (i in 1:length(data)){
 plot <- ggplot(data, aes(x = Group, y = data[,i], fill = Group))+
   geom_boxplot()+
  geom_point()+
   labs(y = "Response", x = "Treatment Group", title = paste(proteins[i]))+
     theme(legend.position = "none")
   
  
  print(plot)
}

data <- h.mean.protein.levels[HI_HTH_PBS_CONTROL.sig]

proteins <- colnames(data)

proteins <- sub("..........|*xxx", "", proteins)
proteins <- sub("\\_.*", "",proteins)

for (i in 1:length(data)){
 plot <- ggplot(data, aes(x = Group, y = data[,i], fill = Group))+
   geom_boxplot()+
  geom_point()+
   labs(y = "Response", x = "Treatment Group", title = paste(proteins[i]))+
     theme(legend.position = "none")
   
  
  print(plot)
}

# Data wrangling into a format suitable for cluster profiler

##transform our data into log2 base.
#calculate the log mean of each gene per group


PBS <- c("2334","2344","2350","2368","2374")
SC <- c("2335","2343","2351","2367","2380","2407")
CON <- c("2542","2543","2544","2545","2546")

PBS.g <- subset(Data, select = PBS)
SC.g <- subset(Data, select = SC)
CON.g <- subset(Data, select = CON)

PBS.mean <- rowMeans(PBS.g)
SC.mean <- rowMeans(SC.g)
CON.mean <- rowMeans(CON.g)

PBS.logmean <- log2(PBS.mean)
SC.logmean <- log2(SC.mean)
CON.logmean <- log2(CON.mean)


#because our data is already log2 transformed, we can take the difference between the means.  And this is our log2 Fold Change or log2 Ratio == log2(control / test)
foldchange_CON.vs.HI_HTH_PBS <- as.matrix(SC.logmean-PBS.logmean)
foldchange_CON.vs.HI_HTH_PBS <- as.matrix(CON.logmean-PBS.logmean)
foldchange_CON.vs.HI_HTH_SC <- as.matrix(CON.logmean-SC.logmean)

CON.vs.HI_HTH_PBS <- data.frame(cbind(postTurSHD_Table$`HI+HTH+SC-HI+HTH+PBS`, foldchange_CON.vs.HI_HTH_PBS[,1]))
CON.vs.HI_HTH_PBS <- data.frame(cbind(postTurSHD_Table$`HI+HTH+PBS-CONTROL`, foldchange_CON.vs.HI_HTH_PBS [,1]))
CON.vs.HI_HTH_SC <- data.frame(cbind(postTurSHD_Table$`HI+HTH+SC-CONTROL`,foldchange_CON.vs.HI_HTH_SC[,1]))



sig <- CON.vs.HI_HTH_SC[CON.vs.HI_HTH_SC$X1 <= 0.01 & CON.vs.HI_HTH_PBS$X2 >= 0.5]
hist(foldchange_CON.vs.HI_HTH_PBS, main = "Distribution of log fold change between HI+HTH+PBS - HI+HTH+SC")

png("figs/Distribution of fold change between Control - HI+HTH+SC")

data <-CON.vs.HI_HTH_PBS

ggplot(data, aes(X2))+
      geom_histogram(bins = 50, aes(alpha = 0.05))+
      theme_minimal()+
      geom_vline(xintercept = 0.05)+
      geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text = element_text(size=11))+
      theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+ 
      ggtitle("HI+HTH+PBS vs CONTROL")+
      xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text
## Warning: Removed 3 rows containing missing values (geom_bar).
data <- CON.vs.HI_HTH_PBS

ggplot(data, aes(X2))+
  geom_histogram(bins = 50, aes(alpha = 0.05))+
      theme_minimal()+
      geom_vline(xintercept = 0.05)+
      geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
      theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+ 
      ggtitle("HI+HTH+SC vs CONTROL")+
      xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text

## Warning: Removed 3 rows containing missing values (geom_bar).
data <- CON.vs.HI_HTH_SC
  
ggplot(data, aes(X2))+
  geom_histogram(bins = 50, aes(alpha = 0.05))+
      theme_minimal()+
      geom_vline(xintercept = 0.05)+
      geom_text(aes(x=0.005, label="p < 0.05", y=150), colour="red", angle=90, vjust = 1.2, text=element_text(size=11))+
      theme(legend.position = "none", plot.title = element_text(hjust = 0.5))+ 
      ggtitle("HI+HTH+PBS vs HI+HTH+SC")+
      xlab("p value")+ ylab("Count")+ ylim(0,240)
## Warning: Ignoring unknown parameters: text

## Warning: Removed 3 rows containing missing values (geom_bar).
d <- CON.vs.HI_HTH_PBS %>%
  mutate(`pvalue` = `X1`, `logFC` = `X2` )

#write.csv(d, file = "output/Vehicle_vs_Stem cells/HI=HTH+PBS vs HI+HTH+SC.csv")


  EnhancedVolcano(d,
    lab = rownames(d),
    x = "logFC",
    y = "pvalue",
    xlim = c(-3, 3),
    ylim = c(0, 3),
    title = 'HI+HTH+PBS versus HI+HTH+SC',
    subtitle = "Differential protein expression",
    pCutoff = 10e-2,
    FCcutoff = 0.5,
    pointSize = 1.0,
    labSize = 3.0,
    shape = 2,
    colAlpha = 1)

  d <- CON.vs.HI_HTH_PBS %>%
  mutate(`pvalue` = `X1`, `logFC` = `X2` )

  
  EnhancedVolcano(d,
    lab = rownames(d),
    x = "logFC",
    y = "pvalue",
    xlim = c(-3, 3),
    ylim = c(0, 3),
    title = 'CONTROL versus HI+HTH+PBS ',
    subtitle = "Differential protein expression",
    pCutoff = 10e-2,
    FCcutoff = 0.5,
    pointSize = 1.0,
    labSize = 3.0,
    shape = 2,
    colAlpha = 1)

  d <- CON.vs.HI_HTH_SC %>%
  mutate(`pvalue` = `X1`, `logFC` = `X2` )

  
  EnhancedVolcano(d,
    lab = rownames(d),
    x = "logFC",
    y = "pvalue",
    xlim = c(-3, 3),
    ylim = c(0, 3),
    title = 'CONTROL versus HI+HTH+SC',
    subtitle = "Differential protein expression",
    pCutoff = 10e-2,
    FCcutoff = 0.5,
    pointSize = 1.0,
    labSize = 3.0,
    shape = 2,
    colAlpha = 1)

sub_data <- subset(h.mean.protein.levels, select = HI_HTH_SC_HI_HTH_PBS.sig)

sig.proteins.PCA <- scale(sub_data, center = TRUE)

sig.proteins.pca <- pca(sig.proteins.PCA)

plotIndiv(sig.proteins.pca, group = clinic.values$Group, title = "PCA using significant proteins")

# prepare prot.list
## assume that 1st column is ID
## 2nd column is fold change

## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)

Prot.list <- CON.vs.HI_HTH_PBS %>%
  separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
  mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
  dplyr::select(kegg, `Fold Change`, `p-value`, UP)%>%
  filter(`p-value` <= 0.1)


Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg



## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
# =================================================================================================
## Pathway analysis data preparation
# This "Gene list" contains the KEGG ids
geneList <- d[, 2]
names(geneList) <- as.character(d[, 1])

geneList <- Prot.list.x


gene <- names(geneList)[abs(geneList) > 0.5]
#downloaded from http://www.disgenet.org/ds/DisGeNET/results/all_gene_disease_associations.tar.gz
# The above file cannot be run from a script but must instead be run from the console
gda <- read.delim("data/all_gene_disease_associations.txt")

disease2gene=gda[, c("diseaseId", "geneId")]
disease2name=gda[, c("diseaseId", "diseaseName")]





#A universal enrichment analyzer
enriched <- enricher(
  gene,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe,
  minGSSize = 5,
  maxGSSize = 500,
  qvalueCutoff = 0.2, TERM2GENE=disease2gene, TERM2NAME=disease2name
)
## `universe` is not in character and will be ignored...
enrichedx <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')

dotplot(enrichedx, showCategory = 20)

cnetplot(enrichedx, showCategory = 5, foldChange = geneList)

png("output/cnetplot from universal enrichment analysis from disGenet")
# ==================================================================================================
# KEGG enrichment analysis
KEGG.enrichment <- enrichKEGG(
  gene = gene,
  keyType = "kegg",
  organism = "hsa",
  pvalueCutoff = 0.1
)
## Reading KEGG annotation online:
## 
## Reading KEGG annotation online:
KEGG.enrichment <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')

barplot(KEGG.enrichment, showCategory = 6)

dotplot(KEGG.enrichment, showCategory = 20)

cnetplot(KEGG.enrichment, showCategory = 6, foldChange = geneList)

png("output/cnetplot from KEGG Enrichment Analysis")



## convert gene ID to Symbol
gse <- gseKEGG(
  geneList = geneList,
  keyType = "kegg",
  organism = "hsa",
  minGSSize = 1,
  pvalueCutoff = 0.1,
  verbose = FALSE
)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
# Reactome Pathway Analysis
#Protein Enrichment analysis
RPA <- enrichPathway(
  gene = gene,
  pvalueCutoff = 0.05)

RPAx <- setReadable(RPA, "org.Hs.eg.db", 'ENTREZID')

barplot(RPAx, showCategory = 20)

dotplot(RPAx, showCategory = 20)

emapplot(RPA, node_label  = "all", showCategory = 5, foldChange = geneList)

cnetplot(RPAx, node_label  = "all", showCategory = 20, text = 16, foldChange = geneList)

png("output/cnetplot from Reactome analysis")

#Protein set enrichment analysis
gseRPA <- gsePathway(geneList = geneList,
  pvalueCutoff = 0.2)
## preparing geneSet collections...
## GSEA analysis...
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## leading edge analysis...
## done...
gseRPAx <- setReadable(gseRPA, "org.Hs.eg.db", 'ENTREZID')

cnetplot(gseRPAx, node_label = "all")
png("output/Vehicle_vs_Stem cells/cnetplot from Reactome GeneSet analysis")
## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)

Prot.list <- CON.vs.HI_HTH_PBS %>%
  separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
  mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
  dplyr::select(kegg, `Fold Change`, `p-value`)%>%
  filter(`p-value` <= 0.1)


Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg



## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
#Functional Profile of a gene set at specific GO level. Given a vector of genes, this function will return the GO profile at a specific level.
ggoMF <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "MF",
               level    = 3,
               readable = TRUE)

head(ggoMF)
##                    ID                          Description Count GeneRatio
## GO:0004133 GO:0004133 glycogen debranching enzyme activity     0      0/97
## GO:0009975 GO:0009975                     cyclase activity     0      0/97
## GO:0016491 GO:0016491              oxidoreductase activity     7      7/97
## GO:0016740 GO:0016740                 transferase activity    12     12/97
## GO:0016787 GO:0016787                   hydrolase activity    19     19/97
## GO:0016829 GO:0016829                       lyase activity     0      0/97
##                                                                                                                  geneID
## GO:0004133                                                                                                             
## GO:0009975                                                                                                             
## GO:0016491                                                                       SQOR/G6PD/PRDX1/AKR1B1/HADH/ACADM/GPX1
## GO:0016740                                             PRMT1/COMT/GYG1/PKM/UBA3/TKT/GSTM3/PCYT1A/PRPSAP2/ILK/GATM/PRKCD
## GO:0016787 AHCY/RAB8A/PSMB2/HSPA1A/HSPA1B/CAPN2/PSMD14/EIF4A1/PSAP/NT5C2/ARF3/PEPD/PSMC5/CNDP2/DDX3X/GNS/CTSB/MX1/PTPN6
## GO:0016829
barplot(ggoMF)

cnetplot(ggoMF , foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 1,
               readable = TRUE)

head(ggoBP)
##                    ID        Description Count GeneRatio
## GO:0008150 GO:0008150 biological_process    97     97/97
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     geneID
## GO:0008150 PICK1/PCP4/AHCY/RAB8A/PRMT1/PSMB2/SMAP2/AP1B1/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/SH3BGRL/EIF3L/SLC3A2/CORO1B/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/ANXA5/PCYT1A/CHL1/SPTBN1/SLC17A6/PRPSAP2/SNRPN/SNURF/XPO5/PSAP/TPM4/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/TAGLN2/AHNAK/FERMT3/MATR3/GATM/APOE/HSPB1/ABRACL/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/SSR1/ARCN1/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/PLEC/MX1/NPM1/OSTF1/PTPN6/CLIC1/COTL1/ARPC1B/STOM
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 2,
               readable = TRUE)

head(ggoBP)
##                    ID            Description Count GeneRatio
## GO:0000003 GO:0000003           reproduction    11     11/97
## GO:0002376 GO:0002376  immune system process    39     39/97
## GO:0006791 GO:0006791     sulfur utilization     0      0/97
## GO:0006794 GO:0006794 phosphorus utilization     0      0/97
## GO:0007610 GO:0007610               behavior     9      9/97
## GO:0008152 GO:0008152      metabolic process    75     75/97
##                                                                                                                                                                                                                                                                                                                                                                                                                                                geneID
## GO:0000003                                                                                                                                                                                                                                                                                                                                                                              PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0002376                                                                                                                                                                                                                 AHCY/PRMT1/PSMB2/AP1B1/GYG1/PKM/HSPA1A/HSPA1B/EZR/CYFIP1/SLC3A2/DBNL/PSMD14/G6PD/PSAP/PRDX1/MATR3/APOE/PSMC5/HMGB2/RPS6/MSN/PGM2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/GPX1/MX1/OSTF1/PTPN6/COTL1/ARPC1B/STOM
## GO:0006791                                                                                                                                                                                                                                                                                                                                                                                                                                           
## GO:0006794                                                                                                                                                                                                                                                                                                                                                                                                                                           
## GO:0007610                                                                                                                                                                                                                                                                                                                                                                                             AHCY/COMT/UBA6/CAPN2/CHL1/PSAP/GATM/APOE/ARCN1
## GO:0008152 PICK1/AHCY/RAB8A/PRMT1/PSMB2/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/EIF3L/SLC3A2/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/PCYT1A/SPTBN1/PRPSAP2/SNRPN/XPO5/PSAP/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/AHNAK/MATR3/GATM/APOE/HSPB1/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/NPM1/PTPN6
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 3,
               readable = TRUE)

head(ggoBP)
##                    ID                              Description Count GeneRatio
## GO:0019953 GO:0019953                      sexual reproduction     4      4/97
## GO:0019954 GO:0019954                     asexual reproduction     0      0/97
## GO:0022414 GO:0022414                     reproductive process    11     11/97
## GO:0032504 GO:0032504      multicellular organism reproduction     7      7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism     0      0/97
## GO:0061887 GO:0061887         reproduction of symbiont in host     0      0/97
##                                                                   geneID
## GO:0019953                                        PICK1/HMGB2/RPS6/DDX3X
## GO:0019954                                                              
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504                       PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505                                                              
## GO:0061887
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 4,
               readable = TRUE)

head(ggoBP)
##                    ID                                           Description
## GO:0000747 GO:0000747                      conjugation with cellular fusion
## GO:0000909 GO:0000909 sporocarp development involved in sexual reproduction
## GO:0007276 GO:0007276                                     gamete generation
## GO:0007618 GO:0007618                                                mating
## GO:0009566 GO:0009566                                         fertilization
## GO:0034293 GO:0034293                                    sexual sporulation
##            Count GeneRatio                 geneID
## GO:0000747     0      0/97                       
## GO:0000909     0      0/97                       
## GO:0007276     4      4/97 PICK1/HMGB2/RPS6/DDX3X
## GO:0007618     0      0/97                       
## GO:0009566     0      0/97                       
## GO:0034293     0      0/97
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)
## Warning in vattrs[[name]][index] <- value: number of items to replace is not a
## multiple of replacement length

ggoCC <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "CC",
               level    = 3,
               readable = TRUE)

head(ggoCC)
##                    ID                                        Description Count
## GO:0000131 GO:0000131                        incipient cellular bud site     0
## GO:0000151 GO:0000151                           ubiquitin ligase complex     2
## GO:0000159 GO:0000159                protein phosphatase type 2A complex     0
## GO:0000178 GO:0000178                            exosome (RNase complex)     1
## GO:0000307 GO:0000307 cyclin-dependent protein kinase holoenzyme complex     0
## GO:0000408 GO:0000408                                  EKC/KEOPS complex     0
##            GeneRatio       geneID
## GO:0000131      0/97             
## GO:0000151      2/97 CUL4B/BABAM2
## GO:0000159      0/97             
## GO:0000178      1/97      CARHSP1
## GO:0000307      0/97             
## GO:0000408      0/97
barplot(ggoCC)

#cnetplot(ggoCC , foldChange = geneList)

ggo <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 3,
               readable = TRUE)

head(ggo)
##                    ID                              Description Count GeneRatio
## GO:0019953 GO:0019953                      sexual reproduction     4      4/97
## GO:0019954 GO:0019954                     asexual reproduction     0      0/97
## GO:0022414 GO:0022414                     reproductive process    11     11/97
## GO:0032504 GO:0032504      multicellular organism reproduction     7      7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism     0      0/97
## GO:0061887 GO:0061887         reproduction of symbiont in host     0      0/97
##                                                                   geneID
## GO:0019953                                        PICK1/HMGB2/RPS6/DDX3X
## GO:0019954                                                              
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504                       PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505                                                              
## GO:0061887
#GO Enrichment Analysis of a gene set. Given a vector of genes, this function will return the enrichment GO categories after FDR control.
ego <- enrichGO(gene          = gene,
                universe      = NULL,
                OrgDb         = org.Hs.eg.db,
                ont           = "ALL",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.05,
                qvalueCutoff  = 0.2,
                readable      = TRUE)
head(ego) # No GO output
##            ONTOLOGY         ID
## GO:0043312       BP GO:0043312
## GO:0002283       BP GO:0002283
## GO:0042119       BP GO:0042119
## GO:0002446       BP GO:0002446
## GO:0043254       BP GO:0043254
## GO:0051258       BP GO:0051258
##                                                  Description GeneRatio
## GO:0043312                          neutrophil degranulation     19/97
## GO:0002283 neutrophil activation involved in immune response     19/97
## GO:0042119                             neutrophil activation     19/97
## GO:0002446                      neutrophil mediated immunity     19/97
## GO:0043254 regulation of protein-containing complex assembly     17/97
## GO:0051258                            protein polymerization     13/97
##              BgRatio       pvalue     p.adjust       qvalue
## GO:0043312 485/18670 5.714026e-12 5.492428e-09 4.300393e-09
## GO:0002283 488/18670 6.360722e-12 5.492428e-09 4.300393e-09
## GO:0042119 498/18670 9.044814e-12 5.492428e-09 4.300393e-09
## GO:0002446 499/18670 9.364753e-12 5.492428e-09 4.300393e-09
## GO:0043254 429/18670 7.103647e-11 3.333031e-08 2.609656e-08
## GO:0051258 283/18670 2.535521e-09 9.913887e-07 7.762253e-07
##                                                                                                                 geneID
## GO:0043312 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002283 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0042119 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002446 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0043254       PICK1/CUL4B/HSPA1A/HSPA1B/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/APOE/PSMC5/MSN/PRKCD/DDX3X/GFAP/COTL1/ARPC1B
## GO:0051258                            PICK1/HSPA1A/HSPA1B/ZNF207/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/PRKCD/MX1/COTL1/ARPC1B
##            Count
## GO:0043312    19
## GO:0002283    19
## GO:0042119    19
## GO:0002446    19
## GO:0043254    17
## GO:0051258    13
dotplot(ego)

barplot(ego, showCategory = 20)

cnetplot(ego, foldChange = geneList, showCategory = 5)

emapplot(ego, foldChange = geneList, showCategory = 10)

#enrichment analysis by DAVID
#enrDAVID <- enrichDAVID(gene,
 # idType = "ENTREZ_GENE_ID",
  #universe,
  #minGSSize = 5,
  #maxGSSize = 500,
  #annotation = "GOTERM_BP_FAT",
  #pvalueCutoff = 0.05,
  #pAdjustMethod = "BH",
  #qvalueCutoff = 0.2,
  #species = NA)

#given a vector of genes, this function will return the enrichment NCG categories with FDR control
eDGN <- enrichDGN(gene,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe,
  minGSSize = 10,
  maxGSSize = 500,
  qvalueCutoff = 0.2,
  readable = TRUE)
## `universe` is not in character and will be ignored...
barplot(eDGN)

cnetplot(eDGN)

#gseDGN <- gseDGN(geneList,
  #exponent = 1,
  #minGSSize = 10,
  #maxGSSize = 500,
  #pvalueCutoff = 0.05,
  #pAdjustMethod = "BH",
  #verbose = TRUE,
  #seed = TRUE,
 # by = "DOSE", nPerm = 1000)

#cnetplot(gseDGN)
# The output of the following compares between the Control and the HI+HTH+PBS group 
DO <-  enrichDO(gene          = names(geneList),
              ont           = "DO",
              pvalueCutoff  = 0.05,
              pAdjustMethod = "BH",
              universe      = NULL,
              minGSSize     = 5,
              maxGSSize     = 500,
              qvalueCutoff  = 0.2,
              readable      = TRUE)


barplot(DO, showCategory = 20)

dotplot(DO, showCategory = 20)

#cnetplot(DO)


ncg <- enrichNCG(gene)

head(ncg)
## [1] ID          Description GeneRatio   BgRatio     pvalue      p.adjust   
## [7] qvalue      geneID      Count      
## <0 rows> (or 0-length row.names)
dgn <- enrichDGN(gene)

head(dgn)
##                          ID            Description GeneRatio   BgRatio
## umls:C0026848 umls:C0026848               Myopathy     12/90 298/17381
## umls:C0848332 umls:C0848332          Spots on skin      9/90 198/17381
## umls:C0024408 umls:C0024408 Machado-Joseph Disease      6/90  78/17381
## umls:C0036646 umls:C0036646   Age-related cataract      5/90  45/17381
## umls:C0151744 umls:C0151744    Myocardial Ischemia     12/90 476/17381
## umls:C0040822 umls:C0040822                 Tremor      6/90  88/17381
##                     pvalue     p.adjust       qvalue
## umls:C0026848 4.300625e-08 6.919706e-05 5.405207e-05
## umls:C0848332 8.582970e-07 6.904999e-04 5.393719e-04
## umls:C0024408 3.100934e-06 1.388755e-03 1.084802e-03
## umls:C0036646 3.452468e-06 1.388755e-03 1.084802e-03
## umls:C0151744 6.120241e-06 1.538419e-03 1.201709e-03
## umls:C0040822 6.277333e-06 1.538419e-03 1.201709e-03
##                                                                 geneID Count
## umls:C0026848 6520/2539/6638/9782/2628/348/5580/1508/3920/34/2670/5777    12
## umls:C0848332                5690/6520/2539/5052/348/3315/1508/34/4869     9
## umls:C0024408                             9463/348/3315/2521/5580/2670     6
## umls:C0036646                                  3303/3304/2947/2539/348     5
## umls:C0151744  2992/3303/308/231/2628/348/3315/4478/372/5580/2171/4599    12
## umls:C0040822                             8450/348/2521/5580/3920/2670     6
barplot(dgn, showCategory = 20)

dotplot(dgn, showCategory = 20)

cnetplot(dgn, showCategory = 10)

dgnx <- setReadable(dgn, 'org.Hs.eg.db')

y <- gseDO(geneList,
           nPerm         = 100,
           minGSSize     = 120,
           pvalueCutoff  = 0.2,
           pAdjustMethod = "BH",
           verbose       = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
head(y, 3)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
ncg <- gseNCG(geneList,
              nPerm         = 1000,
              minGSSize     = 15,
              pvalueCutoff  = 0.2,
              pAdjustMethod = "BH",
              verbose       = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
dgn <- gseDGN(geneList,
              nPerm         = 1000,
              minGSSize     = 5,
              pvalueCutoff  = 0.2,
              pAdjustMethod = "BH",
              verbose       = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
# prepare prot.list
## assume that 1st column is ID
## 2nd column is fold change

## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)

Prot.list <- CON.vs.HI_HTH_PBS %>%
  separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
  mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
  dplyr::select(kegg, `Fold Change`, `p-value`, UP)%>%
  filter(`p-value` <= 0.1)


Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg



## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
# =================================================================================================
## Pathway analysis data preparation
# This "Gene list" contains the KEGG ids
geneList <- d[, 2]
names(geneList) <- as.character(d[, 1])

geneList <- Prot.list.x


gene <- names(geneList)[abs(geneList) > 0.5]
#downloaded from http://www.disgenet.org/ds/DisGeNET/results/all_gene_disease_associations.tar.gz
# The above file cannot be run from a script but must instead be run from the console
gda <- read.delim("data/all_gene_disease_associations.txt")

disease2gene=gda[, c("diseaseId", "geneId")]
disease2name=gda[, c("diseaseId", "diseaseName")]





#A universal enrichment analyzer
enriched <- enricher(
  gene,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe,
  minGSSize = 5,
  maxGSSize = 500,
  qvalueCutoff = 0.2, TERM2GENE=disease2gene, TERM2NAME=disease2name
)
## `universe` is not in character and will be ignored...
enrichedx <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')

dotplot(enrichedx, showCategory = 20)

cnetplot(enrichedx, showCategory = 5, foldChange = geneList)

png("output/cnetplot from universal enrichment analysis from disGenet")
# ==================================================================================================
# KEGG enrichment analysis
KEGG.enrichment <- enrichKEGG(
  gene = gene,
  keyType = "kegg",
  organism = "hsa",
  pvalueCutoff = 0.05
)

#KEGG.enrichment <- setReadable(enriched, "org.Hs.eg.db", 'ENTREZID')

barplot(KEGG.enrichment, showCategory = 6)

dotplot(KEGG.enrichment, showCategory = 20)

#cnetplot(KEGG.enrichmentx, showCategory = 6, foldChange = geneList)
png("output/cnetplot from KEGG Enrichment Analysis")



## convert gene ID to Symbol
gse <- gseKEGG(
  geneList = geneList,
  keyType = "kegg",
  organism = "hsa",
  minGSSize = 1,
  pvalueCutoff = 0.1,
  verbose = FALSE
)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
head(gse)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
#cnetplot(gse, foldChange = geneList)

gsex <- setReadable(gse, "org.Hs.eg.db", 'ENTREZID')

#dotplot(gse, showCategory = 20)
#cnetplot(gsex,node_lable = "all")
png("output/cnetplot from KEGG geneset analysis")
#gseaplot(gse, geneSetID = 1)
# Reactome Pathway Analysis
#Protein Enrichment analysis
RPA <- enrichPathway(
  gene = gene,
  pvalueCutoff = 0.05)

RPAx <- setReadable(RPA, "org.Hs.eg.db", 'ENTREZID')

barplot(RPAx, showCategory = 20)

dotplot(RPAx, showCategory = 20)

emapplot(RPA, node_label  = "all", showCategory = 5, foldChange = geneList)

cnetplot(RPAx, node_label  = "all", showCategory = 5, text = 16, foldChange = geneList)

png("output/cnetplot from Reactome analysis")

#Protein set enrichment analysis
gseRPA <- gsePathway(geneList = geneList,
  pvalueCutoff = 0.1)
## preparing geneSet collections...
## GSEA analysis...
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## leading edge analysis...
## done...
gseRPAx <- setReadable(gseRPA, "org.Hs.eg.db", 'ENTREZID')

#cnetplot(gseRPAx, node_label = "all")
png("output/Vehicle_vs_Stem cells/cnetplot from Reactome GeneSet analysis")
## feature 1: numeric vector
CON.vs.HI_HTH_PBS <- setDT(CON.vs.HI_HTH_PBS, keep.rownames = TRUE)

Prot.list <- CON.vs.HI_HTH_PBS %>%
  separate(rn, into = c("sp", "UniProt", "UP"), sep = "[|]")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [358].
KEGGID <- bitr_kegg(Prot.list$UniProt , fromType = "uniprot", toType = "kegg", organism = "hsa")
## Reading KEGG annotation online:
## Warning in bitr_kegg(Prot.list$UniProt, fromType = "uniprot", toType = "kegg", :
## 8.02% of input gene IDs are fail to map...
entrez = bitr(KEGGID$uniprot, fromType="UNIPROT", toType="ENTREZID", OrgDb="org.Hs.eg.db")
## 'select()' returned 1:many mapping between keys and columns
## Warning in bitr(KEGGID$uniprot, fromType = "UNIPROT", toType = "ENTREZID", :
## 0.13% of input gene IDs are fail to map...
d <- merge(Prot.list, KEGGID, by = "Identifier", by.x = "UniProt", by.y = "uniprot")
## Warning in merge.data.table(Prot.list, KEGGID, by = "Identifier", by.x =
## "UniProt", : Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.
d <- d %>%
  mutate(`Fold Change` = `X2`, `p-value`= `X1`)%>%
  dplyr::select(kegg, `Fold Change`, `p-value`)%>%
  filter(`p-value` <= 0.1)


Prot.list.x <- as.numeric(d$`Fold Change`)
## feature 2: named vector
names(Prot.list.x) <- d$kegg



## feature 3: decreasing order
Prot.list.x <- sort(Prot.list.x, decreasing = TRUE)
#Functional Profile of a gene set at specific GO level. Given a vector of genes, this function will return the GO profile at a specific level.
ggoMF <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "MF",
               level    = 3,
               readable = TRUE)

head(ggoMF)
##                    ID                          Description Count GeneRatio
## GO:0004133 GO:0004133 glycogen debranching enzyme activity     0      0/97
## GO:0009975 GO:0009975                     cyclase activity     0      0/97
## GO:0016491 GO:0016491              oxidoreductase activity     7      7/97
## GO:0016740 GO:0016740                 transferase activity    12     12/97
## GO:0016787 GO:0016787                   hydrolase activity    19     19/97
## GO:0016829 GO:0016829                       lyase activity     0      0/97
##                                                                                                                  geneID
## GO:0004133                                                                                                             
## GO:0009975                                                                                                             
## GO:0016491                                                                       SQOR/G6PD/PRDX1/AKR1B1/HADH/ACADM/GPX1
## GO:0016740                                             PRMT1/COMT/GYG1/PKM/UBA3/TKT/GSTM3/PCYT1A/PRPSAP2/ILK/GATM/PRKCD
## GO:0016787 AHCY/RAB8A/PSMB2/HSPA1A/HSPA1B/CAPN2/PSMD14/EIF4A1/PSAP/NT5C2/ARF3/PEPD/PSMC5/CNDP2/DDX3X/GNS/CTSB/MX1/PTPN6
## GO:0016829
barplot(ggoMF)

cnetplot(ggoMF , foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 1,
               readable = TRUE)

head(ggoBP)
##                    ID        Description Count GeneRatio
## GO:0008150 GO:0008150 biological_process    97     97/97
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     geneID
## GO:0008150 PICK1/PCP4/AHCY/RAB8A/PRMT1/PSMB2/SMAP2/AP1B1/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/SH3BGRL/EIF3L/SLC3A2/CORO1B/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/ANXA5/PCYT1A/CHL1/SPTBN1/SLC17A6/PRPSAP2/SNRPN/SNURF/XPO5/PSAP/TPM4/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/TAGLN2/AHNAK/FERMT3/MATR3/GATM/APOE/HSPB1/ABRACL/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/SSR1/ARCN1/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/PLEC/MX1/NPM1/OSTF1/PTPN6/CLIC1/COTL1/ARPC1B/STOM
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 2,
               readable = TRUE)

head(ggoBP)
##                    ID            Description Count GeneRatio
## GO:0000003 GO:0000003           reproduction    11     11/97
## GO:0002376 GO:0002376  immune system process    39     39/97
## GO:0006791 GO:0006791     sulfur utilization     0      0/97
## GO:0006794 GO:0006794 phosphorus utilization     0      0/97
## GO:0007610 GO:0007610               behavior     9      9/97
## GO:0008152 GO:0008152      metabolic process    75     75/97
##                                                                                                                                                                                                                                                                                                                                                                                                                                                geneID
## GO:0000003                                                                                                                                                                                                                                                                                                                                                                              PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0002376                                                                                                                                                                                                                 AHCY/PRMT1/PSMB2/AP1B1/GYG1/PKM/HSPA1A/HSPA1B/EZR/CYFIP1/SLC3A2/DBNL/PSMD14/G6PD/PSAP/PRDX1/MATR3/APOE/PSMC5/HMGB2/RPS6/MSN/PGM2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/GPX1/MX1/OSTF1/PTPN6/COTL1/ARPC1B/STOM
## GO:0006791                                                                                                                                                                                                                                                                                                                                                                                                                                           
## GO:0006794                                                                                                                                                                                                                                                                                                                                                                                                                                           
## GO:0007610                                                                                                                                                                                                                                                                                                                                                                                             AHCY/COMT/UBA6/CAPN2/CHL1/PSAP/GATM/APOE/ARCN1
## GO:0008152 PICK1/AHCY/RAB8A/PRMT1/PSMB2/COMT/CUL4B/NASP/GYG1/PKM/HSPA1A/HSPA1B/PTMA/EZR/UBA6/CAPN2/UBA3/ZNF207/EPS15/CYFIP1/EIF3L/SLC3A2/DBNL/SQOR/TKT/GSTM3/PSMD14/G6PD/SNRPD1/PFN1/EIF4A1/PCYT1A/SPTBN1/PRPSAP2/SNRPN/XPO5/PSAP/PRDX1/AKR1B1/BABAM2/NT5C2/ARF3/CARHSP1/HADH/ACTN4/PEPD/ILK/AHNAK/MATR3/GATM/APOE/HSPB1/H1-2/PSMC5/HMGB2/RPS6/MSN/FUS/PGM2/CNDP2/STAT1/PRKCD/DDX3X/GNS/PSME1/FABP5/PSME2/CTSB/LAMP2/RPLP0/ACADM/GPX1/GFAP/NPM1/PTPN6
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 3,
               readable = TRUE)

head(ggoBP)
##                    ID                              Description Count GeneRatio
## GO:0019953 GO:0019953                      sexual reproduction     4      4/97
## GO:0019954 GO:0019954                     asexual reproduction     0      0/97
## GO:0022414 GO:0022414                     reproductive process    11     11/97
## GO:0032504 GO:0032504      multicellular organism reproduction     7      7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism     0      0/97
## GO:0061887 GO:0061887         reproduction of symbiont in host     0      0/97
##                                                                   geneID
## GO:0019953                                        PICK1/HMGB2/RPS6/DDX3X
## GO:0019954                                                              
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504                       PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505                                                              
## GO:0061887
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)

ggoBP <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 4,
               readable = TRUE)

head(ggoBP)
##                    ID                                           Description
## GO:0000747 GO:0000747                      conjugation with cellular fusion
## GO:0000909 GO:0000909 sporocarp development involved in sexual reproduction
## GO:0007276 GO:0007276                                     gamete generation
## GO:0007618 GO:0007618                                                mating
## GO:0009566 GO:0009566                                         fertilization
## GO:0034293 GO:0034293                                    sexual sporulation
##            Count GeneRatio                 geneID
## GO:0000747     0      0/97                       
## GO:0000909     0      0/97                       
## GO:0007276     4      4/97 PICK1/HMGB2/RPS6/DDX3X
## GO:0007618     0      0/97                       
## GO:0009566     0      0/97                       
## GO:0034293     0      0/97
barplot(ggoBP)

cnetplot(ggoBP, foldChange = geneList)
## Warning in vattrs[[name]][index] <- value: number of items to replace is not a
## multiple of replacement length

ggoCC <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "CC",
               level    = 3,
               readable = TRUE)

head(ggoCC)
##                    ID                                        Description Count
## GO:0000131 GO:0000131                        incipient cellular bud site     0
## GO:0000151 GO:0000151                           ubiquitin ligase complex     2
## GO:0000159 GO:0000159                protein phosphatase type 2A complex     0
## GO:0000178 GO:0000178                            exosome (RNase complex)     1
## GO:0000307 GO:0000307 cyclin-dependent protein kinase holoenzyme complex     0
## GO:0000408 GO:0000408                                  EKC/KEOPS complex     0
##            GeneRatio       geneID
## GO:0000131      0/97             
## GO:0000151      2/97 CUL4B/BABAM2
## GO:0000159      0/97             
## GO:0000178      1/97      CARHSP1
## GO:0000307      0/97             
## GO:0000408      0/97
barplot(ggoCC)

cnetplot(ggoCC , foldChange = geneList)
## Warning in vattrs[[name]][index] <- value: number of items to replace is not a
## multiple of replacement length

ggo <- groupGO(gene     = gene,
               OrgDb    = org.Hs.eg.db,
               ont      = "BP",
               level    = 3,
               readable = TRUE)

head(ggo)
##                    ID                              Description Count GeneRatio
## GO:0019953 GO:0019953                      sexual reproduction     4      4/97
## GO:0019954 GO:0019954                     asexual reproduction     0      0/97
## GO:0022414 GO:0022414                     reproductive process    11     11/97
## GO:0032504 GO:0032504      multicellular organism reproduction     7      7/97
## GO:0032505 GO:0032505 reproduction of a single-celled organism     0      0/97
## GO:0061887 GO:0061887         reproduction of symbiont in host     0      0/97
##                                                                   geneID
## GO:0019953                                        PICK1/HMGB2/RPS6/DDX3X
## GO:0019954                                                              
## GO:0022414 PICK1/COMT/NASP/CAPN2/ANXA5/PSAP/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032504                       PICK1/COMT/AKR1B1/HMGB2/RPS6/DDX3X/CTSB
## GO:0032505                                                              
## GO:0061887
#GO Enrichment Analysis of a gene set. Given a vector of genes, this function will return the enrichment GO categories after FDR control.
ego <- enrichGO(gene          = gene,
                universe      = NULL,
                OrgDb         = org.Hs.eg.db,
                ont           = "ALL",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.05,
                qvalueCutoff  = 0.2,
                readable      = TRUE)
head(ego) # No GO output
##            ONTOLOGY         ID
## GO:0043312       BP GO:0043312
## GO:0002283       BP GO:0002283
## GO:0042119       BP GO:0042119
## GO:0002446       BP GO:0002446
## GO:0043254       BP GO:0043254
## GO:0051258       BP GO:0051258
##                                                  Description GeneRatio
## GO:0043312                          neutrophil degranulation     19/97
## GO:0002283 neutrophil activation involved in immune response     19/97
## GO:0042119                             neutrophil activation     19/97
## GO:0002446                      neutrophil mediated immunity     19/97
## GO:0043254 regulation of protein-containing complex assembly     17/97
## GO:0051258                            protein polymerization     13/97
##              BgRatio       pvalue     p.adjust       qvalue
## GO:0043312 485/18670 5.714026e-12 5.492428e-09 4.300393e-09
## GO:0002283 488/18670 6.360722e-12 5.492428e-09 4.300393e-09
## GO:0042119 498/18670 9.044814e-12 5.492428e-09 4.300393e-09
## GO:0002446 499/18670 9.364753e-12 5.492428e-09 4.300393e-09
## GO:0043254 429/18670 7.103647e-11 3.333031e-08 2.609656e-08
## GO:0051258 283/18670 2.535521e-09 9.913887e-07 7.762253e-07
##                                                                                                                 geneID
## GO:0043312 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002283 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0042119 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0002446 GYG1/PKM/HSPA1A/HSPA1B/CYFIP1/DBNL/PSMD14/PSAP/PGM2/PRKCD/DDX3X/GNS/FABP5/CTSB/LAMP2/OSTF1/PTPN6/COTL1/STOM
## GO:0043254       PICK1/CUL4B/HSPA1A/HSPA1B/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/APOE/PSMC5/MSN/PRKCD/DDX3X/GFAP/COTL1/ARPC1B
## GO:0051258                            PICK1/HSPA1A/HSPA1B/ZNF207/CYFIP1/CORO1B/DBNL/PFN1/SPTBN1/PRKCD/MX1/COTL1/ARPC1B
##            Count
## GO:0043312    19
## GO:0002283    19
## GO:0042119    19
## GO:0002446    19
## GO:0043254    17
## GO:0051258    13
dotplot(ego)

barplot(ego, showCategory = 10)

cnetplot(ego, foldChange = geneList, showCategory = 7)

emapplot(ego, foldChange = geneList, showCategory = 10)

#enrichment analysis by DAVID
#enrDAVID <- enrichDAVID(gene,
 # idType = "ENTREZ_GENE_ID",
  #universe,
  #minGSSize = 5,
  #maxGSSize = 500,
  #annotation = "GOTERM_BP_FAT",
  #pvalueCutoff = 0.05,
  #pAdjustMethod = "BH",
  #qvalueCutoff = 0.2,
  #species = NA)

#given a vector of genes, this function will return the enrichment NCG categories with FDR control
eDGN <- enrichDGN(gene,
  pvalueCutoff = 0.05,
  pAdjustMethod = "BH",
  universe,
  minGSSize = 10,
  maxGSSize = 500,
  qvalueCutoff = 0.2,
  readable = TRUE)
## `universe` is not in character and will be ignored...
barplot(eDGN)

cnetplot(eDGN)

#gseDGN <- gseDGN(geneList,
  #exponent = 1,
  #minGSSize = 10,
  #maxGSSize = 500,
  #pvalueCutoff = 0.05,
  #pAdjustMethod = "BH",
  #verbose = TRUE,
  #seed = TRUE,
 # by = "DOSE", nPerm = 1000)

#cnetplot(gseDGN)
# The output of the following compares between the Control and the HI+HTH+PBS group 
DO <-  enrichDO(gene          = names(geneList),
              ont           = "DO",
              pvalueCutoff  = 0.05,
              pAdjustMethod = "BH",
              universe      = NULL,
              minGSSize     = 5,
              maxGSSize     = 500,
              qvalueCutoff  = 0.2,
              readable      = TRUE)


barplot(DO, showCategory = 20)

dotplot(DO, showCategory = 20)

#cnetplot(DO)


ncg <- enrichNCG(gene)

head(ncg)
## [1] ID          Description GeneRatio   BgRatio     pvalue      p.adjust   
## [7] qvalue      geneID      Count      
## <0 rows> (or 0-length row.names)
dgn <- enrichDGN(gene)

head(dgn)
##                          ID            Description GeneRatio   BgRatio
## umls:C0026848 umls:C0026848               Myopathy     12/90 298/17381
## umls:C0848332 umls:C0848332          Spots on skin      9/90 198/17381
## umls:C0024408 umls:C0024408 Machado-Joseph Disease      6/90  78/17381
## umls:C0036646 umls:C0036646   Age-related cataract      5/90  45/17381
## umls:C0151744 umls:C0151744    Myocardial Ischemia     12/90 476/17381
## umls:C0040822 umls:C0040822                 Tremor      6/90  88/17381
##                     pvalue     p.adjust       qvalue
## umls:C0026848 4.300625e-08 6.919706e-05 5.405207e-05
## umls:C0848332 8.582970e-07 6.904999e-04 5.393719e-04
## umls:C0024408 3.100934e-06 1.388755e-03 1.084802e-03
## umls:C0036646 3.452468e-06 1.388755e-03 1.084802e-03
## umls:C0151744 6.120241e-06 1.538419e-03 1.201709e-03
## umls:C0040822 6.277333e-06 1.538419e-03 1.201709e-03
##                                                                 geneID Count
## umls:C0026848 6520/2539/6638/9782/2628/348/5580/1508/3920/34/2670/5777    12
## umls:C0848332                5690/6520/2539/5052/348/3315/1508/34/4869     9
## umls:C0024408                             9463/348/3315/2521/5580/2670     6
## umls:C0036646                                  3303/3304/2947/2539/348     5
## umls:C0151744  2992/3303/308/231/2628/348/3315/4478/372/5580/2171/4599    12
## umls:C0040822                             8450/348/2521/5580/3920/2670     6
barplot(dgn, showCategory = 20)

dotplot(dgn, showCategory = 20)

cnetplot(dgn, showCategory = 10)

dgnx <- setReadable(dgn, 'org.Hs.eg.db')

y <- gseDO(geneList,
           nPerm         = 100,
           minGSSize     = 120,
           pvalueCutoff  = 0.2,
           pAdjustMethod = "BH",
           verbose       = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
head(y, 3)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
ncg <- gseNCG(geneList,
              nPerm         = 1000,
              minGSSize     = 15,
              pvalueCutoff  = 0.2,
              pAdjustMethod = "BH",
              verbose       = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
dgn <- gseDGN(geneList,
              nPerm         = 1000,
              minGSSize     = 5,
              pvalueCutoff  = 0.2,
              pAdjustMethod = "BH",
              verbose       = FALSE)
## Warning in .GSEA(geneList = geneList, exponent = exponent, minGSSize =
## minGSSize, : We do not recommend using nPerm parameter incurrent and future
## releases
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize
## = minGSSize, : You are trying to run fgseaSimple. It is recommended to use
## fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument
## in the fgsea function call.
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1.96% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
## no term enriched under specific pvalueCutoff...
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
## [1] ID              Description     setSize         enrichmentScore
## [5] NES             pvalue          p.adjust        qvalues        
## <0 rows> (or 0-length row.names)
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] gplots_3.1.1            viridis_0.6.1           viridisLite_0.4.0      
##  [4] readr_2.0.0             RDAVIDWebService_1.26.0 GOstats_2.54.0         
##  [7] Category_2.54.0         Matrix_1.2-18           graph_1.66.0           
## [10] VIM_6.1.0               colorspace_2.0-2        org.Ss.eg.db_3.11.4    
## [13] org.Hs.eg.db_3.11.4     tidyr_1.1.3             SWATH2stats_1.18.0     
## [16] rWikiPathways_1.8.5     readxl_1.3.1            ReactomePA_1.32.0      
## [19] plyr_1.8.6              mixOmics_6.12.2         lattice_0.20-44        
## [22] MASS_7.3-54             mice_3.13.0             meshes_1.14.0          
## [25] MeSH.Ssc.eg.db_1.13.0   MeSH.Hsa.eg.db_1.13.0   MeSHDbi_1.24.0         
## [28] RColorBrewer_1.1-2      ggfortify_0.4.12        ggcorrplot_0.1.3       
## [31] enrichplot_1.8.1        EnhancedVolcano_1.6.0   ggrepel_0.9.1          
## [34] ggplot2_3.3.5           DOSE_3.14.0             dplyr_1.0.7            
## [37] data.table_1.14.0       corrplot_0.90           clusterProfiler_3.16.1 
## [40] broom.mixed_0.2.7       biomaRt_2.44.4          BiocManager_1.30.16    
## [43] pathview_1.28.1         AnnotationDbi_1.50.3    IRanges_2.22.2         
## [46] S4Vectors_0.26.1        Biobase_2.48.0          BiocGenerics_0.34.0    
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.1             tidyselect_1.1.1       RSQLite_2.2.7         
##   [4] ranger_0.13.1          BiocParallel_1.22.0    scatterpie_0.1.6      
##   [7] munsell_0.5.0          withr_2.4.2            GOSemSim_2.14.2       
##  [10] highr_0.9              knitr_1.33             rstudioapi_0.13       
##  [13] robustbase_0.93-8      vcd_1.4-8              rJava_1.0-4           
##  [16] labeling_0.4.2         KEGGgraph_1.48.0       urltools_1.7.3        
##  [19] polyclip_1.10-0        bit64_4.0.5            farver_2.1.0          
##  [22] downloader_0.4         vctrs_0.3.8            generics_0.1.0        
##  [25] xfun_0.24              BiocFileCache_1.12.1   R6_2.5.0              
##  [28] graphlayouts_0.7.1     bitops_1.0-7           cachem_1.0.5          
##  [31] fgsea_1.14.0           gridGraphics_0.5-1     assertthat_0.2.1      
##  [34] vroom_1.5.3            scales_1.1.1           ggraph_2.0.5          
##  [37] nnet_7.3-14            gtable_0.3.0           tidygraph_1.2.0       
##  [40] rlang_0.4.11           genefilter_1.70.0      splines_4.0.2         
##  [43] broom_0.7.9            europepmc_0.4          checkmate_2.0.0       
##  [46] yaml_2.2.1             reshape2_1.4.4         abind_1.4-5           
##  [49] backports_1.2.1        qvalue_2.20.0          RBGL_1.64.0           
##  [52] tools_4.0.2            ggplotify_0.0.8        ellipsis_0.3.2        
##  [55] jquerylib_0.1.4        proxy_0.4-26           ggridges_0.5.3        
##  [58] Rcpp_1.0.7             progress_1.2.2         zlibbioc_1.34.0       
##  [61] purrr_0.3.4            RCurl_1.98-1.3         prettyunits_1.1.1     
##  [64] openssl_1.4.4          cowplot_1.1.1          zoo_1.8-9             
##  [67] haven_2.4.1            magrittr_2.0.1         RSpectra_0.16-0       
##  [70] MeSH.db_1.13.0         DO.db_2.9              openxlsx_4.2.4        
##  [73] lmtest_0.9-38          triebeard_0.3.0        reactome.db_1.70.0    
##  [76] matrixStats_0.60.0     xtable_1.8-4           hms_1.1.0             
##  [79] evaluate_0.14          XML_3.99-0.6           rio_0.5.27            
##  [82] gridExtra_2.3          compiler_4.0.2         ellipse_0.4.2         
##  [85] tibble_3.1.3           KernSmooth_2.23-20     crayon_1.4.1          
##  [88] htmltools_0.5.1.1      tzdb_0.1.2             corpcor_1.6.9         
##  [91] snow_0.4-3             DBI_1.1.1              tweenr_1.0.2          
##  [94] dbplyr_2.1.1           rappdirs_0.3.3         boot_1.3-28           
##  [97] car_3.0-11             cli_3.0.1              igraph_1.2.6          
## [100] forcats_0.5.1          pkgconfig_2.0.3        rvcheck_0.1.8         
## [103] laeken_0.5.1           foreign_0.8-81         sp_1.4-5              
## [106] xml2_1.3.2             annotate_1.66.0        rARPACK_0.11-0        
## [109] bslib_0.2.5.1          XVector_0.28.0         AnnotationForge_1.30.1
## [112] stringr_1.4.0          digest_0.6.27          Biostrings_2.56.0     
## [115] rmarkdown_2.9          cellranger_1.1.0       fastmatch_1.1-3       
## [118] GSEABase_1.50.1        curl_4.3.2             gtools_3.9.2          
## [121] graphite_1.34.0        rjson_0.2.20           lifecycle_1.0.0       
## [124] nlme_3.1-148           jsonlite_1.7.2         carData_3.0-4         
## [127] askpass_1.1            fansi_0.5.0            pillar_1.6.2          
## [130] survival_3.1-12        KEGGREST_1.28.0        fastmap_1.1.0         
## [133] httr_1.4.2             DEoptimR_1.0-9         GO.db_3.11.4          
## [136] glue_1.4.2             zip_2.2.0              png_0.1-7             
## [139] bit_4.0.4              Rgraphviz_2.32.0       ggforce_0.3.3         
## [142] class_7.3-19           stringi_1.7.3          sass_0.4.0            
## [145] blob_1.2.2             caTools_1.18.2         memoise_2.0.0         
## [148] e1071_1.7-8
warnings()