Set up your R environment by downloading the clusterProfiler and the org.Mm.eg.db or org.Hs.eg.db package from BioConductor. Also load the dplyr package for working with dataframes.
Load your dataframe of differentially expressed genes. I have it saved in .csv format, so I’m using read.csv(). Keep columns with your fold change, p value, adj p value, and gene symbol information. Remove genes with NA p value or duplicated gene names.
res_ordered = read.csv('res_ordered.csv') #Modify based on your result file
res_ordered = res_ordered[, grep('FoldChange|pvalue|adj|ENTREZ|SYMBOL', colnames(res_ordered))] #Modify based on the columns in your file
colnames(res_ordered)[grep('SYMBOL', colnames(res_ordered))] = 'symbol' #Optional: you might want to change some column names to a simpler name
res_ordered = res_ordered[!is.na(res_ordered$pvalue) & !is.na(res_ordered$padj) & !duplicated(res_ordered$symbol), ]
head(res_ordered)
## ENTREZID log2FoldChange pvalue padj symbol
## 1 104112 -2.3057311 2.705503e-14 3.417320e-10 Acly
## 2 79235 1.2175064 5.961347e-08 3.764888e-04 Lrat
## 3 68944 -0.3817892 1.286593e-07 4.524772e-04 Tmco1
## 4 66357 -0.4809418 1.562076e-07 4.524772e-04 Ostc
## 5 72433 -3.0421091 1.791138e-07 4.524772e-04 Rab38
## 6 72823 1.4719887 2.662254e-07 5.604489e-04 Pard3b
Separate your dataframe in two based on the direction of fold change (i.e. positive fold change in one dataframe, negative fold change in another dataframe). Then create another dataframe keeping significant genes only.
up = subset(res_ordered, log2FoldChange > 0)
down = subset(res_ordered, log2FoldChange < 0)
up_sig = up[up$padj <= 0.05, ]
down_sig = down[down$padj <= 0.05, ]
message('There are ', nrow(up), ' upregulated genes, ', nrow(up_sig), ' of which are significant at p adj <= 0.05')
## There are 6284 upregulated genes, 46 of which are significant at p adj <= 0.05
message('There are ', nrow(down), ' downregulated genes, ', nrow(down_sig), ' of which are significant at p adj <= 0.05')
## There are 6266 downregulated genes, 69 of which are significant at p adj <= 0.05
We will now perform gene ontology (GO) analysis. GO is an ‘over-representation’ analysis that aims to categorize and annotate genes based on their biological roles. These roles are defined based on curated sets of terms in three major domains (molecular function (MF), biological process (BP), and cellular component (CC)). Each set of terms contain a set of genes. GO analysis uses the Fisher’s exact test to compare the proportion of your significant up and down-regulated genes that match these terms relative to the proportion of all genes in the genome in these terms. For example, if 10 out of your 100 significantly up-regulated genes match to the term ‘cell division’, that would be a proportion of 0.1. This is compared to the number of genes in the entire genome that match to the term ‘cell division’. This might be 1,000 out of 20,000 genes, which returns a proportion of 0.05. A hypergeometric test (such as the Fisher’s exact test) can then test the significance of observing a proportion of 0.1 > the reference proportion of 0.05.
#For each of up and down-regulated gene sets, assess biological process, molecular function, and cellular component
#First compile up_sig and down_sig in a list for efficient processing
sig_list = list(up = up_sig, down = down_sig)
#Then use lapply to loop through both set of genes in your list and then loop through all 3 GO analysis
go_res = lapply(sig_list, function(i) {
res = lapply(c('BP', 'MF' ,'CC'), function(x) {
message(paste0('Analyzing ', x))
enrichGO(i$ENTREZID,
OrgDb = org.Mm.eg.db, #If human, use org.Hs.eg.db
keyType = 'ENTREZID',
ont = x,
readable = F)
}) %>% setNames(c('BP', 'MF', 'CC'))
})
## Analyzing BP
## Analyzing MF
## Analyzing CC
## Analyzing BP
## Analyzing MF
## Analyzing CC
#Check out the result, change the max.level around to efficiently inspect different layers of the output data
str(go_res, max.level = 2)
## List of 2
## $ up :List of 3
## ..$ BP:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## ..$ MF:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## ..$ CC:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## $ down:List of 3
## ..$ BP:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## ..$ MF:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## ..$ CC:Formal class 'enrichResult' [package "DOSE"] with 15 slots
str(go_res$up, max.level = 3)
## List of 3
## $ BP:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## .. ..@ result :'data.frame': 1453 obs. of 9 variables:
## .. ..@ pvalueCutoff : num 0.05
## .. ..@ pAdjustMethod: chr "BH"
## .. ..@ qvalueCutoff : num 0.2
## .. ..@ organism : chr "Mus musculus"
## .. ..@ ontology : chr "BP"
## .. ..@ gene : chr [1:46] "79235" "72823" "11622" "229595" ...
## .. ..@ keytype : chr "ENTREZID"
## .. ..@ universe : chr [1:28564] "11545" "12628" "13804" "16882" ...
## .. ..@ gene2Symbol : chr(0)
## .. ..@ geneSets :List of 2061
## .. ..@ readable : logi FALSE
## .. ..@ termsim : num[0 , 0 ]
## .. ..@ method : chr(0)
## .. ..@ dr : list()
## $ MF:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## .. ..@ result :'data.frame': 231 obs. of 9 variables:
## .. ..@ pvalueCutoff : num 0.05
## .. ..@ pAdjustMethod: chr "BH"
## .. ..@ qvalueCutoff : num 0.2
## .. ..@ organism : chr "Mus musculus"
## .. ..@ ontology : chr "MF"
## .. ..@ gene : chr [1:46] "79235" "72823" "11622" "229595" ...
## .. ..@ keytype : chr "ENTREZID"
## .. ..@ universe : chr [1:28171] "223774" "230801" "56075" "71365" ...
## .. ..@ gene2Symbol : chr(0)
## .. ..@ geneSets :List of 349
## .. ..@ readable : logi FALSE
## .. ..@ termsim : num[0 , 0 ]
## .. ..@ method : chr(0)
## .. ..@ dr : list()
## $ CC:Formal class 'enrichResult' [package "DOSE"] with 15 slots
## .. ..@ result :'data.frame': 134 obs. of 9 variables:
## .. ..@ pvalueCutoff : num 0.05
## .. ..@ pAdjustMethod: chr "BH"
## .. ..@ qvalueCutoff : num 0.2
## .. ..@ organism : chr "Mus musculus"
## .. ..@ ontology : chr "CC"
## .. ..@ gene : chr [1:46] "79235" "72823" "11622" "229595" ...
## .. ..@ keytype : chr "ENTREZID"
## .. ..@ universe : chr [1:28585] "13806" "13807" "13808" "226265" ...
## .. ..@ gene2Symbol : chr(0)
## .. ..@ geneSets :List of 241
## .. ..@ readable : logi FALSE
## .. ..@ termsim : num[0 , 0 ]
## .. ..@ method : chr(0)
## .. ..@ dr : list()
str(go_res$up$BP, max.level = 2)
## Formal class 'enrichResult' [package "DOSE"] with 15 slots
## ..@ result :'data.frame': 1453 obs. of 9 variables:
## ..@ pvalueCutoff : num 0.05
## ..@ pAdjustMethod: chr "BH"
## ..@ qvalueCutoff : num 0.2
## ..@ organism : chr "Mus musculus"
## ..@ ontology : chr "BP"
## ..@ gene : chr [1:46] "79235" "72823" "11622" "229595" ...
## ..@ keytype : chr "ENTREZID"
## ..@ universe : chr [1:28564] "11545" "12628" "13804" "16882" ...
## ..@ gene2Symbol : chr(0)
## ..@ geneSets :List of 2061
## ..@ readable : logi FALSE
## ..@ termsim : num[0 , 0 ]
## ..@ method : chr(0)
## ..@ dr : list()
str(go_res$up$BP@result, max.level = 2)
## 'data.frame': 1453 obs. of 9 variables:
## $ ID : chr "GO:0001776" "GO:0048872" "GO:0050670" "GO:0032944" ...
## $ Description: chr "leukocyte homeostasis" "homeostasis of number of cells" "regulation of lymphocyte proliferation" "regulation of mononuclear cell proliferation" ...
## $ GeneRatio : chr "6/45" "8/45" "7/45" "7/45" ...
## $ BgRatio : chr "136/28564" "391/28564" "260/28564" "264/28564" ...
## $ pvalue : num 7.29e-08 1.59e-07 1.61e-07 1.79e-07 3.15e-07 ...
## $ p.adjust : num 6.50e-05 6.50e-05 6.50e-05 6.50e-05 9.14e-05 ...
## $ qvalue : num 4.33e-05 4.33e-05 4.33e-05 4.33e-05 6.09e-05 ...
## $ geneID : chr "11622/210293/17311/14605/17096/12978" "11622/210293/17311/14605/17096/16331/12978/16658" "11622/17311/17096/12494/16331/13136/12978" "11622/17311/17096/12494/16331/13136/12978" ...
## $ Count : int 6 8 7 7 7 7 7 7 5 6 ...
#Create a subset of your analysis with significant results only and save results
go_sig = lapply(c('up', 'down'), function(y) {
i = go_res[[y]]
sig = lapply(c('BP', 'MF', 'CC'), function(x) {
df = subset(i[[x]]@result, p.adjust <= 0.05)
#Save only if there are significant results
if (nrow(df) > 1) write.table(df, paste0(y, '_GO', x, '_sig.txt'), col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')
return(df)
}) %>% setNames(c('BP', 'MF', 'CC'))
}) %>% setNames(c('up', 'down'))
#How many significant GO terms are reported for each of BP, MF, CC? Compile this summary level data into a table where rows correspond to 'up', and 'down' regulated genes, and columns correspond to the 3 GO analysis. Here I will use a more traditional for loop method instead of lapply. Challenge yourself by translating this using lapply.
go_summary = data.frame(matrix(0, ncol = 3, nrow = 2))
rownames(go_summary) = c('up', 'down')
colnames(go_summary) = c('BP', 'MF', 'CC')
for (y in c('up', 'down')) {
i = go_sig[[y]]
for (n in c('BP', 'MF', 'CC')) {
x = i[[n]]
message(paste0(nrow(x),' GO BP terms are over-represented among ', y, '-regulated genes'))
go_summary[y, n] = nrow(x)
}
}
## 189 GO BP terms are over-represented among up-regulated genes
## 0 GO BP terms are over-represented among up-regulated genes
## 0 GO BP terms are over-represented among up-regulated genes
## 54 GO BP terms are over-represented among down-regulated genes
## 0 GO BP terms are over-represented among down-regulated genes
## 32 GO BP terms are over-represented among down-regulated genes
go_summary
## BP MF CC
## up 189 0 0
## down 54 0 32
write.table(go_summary, 'go_summary.txt', col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t') #Save results
Next, we will perform GSEA analysis. GSEA is similar to GO in the general sense that it seeks to determine enrichment of curated gene sets. However, GSEA uses all genes instead of just ones that are significant. This increases statistical power, and reduce bias of detection. The curated gene sets used by GSEA is conveniently stored in MsigDB which is an online database. We can download all the gene sets collections from the MSigDB website: https://www.gsea-msigdb.org/gsea/msigdb/mouse/collections.jsp. On the webpage you will notice for each row of the collection, there is an option to ‘Download GMT Files’. Download the Gene Symbols file by right clicking ‘Gene Symbols’ underneath Download GMT Files. Note that you will need to first register for a free account on MSigDB and login.
#Go to MSigDb and download GMT file, make sure to know where your download is!
#Load .gmt file
hallmark = read.gmt('./mh.all.v2023.2.Mm.symbols.gmt') #Change this if using other gene set collections
#Inspect .gmt file
str(hallmark)
## 'data.frame': 7189 obs. of 2 variables:
## $ term: Factor w/ 50 levels "HALLMARK_ADIPOGENESIS",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ gene: chr "Abca1" "Abcb8" "Acaa2" "Acadl" ...
table(hallmark$term) %>% head()
##
## HALLMARK_ADIPOGENESIS HALLMARK_ALLOGRAFT_REJECTION
## 200 192
## HALLMARK_ANDROGEN_RESPONSE HALLMARK_ANGIOGENESIS
## 96 36
## HALLMARK_APICAL_JUNCTION HALLMARK_APICAL_SURFACE
## 199 44
Our input data for GSEA is a named vector where names are the gene identifiers matching with that in your .gmt file and the vector is the rank metric. A common rank metric for GSEA applies log() transformation to p-value such that genes with small p-value return higher log(p) than genes with larger p-values. We will annotate this rank with + and - signs based on the direction of fold change.
res_ordered = res_ordered %>%
mutate(rank = ifelse(.$log2FoldChange > 0, -log(.$pvalue), log(.$pvalue)))
rank = res_ordered$rank
names(rank) = res_ordered$symbol
rank = rank[order(rank, decreasing = T)]
head(rank)
## Lrat Pard3b Ahr Adamtsl4 Kcnk6 Angptl4
## 16.63538 15.13892 14.24331 13.09387 11.48079 10.50572
Now we are ready to run GSEA
h = GSEA(rank, TERM2GENE = hallmark)
## preparing geneSet collections...
## GSEA analysis...
## leading edge analysis...
## done...
str(h, max.level = 2) #inspect the output
## Formal class 'gseaResult' [package "DOSE"] with 13 slots
## ..@ result :'data.frame': 30 obs. of 11 variables:
## ..@ organism : chr "UNKNOWN"
## ..@ setType : chr "UNKNOWN"
## ..@ geneSets :List of 50
## ..@ geneList : Named num [1:12550] 16.6 15.1 14.2 13.1 11.5 ...
## .. ..- attr(*, "names")= chr [1:12550] "Lrat" "Pard3b" "Ahr" "Adamtsl4" ...
## ..@ keytype : chr "UNKNOWN"
## ..@ permScores : num[0 , 0 ]
## ..@ params :List of 6
## ..@ gene2Symbol: chr(0)
## ..@ readable : logi FALSE
## ..@ termsim : num[0 , 0 ]
## ..@ method : chr(0)
## ..@ dr : list()
head(h@result)
## ID
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION
## HALLMARK_ALLOGRAFT_REJECTION HALLMARK_ALLOGRAFT_REJECTION
## HALLMARK_G2M_CHECKPOINT HALLMARK_G2M_CHECKPOINT
## HALLMARK_MTORC1_SIGNALING HALLMARK_MTORC1_SIGNALING
## HALLMARK_E2F_TARGETS HALLMARK_E2F_TARGETS
## HALLMARK_INFLAMMATORY_RESPONSE HALLMARK_INFLAMMATORY_RESPONSE
## Description setSize
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION 193
## HALLMARK_ALLOGRAFT_REJECTION HALLMARK_ALLOGRAFT_REJECTION 148
## HALLMARK_G2M_CHECKPOINT HALLMARK_G2M_CHECKPOINT 187
## HALLMARK_MTORC1_SIGNALING HALLMARK_MTORC1_SIGNALING 194
## HALLMARK_E2F_TARGETS HALLMARK_E2F_TARGETS 196
## HALLMARK_INFLAMMATORY_RESPONSE HALLMARK_INFLAMMATORY_RESPONSE 146
## enrichmentScore NES pvalue
## HALLMARK_OXIDATIVE_PHOSPHORYLATION -0.5634015 -2.234819 1.000000e-10
## HALLMARK_ALLOGRAFT_REJECTION 0.5622219 2.141500 4.206472e-10
## HALLMARK_G2M_CHECKPOINT -0.5398816 -2.137143 3.662952e-10
## HALLMARK_MTORC1_SIGNALING -0.5215038 -2.065670 1.925807e-09
## HALLMARK_E2F_TARGETS -0.5222110 -2.067004 3.473268e-09
## HALLMARK_INFLAMMATORY_RESPONSE 0.5126148 1.944698 1.556340e-07
## p.adjust qvalue rank
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 5.000000e-09 2.105263e-09 3816
## HALLMARK_ALLOGRAFT_REJECTION 7.010786e-09 2.951910e-09 1699
## HALLMARK_G2M_CHECKPOINT 7.010786e-09 2.951910e-09 1847
## HALLMARK_MTORC1_SIGNALING 2.407259e-08 1.013583e-08 2030
## HALLMARK_E2F_TARGETS 3.473268e-08 1.462429e-08 1890
## HALLMARK_INFLAMMATORY_RESPONSE 1.296950e-06 5.460841e-07 1884
## leading_edge
## HALLMARK_OXIDATIVE_PHOSPHORYLATION tags=65%, list=30%, signal=46%
## HALLMARK_ALLOGRAFT_REJECTION tags=34%, list=14%, signal=30%
## HALLMARK_G2M_CHECKPOINT tags=33%, list=15%, signal=28%
## HALLMARK_MTORC1_SIGNALING tags=36%, list=16%, signal=30%
## HALLMARK_E2F_TARGETS tags=28%, list=15%, signal=24%
## HALLMARK_INFLAMMATORY_RESPONSE tags=34%, list=15%, signal=29%
## core_enrichment
## HALLMARK_OXIDATIVE_PHOSPHORYLATION Ndufs8/Ldha/Atp5b/Ndufs7/Slc25a11/Gpx4/Sdha/Atp6v1e1/Ndufs3/Mdh2/Aco2/Ndufs4/Supv3l1/Ndufv2/Cox8a/Acadm/Phb2/Cox6a1/Ndufa3/Atp5e/Atp5a1/Uqcrc2/Ndufc2/Cox11/Acadvl/Abcb7/Ndufb2/Opa1/Etfb/Cox7a2/Ogdh/Ndufb8/Cox7a2l/Hsd17b10/Cyc1/Slc25a12/Tomm22/Cox10/Atp5pb/Timm13/Atp5d/Idh1/Acadsb/Mrpl35/Cox15/Htra2/Ndufv1/Hccs/Cox5b/Cox6c/Cox4i1/Uqcrb/Etfdh/Atp6v0e/Grpel1/Uqcr11/Mfn2/Mrpl34/Rhot2/Uqcrh/Atp6v0b/Atp6v1h/Atp5g3/Immt/Timm50/Cox5a/Mrpl15/Atp5c1/Ndufa5/Mrps30/Idh3b/Ndufa6/Ndufab1/Timm8b/Dlst/Ndufa9/Cox7b/Acaa2/Sdhd/Ndufb6/Got2/Ndufb7/Ndufa2/Ndufa8/Ndufb3/Mrps12/Cox7c/Nqo2/Cox6b1/Mgst3/Pdhb/Dld/Ndufb5/Uqcrc1/Vdac1/Ndufc1/Eci1/Cyb5a/Ndufa1/Slc25a3/Sdhc/Atp5g1/Pdha1/Sucla2/Ndufs6/Atp5j2/Atp5h/Atp5k/Mrps22/Bdh2/Ndufs1/Lrpprc/Cyb5r3/Mrps15/Suclg1/Atp5j/Atp5o/Uqcrfs1/Ndufs2/Aifm1/Pdhx/Prdx3/Afg3l2/Sdhb/Iscu/Gpi1
## HALLMARK_ALLOGRAFT_REJECTION Lyn/Itgb2/Ctss/Cxcl13/Elf4/Itgal/H2-Aa/Igsf6/Il16/Lcp2/Fyb/Ly86/Irf8/Ptpn6/Ccr2/Ptprc/Gcnt1/Spi1/Ets1/Ccr1/Cfp/Prkcb/Il2rg/Fcgr2b/Cd74/Ccr5/Icam1/Tgfb1/Jak2/Il2rb/H2-DMa/Icosl/Cd86/Il27ra/Srgn/Hdac9/Nlrp3/Abi1/Fgr/Was/Npm1/Cxcr3/Ccl22/Gpr65/Flna/Akt1/Map4k1/Gzma/St8sia4/Il1b
## HALLMARK_G2M_CHECKPOINT Jpt1/Hmgb3/Chaf1a/Cul5/Rad23b/Prmt5/Incenp/Ccnf/E2f3/Ube2s/Pttg1/E2f4/Ddx39a/Bcl3/Kif23/Cdkn2c/Prim2/Dtymk/Kif2c/Smc2/Aurkb/Birc5/Bub1/Ndc80/Racgap1/Top2a/Nusap1/Kif20b/Ttk/Kif22/Troap/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Cks2/Espl1/Gins2/E2f2/Cdkn3/Dbf4/Cenpa/Pbk/Kif11/Mad2l1/Cenpf/Mki67/Mybl2/Plk1/Aurka/Tacc3/Prc1/Tpx2/Ccnb2/Ccna2/Nek2/Hmmr/Ube2c/Cdc20
## HALLMARK_MTORC1_SIGNALING Etf1/Psmc6/Psme3/Immt/Psmd13/Nfkbib/Cops5/Nufip1/Map2k3/Prdx1/Pfkl/Lta4h/Pdap1/Ctsc/Eprs/Ccnf/Fdxr/Adipor2/Nup205/Tes/Ykt6/M6pr/Ddx39a/Psmb5/Cd9/Pdk1/Txnrd1/Ak4/Slc1a4/Psmd12/Atp5g1/Qdpr/Xbp1/Bub1/Slc7a11/Sdf2l1/Gclc/Canx/Calr/Psmc4/Gsr/Pgm1/Mllt11/Slc2a1/Ero1a/Gbe1/Rab1a/Shmt2/Rrm2/Dhfr/Hspa5/Hsp90b1/Serp1/Psph/Pgk1/Plk1/Edem1/Aurka/Tpi1/Psmd14/Stc1/Ufm1/Egln3/Gpi1/Tuba4a/Uso1/Ssr1/Ppa1/Acly
## HALLMARK_E2F_TARGETS Spc24/Ak2/Mlh1/Jpt1/Hmgb3/Gins3/Anp32e/Ranbp1/Ube2s/Nup205/Spc25/Pttg1/Hells/Diaph3/Mxd3/Ddx39a/Cdca8/Ncapd2/Mms22l/Melk/Cdkn2c/Prim2/Bub1b/Kif2c/Aurkb/Birc5/Spag5/Racgap1/Top2a/Trip13/Kif22/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Ran/Cks2/Espl1/Rad51ap1/Dlgap5/Cdkn3/Kif18b/Rrm2/Mad2l1/Mki67/Mybl2/Plk1/Aurka/Tacc3/Ccnb2/Hmmr/Cdca3/Cdc20
## HALLMARK_INFLAMMATORY_RESPONSE Ahr/Pik3r5/Lyn/Adgre1/Il1a/Rgs1/Axl/Cd48/Klf6/Msr1/Tnfrsf1b/F3/Cmklr1/Sell/C3ar1/Rasgrp1/Lcp2/Gpr183/Icam1/Il7r/Marco/Nampt/Il2rb/Il10ra/Slc7a1/Calcrl/Slc28a2/Icosl/Emp3/Dcbld2/Nlrp3/Abi1/Ptger2/Rhog/Aqp9/Ccl22/Csf3r/Ccrl2/Cybb/Slc1a2/Pcdh7/Fpr1/Myc/Cdkn1a/Sema4d/Il1b/Sri/Ebi3/Itgb3
#Split to positive and negative enrichment
h_up = subset(h, h$NES > 0)
head(h_up)
## ID
## HALLMARK_ALLOGRAFT_REJECTION HALLMARK_ALLOGRAFT_REJECTION
## HALLMARK_INFLAMMATORY_RESPONSE HALLMARK_INFLAMMATORY_RESPONSE
## HALLMARK_KRAS_SIGNALING_UP HALLMARK_KRAS_SIGNALING_UP
## HALLMARK_IL2_STAT5_SIGNALING HALLMARK_IL2_STAT5_SIGNALING
## HALLMARK_APICAL_JUNCTION HALLMARK_APICAL_JUNCTION
## HALLMARK_INTERFERON_GAMMA_RESPONSE HALLMARK_INTERFERON_GAMMA_RESPONSE
## Description setSize
## HALLMARK_ALLOGRAFT_REJECTION HALLMARK_ALLOGRAFT_REJECTION 148
## HALLMARK_INFLAMMATORY_RESPONSE HALLMARK_INFLAMMATORY_RESPONSE 146
## HALLMARK_KRAS_SIGNALING_UP HALLMARK_KRAS_SIGNALING_UP 164
## HALLMARK_IL2_STAT5_SIGNALING HALLMARK_IL2_STAT5_SIGNALING 168
## HALLMARK_APICAL_JUNCTION HALLMARK_APICAL_JUNCTION 150
## HALLMARK_INTERFERON_GAMMA_RESPONSE HALLMARK_INTERFERON_GAMMA_RESPONSE 174
## enrichmentScore NES pvalue
## HALLMARK_ALLOGRAFT_REJECTION 0.5622219 2.141500 4.206472e-10
## HALLMARK_INFLAMMATORY_RESPONSE 0.5126148 1.944698 1.556340e-07
## HALLMARK_KRAS_SIGNALING_UP 0.4926077 1.910326 4.147305e-07
## HALLMARK_IL2_STAT5_SIGNALING 0.4886145 1.890264 4.677662e-07
## HALLMARK_APICAL_JUNCTION 0.4912933 1.878847 1.426230e-06
## HALLMARK_INTERFERON_GAMMA_RESPONSE 0.4371705 1.702478 3.744235e-05
## p.adjust qvalue rank
## HALLMARK_ALLOGRAFT_REJECTION 7.010786e-09 2.951910e-09 1699
## HALLMARK_INFLAMMATORY_RESPONSE 1.296950e-06 5.460841e-07 1884
## HALLMARK_KRAS_SIGNALING_UP 2.914670e-06 1.227229e-06 1731
## HALLMARK_IL2_STAT5_SIGNALING 2.914670e-06 1.227229e-06 1858
## HALLMARK_APICAL_JUNCTION 7.131148e-06 3.002589e-06 994
## HALLMARK_INTERFERON_GAMMA_RESPONSE 1.440090e-04 6.063538e-05 2022
## leading_edge
## HALLMARK_ALLOGRAFT_REJECTION tags=34%, list=14%, signal=30%
## HALLMARK_INFLAMMATORY_RESPONSE tags=34%, list=15%, signal=29%
## HALLMARK_KRAS_SIGNALING_UP tags=32%, list=14%, signal=28%
## HALLMARK_IL2_STAT5_SIGNALING tags=27%, list=15%, signal=23%
## HALLMARK_APICAL_JUNCTION tags=20%, list=8%, signal=19%
## HALLMARK_INTERFERON_GAMMA_RESPONSE tags=30%, list=16%, signal=25%
## core_enrichment
## HALLMARK_ALLOGRAFT_REJECTION Lyn/Itgb2/Ctss/Cxcl13/Elf4/Itgal/H2-Aa/Igsf6/Il16/Lcp2/Fyb/Ly86/Irf8/Ptpn6/Ccr2/Ptprc/Gcnt1/Spi1/Ets1/Ccr1/Cfp/Prkcb/Il2rg/Fcgr2b/Cd74/Ccr5/Icam1/Tgfb1/Jak2/Il2rb/H2-DMa/Icosl/Cd86/Il27ra/Srgn/Hdac9/Nlrp3/Abi1/Fgr/Was/Npm1/Cxcr3/Ccl22/Gpr65/Flna/Akt1/Map4k1/Gzma/St8sia4/Il1b
## HALLMARK_INFLAMMATORY_RESPONSE Ahr/Pik3r5/Lyn/Adgre1/Il1a/Rgs1/Axl/Cd48/Klf6/Msr1/Tnfrsf1b/F3/Cmklr1/Sell/C3ar1/Rasgrp1/Lcp2/Gpr183/Icam1/Il7r/Marco/Nampt/Il2rb/Il10ra/Slc7a1/Calcrl/Slc28a2/Icosl/Emp3/Dcbld2/Nlrp3/Abi1/Ptger2/Rhog/Aqp9/Ccl22/Csf3r/Ccrl2/Cybb/Slc1a2/Pcdh7/Fpr1/Myc/Cdkn1a/Sema4d/Il1b/Sri/Ebi3/Itgb3
## HALLMARK_KRAS_SIGNALING_UP Angptl4/Gabra3/Tlr8/Mafb/Itgb2/Ctss/Ephb2/Bmp2/Tfpi/Gadd45g/Dock2/Tnfrsf1b/Cmklr1/C3ar1/Fcer1g/Irf8/Nin/Lat2/Laptm5/Epb41l3/Ets1/Adam8/Csf2ra/Eng/Il2rg/Gpnmb/Il7r/Plvap/Galnt3/Peg3/Ppp1r15a/Il10ra/Adamdec1/Tspan7/Vwa5a/Ikzf1/Dcbld2/Nrp1/Hdac9/Ano1/Spon1/Lcp1/Clec4a3/Adgra2/Sema3b/Map4k1/Crot/Adam17/Dnmbp/Cbr4/Cidea/Il1b/Tnfaip3
## HALLMARK_IL2_STAT5_SIGNALING Ahr/Tlr7/Bcl2/Bmp2/Map3k8/Swap70/Cd48/Klf6/Rhob/Tnfrsf1b/Ecm1/Sell/Selp/Irf8/Batf3/Cd44/Cdc42se2/Gabarapl1/Lrrc8c/Il2rb/Cd83/Ahnak/Muc1/Gadd45b/Il10ra/Prkch/Hk2/Plagl1/Ncoa3/Cd86/Nrp1/Slc1a5/Igf1r/Ptger2/Gpr65/Tnfrsf21/Pim1/Cd79b/Myc/Gucy1b1/Smpdl3a/She/Aplp1/Traf1/Ptch1
## HALLMARK_APICAL_JUNCTION Syk/Itga9/Msn/Vcam1/Epb41l2/Amigo2/Evl/Sdc3/Rac2/Cldn15/Tgfbi/Cldn7/Arhgef6/Fyb/Ptprc/Atp1a3/Zyx/Sirpa/Lamc2/Icam1/Cadm3/Pik3cb/Sorbs3/Skap2/Adam23/Cd274/Cd86/Gnai1/Nectin4/Cldn4
## HALLMARK_INTERFERON_GAMMA_RESPONSE Cd38/Vcam1/Epsti1/Ifit2/Cmklr1/H2-Aa/Slamf7/Lcp2/Selp/Socs3/Irf8/Ptpn6/Txnip/Cd74/Icam1/Samhd1/Nampt/Jak2/Il2rb/Rnf31/Ifi30/P2ry14/Fcgr1/Il10ra/H2-DMa/Cd274/Ncoa3/Cd86/Lats2/Ciita/Il18bp/Itgb7/Bank1/Ptpn1/Irf5/Arid5b/Sp110/Pim1/Fpr1/Gzma/Cdkn1a/St8sia4/Pla2g4a/Peli1/Marchf1/Tnfaip3/Sri/Slc25a28/Mthfd2/Znfx1/Trim21/Nod1
write.table(h_up, 'up_GSEA_hallmark.txt', col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')
h_down = subset(h, h$NES < 0)
head(h_down)
## ID
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION
## HALLMARK_G2M_CHECKPOINT HALLMARK_G2M_CHECKPOINT
## HALLMARK_MTORC1_SIGNALING HALLMARK_MTORC1_SIGNALING
## HALLMARK_E2F_TARGETS HALLMARK_E2F_TARGETS
## HALLMARK_PROTEIN_SECRETION HALLMARK_PROTEIN_SECRETION
## HALLMARK_ADIPOGENESIS HALLMARK_ADIPOGENESIS
## Description setSize
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION 193
## HALLMARK_G2M_CHECKPOINT HALLMARK_G2M_CHECKPOINT 187
## HALLMARK_MTORC1_SIGNALING HALLMARK_MTORC1_SIGNALING 194
## HALLMARK_E2F_TARGETS HALLMARK_E2F_TARGETS 196
## HALLMARK_PROTEIN_SECRETION HALLMARK_PROTEIN_SECRETION 91
## HALLMARK_ADIPOGENESIS HALLMARK_ADIPOGENESIS 192
## enrichmentScore NES pvalue
## HALLMARK_OXIDATIVE_PHOSPHORYLATION -0.5634015 -2.234819 1.000000e-10
## HALLMARK_G2M_CHECKPOINT -0.5398816 -2.137143 3.662952e-10
## HALLMARK_MTORC1_SIGNALING -0.5215038 -2.065670 1.925807e-09
## HALLMARK_E2F_TARGETS -0.5222110 -2.067004 3.473268e-09
## HALLMARK_PROTEIN_SECRETION -0.5763271 -2.042758 5.246406e-07
## HALLMARK_ADIPOGENESIS -0.4615917 -1.827758 2.042540e-06
## p.adjust qvalue rank
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 5.000000e-09 2.105263e-09 3816
## HALLMARK_G2M_CHECKPOINT 7.010786e-09 2.951910e-09 1847
## HALLMARK_MTORC1_SIGNALING 2.407259e-08 1.013583e-08 2030
## HALLMARK_E2F_TARGETS 3.473268e-08 1.462429e-08 1890
## HALLMARK_PROTEIN_SECRETION 2.914670e-06 1.227229e-06 1787
## HALLMARK_ADIPOGENESIS 9.284274e-06 3.909168e-06 2683
## leading_edge
## HALLMARK_OXIDATIVE_PHOSPHORYLATION tags=65%, list=30%, signal=46%
## HALLMARK_G2M_CHECKPOINT tags=33%, list=15%, signal=28%
## HALLMARK_MTORC1_SIGNALING tags=36%, list=16%, signal=30%
## HALLMARK_E2F_TARGETS tags=28%, list=15%, signal=24%
## HALLMARK_PROTEIN_SECRETION tags=38%, list=14%, signal=33%
## HALLMARK_ADIPOGENESIS tags=38%, list=21%, signal=30%
## core_enrichment
## HALLMARK_OXIDATIVE_PHOSPHORYLATION Ndufs8/Ldha/Atp5b/Ndufs7/Slc25a11/Gpx4/Sdha/Atp6v1e1/Ndufs3/Mdh2/Aco2/Ndufs4/Supv3l1/Ndufv2/Cox8a/Acadm/Phb2/Cox6a1/Ndufa3/Atp5e/Atp5a1/Uqcrc2/Ndufc2/Cox11/Acadvl/Abcb7/Ndufb2/Opa1/Etfb/Cox7a2/Ogdh/Ndufb8/Cox7a2l/Hsd17b10/Cyc1/Slc25a12/Tomm22/Cox10/Atp5pb/Timm13/Atp5d/Idh1/Acadsb/Mrpl35/Cox15/Htra2/Ndufv1/Hccs/Cox5b/Cox6c/Cox4i1/Uqcrb/Etfdh/Atp6v0e/Grpel1/Uqcr11/Mfn2/Mrpl34/Rhot2/Uqcrh/Atp6v0b/Atp6v1h/Atp5g3/Immt/Timm50/Cox5a/Mrpl15/Atp5c1/Ndufa5/Mrps30/Idh3b/Ndufa6/Ndufab1/Timm8b/Dlst/Ndufa9/Cox7b/Acaa2/Sdhd/Ndufb6/Got2/Ndufb7/Ndufa2/Ndufa8/Ndufb3/Mrps12/Cox7c/Nqo2/Cox6b1/Mgst3/Pdhb/Dld/Ndufb5/Uqcrc1/Vdac1/Ndufc1/Eci1/Cyb5a/Ndufa1/Slc25a3/Sdhc/Atp5g1/Pdha1/Sucla2/Ndufs6/Atp5j2/Atp5h/Atp5k/Mrps22/Bdh2/Ndufs1/Lrpprc/Cyb5r3/Mrps15/Suclg1/Atp5j/Atp5o/Uqcrfs1/Ndufs2/Aifm1/Pdhx/Prdx3/Afg3l2/Sdhb/Iscu/Gpi1
## HALLMARK_G2M_CHECKPOINT Jpt1/Hmgb3/Chaf1a/Cul5/Rad23b/Prmt5/Incenp/Ccnf/E2f3/Ube2s/Pttg1/E2f4/Ddx39a/Bcl3/Kif23/Cdkn2c/Prim2/Dtymk/Kif2c/Smc2/Aurkb/Birc5/Bub1/Ndc80/Racgap1/Top2a/Nusap1/Kif20b/Ttk/Kif22/Troap/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Cks2/Espl1/Gins2/E2f2/Cdkn3/Dbf4/Cenpa/Pbk/Kif11/Mad2l1/Cenpf/Mki67/Mybl2/Plk1/Aurka/Tacc3/Prc1/Tpx2/Ccnb2/Ccna2/Nek2/Hmmr/Ube2c/Cdc20
## HALLMARK_MTORC1_SIGNALING Etf1/Psmc6/Psme3/Immt/Psmd13/Nfkbib/Cops5/Nufip1/Map2k3/Prdx1/Pfkl/Lta4h/Pdap1/Ctsc/Eprs/Ccnf/Fdxr/Adipor2/Nup205/Tes/Ykt6/M6pr/Ddx39a/Psmb5/Cd9/Pdk1/Txnrd1/Ak4/Slc1a4/Psmd12/Atp5g1/Qdpr/Xbp1/Bub1/Slc7a11/Sdf2l1/Gclc/Canx/Calr/Psmc4/Gsr/Pgm1/Mllt11/Slc2a1/Ero1a/Gbe1/Rab1a/Shmt2/Rrm2/Dhfr/Hspa5/Hsp90b1/Serp1/Psph/Pgk1/Plk1/Edem1/Aurka/Tpi1/Psmd14/Stc1/Ufm1/Egln3/Gpi1/Tuba4a/Uso1/Ssr1/Ppa1/Acly
## HALLMARK_E2F_TARGETS Spc24/Ak2/Mlh1/Jpt1/Hmgb3/Gins3/Anp32e/Ranbp1/Ube2s/Nup205/Spc25/Pttg1/Hells/Diaph3/Mxd3/Ddx39a/Cdca8/Ncapd2/Mms22l/Melk/Cdkn2c/Prim2/Bub1b/Kif2c/Aurkb/Birc5/Spag5/Racgap1/Top2a/Trip13/Kif22/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Ran/Cks2/Espl1/Rad51ap1/Dlgap5/Cdkn3/Kif18b/Rrm2/Mad2l1/Mki67/Mybl2/Plk1/Aurka/Tacc3/Ccnb2/Hmmr/Cdca3/Cdc20
## HALLMARK_PROTEIN_SECRETION Clcn3/Tsg101/Rab2a/Abca1/Cltc/Anp32e/Ctsc/Gosr2/Ap2m1/Ykt6/M6pr/Tmx1/Cog2/Tspan8/Tmed10/Cd63/Scamp3/Arcn1/Vps45/Gbf1/Yipf6/Sec31a/Copb2/Arfgap3/Atp1a1/Sec24d/Sec22b/Gnas/Ap3s1/Rer1/Cope/Bet1/Uso1/Ergic3/Lman1
## HALLMARK_ADIPOGENESIS Aldoa/Idh1/Pex14/Tst/Fabp4/Plin2/Ptcd3/Dbt/Fah/Dgat1/Dhrs7b/Tank/G3bp2/Sult1a1/Samm50/Grpel1/Uqcr11/Slc25a1/Araf/C3/Acads/Immt/Ak2/Ywhag/Mrpl15/Ndufa5/Ndufab1/Reep6/Cox7b/Acaa2/Abca1/Pfkl/Aldh2/Enpp2/Reep5/Ndufb7/Gpam/Rmdn3/Scarb1/Adipor2/Lpl/Mgst3/Dld/Stom/Uqcrc1/Cdkn2c/Cd151/Aplp2/Pdcd4/Preb/Sdhc/Qdpr/Riok3/Ddt/Jagn1/Mccc1/Pgm1/Ubqln1/Agpat3/Suclg1/Gbe1/Ghitm/Dnajb9/Atp5o/Aifm1/Prdx3/Sdhb/Orm1/Sqor/Mrap/Cmpk1/Acly
write.table(h_down, 'down_GSEA_hallmark.txt', col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')
That’s a wrap on the basics of GO and GSEA. Please have the saved result files ready for class next Monday. MSigDb provides an abundance of other gene sets ready for tests on their website. Please prepare additional results using gene sets relevant to your study.
Additionally, I encourage you to explore this website: https://yulab-smu.top/biomedical-knowledge-mining-book/ for more tutorials on using the clusterprofiler package. You will find many different variations of gene set analyses and importantly, visualization of your enrichment results (https://yulab-smu.top/biomedical-knowledge-mining-book/enrichplot.html).