GSEA_GO

Set up your R environment by downloading the clusterProfiler and the org.Mm.eg.db or org.Hs.eg.db package from BioConductor. Also load the dplyr package for working with dataframes.

Load your dataframe of differentially expressed genes. I have it saved in .csv format, so I’m using read.csv(). Keep columns with your fold change, p value, adj p value, and gene symbol information. Remove genes with NA p value or duplicated gene names.

res_ordered = read.csv('res_ordered.csv') #Modify based on your result file
res_ordered = res_ordered[, grep('FoldChange|pvalue|adj|ENTREZ|SYMBOL', colnames(res_ordered))] #Modify based on the columns in your file
colnames(res_ordered)[grep('SYMBOL', colnames(res_ordered))] = 'symbol' #Optional: you might want to change some column names to a simpler name
res_ordered = res_ordered[!is.na(res_ordered$pvalue) & !is.na(res_ordered$padj) & !duplicated(res_ordered$symbol), ]
head(res_ordered)

##   ENTREZID log2FoldChange       pvalue         padj symbol
## 1   104112     -2.3057311 2.705503e-14 3.417320e-10   Acly
## 2    79235      1.2175064 5.961347e-08 3.764888e-04   Lrat
## 3    68944     -0.3817892 1.286593e-07 4.524772e-04  Tmco1
## 4    66357     -0.4809418 1.562076e-07 4.524772e-04   Ostc
## 5    72433     -3.0421091 1.791138e-07 4.524772e-04  Rab38
## 6    72823      1.4719887 2.662254e-07 5.604489e-04 Pard3b

Separate your dataframe in two based on the direction of fold change (i.e. positive fold change in one dataframe, negative fold change in another dataframe). Then create another dataframe keeping significant genes only.

up = subset(res_ordered, log2FoldChange > 0)
down = subset(res_ordered, log2FoldChange < 0)

up_sig = up[up$padj <= 0.05, ]
down_sig = down[down$padj <= 0.05, ]

message('There are ', nrow(up), ' upregulated genes, ', nrow(up_sig), ' of which are significant at p adj <= 0.05')

## There are 6284 upregulated genes, 46 of which are significant at p adj <= 0.05

message('There are ', nrow(down), ' downregulated genes, ', nrow(down_sig), ' of which are significant at p adj <= 0.05')

## There are 6266 downregulated genes, 69 of which are significant at p adj <= 0.05

We will now perform gene ontology (GO) analysis. GO is an ‘over-representation’ analysis that aims to categorize and annotate genes based on their biological roles. These roles are defined based on curated sets of terms in three major domains (molecular function (MF), biological process (BP), and cellular component (CC)). Each set of terms contain a set of genes. GO analysis uses the Fisher’s exact test to compare the proportion of your significant up and down-regulated genes that match these terms relative to the proportion of all genes in the genome in these terms. For example, if 10 out of your 100 significantly up-regulated genes match to the term ‘cell division’, that would be a proportion of 0.1. This is compared to the number of genes in the entire genome that match to the term ‘cell division’. This might be 1,000 out of 20,000 genes, which returns a proportion of 0.05. A hypergeometric test (such as the Fisher’s exact test) can then test the significance of observing a proportion of 0.1 > the reference proportion of 0.05.

#For each of up and down-regulated gene sets, assess biological process, molecular function, and cellular component
#First compile up_sig and down_sig in a list for efficient processing
sig_list = list(up = up_sig, down = down_sig)

#Then use lapply to loop through both set of genes in your list and then loop through all 3 GO analysis
go_res = lapply(sig_list, function(i) {
  
  res = lapply(c('BP', 'MF' ,'CC'), function(x) {
    message(paste0('Analyzing ', x))
    enrichGO(i$ENTREZID,
             OrgDb = org.Mm.eg.db, #If human, use org.Hs.eg.db
             keyType = 'ENTREZID',
             ont = x,
             readable = F) 
  }) %>% setNames(c('BP', 'MF', 'CC'))
  
})

## Analyzing BP

## Analyzing MF

## Analyzing CC

## Analyzing BP

## Analyzing MF

## Analyzing CC

#Check out the result, change the max.level around to efficiently inspect different layers of the output data
str(go_res, max.level = 2)

## List of 2
##  $ up  :List of 3
##   ..$ BP:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   ..$ MF:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   ..$ CC:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##  $ down:List of 3
##   ..$ BP:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   ..$ MF:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   ..$ CC:Formal class 'enrichResult' [package "DOSE"] with 15 slots

str(go_res$up, max.level = 3)

## List of 3
##  $ BP:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   .. ..@ result       :'data.frame': 1453 obs. of  9 variables:
##   .. ..@ pvalueCutoff : num 0.05
##   .. ..@ pAdjustMethod: chr "BH"
##   .. ..@ qvalueCutoff : num 0.2
##   .. ..@ organism     : chr "Mus musculus"
##   .. ..@ ontology     : chr "BP"
##   .. ..@ gene         : chr [1:46] "79235" "72823" "11622" "229595" ...
##   .. ..@ keytype      : chr "ENTREZID"
##   .. ..@ universe     : chr [1:28564] "11545" "12628" "13804" "16882" ...
##   .. ..@ gene2Symbol  : chr(0) 
##   .. ..@ geneSets     :List of 2061
##   .. ..@ readable     : logi FALSE
##   .. ..@ termsim      : num[0 , 0 ] 
##   .. ..@ method       : chr(0) 
##   .. ..@ dr           : list()
##  $ MF:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   .. ..@ result       :'data.frame': 231 obs. of  9 variables:
##   .. ..@ pvalueCutoff : num 0.05
##   .. ..@ pAdjustMethod: chr "BH"
##   .. ..@ qvalueCutoff : num 0.2
##   .. ..@ organism     : chr "Mus musculus"
##   .. ..@ ontology     : chr "MF"
##   .. ..@ gene         : chr [1:46] "79235" "72823" "11622" "229595" ...
##   .. ..@ keytype      : chr "ENTREZID"
##   .. ..@ universe     : chr [1:28171] "223774" "230801" "56075" "71365" ...
##   .. ..@ gene2Symbol  : chr(0) 
##   .. ..@ geneSets     :List of 349
##   .. ..@ readable     : logi FALSE
##   .. ..@ termsim      : num[0 , 0 ] 
##   .. ..@ method       : chr(0) 
##   .. ..@ dr           : list()
##  $ CC:Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   .. ..@ result       :'data.frame': 134 obs. of  9 variables:
##   .. ..@ pvalueCutoff : num 0.05
##   .. ..@ pAdjustMethod: chr "BH"
##   .. ..@ qvalueCutoff : num 0.2
##   .. ..@ organism     : chr "Mus musculus"
##   .. ..@ ontology     : chr "CC"
##   .. ..@ gene         : chr [1:46] "79235" "72823" "11622" "229595" ...
##   .. ..@ keytype      : chr "ENTREZID"
##   .. ..@ universe     : chr [1:28585] "13806" "13807" "13808" "226265" ...
##   .. ..@ gene2Symbol  : chr(0) 
##   .. ..@ geneSets     :List of 241
##   .. ..@ readable     : logi FALSE
##   .. ..@ termsim      : num[0 , 0 ] 
##   .. ..@ method       : chr(0) 
##   .. ..@ dr           : list()

str(go_res$up$BP, max.level = 2)

## Formal class 'enrichResult' [package "DOSE"] with 15 slots
##   ..@ result       :'data.frame':    1453 obs. of  9 variables:
##   ..@ pvalueCutoff : num 0.05
##   ..@ pAdjustMethod: chr "BH"
##   ..@ qvalueCutoff : num 0.2
##   ..@ organism     : chr "Mus musculus"
##   ..@ ontology     : chr "BP"
##   ..@ gene         : chr [1:46] "79235" "72823" "11622" "229595" ...
##   ..@ keytype      : chr "ENTREZID"
##   ..@ universe     : chr [1:28564] "11545" "12628" "13804" "16882" ...
##   ..@ gene2Symbol  : chr(0) 
##   ..@ geneSets     :List of 2061
##   ..@ readable     : logi FALSE
##   ..@ termsim      : num[0 , 0 ] 
##   ..@ method       : chr(0) 
##   ..@ dr           : list()

str(go_res$up$BP@result, max.level = 2)

## 'data.frame':    1453 obs. of  9 variables:
##  $ ID         : chr  "GO:0001776" "GO:0048872" "GO:0050670" "GO:0032944" ...
##  $ Description: chr  "leukocyte homeostasis" "homeostasis of number of cells" "regulation of lymphocyte proliferation" "regulation of mononuclear cell proliferation" ...
##  $ GeneRatio  : chr  "6/45" "8/45" "7/45" "7/45" ...
##  $ BgRatio    : chr  "136/28564" "391/28564" "260/28564" "264/28564" ...
##  $ pvalue     : num  7.29e-08 1.59e-07 1.61e-07 1.79e-07 3.15e-07 ...
##  $ p.adjust   : num  6.50e-05 6.50e-05 6.50e-05 6.50e-05 9.14e-05 ...
##  $ qvalue     : num  4.33e-05 4.33e-05 4.33e-05 4.33e-05 6.09e-05 ...
##  $ geneID     : chr  "11622/210293/17311/14605/17096/12978" "11622/210293/17311/14605/17096/16331/12978/16658" "11622/17311/17096/12494/16331/13136/12978" "11622/17311/17096/12494/16331/13136/12978" ...
##  $ Count      : int  6 8 7 7 7 7 7 7 5 6 ...

#Create a subset of your analysis with significant results only and save results
go_sig = lapply(c('up', 'down'), function(y) {
  
  i = go_res[[y]]
  
  sig = lapply(c('BP', 'MF', 'CC'), function(x) {
    
    df = subset(i[[x]]@result, p.adjust <= 0.05) 
    #Save only if there are significant results
    if (nrow(df) > 1) write.table(df, paste0(y, '_GO', x, '_sig.txt'), col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t') 
    return(df)
    
  }) %>% setNames(c('BP', 'MF', 'CC'))
  
}) %>% setNames(c('up', 'down'))

#How many significant GO terms are reported for each of BP, MF, CC? Compile this summary level data into a table where rows correspond to 'up', and 'down' regulated genes, and columns correspond to the 3 GO analysis. Here I will use a more traditional for loop method instead of lapply. Challenge yourself by translating this using lapply.
go_summary = data.frame(matrix(0, ncol = 3, nrow = 2))
rownames(go_summary) = c('up', 'down')
colnames(go_summary) = c('BP', 'MF', 'CC')
for (y in c('up', 'down')) {
  i = go_sig[[y]]
  for (n in c('BP', 'MF', 'CC')) {
    x = i[[n]]
    message(paste0(nrow(x),' GO BP terms are over-represented among ', y, '-regulated genes'))
    go_summary[y, n] = nrow(x)
  }
}

## 189 GO BP terms are over-represented among up-regulated genes

## 0 GO BP terms are over-represented among up-regulated genes
## 0 GO BP terms are over-represented among up-regulated genes

## 54 GO BP terms are over-represented among down-regulated genes

## 0 GO BP terms are over-represented among down-regulated genes

## 32 GO BP terms are over-represented among down-regulated genes

go_summary

##       BP MF CC
## up   189  0  0
## down  54  0 32

write.table(go_summary, 'go_summary.txt', col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t') #Save results

Next, we will perform GSEA analysis. GSEA is similar to GO in the general sense that it seeks to determine enrichment of curated gene sets. However, GSEA uses all genes instead of just ones that are significant. This increases statistical power, and reduce bias of detection. The curated gene sets used by GSEA is conveniently stored in MsigDB which is an online database. We can download all the gene sets collections from the MSigDB website: https://www.gsea-msigdb.org/gsea/msigdb/mouse/collections.jsp. On the webpage you will notice for each row of the collection, there is an option to ‘Download GMT Files’. Download the Gene Symbols file by right clicking ‘Gene Symbols’ underneath Download GMT Files. Note that you will need to first register for a free account on MSigDB and login.

#Go to MSigDb and download GMT file, make sure to know where your download is!
#Load .gmt file
hallmark = read.gmt('./mh.all.v2023.2.Mm.symbols.gmt') #Change this if using other gene set collections

#Inspect .gmt file
str(hallmark)

## 'data.frame':    7189 obs. of  2 variables:
##  $ term: Factor w/ 50 levels "HALLMARK_ADIPOGENESIS",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gene: chr  "Abca1" "Abcb8" "Acaa2" "Acadl" ...

table(hallmark$term) %>% head()

## 
##        HALLMARK_ADIPOGENESIS HALLMARK_ALLOGRAFT_REJECTION 
##                          200                          192 
##   HALLMARK_ANDROGEN_RESPONSE        HALLMARK_ANGIOGENESIS 
##                           96                           36 
##     HALLMARK_APICAL_JUNCTION      HALLMARK_APICAL_SURFACE 
##                          199                           44

Our input data for GSEA is a named vector where names are the gene identifiers matching with that in your .gmt file and the vector is the rank metric. A common rank metric for GSEA applies log() transformation to p-value such that genes with small p-value return higher log(p) than genes with larger p-values. We will annotate this rank with + and - signs based on the direction of fold change.

res_ordered = res_ordered %>% 
  mutate(rank = ifelse(.$log2FoldChange > 0, -log(.$pvalue), log(.$pvalue)))

rank = res_ordered$rank 
names(rank) = res_ordered$symbol
rank = rank[order(rank, decreasing = T)]
head(rank)

##     Lrat   Pard3b      Ahr Adamtsl4    Kcnk6  Angptl4 
## 16.63538 15.13892 14.24331 13.09387 11.48079 10.50572

Now we are ready to run GSEA

h = GSEA(rank, TERM2GENE = hallmark)

## preparing geneSet collections...

## GSEA analysis...

## leading edge analysis...

## done...

str(h, max.level = 2) #inspect the output

## Formal class 'gseaResult' [package "DOSE"] with 13 slots
##   ..@ result     :'data.frame':  30 obs. of  11 variables:
##   ..@ organism   : chr "UNKNOWN"
##   ..@ setType    : chr "UNKNOWN"
##   ..@ geneSets   :List of 50
##   ..@ geneList   : Named num [1:12550] 16.6 15.1 14.2 13.1 11.5 ...
##   .. ..- attr(*, "names")= chr [1:12550] "Lrat" "Pard3b" "Ahr" "Adamtsl4" ...
##   ..@ keytype    : chr "UNKNOWN"
##   ..@ permScores : num[0 , 0 ] 
##   ..@ params     :List of 6
##   ..@ gene2Symbol: chr(0) 
##   ..@ readable   : logi FALSE
##   ..@ termsim    : num[0 , 0 ] 
##   ..@ method     : chr(0) 
##   ..@ dr         : list()

head(h@result)

##                                                                    ID
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION
## HALLMARK_ALLOGRAFT_REJECTION             HALLMARK_ALLOGRAFT_REJECTION
## HALLMARK_G2M_CHECKPOINT                       HALLMARK_G2M_CHECKPOINT
## HALLMARK_MTORC1_SIGNALING                   HALLMARK_MTORC1_SIGNALING
## HALLMARK_E2F_TARGETS                             HALLMARK_E2F_TARGETS
## HALLMARK_INFLAMMATORY_RESPONSE         HALLMARK_INFLAMMATORY_RESPONSE
##                                                           Description setSize
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION     193
## HALLMARK_ALLOGRAFT_REJECTION             HALLMARK_ALLOGRAFT_REJECTION     148
## HALLMARK_G2M_CHECKPOINT                       HALLMARK_G2M_CHECKPOINT     187
## HALLMARK_MTORC1_SIGNALING                   HALLMARK_MTORC1_SIGNALING     194
## HALLMARK_E2F_TARGETS                             HALLMARK_E2F_TARGETS     196
## HALLMARK_INFLAMMATORY_RESPONSE         HALLMARK_INFLAMMATORY_RESPONSE     146
##                                    enrichmentScore       NES       pvalue
## HALLMARK_OXIDATIVE_PHOSPHORYLATION      -0.5634015 -2.234819 1.000000e-10
## HALLMARK_ALLOGRAFT_REJECTION             0.5622219  2.141500 4.206472e-10
## HALLMARK_G2M_CHECKPOINT                 -0.5398816 -2.137143 3.662952e-10
## HALLMARK_MTORC1_SIGNALING               -0.5215038 -2.065670 1.925807e-09
## HALLMARK_E2F_TARGETS                    -0.5222110 -2.067004 3.473268e-09
## HALLMARK_INFLAMMATORY_RESPONSE           0.5126148  1.944698 1.556340e-07
##                                        p.adjust       qvalue rank
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 5.000000e-09 2.105263e-09 3816
## HALLMARK_ALLOGRAFT_REJECTION       7.010786e-09 2.951910e-09 1699
## HALLMARK_G2M_CHECKPOINT            7.010786e-09 2.951910e-09 1847
## HALLMARK_MTORC1_SIGNALING          2.407259e-08 1.013583e-08 2030
## HALLMARK_E2F_TARGETS               3.473268e-08 1.462429e-08 1890
## HALLMARK_INFLAMMATORY_RESPONSE     1.296950e-06 5.460841e-07 1884
##                                                      leading_edge
## HALLMARK_OXIDATIVE_PHOSPHORYLATION tags=65%, list=30%, signal=46%
## HALLMARK_ALLOGRAFT_REJECTION       tags=34%, list=14%, signal=30%
## HALLMARK_G2M_CHECKPOINT            tags=33%, list=15%, signal=28%
## HALLMARK_MTORC1_SIGNALING          tags=36%, list=16%, signal=30%
## HALLMARK_E2F_TARGETS               tags=28%, list=15%, signal=24%
## HALLMARK_INFLAMMATORY_RESPONSE     tags=34%, list=15%, signal=29%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                core_enrichment
## HALLMARK_OXIDATIVE_PHOSPHORYLATION Ndufs8/Ldha/Atp5b/Ndufs7/Slc25a11/Gpx4/Sdha/Atp6v1e1/Ndufs3/Mdh2/Aco2/Ndufs4/Supv3l1/Ndufv2/Cox8a/Acadm/Phb2/Cox6a1/Ndufa3/Atp5e/Atp5a1/Uqcrc2/Ndufc2/Cox11/Acadvl/Abcb7/Ndufb2/Opa1/Etfb/Cox7a2/Ogdh/Ndufb8/Cox7a2l/Hsd17b10/Cyc1/Slc25a12/Tomm22/Cox10/Atp5pb/Timm13/Atp5d/Idh1/Acadsb/Mrpl35/Cox15/Htra2/Ndufv1/Hccs/Cox5b/Cox6c/Cox4i1/Uqcrb/Etfdh/Atp6v0e/Grpel1/Uqcr11/Mfn2/Mrpl34/Rhot2/Uqcrh/Atp6v0b/Atp6v1h/Atp5g3/Immt/Timm50/Cox5a/Mrpl15/Atp5c1/Ndufa5/Mrps30/Idh3b/Ndufa6/Ndufab1/Timm8b/Dlst/Ndufa9/Cox7b/Acaa2/Sdhd/Ndufb6/Got2/Ndufb7/Ndufa2/Ndufa8/Ndufb3/Mrps12/Cox7c/Nqo2/Cox6b1/Mgst3/Pdhb/Dld/Ndufb5/Uqcrc1/Vdac1/Ndufc1/Eci1/Cyb5a/Ndufa1/Slc25a3/Sdhc/Atp5g1/Pdha1/Sucla2/Ndufs6/Atp5j2/Atp5h/Atp5k/Mrps22/Bdh2/Ndufs1/Lrpprc/Cyb5r3/Mrps15/Suclg1/Atp5j/Atp5o/Uqcrfs1/Ndufs2/Aifm1/Pdhx/Prdx3/Afg3l2/Sdhb/Iscu/Gpi1
## HALLMARK_ALLOGRAFT_REJECTION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Lyn/Itgb2/Ctss/Cxcl13/Elf4/Itgal/H2-Aa/Igsf6/Il16/Lcp2/Fyb/Ly86/Irf8/Ptpn6/Ccr2/Ptprc/Gcnt1/Spi1/Ets1/Ccr1/Cfp/Prkcb/Il2rg/Fcgr2b/Cd74/Ccr5/Icam1/Tgfb1/Jak2/Il2rb/H2-DMa/Icosl/Cd86/Il27ra/Srgn/Hdac9/Nlrp3/Abi1/Fgr/Was/Npm1/Cxcr3/Ccl22/Gpr65/Flna/Akt1/Map4k1/Gzma/St8sia4/Il1b
## HALLMARK_G2M_CHECKPOINT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Jpt1/Hmgb3/Chaf1a/Cul5/Rad23b/Prmt5/Incenp/Ccnf/E2f3/Ube2s/Pttg1/E2f4/Ddx39a/Bcl3/Kif23/Cdkn2c/Prim2/Dtymk/Kif2c/Smc2/Aurkb/Birc5/Bub1/Ndc80/Racgap1/Top2a/Nusap1/Kif20b/Ttk/Kif22/Troap/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Cks2/Espl1/Gins2/E2f2/Cdkn3/Dbf4/Cenpa/Pbk/Kif11/Mad2l1/Cenpf/Mki67/Mybl2/Plk1/Aurka/Tacc3/Prc1/Tpx2/Ccnb2/Ccna2/Nek2/Hmmr/Ube2c/Cdc20
## HALLMARK_MTORC1_SIGNALING                                                                                                                                                                                                                                                                                                                                                                                                                                          Etf1/Psmc6/Psme3/Immt/Psmd13/Nfkbib/Cops5/Nufip1/Map2k3/Prdx1/Pfkl/Lta4h/Pdap1/Ctsc/Eprs/Ccnf/Fdxr/Adipor2/Nup205/Tes/Ykt6/M6pr/Ddx39a/Psmb5/Cd9/Pdk1/Txnrd1/Ak4/Slc1a4/Psmd12/Atp5g1/Qdpr/Xbp1/Bub1/Slc7a11/Sdf2l1/Gclc/Canx/Calr/Psmc4/Gsr/Pgm1/Mllt11/Slc2a1/Ero1a/Gbe1/Rab1a/Shmt2/Rrm2/Dhfr/Hspa5/Hsp90b1/Serp1/Psph/Pgk1/Plk1/Edem1/Aurka/Tpi1/Psmd14/Stc1/Ufm1/Egln3/Gpi1/Tuba4a/Uso1/Ssr1/Ppa1/Acly
## HALLMARK_E2F_TARGETS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Spc24/Ak2/Mlh1/Jpt1/Hmgb3/Gins3/Anp32e/Ranbp1/Ube2s/Nup205/Spc25/Pttg1/Hells/Diaph3/Mxd3/Ddx39a/Cdca8/Ncapd2/Mms22l/Melk/Cdkn2c/Prim2/Bub1b/Kif2c/Aurkb/Birc5/Spag5/Racgap1/Top2a/Trip13/Kif22/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Ran/Cks2/Espl1/Rad51ap1/Dlgap5/Cdkn3/Kif18b/Rrm2/Mad2l1/Mki67/Mybl2/Plk1/Aurka/Tacc3/Ccnb2/Hmmr/Cdca3/Cdc20
## HALLMARK_INFLAMMATORY_RESPONSE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Ahr/Pik3r5/Lyn/Adgre1/Il1a/Rgs1/Axl/Cd48/Klf6/Msr1/Tnfrsf1b/F3/Cmklr1/Sell/C3ar1/Rasgrp1/Lcp2/Gpr183/Icam1/Il7r/Marco/Nampt/Il2rb/Il10ra/Slc7a1/Calcrl/Slc28a2/Icosl/Emp3/Dcbld2/Nlrp3/Abi1/Ptger2/Rhog/Aqp9/Ccl22/Csf3r/Ccrl2/Cybb/Slc1a2/Pcdh7/Fpr1/Myc/Cdkn1a/Sema4d/Il1b/Sri/Ebi3/Itgb3

#Split to positive and negative enrichment
h_up = subset(h, h$NES > 0)
head(h_up)

##                                                                    ID
## HALLMARK_ALLOGRAFT_REJECTION             HALLMARK_ALLOGRAFT_REJECTION
## HALLMARK_INFLAMMATORY_RESPONSE         HALLMARK_INFLAMMATORY_RESPONSE
## HALLMARK_KRAS_SIGNALING_UP                 HALLMARK_KRAS_SIGNALING_UP
## HALLMARK_IL2_STAT5_SIGNALING             HALLMARK_IL2_STAT5_SIGNALING
## HALLMARK_APICAL_JUNCTION                     HALLMARK_APICAL_JUNCTION
## HALLMARK_INTERFERON_GAMMA_RESPONSE HALLMARK_INTERFERON_GAMMA_RESPONSE
##                                                           Description setSize
## HALLMARK_ALLOGRAFT_REJECTION             HALLMARK_ALLOGRAFT_REJECTION     148
## HALLMARK_INFLAMMATORY_RESPONSE         HALLMARK_INFLAMMATORY_RESPONSE     146
## HALLMARK_KRAS_SIGNALING_UP                 HALLMARK_KRAS_SIGNALING_UP     164
## HALLMARK_IL2_STAT5_SIGNALING             HALLMARK_IL2_STAT5_SIGNALING     168
## HALLMARK_APICAL_JUNCTION                     HALLMARK_APICAL_JUNCTION     150
## HALLMARK_INTERFERON_GAMMA_RESPONSE HALLMARK_INTERFERON_GAMMA_RESPONSE     174
##                                    enrichmentScore      NES       pvalue
## HALLMARK_ALLOGRAFT_REJECTION             0.5622219 2.141500 4.206472e-10
## HALLMARK_INFLAMMATORY_RESPONSE           0.5126148 1.944698 1.556340e-07
## HALLMARK_KRAS_SIGNALING_UP               0.4926077 1.910326 4.147305e-07
## HALLMARK_IL2_STAT5_SIGNALING             0.4886145 1.890264 4.677662e-07
## HALLMARK_APICAL_JUNCTION                 0.4912933 1.878847 1.426230e-06
## HALLMARK_INTERFERON_GAMMA_RESPONSE       0.4371705 1.702478 3.744235e-05
##                                        p.adjust       qvalue rank
## HALLMARK_ALLOGRAFT_REJECTION       7.010786e-09 2.951910e-09 1699
## HALLMARK_INFLAMMATORY_RESPONSE     1.296950e-06 5.460841e-07 1884
## HALLMARK_KRAS_SIGNALING_UP         2.914670e-06 1.227229e-06 1731
## HALLMARK_IL2_STAT5_SIGNALING       2.914670e-06 1.227229e-06 1858
## HALLMARK_APICAL_JUNCTION           7.131148e-06 3.002589e-06  994
## HALLMARK_INTERFERON_GAMMA_RESPONSE 1.440090e-04 6.063538e-05 2022
##                                                      leading_edge
## HALLMARK_ALLOGRAFT_REJECTION       tags=34%, list=14%, signal=30%
## HALLMARK_INFLAMMATORY_RESPONSE     tags=34%, list=15%, signal=29%
## HALLMARK_KRAS_SIGNALING_UP         tags=32%, list=14%, signal=28%
## HALLMARK_IL2_STAT5_SIGNALING       tags=27%, list=15%, signal=23%
## HALLMARK_APICAL_JUNCTION            tags=20%, list=8%, signal=19%
## HALLMARK_INTERFERON_GAMMA_RESPONSE tags=30%, list=16%, signal=25%
##                                                                                                                                                                                                                                                                                                                                                              core_enrichment
## HALLMARK_ALLOGRAFT_REJECTION                                                             Lyn/Itgb2/Ctss/Cxcl13/Elf4/Itgal/H2-Aa/Igsf6/Il16/Lcp2/Fyb/Ly86/Irf8/Ptpn6/Ccr2/Ptprc/Gcnt1/Spi1/Ets1/Ccr1/Cfp/Prkcb/Il2rg/Fcgr2b/Cd74/Ccr5/Icam1/Tgfb1/Jak2/Il2rb/H2-DMa/Icosl/Cd86/Il27ra/Srgn/Hdac9/Nlrp3/Abi1/Fgr/Was/Npm1/Cxcr3/Ccl22/Gpr65/Flna/Akt1/Map4k1/Gzma/St8sia4/Il1b
## HALLMARK_INFLAMMATORY_RESPONSE                                                   Ahr/Pik3r5/Lyn/Adgre1/Il1a/Rgs1/Axl/Cd48/Klf6/Msr1/Tnfrsf1b/F3/Cmklr1/Sell/C3ar1/Rasgrp1/Lcp2/Gpr183/Icam1/Il7r/Marco/Nampt/Il2rb/Il10ra/Slc7a1/Calcrl/Slc28a2/Icosl/Emp3/Dcbld2/Nlrp3/Abi1/Ptger2/Rhog/Aqp9/Ccl22/Csf3r/Ccrl2/Cybb/Slc1a2/Pcdh7/Fpr1/Myc/Cdkn1a/Sema4d/Il1b/Sri/Ebi3/Itgb3
## HALLMARK_KRAS_SIGNALING_UP         Angptl4/Gabra3/Tlr8/Mafb/Itgb2/Ctss/Ephb2/Bmp2/Tfpi/Gadd45g/Dock2/Tnfrsf1b/Cmklr1/C3ar1/Fcer1g/Irf8/Nin/Lat2/Laptm5/Epb41l3/Ets1/Adam8/Csf2ra/Eng/Il2rg/Gpnmb/Il7r/Plvap/Galnt3/Peg3/Ppp1r15a/Il10ra/Adamdec1/Tspan7/Vwa5a/Ikzf1/Dcbld2/Nrp1/Hdac9/Ano1/Spon1/Lcp1/Clec4a3/Adgra2/Sema3b/Map4k1/Crot/Adam17/Dnmbp/Cbr4/Cidea/Il1b/Tnfaip3
## HALLMARK_IL2_STAT5_SIGNALING                                                                 Ahr/Tlr7/Bcl2/Bmp2/Map3k8/Swap70/Cd48/Klf6/Rhob/Tnfrsf1b/Ecm1/Sell/Selp/Irf8/Batf3/Cd44/Cdc42se2/Gabarapl1/Lrrc8c/Il2rb/Cd83/Ahnak/Muc1/Gadd45b/Il10ra/Prkch/Hk2/Plagl1/Ncoa3/Cd86/Nrp1/Slc1a5/Igf1r/Ptger2/Gpr65/Tnfrsf21/Pim1/Cd79b/Myc/Gucy1b1/Smpdl3a/She/Aplp1/Traf1/Ptch1
## HALLMARK_APICAL_JUNCTION                                                                                                                                                                  Syk/Itga9/Msn/Vcam1/Epb41l2/Amigo2/Evl/Sdc3/Rac2/Cldn15/Tgfbi/Cldn7/Arhgef6/Fyb/Ptprc/Atp1a3/Zyx/Sirpa/Lamc2/Icam1/Cadm3/Pik3cb/Sorbs3/Skap2/Adam23/Cd274/Cd86/Gnai1/Nectin4/Cldn4
## HALLMARK_INTERFERON_GAMMA_RESPONSE          Cd38/Vcam1/Epsti1/Ifit2/Cmklr1/H2-Aa/Slamf7/Lcp2/Selp/Socs3/Irf8/Ptpn6/Txnip/Cd74/Icam1/Samhd1/Nampt/Jak2/Il2rb/Rnf31/Ifi30/P2ry14/Fcgr1/Il10ra/H2-DMa/Cd274/Ncoa3/Cd86/Lats2/Ciita/Il18bp/Itgb7/Bank1/Ptpn1/Irf5/Arid5b/Sp110/Pim1/Fpr1/Gzma/Cdkn1a/St8sia4/Pla2g4a/Peli1/Marchf1/Tnfaip3/Sri/Slc25a28/Mthfd2/Znfx1/Trim21/Nod1

write.table(h_up, 'up_GSEA_hallmark.txt',  col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')
h_down = subset(h, h$NES < 0)
head(h_down)

##                                                                    ID
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION
## HALLMARK_G2M_CHECKPOINT                       HALLMARK_G2M_CHECKPOINT
## HALLMARK_MTORC1_SIGNALING                   HALLMARK_MTORC1_SIGNALING
## HALLMARK_E2F_TARGETS                             HALLMARK_E2F_TARGETS
## HALLMARK_PROTEIN_SECRETION                 HALLMARK_PROTEIN_SECRETION
## HALLMARK_ADIPOGENESIS                           HALLMARK_ADIPOGENESIS
##                                                           Description setSize
## HALLMARK_OXIDATIVE_PHOSPHORYLATION HALLMARK_OXIDATIVE_PHOSPHORYLATION     193
## HALLMARK_G2M_CHECKPOINT                       HALLMARK_G2M_CHECKPOINT     187
## HALLMARK_MTORC1_SIGNALING                   HALLMARK_MTORC1_SIGNALING     194
## HALLMARK_E2F_TARGETS                             HALLMARK_E2F_TARGETS     196
## HALLMARK_PROTEIN_SECRETION                 HALLMARK_PROTEIN_SECRETION      91
## HALLMARK_ADIPOGENESIS                           HALLMARK_ADIPOGENESIS     192
##                                    enrichmentScore       NES       pvalue
## HALLMARK_OXIDATIVE_PHOSPHORYLATION      -0.5634015 -2.234819 1.000000e-10
## HALLMARK_G2M_CHECKPOINT                 -0.5398816 -2.137143 3.662952e-10
## HALLMARK_MTORC1_SIGNALING               -0.5215038 -2.065670 1.925807e-09
## HALLMARK_E2F_TARGETS                    -0.5222110 -2.067004 3.473268e-09
## HALLMARK_PROTEIN_SECRETION              -0.5763271 -2.042758 5.246406e-07
## HALLMARK_ADIPOGENESIS                   -0.4615917 -1.827758 2.042540e-06
##                                        p.adjust       qvalue rank
## HALLMARK_OXIDATIVE_PHOSPHORYLATION 5.000000e-09 2.105263e-09 3816
## HALLMARK_G2M_CHECKPOINT            7.010786e-09 2.951910e-09 1847
## HALLMARK_MTORC1_SIGNALING          2.407259e-08 1.013583e-08 2030
## HALLMARK_E2F_TARGETS               3.473268e-08 1.462429e-08 1890
## HALLMARK_PROTEIN_SECRETION         2.914670e-06 1.227229e-06 1787
## HALLMARK_ADIPOGENESIS              9.284274e-06 3.909168e-06 2683
##                                                      leading_edge
## HALLMARK_OXIDATIVE_PHOSPHORYLATION tags=65%, list=30%, signal=46%
## HALLMARK_G2M_CHECKPOINT            tags=33%, list=15%, signal=28%
## HALLMARK_MTORC1_SIGNALING          tags=36%, list=16%, signal=30%
## HALLMARK_E2F_TARGETS               tags=28%, list=15%, signal=24%
## HALLMARK_PROTEIN_SECRETION         tags=38%, list=14%, signal=33%
## HALLMARK_ADIPOGENESIS              tags=38%, list=21%, signal=30%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                core_enrichment
## HALLMARK_OXIDATIVE_PHOSPHORYLATION Ndufs8/Ldha/Atp5b/Ndufs7/Slc25a11/Gpx4/Sdha/Atp6v1e1/Ndufs3/Mdh2/Aco2/Ndufs4/Supv3l1/Ndufv2/Cox8a/Acadm/Phb2/Cox6a1/Ndufa3/Atp5e/Atp5a1/Uqcrc2/Ndufc2/Cox11/Acadvl/Abcb7/Ndufb2/Opa1/Etfb/Cox7a2/Ogdh/Ndufb8/Cox7a2l/Hsd17b10/Cyc1/Slc25a12/Tomm22/Cox10/Atp5pb/Timm13/Atp5d/Idh1/Acadsb/Mrpl35/Cox15/Htra2/Ndufv1/Hccs/Cox5b/Cox6c/Cox4i1/Uqcrb/Etfdh/Atp6v0e/Grpel1/Uqcr11/Mfn2/Mrpl34/Rhot2/Uqcrh/Atp6v0b/Atp6v1h/Atp5g3/Immt/Timm50/Cox5a/Mrpl15/Atp5c1/Ndufa5/Mrps30/Idh3b/Ndufa6/Ndufab1/Timm8b/Dlst/Ndufa9/Cox7b/Acaa2/Sdhd/Ndufb6/Got2/Ndufb7/Ndufa2/Ndufa8/Ndufb3/Mrps12/Cox7c/Nqo2/Cox6b1/Mgst3/Pdhb/Dld/Ndufb5/Uqcrc1/Vdac1/Ndufc1/Eci1/Cyb5a/Ndufa1/Slc25a3/Sdhc/Atp5g1/Pdha1/Sucla2/Ndufs6/Atp5j2/Atp5h/Atp5k/Mrps22/Bdh2/Ndufs1/Lrpprc/Cyb5r3/Mrps15/Suclg1/Atp5j/Atp5o/Uqcrfs1/Ndufs2/Aifm1/Pdhx/Prdx3/Afg3l2/Sdhb/Iscu/Gpi1
## HALLMARK_G2M_CHECKPOINT                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Jpt1/Hmgb3/Chaf1a/Cul5/Rad23b/Prmt5/Incenp/Ccnf/E2f3/Ube2s/Pttg1/E2f4/Ddx39a/Bcl3/Kif23/Cdkn2c/Prim2/Dtymk/Kif2c/Smc2/Aurkb/Birc5/Bub1/Ndc80/Racgap1/Top2a/Nusap1/Kif20b/Ttk/Kif22/Troap/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Cks2/Espl1/Gins2/E2f2/Cdkn3/Dbf4/Cenpa/Pbk/Kif11/Mad2l1/Cenpf/Mki67/Mybl2/Plk1/Aurka/Tacc3/Prc1/Tpx2/Ccnb2/Ccna2/Nek2/Hmmr/Ube2c/Cdc20
## HALLMARK_MTORC1_SIGNALING                                                                                                                                                                                                                                                                                                                                                                                                                                          Etf1/Psmc6/Psme3/Immt/Psmd13/Nfkbib/Cops5/Nufip1/Map2k3/Prdx1/Pfkl/Lta4h/Pdap1/Ctsc/Eprs/Ccnf/Fdxr/Adipor2/Nup205/Tes/Ykt6/M6pr/Ddx39a/Psmb5/Cd9/Pdk1/Txnrd1/Ak4/Slc1a4/Psmd12/Atp5g1/Qdpr/Xbp1/Bub1/Slc7a11/Sdf2l1/Gclc/Canx/Calr/Psmc4/Gsr/Pgm1/Mllt11/Slc2a1/Ero1a/Gbe1/Rab1a/Shmt2/Rrm2/Dhfr/Hspa5/Hsp90b1/Serp1/Psph/Pgk1/Plk1/Edem1/Aurka/Tpi1/Psmd14/Stc1/Ufm1/Egln3/Gpi1/Tuba4a/Uso1/Ssr1/Ppa1/Acly
## HALLMARK_E2F_TARGETS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Spc24/Ak2/Mlh1/Jpt1/Hmgb3/Gins3/Anp32e/Ranbp1/Ube2s/Nup205/Spc25/Pttg1/Hells/Diaph3/Mxd3/Ddx39a/Cdca8/Ncapd2/Mms22l/Melk/Cdkn2c/Prim2/Bub1b/Kif2c/Aurkb/Birc5/Spag5/Racgap1/Top2a/Trip13/Kif22/Cks1b/Kif4/Cenpe/Plk4/Cdk1/Gspt1/Ran/Cks2/Espl1/Rad51ap1/Dlgap5/Cdkn3/Kif18b/Rrm2/Mad2l1/Mki67/Mybl2/Plk1/Aurka/Tacc3/Ccnb2/Hmmr/Cdca3/Cdc20
## HALLMARK_PROTEIN_SECRETION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Clcn3/Tsg101/Rab2a/Abca1/Cltc/Anp32e/Ctsc/Gosr2/Ap2m1/Ykt6/M6pr/Tmx1/Cog2/Tspan8/Tmed10/Cd63/Scamp3/Arcn1/Vps45/Gbf1/Yipf6/Sec31a/Copb2/Arfgap3/Atp1a1/Sec24d/Sec22b/Gnas/Ap3s1/Rer1/Cope/Bet1/Uso1/Ergic3/Lman1
## HALLMARK_ADIPOGENESIS                                                                                                                                                                                                                                                                                                                                                                                                                      Aldoa/Idh1/Pex14/Tst/Fabp4/Plin2/Ptcd3/Dbt/Fah/Dgat1/Dhrs7b/Tank/G3bp2/Sult1a1/Samm50/Grpel1/Uqcr11/Slc25a1/Araf/C3/Acads/Immt/Ak2/Ywhag/Mrpl15/Ndufa5/Ndufab1/Reep6/Cox7b/Acaa2/Abca1/Pfkl/Aldh2/Enpp2/Reep5/Ndufb7/Gpam/Rmdn3/Scarb1/Adipor2/Lpl/Mgst3/Dld/Stom/Uqcrc1/Cdkn2c/Cd151/Aplp2/Pdcd4/Preb/Sdhc/Qdpr/Riok3/Ddt/Jagn1/Mccc1/Pgm1/Ubqln1/Agpat3/Suclg1/Gbe1/Ghitm/Dnajb9/Atp5o/Aifm1/Prdx3/Sdhb/Orm1/Sqor/Mrap/Cmpk1/Acly

write.table(h_down, 'down_GSEA_hallmark.txt',  col.names = TRUE, row.names = TRUE, quote = FALSE, sep = '\t')

That’s a wrap on the basics of GO and GSEA. Please have the saved result files ready for class next Monday. MSigDb provides an abundance of other gene sets ready for tests on their website. Please prepare additional results using gene sets relevant to your study.

Additionally, I encourage you to explore this website: https://yulab-smu.top/biomedical-knowledge-mining-book/ for more tutorials on using the clusterprofiler package. You will find many different variations of gene set analyses and importantly, visualization of your enrichment results (https://yulab-smu.top/biomedical-knowledge-mining-book/enrichplot.html).

GSEA_GO

Jianhan

2023-11-01