output: html_document: title: ‘MA5111 RNA-Seq’ toc: true toc_float: true code_folding: hide highlight: zenburn theme: flatly —r::opts_chunk$set(echo = TRUE)
```r
library(dplyr)
library(biomaRt)
library(tximport)
library(rhdf5)
library(gplots)
library(org.Hs.eg.db)
library(DESeq2)
library(DT)
library(apeglm)
library(RColorBrewer)
library(IHW)
library(PCAtools)
library(pheatmap)
library(clusterProfiler)
library(EnhancedVolcano)
library(ComplexHeatmap)
library(circlize)
library(fgsea)
library(tidyverse)
library(ggpubr)
This R markdown document walks through an RNA-Seq differential
expression analysis using DESeq2 for MA5111. Quality
control plots, differential expression results, plots and pathway
analysis are covered in the document.
The tutorial assumes you have 9 Kallisto quantification directories
and a samples.csv file under /home/rnaseq_data/quant/ as
well as a .gmt file that we will use for the gene set enrichment
analysis.
quant_dir <- "/home/rstudio/rnaseq_data/quant"
list.files(quant_dir)
## [1] "A375_1" "A375_2" "A375_3"
## [4] "A549_1" "A549_2" "A549_3"
## [7] "c5.bp.v7.0.symbols.gmt" "ctrl_1" "ctrl_2"
## [10] "ctrl_3" "samples.csv"
Read in the file samples.csv which contains the
experimental metadata. Please make sure that the rownames of this
dataframe match the kallisto quantification directory names at
/home/rnaseq/quant/.
samples <- read.csv(paste0(quant_dir, "/samples.csv"), header=T, row.names = "sample", stringsAsFactors = T)
samples
## condition replicate
## ctrl_1 control 1
## ctrl_2 control 2
## ctrl_3 control 3
## A549_1 lung 1
## A549_2 lung 2
## A549_3 lung 3
## A375_1 melanoma 1
## A375_2 melanoma 2
## A375_3 melanoma 3
When performing differential expression analysis using
DESeq2, if a column in samples.csv is numeric
and provided to the design formula, DESeq2 will produce the
following message:
the design formula contains a numeric variable with integer values, specifying a model with increasing fold change for higher values. did you mean for this to be a factor? if so, first convert this variable to a factor using the factor() function
What does this mean? From Mike Love (DESeq2 author):
There is a constant fold change for every unit of change in replicates. So if the estimated fold change is 2, this implies that replicates 2 = 2x replicates 1, replicates 3 = 2x replicates 2, etc. Or in other words, the relationship is linear on the log counts scale.
This is not what we want! Be really careful with your metadata file as it defines our statistical model design. To fix this, change the replicate column to a categorical factor using factor().
The code block below checks the input metadata file, tells the user if any columns are numeric and converts them to a factor.
samples$replicate <- factor(samples$replicate)
# check its ok:
factor_cols <- sapply(samples, is.factor)
factor_cols
## condition replicate
## TRUE TRUE
We need to create a file handle object (a named character list)
containing the sample IDs and the paths to the kallisto quantification
.h5 files. If the rownames of your metadata object
(samples) do not match the quantification directory names,
you will get an error during the TXI object step.
files <- file.path(quant_dir, rownames(samples), "abundance.h5")
names(files) <- paste0(rownames(samples))
files
## ctrl_1
## "/home/rstudio/rnaseq_data/quant/ctrl_1/abundance.h5"
## ctrl_2
## "/home/rstudio/rnaseq_data/quant/ctrl_2/abundance.h5"
## ctrl_3
## "/home/rstudio/rnaseq_data/quant/ctrl_3/abundance.h5"
## A549_1
## "/home/rstudio/rnaseq_data/quant/A549_1/abundance.h5"
## A549_2
## "/home/rstudio/rnaseq_data/quant/A549_2/abundance.h5"
## A549_3
## "/home/rstudio/rnaseq_data/quant/A549_3/abundance.h5"
## A375_1
## "/home/rstudio/rnaseq_data/quant/A375_1/abundance.h5"
## A375_2
## "/home/rstudio/rnaseq_data/quant/A375_2/abundance.h5"
## A375_3
## "/home/rstudio/rnaseq_data/quant/A375_3/abundance.h5"
tximport imports transcript-level abundances from
quantification tools (kallisto in our case) and converts
them to gene counts for downstream analyses using a differential
expression analysis package.
We will use biomaRt to connect to the
ENSEMBL databases and map transcript IDs to gene IDs.
Firstly, we will create the mart object specifying which
database to use:
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
Recall we used the ENSEMBL reference cDNA file
Homo_sapiens.GRCh38.cdna.all.fa.gz for
Kallisto quant on LUGH. This means that our quantification
files have ENSEMBL transcript ID’s. We can map them to gene
symbols running the code below:
tx2gene <- getBM(attributes = c("ensembl_transcript_id_version", "hgnc_symbol"), mart = mart, useCache = FALSE)
head(tx2gene)
## ensembl_transcript_id_version hgnc_symbol
## 1 ENST00000387314.1 MT-TF
## 2 ENST00000389680.2 MT-RNR1
## 3 ENST00000387342.1 MT-TV
## 4 ENST00000387347.2 MT-RNR2
## 5 ENST00000386347.1 MT-TL1
## 6 ENST00000361390.2 MT-ND1
Create a txi object summarising kallisto
transcipt quantification to the gene-level.
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene)
head(txi$abundance)
## ctrl_1 ctrl_2 ctrl_3 A549_1 A549_2
## 7.565669e+03 8.212640e+03 7.877847e+03 7745.3681011 7.850105e+03
## A1BG 5.901603e+00 4.889110e+00 5.094564e+00 4.8513322 5.471458e+00
## A1CF 3.783157e-02 3.832046e-03 8.864742e-02 0.0453066 6.511605e-02
## A2M 4.130882e+01 2.953310e+01 3.708851e+01 37.1903231 4.827051e+01
## A2ML1 1.886978e-01 2.740595e-01 7.846124e-02 0.3793876 4.937987e-01
## A2MP1 6.811755e-01 6.825399e-01 5.020745e-01 0.9794183 1.189639e+00
## A549_3 A375_1 A375_2 A375_3
## 7.888903e+03 7.705010e+03 8.004009e+03 8.018222e+03
## A1BG 4.309155e+00 4.564884e+00 5.934494e+00 3.989345e+00
## A1CF 1.938563e-02 1.034112e-02 3.690397e-02 1.865727e-02
## A2M 4.759494e+01 3.576941e+01 3.101334e+01 4.386300e+01
## A2ML1 2.521947e-01 4.777695e-01 3.322747e-01 3.713090e-01
## A2MP1 1.041572e+00 8.024568e-01 6.518761e-01 8.664537e-01
Now we are ready to create the DDS object for
DESeq2 analysis that contains metadata (colData), counts
further information that can be obtained via dds@.
The model design is one of the most important steps for an RNA-Seq
analysis. Here, we are going to specify the design
~ replicate + condition (columns of the metadata file). The
factor of interest goes last in the model terms, we want to compare
control vs. lung vs. melanoma in the experiment. Replicate has been
included as we want to control for the effect of sample replicates.
We are going to create the DDS object using
DESeqDataSetFromTximport() as we used tximport
to convert transcript abundances to gene-level counts.
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ replicate + condition )
By default, R will choose a reference level for factors based on alphabetical order. We will explicitly set control as the reference level so you can see how it’s done using relevel().
N.B this step must be done prior to creating the dds object using DESeq()! If you want to change the reference level, you must re-run DESeq().
Check that relevel() worked correctly by running resultsNames(dds) - we want control to be the reference level so we can compare lung vs control and melanoma vs. control.
dds$condition <- relevel(dds$condition, ref = "control")
dds <- DESeq(dds)
resultsNames(dds)
## [1] "Intercept" "replicate_2_vs_1"
## [3] "replicate_3_vs_1" "condition_lung_vs_control"
## [5] "condition_melanoma_vs_control"
We will use the gene-level counts to perform some quality control checks on the samples in the experiment. We will choose a suitable transformation for the gene counts to reduce impact of lowly-expressed genes on our analysis.
Extract gene-level counts from the DDS object.
counts <- counts(dds, normalized=TRUE)
###Transform Counts We’ll look at two transformations: log2(), and rlog().
log2_counts <- assay(normTransform(dds))
rld_counts <- assay(rlog(dds))
library(vsn)
library(hexbin)
## x-axis is the transformed mean not the raw mean..
log2_plt <- meanSdPlot(log2_counts, ranks=FALSE, plot=FALSE)
log2_plt$gg + ggtitle("Log2 + PC Transformation") + xlim(0,20)
rld_plt <- meanSdPlot(rld_counts, ranks=FALSE, plot=FALSE)
rld_plt$gg + ggtitle("Rlog Transformation") + xlim(0,20)
Write the counts files
#dir.create("/home/rstudio/rnaseq_data/counts")
write.table(counts, "normalised_counts.txt", sep="\t", quote = F)
write.table(log2_counts, "log2_counts.txt", sep="\t", quote = F)
A heatmap of sample distances matrix (dist(t(log2)))
gives us an overview over similarities and dissimilarities between
samples.
## Calculate distance between samples
sampleDists <- dist(t(rld_counts))
## Place distances in matrix
sampleDistMatrix <- as.matrix(sampleDists)
## Optional, remove colnames
colnames(sampleDistMatrix) <- NULL
## create annotation dataframe
ann <- data.frame(Condition = samples$condition)
col <- c("blue", "forestgreen", "red1")
names(col) <- c("melanoma", "lung", "control")
ann_col <- list(Condition = col)
## match annotation rownames to distance mat
rownames(ann) <- rownames(sampleDistMatrix)
pheatmap(mat=sampleDistMatrix,
## pass distance metric calculated to heatmap
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
## pass annotation dataframe
annotation_col = ann,
## add colors
annotation_colors = ann_col,
## heatmap colours
col=hcl.colors(100,"GnBu",rev=T))
PCA plots the variance explained by samples in each principal component. Typically PC1 & PC2 explain the most variation in the dataset, you should look at these first and foremost.
In the plot below we can see that samples from the lung and control groups cluster tightly together and that the melanoma samples overlap with the control samples in PC1/PC2. Make the effort to explore other principal components yourdelf - for example, in PC2 vs PC4 there is excellent separation between the melanoma and control samples.
PCA can give you a good idea of how succesful the Differential Expression analysis will be - samples that are very close together in PC feature space will not produce as many DE genes as those that are separated by a large distance.
p <- pca(rld_counts, metadata = samples)
biplot(p,
colby = 'condition',
colkey = c('melanoma'='royalblue', 'control'='red1', 'lung'='forestgreen'),
hline = 0,
vline = 0,
legendPosition = 'right',
legendLabSize = 12,
legendIconSize = 8.0,
title = 'PCA bi-plot',
subtitle = 'PC1 versus PC2')
We can extract differentially expressed genes between phenotypes of
interest by using results() on the DDS object.
We will apply the apeglm shrinkage estimator on our
results. apeglm shrinks low confidence (high inter-sample variation)
differentially expressed genes towards 0, producing a robust set of
differentially expressed genes.
The argument coef in lfcShrink refers to
the contrast of interest returned by resultsNames(dds). For
lung vs. control, it is the 4th character string returned thus
coef=4 for lung vs. control.
Set up lung vs control, melanoma vs control:
# make lung vs control object
lung_v_ctrl <- results(dds, filterFun=ihw, alpha=0.05, c("condition", "lung", "control"))
res1 <- lfcShrink(dds=dds, res=lung_v_ctrl, coef=4, type="apeglm")
summary(res1)
##
## out of 22664 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 2661, 12%
## LFC < 0 (down) : 3253, 14%
## outliers [1] : 0, 0%
## [1] see 'cooksCutoff' argument of ?results
## see metadata(res)$ihwResult on hypothesis weighting
# make melanoma vs control object
melanoma_v_ctrl <- results(dds, filterFun=ihw, alpha=0.05, c("condition", "melanoma", "control"))
res2 <- lfcShrink(dds=dds, res=melanoma_v_ctrl, coef=5, type="apeglm")
summary(res2)
##
## out of 22664 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 439, 1.9%
## LFC < 0 (down) : 394, 1.7%
## outliers [1] : 0, 0%
## [1] see 'cooksCutoff' argument of ?results
## see metadata(res)$ihwResult on hypothesis weighting
Set up lung vs melanoma. We need to reconfigure the DDS
object so ‘melanoma’ is the reference level in ‘condition’, and re-run
DESeq().
Check resultsNames(dds) to ensure the correct contrasts
are set:
# to make lung vs melanoma, relevel the dds object reference level and redo the DESeq call
dds$condition <- relevel(dds$condition, ref = "melanoma")
dds <- DESeq(dds)
# double check it worked
resultsNames(dds)
## [1] "Intercept" "replicate_2_vs_1"
## [3] "replicate_3_vs_1" "condition_control_vs_melanoma"
## [5] "condition_lung_vs_melanoma"
Now extract the contrast results and perform apeglm.
# make lung vs melanoma
lung_v_melanoma <- results(dds, filterFun=ihw, alpha=0.05, c("condition", "lung", "melanoma"))
res3 <- lfcShrink(dds=dds, res=lung_v_melanoma, coef=5, type="apeglm")
summary(res3)
##
## out of 22664 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up) : 2646, 12%
## LFC < 0 (down) : 3192, 14%
## outliers [1] : 0, 0%
## [1] see 'cooksCutoff' argument of ?results
## see metadata(res)$ihwResult on hypothesis weighting
Below are some handy functions for extracting differentially expressed genes and performing annotation of DE genes.
get_upregulated <- function(df){
key <- intersect(rownames(df)[which(df$log2FoldChange>=1)], rownames(df)[which(df$pvalue<=0.05)])
results <- as.data.frame((df)[which(rownames(df) %in% key),])
return(results)
}
get_downregulated <- function(df){
key <- intersect(rownames(df)[which(df$log2FoldChange<=-1)], rownames(df)[which(df$pvalue<=0.05)])
results <- as.data.frame((df)[which(rownames(df) %in% key),])
return(results)
}
annotate_de_genes <- function(df){
df$hgnc_symbol <- rownames(df)
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
info <- getBM(attributes=c("hgnc_symbol",
"ensembl_gene_id_version",
"chromosome_name",
"start_position",
"end_position",
"strand",
"entrezgene_description"),
filters = c("hgnc_symbol"),
values = df$hgnc_symbol,
mart = mart,
useCache=FALSE)
tmp <- merge(df, info, by="hgnc_symbol")
tmp$strand <- gsub("-1", "-", tmp$strand)
tmp$strand <- gsub("1", "+", tmp$strand)
tmp$hgnc_symbol <- make.names(tmp$hgnc_symbol, unique = T)
tmp <- tmp[!grepl("CHR", tmp$chromosome_name),]
output_col <- c("Gene", "Ensembl ID", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")
tmp <- subset(tmp, select=c(hgnc_symbol, ensembl_gene_id_version, chromosome_name, start_position, end_position, strand, entrezgene_description, log2FoldChange, pvalue, padj))
colnames(tmp) <- output_col
if(min(tmp$Log2FC) > 0){
tmp <- tmp[order(-tmp$Log2FC),]
}else{
tmp <- tmp[order(tmp$Log2FC),]
}
return(tmp)
}
Use the above functions to extract up/down regulated genes and annotate them using biomaRt, and write to a file. Please note that the get_(up/down)regulated function requires a dataframe as input.
Note: we will focus on lung vs control for the remainder of the practical
de_up <- get_upregulated(as.data.frame(res1))
de_down <- get_downregulated(as.data.frame(res1))
upregulated_genes <- annotate_de_genes(de_up)
downregulated_genes <- annotate_de_genes(de_down)
#confirm these worked
head(upregulated_genes)
## Gene Ensembl ID Chromosome Start Stop Strand
## 467 INSRR ENSG00000027644.5 1 156840063 156859117 -
## 282 FAM187B2P ENSG00000262497.1 19 35232291 35233003 -
## 290 FDCSP ENSG00000181617.6 4 70226124 70235252 +
## 781 SAA1 ENSG00000288411.1 HG2111_PATCH 15086 18803 +
## 782 SAA1.1 ENSG00000173432.13 11 18266260 18269977 +
## 601 NDP ENSG00000124479.11 X 43948776 43973395 -
## Description Log2FC P-value
## 467 insulin receptor related receptor 8.728400 5.039956e-09
## 282 7.402744 7.541358e-06
## 290 follicular dendritic cell secreted protein 7.393437 6.828074e-193
## 781 serum amyloid A1 7.088209 6.108702e-18
## 782 serum amyloid A1 7.088209 6.108702e-18
## 601 norrin cystine knot growth factor NDP 6.907855 3.721412e-05
## Adj P-value
## 467 1.448311e-07
## 282 1.322405e-04
## 290 5.489574e-190
## 781 1.045581e-16
## 782 1.045581e-16
## 601 3.637315e-02
#dir.create("/home/rstudio/rnaseq_data/DESeq_results/")
write.table(upregulated_genes, "lung_vs_control_upregulated.txt", sep="\t", row.names=F, quote=F)
write.table(downregulated_genes, "lung_vs_control_downregulated.txt", sep="\t", row.names=F, quote=F)
Volcano plots are useful to show how many genes are differentially expressed in the experimental contrast of interest. Labels are optional, I have included them so you know how to use them.
N.B Volcano plots use -log10 on the Y-axis.
## remove NA values from results
res1 <- na.omit(res1)
## calculate min/max axis values for plot (optional)
min_width <- min(res1$log2FoldChange)
max_width <- max(res1$log2FoldChange)
max_height <- -log10(min(res1[res1$pvalue>0, 5]))
## Grab top 10 up-reg genes for plot
up <- subset(res1, res1$log2FoldChange > 1 & res1$pvalue <= 0.05)
up <- up[order(-up$log2FoldChange),]
up_list <- head(rownames(up), n=10L)
## Grab top 10 down-reg genes for plot
down <- subset(res1, res1$log2FoldChange < -1 & res1$pvalue <= 0.05)
down <- down[order(down$log2FoldChange),]
down_list <- head(rownames(down), n=10L)
## place top 20 DE genes in vector (optinal...)
plot_top_20 <- c(up_list, down_list)
EnhancedVolcano(res1,
lab=rownames(res1),
x="log2FoldChange",
y="pvalue",
selectLab=plot_top_20,
drawConnectors=TRUE,
legendPosition = "none",
FCcutoff=1.0,
pCutoff=0.05,
title="Volcano Plot",
subtitle="Lung vs. Control",
caption = paste0('Total Genes = ', nrow(res1)),
xlim=c(min_width, max_width),
ylim=c(0, max_height))
Make a volcano plot of lung_v_ctrl, the
DESeq2 results object that has not been filtered by
apeglm.
You can see that there are genes with very large log2 FC values. As a rule of thumb, be skeptical of genes that have LFC values over +/- 10.
res1 <- na.omit(lung_v_ctrl)
min_width <- min(res1$log2FoldChange)
max_width <- max(res1$log2FoldChange)
max_height <- -log10(min(res1[res1$pvalue>0, 5]))
up <- subset(res1, res1$log2FoldChange > 1 & res1$pvalue <= 0.05)
up <- up[order(-up$log2FoldChange),]
up_list <- head(rownames(up), n=10L)
down <- subset(res1, res1$log2FoldChange < -1 & res1$pvalue <= 0.05)
down <- down[order(down$log2FoldChange),]
down_list <- head(rownames(down), n=10L)
plot_top_20 <- c(up_list, down_list)
EnhancedVolcano(res1,
lab=rownames(res1),
x="log2FoldChange",
y="pvalue",
selectLab=plot_top_20,
drawConnectors=TRUE,
FCcutoff=1.0,
pCutoff=0.05,
title="Volcano Plot",
subtitle="Lung vs. Control",
legendLabSize=8,
caption = paste0('Total Genes = ', nrow(res1)))
##reset res1 for heatmaps
res1 <- na.omit(res1)
Heatmaps are another way to show the differentially expressed genes in the experimental contrast of interest.
# subset the counts matrix to get the lung and control samples
subset <- rld_counts[, 1:6]
# now select de_up, de_down, i.e DE genes that passed the filtering our function produced
up <- rownames(de_up)
down <- rownames(de_down)
# subset matrix to include only DE genes
key <- c(up, down)
subset <- subset[which(rownames(subset) %in% key),]
# scale and center the values
mat <- as.matrix(scale(t(subset), center = T))
# basic plot to check we're plotting something sensible
#pheatmap(t(mat))
# spruce it up a bit..
ann <- data.frame(Condition = c(rep("Control", 3), rep("Lung", 3)))
rownames(ann) <- rownames(mat)
col <- c("blue", "forestgreen")
names(col) <- c("Control", "Lung")
ann_col <- list(Condition = col)
pheatmap(t(mat),
show_rownames = FALSE,
annotation_col = ann,
annotation_colors = ann_col,
color = hcl.colors(100, "PRGn",rev=F))
We will use the GO Biological processes GMT file c5.bp.v7.0.symbols.gmt which should be in the quant directory.
Extract the gene names and associated log2FoldChanges from our lung vs control study to generate a ranked gene list.
## convert result object to dataframe
res <- as.data.frame(res1) # lung vs control
res$hgnc_symbol <- rownames(res)
# compute summary stat
fgsea_rank <- res %>%
dplyr::select(hgnc_symbol, log2FoldChange) %>%
na.omit() %>%
distinct() %>%
group_by(hgnc_symbol) %>%
summarize(stat=mean(log2FoldChange))
fgsea_rank
## # A tibble: 22,664 × 2
## hgnc_symbol stat
## <chr> <dbl>
## 1 "" -0.0969
## 2 "A1BG" -0.215
## 3 "A1CF" -0.0345
## 4 "A2M" 0.215
## 5 "A2ML1" 1.03
## 6 "A2MP1" 0.692
## 7 "A3GALT2" 1.72
## 8 "A4GALT" 1.71
## 9 "A4GNT" -0.119
## 10 "AAAS" -0.444
## # ℹ 22,654 more rows
# create named list
rank <- deframe(fgsea_rank)
head(rank, 20)
## A1BG A1CF A2M A2ML1 A2MP1
## -0.09692611 -0.21512443 -0.03447851 0.21531794 1.03183917 0.69165261
## A3GALT2 A4GALT A4GNT AAAS AACS AACSP1
## 1.72434705 1.71202195 -0.11870411 -0.44423038 -0.68357989 -1.66781509
## AADAC AADACL3 AADACP1 AADAT AAGAB AAK1
## -0.50864719 0.60883057 1.20853625 -0.18175691 0.12732625 0.16300913
## AAMDC AAMP
## -0.59608215 0.20323132
# read in gmt file
pathway <- gmtPathways("/home/rstudio/rnaseq_data/quant/c5.bp.v7.0.symbols.gmt")
head(pathway, 1)
## $GO_POSITIVE_REGULATION_OF_VIRAL_TRANSCRIPTION
## [1] "CDK9" "CHD1" "DHX9" "EP300" "SNW1" "RRP1B" "NELFB"
## [8] "GTF2B" "GTF2F1" "GTF2F2" "MDFIC" "HPN" "JUN" "LEF1"
## [15] "ZNF639" "NELFCD" "RSF1" "POLR2A" "POLR2B" "POLR2C" "POLR2D"
## [22] "POLR2E" "POLR2F" "POLR2G" "POLR2H" "POLR2I" "POLR2J" "POLR2K"
## [29] "POLR2L" "NUCKS1" "SMARCA4" "SMARCB1" "SP1" "SUPT4H1" "SUPT5H"
## [36] "TAF11" "TFAP4" "NELFA" "NELFE" "CCNT1" "CTDP1"
# run fgsea
fgsea <- fgsea(pathways=pathway, stats=rank, nperm=1000)
fgseaResTidy <- fgsea %>%
as_tibble() %>%
arrange(desc(NES))
# Show in a nice table:
fgseaResTidy %>%
dplyr::select(-leadingEdge, -ES, -nMoreExtreme) %>%
arrange(padj) %>%
DT::datatable()
Here, you can see examples of pathways enriched in our lung samples, and pathways enriched in Control (i.e negative NES score)
filtered_pathway <- subset(fgsea, NES > 2.0)
filtered_pathway
## pathway pval padj
## 1: GO_POSITIVE_REGULATION_OF_LYMPHOCYTE_MIGRATION 0.001360544 0.03671469
## 2: GO_HUMORAL_IMMUNE_RESPONSE 0.001084599 0.03632369
## 3: GO_GRANULOCYTE_MIGRATION 0.001150748 0.03632369
## 4: GO_NEUTROPHIL_MIGRATION 0.001175088 0.03632369
## 5: GO_RESPONSE_TO_CHEMOKINE 0.001200480 0.03632369
## ES NES nMoreExtreme size
## 1: 0.7766969 2.025782 0 32
## 2: 0.6240522 2.010755 0 194
## 3: 0.6800119 2.098605 0 118
## 4: 0.6867646 2.082366 0 98
## 5: 0.6974278 2.072428 0 84
## leadingEdge
## 1: CCL20,CCL7,CXCL10,CXCL12,TNFSF14,CCL4,...
## 2: PI3,BPI,LCN2,C3,CXCL2,CXCL3,...
## 3: SAA1,CCL20,CCL8,IL36G,CCL7,CXCL2,...
## 4: SAA1,CCL20,CCL8,IL36G,CCL7,CXCL2,...
## 5: CCL20,CCL8,CXCR5,CCL7,CCRL2,CXCL2,...
filt_up <- as.vector(filtered_pathway$pathway)
for (i in filt_up){
plt <- plotEnrichment(pathway = pathway[[i]],
gseaParam = 1, ticksSize = 0.5, stats= rank) +
labs(title=i) + theme(plot.title = element_text(hjust = 0.5, face="bold"))
print(plt)
}
filtered_pathway <- subset(fgsea, NES < -2.6)
filtered_pathway
## pathway pval padj ES
## 1: GO_MITOTIC_SISTER_CHROMATID_SEGREGATION 0.008474576 0.09297966 -0.683861
## NES nMoreExtreme size leadingEdge
## 1: -2.601437 0 148 TRIP13,NUSAP1,AURKB,TTK,NUF2,BUB1B,...
filt_down <- as.vector(filtered_pathway$pathway)
for (i in filt_down){
plt <- plotEnrichment(pathway = pathway[[i]],
gseaParam = 1, ticksSize = 0.5, stats= rank) +
labs(title=i) + theme(plot.title = element_text(hjust = 0.5, face="bold"))
print(plt)
}