library(tidyverse)
library(RColorBrewer)
library(ggrepel)
Volcano Plot for differentially expressing genes
Creating a volcano plot using only
ggplot2
andtidyverse
packages
About the data
The dataset is a processed transcriptomics data file in .csv format, comapring a control and a treated sample. It shows the fold change and the p-value of significance for each gene in the list (treated compared to the control). The dataset is available at this link DEG_data
Installing the required packages
Packages can be installed using the the command install.packages("package_name")
Following packages need to be installed:
install.packages("tidyverse")
install.packages("RColorBrewer")
install.packages("ggrepel")
Calling the package to the working environment
Importing the data
Setting working directory
Before you can import the data, the working directory needs to be set. You can set the working directory by using the following command:
setwd("complete_path_of_folder")
This should be the folder/directory in which the CSV file is stored
Reading the .csv file
<- read_csv("DEG_data.csv") deg
Data cleaning
|>
is the native pipe operator which when used transfers whatever is on the left of the pipe as the first argument in the function that is provided immediately after the pipe
<- deg |>
deg_clean select(2:5) |>
drop_na() |>
#adding a new column calculating log10(pvalue)
mutate(negLog10pval = -log10(pval)) |>
#adding a new column where the default value will be NS, but "UPREGULATED" and "DOWNREGULATED" will be assigned if it meets the desired conditions
mutate(diff_exp = case_when(log2fc > 0.6 & pval < 0.05 ~ "UPREGULATED",
< -0.6 & pval < 0.05 ~ "DOWNREGULATED",
log2fc .default = "NS")) |>
#changing the factor levels according to what is needed in the legend of the plot
mutate(diff_exp = fct_relevel(diff_exp, "UPREGULATED", "DOWNREGULATED", "NS"))
#display the first 6 rows
head(deg_clean)
# A tibble: 6 × 6
gene_symbol pval padj log2fc negLog10pval diff_exp
<chr> <dbl> <dbl> <dbl> <dbl> <fct>
1 MIR1302-2HG 1.00 1.00 1.59e+ 1 0.0000614 NS
2 FAM138A 1.00 1.00 -2.84e-12 0.00000516 NS
3 OR4F5 1.00 1.00 -2.84e-12 0.00000516 NS
4 AL627309.1 0.701 1.00 -6.36e- 1 0.154 NS
5 AL627309.3 1.00 1.00 -2.84e-12 0.00000516 NS
6 AL627309.2 1.00 1.00 -2.84e-12 0.00000516 NS
Declaring a list of genes to be annotated on the plot
This list can be prepared according to what genes need to be displayed on the plot.
<- c("CXCR5", "LINC00115", "FAM41C", "FBXO41", "TRIM54") prot
Plotting the data using ggplot2
package
ggplot works in layers. So, after the ggplot() function, all the other functions following will be added using a ‘+’ symbol instead of the pipe operator |>
|>
deg_clean drop_na() |>
ggplot(aes(log2fc, -log10(pval), color = diff_exp))+
#This layer shows all the points in the plot in the same colour
geom_point(size = 2, alpha = 0.7, shape = 16)+
#Thsi layer colours only the points specified in the 'prot' variable in the previous step
geom_point(data = deg_clean |>
filter(gene_symbol %in% prot)
aes(log2fc, -log10(pval)),
, color = "black", size = 3, shape = 21, fill = "#998ec3")+
#drawing two vertical lines on the plot to denote upregulated and downregulated thresholds
geom_vline(xintercept = c(0.6, -0.6), linetype = "dashed")+
#drawing a horizontal line to denote the P-value cutoff
geom_hline(yintercept = c(-log10(0.05)), linetype = "dashed")+
scale_color_manual(values = c("#fc8d59", "#91bfdb", "grey70"),
labels = c("UPREGULATED", "DOWNREGULATED", "NS"),
name = "",
+
)
#marking the genes from the 'prot' variable
geom_text_repel( data = deg_clean |>
filter(gene_symbol %in% prot),
aes(label = gene_symbol, color = label),
box.padding = 0.5,
point.padding = 0.3,
max.overlaps = Inf,
color = "black",
label.colour = "grey30",
fill = "white")+
#adjusting the overall look
scale_y_continuous(expand = c(0,0))+
scale_x_continuous(limits = c(-10,10), breaks = seq(-10, 10, 2))+
theme_classic()+
labs(
x = "Log2 Fold Change",
y = "-log10 (P-value)"
+
)theme(axis.title = element_text(size = 20),
axis.text.x = element_text(size = 15),
axis.text.y = element_text(size = 15),
legend.text = element_text(size = 10),
legend.position = "top")+
#overrides the default size of the legends
guides(color = guide_legend(override.aes = list(size = 5)))