If you face any problem regarding the libraries like “DESeq2” and “apeglm”, just remember that these packages belong to the package called “BiocManager”. If loading fails, then just do the followings:
library("DESeq2")
library("dplyr")
library("apeglm")
Lets download the GLDS-102_rna_seq_Normalized_Counts.csv and GLDS-102_rna_seq_SampleTable.csv files.
After downloading, load the csv files. [remember to put the files in the same directory]
countData <- read.csv("GLDS-102_rna_seq_Normalized_Counts.csv")
colData <- read.csv("GLDS-102_rna_seq_SampleTable.csv", sep = ",", row.names = 1)
First we need to round up the counts, as DESeq2 only allows integers as an input not fractional numbers. This depends on the method of alignment that was used upstream.
countData %>%
mutate_if(is.numeric, ceiling)
countData[, 2:13] <- sapply(countData[, 2:13], as.integer)
row.names(countData) <- countData[,1]
countData <- countData[2:13]
row.names(colData) <- c(stringr::str_replace_all(rownames(colData), "-", "."))
rownames(colData) == colnames(countData)
Matching these two csv files is a big challenge, so be with me ^_^.
Mutate if the data is numeric then round up(ceiling).
Go over each column and make the values an integer.
Take the row names of countData and make it the first column.
Next we are going to remove the copied first column and modify the countData by slicing.
If you see the files, then you will notice that row names of colData and column names of countData are not exactly matching. So we are replacing the “-” with “.” in colData and rename the rows of colData.
Lastly we are checking that are the individuals matching?
dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~condition)
dds <- dds[rowSums(counts(dds)>2) >= 4]
dds <- DESeq(dds)
res <- results(dds)
First we convert our countData to a readable format for DESeq and then design our experiment by condition.
Next we are going to read out all the locals of our genes observed no count changes between the two experiments.
Now we are calling DESeq function to do differential gene expression.
And then we are going to create the results.
resLFC <- lfcShrink(dds, coef = 2)
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
## Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
## sequence count data: removing the noise and preserving large differences.
## Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
(resOrdered <- res[order(res$padj), ])
## log2 fold change (MLE): condition Space.Flight vs Ground.Control
## Wald test p-value: condition Space.Flight vs Ground.Control
## DataFrame with 21653 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000026077 450.811 1.439070 0.1538657 9.35276 8.53866e-21
## ENSMUSG00000002250 1454.342 0.898121 0.1113041 8.06908 7.08315e-16
## ENSMUSG00000007872 1720.636 -0.825132 0.1137404 -7.25452 4.03078e-13
## ENSMUSG00000021775 2818.351 -0.891902 0.1232374 -7.23726 4.57832e-13
## ENSMUSG00000040998 14572.307 0.494630 0.0719056 6.87888 6.03234e-12
## ... ... ... ... ... ...
## ENSMUSG00000118345 3.90266 0.061642112 0.672231 0.091697798 0.926938
## ENSMUSG00000118353 6.89344 -0.526691728 0.466534 -1.128945007 0.258921
## ENSMUSG00000118358 2.98946 -0.000118756 0.840203 -0.000141342 0.999887
## ENSMUSG00000118383 41.68686 0.242245947 0.244118 0.992330339 0.321036
## ENSMUSG00000118384 6.97645 -0.275097077 0.533254 -0.515883833 0.605936
## padj
## <numeric>
## ENSMUSG00000026077 1.23879e-16
## ENSMUSG00000002250 5.13811e-12
## ENSMUSG00000007872 1.66056e-09
## ENSMUSG00000021775 1.66056e-09
## ENSMUSG00000040998 1.75034e-08
## ... ...
## ENSMUSG00000118345 NA
## ENSMUSG00000118353 NA
## ENSMUSG00000118358 NA
## ENSMUSG00000118383 NA
## ENSMUSG00000118384 NA
summary(res)
##
## out of 21653 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 267, 1.2%
## LFC < 0 (down) : 234, 1.1%
## outliers [1] : 23, 0.11%
## low counts [2] : 7122, 33%
## (mean count < 43)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Now to make our results more consistence, we are going to use log2 fold change(LFC).
Now we are going to put the significantly different genes at the top.
Lastly, lets summarize the results.
FLT_Vs_GC <- as.data.frame(res$log2FoldChange)
head(FLT_Vs_GC)
## res$log2FoldChange
## 1 -0.04815999
## 2 0.18946851
## 3 -0.14704150
## 4 0.55857800
## 5 0.20784257
## 6 -0.08172080
Now lets plot our data from -2 to +2 in y axis and make a pdf of the “Space Flight vs Ground Control”.
plotMA(resLFC, ylim = c(-2, 2))
pdf(file = "MA_Plot_FLT_Vs_GC.pdf")
plotMA(resLFC, ylim = c(-2, 2))
dev.off()
Now lets save our differential expression results to a file.
write.csv(as.data.frame(resOrdered), file = "Mouse_DESeq.csv")
Lets perform a custom transformation.
dds <- estimateSizeFactors(dds)
se <- SummarizedExperiment(log2(counts(dds, normalize = TRUE) + 1), colData = colData(dds))
pdf(file = "PCA_PLOT_FLT_Vs_GC.pdf")
plotPCA(DESeqTransform(se), intgroup = "condition")
dev.off()
plotPCA(DESeqTransform(se), intgroup = "condition")