Loading the libraries

If you face any problem regarding the libraries like “DESeq2” and “apeglm”, just remember that these packages belong to the package called “BiocManager”. If loading fails, then just do the followings:

  • install.package(“BiocManager”)
  • BiocManager::install(“apeglm”)
  • BiocManager::install(“DESeq2”)
library("DESeq2")
library("dplyr")
library("apeglm")

Loading csv files

Downloading …

Lets download the GLDS-102_rna_seq_Normalized_Counts.csv and GLDS-102_rna_seq_SampleTable.csv files.

Loading …

After downloading, load the csv files. [remember to put the files in the same directory]

countData <- read.csv("GLDS-102_rna_seq_Normalized_Counts.csv")

colData <- read.csv("GLDS-102_rna_seq_SampleTable.csv", sep = ",", row.names = 1)

Rounding up and checking

First we need to round up the counts, as DESeq2 only allows integers as an input not fractional numbers. This depends on the method of alignment that was used upstream.

countData %>%
  mutate_if(is.numeric, ceiling)

countData[, 2:13] <- sapply(countData[, 2:13], as.integer)

row.names(countData) <- countData[,1]

countData <- countData[2:13]

row.names(colData) <- c(stringr::str_replace_all(rownames(colData), "-", "."))

rownames(colData) == colnames(countData)

Matching these two csv files is a big challenge, so be with me ^_^.

  1. Mutate if the data is numeric then round up(ceiling).

  2. Go over each column and make the values an integer.

  3. Take the row names of countData and make it the first column.

  4. Next we are going to remove the copied first column and modify the countData by slicing.

  5. If you see the files, then you will notice that row names of colData and column names of countData are not exactly matching. So we are replacing the “-” with “.” in colData and rename the rows of colData.

  6. Lastly we are checking that are the individuals matching?

DESeqDataSet object

dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~condition)

dds <- dds[rowSums(counts(dds)>2) >= 4]

dds <- DESeq(dds)

res <- results(dds)
  1. First we convert our countData to a readable format for DESeq and then design our experiment by condition.

  2. Next we are going to read out all the locals of our genes observed no count changes between the two experiments.

  3. Now we are calling DESeq function to do differential gene expression.

  4. And then we are going to create the results.

Applying LFC

resLFC <- lfcShrink(dds, coef = 2)
## using 'apeglm' for LFC shrinkage. If used in published research, please cite:
##     Zhu, A., Ibrahim, J.G., Love, M.I. (2018) Heavy-tailed prior distributions for
##     sequence count data: removing the noise and preserving large differences.
##     Bioinformatics. https://doi.org/10.1093/bioinformatics/bty895
(resOrdered <- res[order(res$padj), ])
## log2 fold change (MLE): condition Space.Flight vs Ground.Control 
## Wald test p-value: condition Space.Flight vs Ground.Control 
## DataFrame with 21653 rows and 6 columns
##                     baseMean log2FoldChange     lfcSE         stat      pvalue
##                    <numeric>      <numeric> <numeric>    <numeric>   <numeric>
## ENSMUSG00000026077   450.811       1.439070 0.1538657      9.35276 8.53866e-21
## ENSMUSG00000002250  1454.342       0.898121 0.1113041      8.06908 7.08315e-16
## ENSMUSG00000007872  1720.636      -0.825132 0.1137404     -7.25452 4.03078e-13
## ENSMUSG00000021775  2818.351      -0.891902 0.1232374     -7.23726 4.57832e-13
## ENSMUSG00000040998 14572.307       0.494630 0.0719056      6.87888 6.03234e-12
## ...                      ...            ...       ...          ...         ...
## ENSMUSG00000118345   3.90266    0.061642112  0.672231  0.091697798    0.926938
## ENSMUSG00000118353   6.89344   -0.526691728  0.466534 -1.128945007    0.258921
## ENSMUSG00000118358   2.98946   -0.000118756  0.840203 -0.000141342    0.999887
## ENSMUSG00000118383  41.68686    0.242245947  0.244118  0.992330339    0.321036
## ENSMUSG00000118384   6.97645   -0.275097077  0.533254 -0.515883833    0.605936
##                           padj
##                      <numeric>
## ENSMUSG00000026077 1.23879e-16
## ENSMUSG00000002250 5.13811e-12
## ENSMUSG00000007872 1.66056e-09
## ENSMUSG00000021775 1.66056e-09
## ENSMUSG00000040998 1.75034e-08
## ...                        ...
## ENSMUSG00000118345          NA
## ENSMUSG00000118353          NA
## ENSMUSG00000118358          NA
## ENSMUSG00000118383          NA
## ENSMUSG00000118384          NA
summary(res)
## 
## out of 21653 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 267, 1.2%
## LFC < 0 (down)     : 234, 1.1%
## outliers [1]       : 23, 0.11%
## low counts [2]     : 7122, 33%
## (mean count < 43)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
  1. Now to make our results more consistence, we are going to use log2 fold change(LFC).

  2. Now we are going to put the significantly different genes at the top.

  3. Lastly, lets summarize the results.

FLT_Vs_GC <- as.data.frame(res$log2FoldChange)

head(FLT_Vs_GC)
##   res$log2FoldChange
## 1        -0.04815999
## 2         0.18946851
## 3        -0.14704150
## 4         0.55857800
## 5         0.20784257
## 6        -0.08172080

MA-plot

Now lets plot our data from -2 to +2 in y axis and make a pdf of the “Space Flight vs Ground Control”.

PlotMA

plotMA(resLFC, ylim = c(-2, 2))

Saving the plotMA…

pdf(file = "MA_Plot_FLT_Vs_GC.pdf")

plotMA(resLFC, ylim = c(-2, 2))

dev.off()

Saving the expression…

Now lets save our differential expression results to a file.

write.csv(as.data.frame(resOrdered), file = "Mouse_DESeq.csv")

Customzation

Lets perform a custom transformation.

dds <- estimateSizeFactors(dds)

se <- SummarizedExperiment(log2(counts(dds, normalize = TRUE) + 1), colData = colData(dds))

pdf(file = "PCA_PLOT_FLT_Vs_GC.pdf")

plotPCA(DESeqTransform(se), intgroup = "condition")

dev.off()

PlotPCA

plotPCA(DESeqTransform(se), intgroup = "condition")