pipeLIMMA Run analyses of differnetial expression in LIMMA.makeBinarySig Find and count the number of significantly differentially expressed genespqHists Plot the distribution of P and FDR-corrected P (Q) valuesvoom2PCA Conduct principal component analyses on voom (or otherwise) normalized expression matricesvolcanoPlot Generate log2 foldchange vs. P-value volcano plotsvolcanoPair Compare two sets of log2 fold changes, colored by p-valuespipeDESeq2 Runs an analysis of differential expression similar to that of pipeLIMMA, except through the DESeq2 package.The limmaDE2 packages is not on CRAN and can only be installed from github. Make sure the devtools package is installed on your system. The limmaDE2 package also requires several other R packages to be installed. These packages have functions upon which limmaDE2 depends:
limma contains all of the basic statistical elements that are called here.qvalue fdr-corrected p-valuesedgeR Differential expression package used for some transformationsSimSeq permits simulation of RNA-seq dataplyr summarize and manipulate lists and dataframesvegan some distance-based methodsHeatplus heatmap plottingtopGO gene ontology (GO) analysisggplot2 routines for complex plotshexbin methods to simplify data in plotslibrary(devtools)
install_github("jtlovell/limmaDE2")
library("limmaDE2")
library("ggplot2")
library("reshape2")
library("SimSeq")
library("DESeq2")
counts: Raw transcript abundance countsinfo: Experimental design information These two matrices must match exactly, where each row in the info matrix corresponds to each column in counts. For best performance, the name of each gene should be stored in the rownames of the counts matrix.data(kidney) # from simseq
counts<-kidney$counts
counts<-counts[sample(1:nrow(counts),1000),]
info<-with(kidney,
data.frame(id = paste(replic, treatment, sep = "_"),
rep=replic,
Treatment=ifelse(treatment == "Tumor","tumor","cntr"),
stringsAsFactors=F))
colnames(counts)<-info$id
group.a<-c("4619", "4712", "4863", "4865", "5452", "5453", "5454", "5455",
"5456", "5457","5458", "5461", "5462", "5463", "5465", "5467",
"5468", "5469", "5470", "5549","5552", "5580", "5641", "5672",
"5689", "5701", "5703", "5706", "5989", "6088")
info$group<-ifelse(info$rep %in% group.a, "a","b")
with(info, table(group, Treatment))
## Treatment
## group cntr tumor
## a 30 30
## b 42 42
info$trt.grp<-with(info, paste(Treatment, group, sep="_"))
head(info)
## id rep Treatment group trt.grp
## 1 3358_Non-Tumor 3358 cntr b cntr_b
## 2 3358_Tumor 3358 tumor b tumor_b
## 3 3387_Non-Tumor 3387 cntr b cntr_b
## 4 3387_Tumor 3387 tumor b tumor_b
## 5 4619_Non-Tumor 4619 cntr a cntr_a
## 6 4619_Tumor 4619 tumor a tumor_a
counts[1:10,1:3]
## 3358_Non-Tumor 3358_Tumor 3387_Non-Tumor
## ZEB1|6935 1248 3517 1388
## ZNF841|284371 274 951 194
## OSBPL2|9885 2182 3161 2107
## DAOA|267012 0 0 0
## RAB5B|5869 8937 13844 9327
## PFDN4|5203 515 925 375
## TMEM223|79064 1094 1267 962
## CDC42EP3|10602 1418 2801 1799
## MRPL50|54534 1062 1519 1120
## IL11RA|3590 384 656 291
In all statistical analyses, it is important to set the levels of the experimental factors. This is esspecially true in linear modelling, such as limma, where levels are tested against the base level. Here, we set the “Non-Tumor” treatment and “a” group as the base.
info$Treatment <- factor(info$Treatment,
levels = c("cntr", "tumor"))
info$group <- factor(info$group,
levels = c("a", "b"))
stats <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
formula = "~ Treatment")
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting linear model ...
## processing statistics and calculating q-values ...
lmStats<-stats$stats
voom<-stats$voom$E
pipeLIMMA returns three elements: stats, voom and fstats. These are the statistical output of the limma functions: eBayes, voom and topTableF. Inspect the top few rows and columns of each.
stats: eBayes linear model statistics| gene | sigma | s2.post | Amean | Fstat | Fpvalue | |
|---|---|---|---|---|---|---|
| ZEB1|6935 | ZEB1|6935 | 1.2591387 | 1.5671488 | 9.929414 | 64.363972 | 0.0000000 |
| ZNF841|284371 | ZNF841|284371 | 0.8035096 | 0.6461980 | 7.196959 | 40.413552 | 0.0000000 |
| OSBPL2|9885 | OSBPL2|9885 | 0.6142314 | 0.3832333 | 10.090835 | 195.217234 | 0.0000000 |
| DAOA|267012 | DAOA|267012 | 0.3963376 | 0.1674536 | -2.379793 | 2.425542 | 0.1215533 |
| RAB5B|5869 | RAB5B|5869 | 0.3284865 | 0.1192599 | 12.201714 | 78.891302 | 0.0000000 |
| PFDN4|5203 | PFDN4|5203 | 0.7327044 | 0.5396079 | 8.042116 | 38.676736 | 0.0000000 |
voom: normalized expression matrix| 3358_Non-Tumor | 3358_Tumor | 3387_Non-Tumor | 3387_Tumor | 4619_Non-Tumor | 4619_Tumor | |
|---|---|---|---|---|---|---|
| ZEB1|6935 | 9.377552 | 10.108863 | 9.495664 | 10.461569 | 9.648007 | 10.657851 |
| ZNF841|284371 | 7.192234 | 8.222588 | 6.659979 | 7.756724 | 6.922638 | 7.720599 |
| OSBPL2|9885 | 10.183338 | 9.954922 | 10.097669 | 9.677473 | 9.827493 | 10.834705 |
| DAOA|267012 | -1.908428 | -2.671471 | -1.943648 | -2.623737 | -1.448049 | -2.142038 |
| RAB5B|5869 | 12.217228 | 12.085554 | 12.243627 | 11.880647 | 12.106659 | 12.274430 |
| PFDN4|5203 | 8.101401 | 8.182618 | 7.609021 | 7.937551 | 8.439171 | 8.270532 |
fstats: F-statistics for each factor in the model, in this case, just treatment| gene | Treatment_F | Treatment_P.Value | Treatment_Q.Value |
|---|---|---|---|
| AAGAB|79719 | 337.7770122 | 0.0000000 | 0.0000000 |
| ABCB10|23456 | 49.1438369 | 0.0000000 | 0.0000000 |
| ABCB5|340273 | 5.8306633 | 0.0169960 | 0.0239718 |
| ABCC1|4363 | 153.8390560 | 0.0000000 | 0.0000000 |
| ABCC6|368 | 0.0717106 | 0.7892431 | 0.8119785 |
| ABHD3|171586 | 150.6163678 | 0.0000000 | 0.0000000 |
duplicateCorrelation among blocks, then uses the blocking variable in the linear model fit.info$block <- rep(1:2,each=nrow(info)/2)
stats.block <- pipeLIMMA(counts = counts,
info = info,
block = info$block,
formula = "~ Treatment")
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## calculating duplicate correlation among replicates ...
## fitting linear model ...
## processing statistics and calculating q-values ...
stats.factorial <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
formula = "~ Treatment + group + Treatment*group")
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting linear model ...
## processing statistics and calculating q-values ...
Sometimes, it may make more sense to use specific contrasts to test for differential expression. For example, let’s say we are interested in how the tumors and non-tumor tissues differ for each of the two groups.
design <- with(info, model.matrix(~ 0 + trt.grp))
colnames(design)<-gsub("trt.grp","",colnames(design))
head(design)
## cntr_a cntr_b tumor_a tumor_b
## 1 0 1 0 0
## 2 0 0 0 1
## 3 0 1 0 0
## 4 0 0 0 1
## 5 1 0 0 0
## 6 0 0 1 0
contrast.matrix <- makeContrasts(
tumor_a - cntr_a ,
tumor_b - cntr_b,
levels = design)
head(contrast.matrix)
## Contrasts
## Levels tumor_a - cntr_a tumor_b - cntr_b
## cntr_a -1 0
## cntr_b 0 -1
## tumor_a 1 0
## tumor_b 0 1
stats <- pipeLIMMA(counts = counts,
info = info,
block = NULL,
design = design,
contrast.matrix = contrast.matrix)
## calculating normalization factors ...
## running voom normalization correcting for quality weights ...
## fitting model to contrast matrix ...
## processing statistics and calculating q-values ...
stats.contrasts <- stats$stats
The function makeBinarySig looks for a provided string (e.g. “Q.Value”) and outputs a matrix with whether or not those columns have values <= the provided alpha
sigs <- makeBinarySig(stats.contrasts, what = "Qvalue", alpha = 0.05)
## ebayesQvalue_tumor_a...cntr_a 705
## ebayesQvalue_tumor_b...cntr_b 679
counts2Venn(x=sigs, cols=c(1,2), names=c("in.grpA","in.grpB"),
colors=c("blue","red"),type="limma", legx=-3.3,legy=-3)
counts2Venn(x=sigs, cols=c(1,2), names=c("in.grpA","in.grpB"),
colors=c("blue","red"),type="Euler", legx=-3.3,legy=-3)
Volcano plots are a good way to look at the differences between two experimental levels. Here, we compare the extent of differential expression between the “high” treatment to the “low” treatment.
with(lmStats, volcanoPlot(pval = ebayesPvalue_Treatmenttumor,
lfc = Treatmenttumor_logFC,
sig = ebayesQvalue_Treatmenttumor,
main = "no tumor vs. tumor Volcano Plot",
xlab = "tumor - no tumor Log2 Fold Change",
bty = "n", legpos = "top", leginset = c(0,-.1)))
## sig
## updown FALSE TRUE
## down 90 355
## up 178 377
sigs <- data.frame(makeBinarySig(stats.contrasts, what = "Qvalue", alpha = 0.05))
## ebayesQvalue_tumor_a...cntr_a 705
## ebayesQvalue_tumor_b...cntr_b 679
names(sigs)<-c("sig.a","sig.b")
sigs$sign.a<-sign(stats.contrasts$tumor_a...cntr_a_logFC)
sigs$sign.b<-sign(stats.contrasts$tumor_b...cntr_b_logFC)
cols<-with(sigs, ifelse(sig.a + sig.b == 0,
"grey",
ifelse(sig.a + sig.b == 2 & sign.a*sign.b == 1,
"pink",
ifelse(sig.a + sig.b == 2 & sign.a*sign.b == -1,
"cornflowerblue",
ifelse(sig.a == 1, "darkblue", "darkred")))))
with(stats.contrasts, volcanoPair(lfc1 = tumor_a...cntr_a_logFC,
lfc2 = tumor_b...cntr_b_logFC,
pt.col = cols, pt.pch = 19, pt.cex=.5,
xlab = "Tumor - control LFC (group A)",
ylab = "Tumor - control LFC (group B)"))
## pt.col
## cornflowerblue darkblue darkred grey pink
## 32 167 141 154 506
It is often a good idea to get an idea of how the individuals (libraries) are structured - how similar are they to eachother. Principal component analyses allow for this kind of inference.
pca12 <- counts2PCA(counts=voom, info = info, ids = info$id, pcas2return = 6)
pca12.var <- pca12[[2]]
pca12 <- pca12[[1]]
gcols <- c("darkblue", "blue", "gold", "green", "pink", "red")
ggplot(pca12, aes(x = PC1, y = PC2, col = group, shape = Treatment)) +
geom_vline(xintercept = 0, lty = 2)+
geom_hline(yintercept = 0, lty = 2)+
geom_point() +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background = element_blank()) +
labs(x = paste("PCA1 (", pca12.var[1],"%)", sep = ""),
y = paste("PCA2 (", pca12.var[2],"%)", sep = ""),
title = "PCA analysis of voom-normalized expression")
For more information, visit lovelleeb.weebly.com/analytics