From My blogger The t-test for CpG islands and volcano plot
In DNA methylation data analysis, t-test statistics is used to identify differences in DNA methylation at single CpG sites.
Now I’m going to use a modified t-test in an R package, limma, to identify differentially methylated CpGs between samples from colon cancer and normal tissues.
library(limma)
load("dna1.rda")
The design matrix indicates which arrays are from cancer tissues.
design <- model.matrix(~pd$Status)
Fit a linear model for each gene to estimate the fold changes and standard errors.
fit <- lmFit(meth, design)
Apply empirical Bayes smoothing to the standard errors.
eb <- ebayes(fit)
A volcano plot reveals effect size on x-axis and the statistical significance on y-axis so that highly dysregulated genes appear farther to the right and left sides while highly significant changes are higher on the plot.
library(ggplot2)
fc = fit$coef[,2]
sig = -log10(eb$p.value[,2])
df <- data.frame(fc, sig)
df$thre <- as.factor(abs(fc) < 0.4 & sig < -log10(0.05))
ggplot(data=df, aes(x=fc, y = sig, color=thre)) +
geom_point(alpha=.6, size=1.2) +
theme(legend.position="none") +
xlab("Effect size") +
ylab("-log10 p value")
Why do these two colors match so well? I like it!