Lattice vs Ggplot2 and aggregate vs plyr

Goal: to get a little more familiar with the awesome ggplot2 and plyr functions by mimicking some results that were obtained with the older lattice and aggregate

Link to original tutorial

Data preparation

Load the required libraries and data

library(lattice)
library(plyr)
library(ggplot2)
prDat <- read.table("GSE4051_data.tsv")
prDes <- readRDS("GSE4051_design.rds")

Single gene exploration

Extract data for one gene

set.seed(987)
theGene <- sample(1:nrow(prDat), 1)
pDat <- data.frame(prDes, gExp = unlist(prDat[theGene, ]))

Explore!
What are sample means in wildtype/knockout? First using aggregate, then using plyr

aggregate(gExp ~ gType, pDat, FUN = mean)
gType gExp
wt 9.76
NrlKO 9.55
ddply(pDat, ~ gType, summarize, gExp = mean(gExp))
gType gExp
wt 9.76
NrlKO 9.55

Make sure the two actually returned identical results

identical(aggregate(gExp ~ gType, pDat, FUN = mean),
          ddply(pDat, ~ gType, summarize, gExp = mean(gExp)))
## [1] TRUE

Plot!
Strip plot of just the one gene, knockout vs wildtype. First using lattice, then using ggplot2

stripplot(gType ~ gExp, pDat)

plot of chunk unnamed-chunk-7

ggplot(pDat, aes(x = gExp, y = gType)) + geom_point()

plot of chunk unnamed-chunk-7

Multiple genes

Load in the dataset

kDat <- readRDS("GSE4051_MINI.rds")

Explore!
Average expression of eggBomb over developmental stages, first using aggregate, then using plyr

aggregate(eggBomb ~ devStage, kDat, FUN = mean)
devStage eggBomb
E16 6.88
P2 6.41
P6 6.46
P10 7.14
4_weeks 7.06
ddply(kDat, ~ devStage, summarize, exp = mean(eggBomb))
devStage exp
E16 6.88
P2 6.41
P6 6.46
P10 7.14
4_weeks 7.06

Same thing, but now aggregate based on dev stage AND genotype

aggregate(eggBomb ~ gType * devStage, kDat, FUN = mean)
gType devStage eggBomb
wt E16 6.90
NrlKO E16 6.85
wt P2 6.61
NrlKO P2 6.21
wt P6 6.65
NrlKO P6 6.27
wt P10 7.04
NrlKO P10 7.24
wt 4_weeks 7.12
NrlKO 4_weeks 7.01
ddply(kDat, .(gType, devStage), summarize, exp = mean(eggBomb))
gType devStage exp
wt E16 6.90
wt P2 6.61
wt P6 6.65
wt P10 7.04
wt 4_weeks 7.12
NrlKO E16 6.85
NrlKO P2 6.21
NrlKO P6 6.27
NrlKO P10 7.24
NrlKO 4_weeks 7.01

Multigene plotting

Grab 6 genes: 3 interesting, 3 boring

keepGenes <- c("1431708_a_at", "1424336_at", "1454696_at",
               "1416119_at", "1432141_x_at", "1429226_at")
miniDat <- subset(prDat, rownames(prDat) %in% keepGenes)
miniDat <- data.frame(gExp = as.vector(t(as.matrix(miniDat))),
                      gene = factor(rep(rownames(miniDat), each = ncol(miniDat)),
                                    levels = keepGenes))
miniDat <- suppressWarnings(data.frame(prDes, miniDat))

Plot!
Strip plot of the expression vs genotype, one plot per gene. First using lattice, then using ggplot2

stripplot(gType ~ gExp | gene, miniDat,
          scales = list(x = list(relation = "free")),
          group = gType, auto.key = TRUE)

plot of chunk unnamed-chunk-14

ggplot(miniDat, aes(x = gExp, y = gType, color = gType)) +
  facet_wrap(~ gene, scales="free_x") +
  geom_point(alpha = 0.7) +
  theme(panel.grid.major.x = element_blank())

plot of chunk unnamed-chunk-14