Multivariable multiple regression with limma()

Need to install limma:

source(“http://www.bioconductor.org/biocLite.R”) biocLite(“limma”) biocLite(“statmod”)

Load packages:

library(limma)
library(lattice)
library(ggplot2)

Load photoRec data:

prDat <- read.delim("../photoRecDataset/dataClean/GSE4051_data.tsv")
str(prDat, max.level = 0)

## 'data.frame':    29949 obs. of  39 variables:


prDes <- readRDS("../photoRecDataset/dataClean/GSE4051_design.rds")
str(prDes)

## 'data.frame':    39 obs. of  4 variables:
##  $ sidChar : chr  "Sample_11" "Sample_12" "Sample_2" "Sample_9" ...
##  $ sidNum  : num  11 12 2 9 36 37 38 39 16 17 ...
##  $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 5 5 5 5 5 5 5 5 1 1 ...
##  $ gType   : Factor w/ 2 levels "wt","NrlKO": 2 2 2 2 1 1 1 1 2 2 ...

You might want to use the functions you wrote last week to extract and stripplot excerpts from the photoRec dataset. If you stored the code defining those functions cleanly in a script, you could make them available now by using the source() function.

source("../photoRecDataset/codes/prepDatScript.r")
source("../photoRecDataset/codes/makeStripplot.R")

The difficulty in estimating gene wise variance

The lmFit() from limma is arguably your main workhorse function for fitting a common linear model to the data for a very large number of genes. It has at least two strengths to recommend it:

1.  It does this in a computationally efficient way, i.e. better than you writing a top-level for() loop and probably even better than pursuing an apply()-type strategy.
2.  It borrows strength across the large number of genes (= datasets) to moderate the gene-wise estimate of error variance.

Before we dive in and start using limma with the photoRec dataset, let's do a small simulation to illustrate how lousy variance estimates can be when the number of samples is small.

Let's simulate data for 1000 genes. For each gene, we get 3 observations from a normal distribution with mean 0 and variance 1. We generate the data for each gene independent of the others.

m <- 1000
n <- 3
x <- matrix(rnorm(m * n), nrow = m)

Let's take the observed gene-wise variances. Yes, folks, we are estimating variance with samples of size 3.

obsVar <- apply(x, 1, var)
summary(obsVar)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.001   0.296   0.740   0.972   1.360   6.640

mean(obsVar < 1/3)

## [1] 0.279

densityplot(~obsVar, n = 200)

plot of chunk unnamed-chunk-5

Notice how many of the observed variances are freakishly small (and freakishly large!), even though they are indeed equal to 1 “on average”. For example, we see that at least a quarter of the genes appear to exhibit a sample variance that is less than one-third the true variance. This can wreak havoc with statistical inference, such as t-statistics. This is what limma – or the statistical methods it embodies, actually – is designed to combat.

Fit a linear model: explain gene expression in the wt mice as a function of developmental stage (one-way ANOVA)

Extract the data for wt mice only

wtDes <- subset(prDes, gType == "wt")
str(wtDes)

## 'data.frame':    20 obs. of  4 variables:
##  $ sidChar : chr  "Sample_36" "Sample_37" "Sample_38" "Sample_39" ...
##  $ sidNum  : num  36 37 38 39 20 21 22 23 32 33 ...
##  $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 5 5 5 5 1 1 1 1 4 4 ...
##  $ gType   : Factor w/ 2 levels "wt","NrlKO": 1 1 1 1 1 1 1 1 1 1 ...


wtDat <- subset(prDat, select = prDes$gType == "wt")
str(wtDat, max.level = 0)

## 'data.frame':    29949 obs. of  20 variables:

Before we can use limma we must make our design matrix.

Let's accept the default “ref + treatment effects” scheme for handling the devStage factor. I encourage you to inspect the design matrix and confirm it's what you expect.

(wtDesMat <- model.matrix(~devStage, wtDes))

##    (Intercept) devStageP2 devStageP6 devStageP10 devStage4_weeks
## 5            1          0          0           0               1
## 6            1          0          0           0               1
## 7            1          0          0           0               1
## 8            1          0          0           0               1
## 12           1          0          0           0               0
## 13           1          0          0           0               0
## 14           1          0          0           0               0
## 15           1          0          0           0               0
## 20           1          0          0           1               0
## 21           1          0          0           1               0
## 22           1          0          0           1               0
## 23           1          0          0           1               0
## 28           1          1          0           0               0
## 29           1          1          0           0               0
## 30           1          1          0           0               0
## 31           1          1          0           0               0
## 36           1          0          1           0               0
## 37           1          0          1           0               0
## 38           1          0          1           0               0
## 39           1          0          1           0               0
## attr(,"assign")
## [1] 0 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$devStage
## [1] "contr.treatment"

Now we will fit the model, for all probes at once, and use eBayes() to moderate the estimated error variances:

wtFit <- lmFit(wtDat, wtDesMat)
wtEbFit <- eBayes(wtFit)

The first thing we might ask is “which genes show differential expression over the course of development”? This can be addressed with an overall F test for the model. In the language used in lecture, we will compare a “big” model to a “small” model, where the “big” model includes a mean parameter (or effect) for each level of devStage and the “small” model includes a single mean parameter, e.g. an intercept. You might expect this to be the F test performed by topTable() by default, i.e. when no specific coefficients or contrasts are given to the coef argument …

topTable(wtEbFit)

##              X.Intercept. devStageP2 devStageP6 devStageP10
## 1423641_s_at        12.18    -0.0175     0.0750      0.0675
## 1438940_x_at        12.86     0.0850     0.1325      0.3425
## 1438657_x_at        12.78     0.1400     0.1250     -0.1850
## 1456736_x_at        12.32     0.1625     0.3050      0.2075
## 1436884_x_at        12.93     0.1775     0.3225      0.0300
## 1419700_a_at        12.32     0.1650     0.6475      0.8175
## 1435800_a_at        12.29     0.0450     0.6825      0.9000
## 1454613_at          12.47    -0.1075    -0.0500     -0.1025
## 1451240_a_at        13.00     0.3100     0.2800      0.2800
## 1450084_s_at        12.63     0.0825     0.0525      0.1725
##              devStage4_weeks AveExpr     F   P.Value adj.P.Val
## 1423641_s_at          0.1800   12.24 45350 3.574e-36 5.201e-32
## 1438940_x_at          0.3500   13.04 44957 3.865e-36 5.201e-32
## 1438657_x_at         -0.4500   12.71 43486 5.210e-36 5.201e-32
## 1456736_x_at          0.0725   12.47 39509 1.233e-35 6.725e-32
## 1436884_x_at          0.0250   13.04 39269 1.302e-35 6.725e-32
## 1419700_a_at          0.6825   12.78 39121 1.347e-35 6.725e-32
## 1435800_a_at          1.0200   12.81 36668 2.410e-35 1.031e-31
## 1454613_at           -0.3825   12.34 35835 2.962e-35 1.078e-31
## 1451240_a_at         -0.3700   13.10 35481 3.239e-35 1.078e-31
## 1450084_s_at          0.2600   12.75 34411 4.265e-35 1.234e-31

by default, topTable() reports the top 10 hits. But let's take more care and specify explicitly the coefficients we want to test for equality with zero. Recall that one can specify these by number but I recommend doing this by name.

# this will work but is not recommended
topTable(wtEbFit, coef = 2:5)

##              devStageP2 devStageP6 devStageP10 devStage4_weeks AveExpr
## 1440645_at       0.3990     0.1953      0.9200           3.961   6.528
## 1416041_at       0.1580     0.4798      0.3327           5.114   9.383
## 1425222_x_at     0.8820     0.7995      1.5488           5.532   7.028
## 1451635_at       1.3025     1.1900      2.0160           6.188   8.319
## 1429028_at      -2.4432    -3.4072     -4.3105          -4.602   8.045
## 1422929_s_at    -2.9117    -3.6182     -3.5472          -3.661   7.278
## 1424852_at       0.4575     0.2297      0.5740           3.979   7.454
## 1425171_at       0.9980     3.0530      5.2787           6.079   9.620
## 1451617_at       0.7255     2.5127      4.9837           6.685   8.817
## 1451618_at       0.6027     2.8902      5.0507           6.288   9.431
##                  F   P.Value adj.P.Val
## 1440645_at   425.4 1.588e-17 4.755e-13
## 1416041_at   195.5 1.522e-14 2.280e-10
## 1425222_x_at 173.4 4.348e-14 4.341e-10
## 1451635_at   157.3 1.013e-13 7.585e-10
## 1429028_at   148.8 1.646e-13 9.203e-10
## 1422929_s_at 146.9 1.844e-13 9.203e-10
## 1424852_at   143.2 2.290e-13 9.799e-10
## 1425171_at   138.8 3.002e-13 1.124e-09
## 1451617_at   136.5 3.485e-13 1.160e-09
## 1451618_at   134.2 4.032e-13 1.207e-09


# remind yoursefl the coefficients names

colnames(coef(wtEbFit))

## [1] "(Intercept)"     "devStageP2"      "devStageP6"      "devStageP10"    
## [5] "devStage4_weeks"


(dsHits <- topTable(wtEbFit, coef = grep("devStage", colnames(coef(wtEbFit)))))

##              devStageP2 devStageP6 devStageP10 devStage4_weeks AveExpr
## 1440645_at       0.3990     0.1953      0.9200           3.961   6.528
## 1416041_at       0.1580     0.4798      0.3327           5.114   9.383
## 1425222_x_at     0.8820     0.7995      1.5488           5.532   7.028
## 1451635_at       1.3025     1.1900      2.0160           6.188   8.319
## 1429028_at      -2.4432    -3.4072     -4.3105          -4.602   8.045
## 1422929_s_at    -2.9117    -3.6182     -3.5472          -3.661   7.278
## 1424852_at       0.4575     0.2297      0.5740           3.979   7.454
## 1425171_at       0.9980     3.0530      5.2787           6.079   9.620
## 1451617_at       0.7255     2.5127      4.9837           6.685   8.817
## 1451618_at       0.6027     2.8902      5.0507           6.288   9.431
##                  F   P.Value adj.P.Val
## 1440645_at   425.4 1.588e-17 4.755e-13
## 1416041_at   195.5 1.522e-14 2.280e-10
## 1425222_x_at 173.4 4.348e-14 4.341e-10
## 1451635_at   157.3 1.013e-13 7.585e-10
## 1429028_at   148.8 1.646e-13 9.203e-10
## 1422929_s_at 146.9 1.844e-13 9.203e-10
## 1424852_at   143.2 2.290e-13 9.799e-10
## 1425171_at   138.8 3.002e-13 1.124e-09
## 1451617_at   136.5 3.485e-13 1.160e-09
## 1451618_at   134.2 4.032e-13 1.207e-09

You will notice that these are not the same hits we got with our first call to topTable(). Compare, e.g., the Affy IDs for the top hits and/or look at the typical F statistic magnitudes. And so we learn that you really must use the coef argument (or a contrasts workflow in more complicated settings) to explicitly define what you regard as a hit.

Use the hit list you stored above and your functions for extracting and plotting data to produce this plot for hits 3, 6, and 9 on the list.

# select the genes
theGenes <- rownames(dsHits[c(3, 6, 9), ])

# use the makeStripplot and prepDat functions to plot the 3 hit genes
makeStripplot(prepDat(wtDat, wtDes, theGenes))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-11

Does it look plausible to you that – using only wild type data – these probes show the most compelling evidence for expression change over development?

Be the boss of topTable()

How many probes have Benjamini-Hochberg (“BH”) adjusted p-values for the F test conducted above that are less than 1e-05?

nrow(topTable(wtEbFit, number = Inf, p.value = 1e-05, coef = grep("devStage", 
    colnames(coef(wtEbFit)))))

## [1] 350


# or

topHits <- topTable(wtEbFit, number = Inf, p.value = 1e-05, coef = grep("devStage", 
    colnames(coef(wtEbFit))))
nrow(topHits)

## [1] 350

Provide Affy ID, F statistic, BH adjusted p-value, and the estimated effect for developmental stage “P6” (in that order) for the 63rd hit on this list?.

topHits <- topTable(wtEbFit, number = Inf, p.value = 1e-05, coef = grep("devStage", 
    colnames(coef(wtEbFit))))
names(topHits)

## [1] "devStageP2"      "devStageP6"      "devStageP10"     "devStage4_weeks"
## [5] "AveExpr"         "F"               "P.Value"         "adj.P.Val"

topHits[63, c("F", "adj.P.Val", "devStageP6")]

##                  F adj.P.Val devStageP6
## 1451633_a_at 64.01 1.049e-07      2.069


# alternatively can do it like this:
topTable(wtEbFit, number = Inf, p.value = 1e-05, coef = grep("devStage", colnames(coef(wtEbFit))))[63, 
    c("F", "adj.P.Val", "devStageP6")]

##                  F adj.P.Val devStageP6
## 1451633_a_at 64.01 1.049e-07      2.069

Consider the effects associated with developmental stages P2 and P10.

p2hits <- topTable(wtEbFit, number = Inf, sort.by = "none", coef = grep("devStageP2", 
    colnames(coef(wtEbFit))))
p10hits <- topTable(wtEbFit, number = Inf, sort.by = "none", coef = grep("devStageP10", 
    colnames(coef(wtEbFit))))
names(p2hits)

## [1] "logFC"     "AveExpr"   "t"         "P.Value"   "adj.P.Val" "B"

Scatterplot the t statistics for the test that the P2 effect is zero against that for P10.

smoothScatter(p2hits$t ~ p10hits$t, xlim = c(-20, 20), ylim = c(-20, 20), xlab = "t-statistic for effect of P10", 
    ylab = "t-statistic for effect of P2")

## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009

abline(a = 0, b = 1, col = "orange")

plot of chunk unnamed-chunk-15

Create a densityplot of the associated adjusted p-values, so you can get a sense of which developmental stage, P2 or P10, is more clearly distinguished from baseline E16.

densityplot(~p2hits$adj.P.Val + p10hits$adj.P.Val, auto.key = T, plot.points = F, 
    n = 200)

plot of chunk unnamed-chunk-16

P10 is more clearly distinquished from E16 (the reference).

If you require a BH adjusted p-value less than 1e-03, how many hits do you get for P2? How many for P10? How much overlap is there?

addmargins(table(p2hits$adj.P.Val < 0.001, p10hits$adj.P.Val < 0.001, dnn = c("P2", 
    "P10")))

##        P10
## P2      FALSE  TRUE   Sum
##   FALSE 29201   695 29896
##   TRUE      1    52    53
##   Sum   29202   747 29949

With the p value threshold at 0.001 there are 53 hits for P2, 747 hits for P10, and 52 overlapping hits for both.

Now just focus on the P10 effect. Create a scatterplot matrix of raw p-values, BH adjusted p-values, and BY p-values.

p10hitsBY <- topTable(wtEbFit, number = Inf, sort.by = "none", adjust.method = "BY", 
    coef = grep("devStageP10", colnames(coef(wtEbFit))))
pValsP10 <- data.frame(raw = p10hits$P.Value, BHadj = p10hits$adj.P.Val, BYadj = p10hitsBY$adj.P.Val)
splom(pValsP10)

plot of chunk unnamed-chunk-18

Perform inference for some contrasts

Let's try to distinguish genes that have stable expression at the last three developmental stages (P6, P10, and 4_weeks) from those that do not.

In other words find differentially expressed genes at P6, P10, and 4weeks. If expression doesn't change from P6 to P10 to 4weeks, then the effects for all 3 of those developmental stages should be the same. That means that the difference between the P10 and P6 effects is zero and ditto for the difference between 4_weeks effect and P10 (or P6, for that matter). Let's form these contrasts.

generate and fit the contrast matrices

colnames(wtDesMat)

## [1] "(Intercept)"     "devStageP2"      "devStageP6"      "devStageP10"    
## [5] "devStage4_weeks"

(contMat <- makeContrasts(P10vsP6 = devStageP10 - devStageP6, fourWeeksVsP10 = devStage4_weeks - 
    devStageP10, levels = wtDesMat))

## Warning: Renaming (Intercept) to Intercept

##                  Contrasts
## Levels            P10vsP6 fourWeeksVsP10
##   Intercept             0              0
##   devStageP2            0              0
##   devStageP6           -1              0
##   devStageP10           1             -1
##   devStage4_weeks       0              1

wtFitCont <- contrasts.fit(wtFit, contMat)

## Warning: row names of contrasts don't match col names of coefficients

wtEbFitCont <- eBayes(wtFitCont)
topTable(wtEbFitCont)

##              P10vsP6 fourWeeksVsP10 AveExpr     F   P.Value adj.P.Val
## 1440645_at    0.7247          3.041   6.528 632.7 2.224e-17 6.662e-13
## 1416041_at   -0.1470          4.782   9.383 302.4 1.473e-14 2.206e-10
## 1425222_x_at  0.7493          3.983   7.028 235.4 1.300e-13 1.297e-09
## 1424852_at    0.3442          3.405   7.454 225.1 1.910e-13 1.430e-09
## 1420726_x_at  0.1732          3.551   7.190 203.5 4.555e-13 2.640e-09
## 1451635_at    0.8260          4.172   8.319 200.0 5.289e-13 2.640e-09
## 1429394_at   -0.0980          2.410   7.848 167.5 2.416e-12 1.034e-08
## 1455447_at   -0.9765         -1.800   9.973 153.5 5.063e-12 1.896e-08
## 1429791_at    0.2480          1.658   8.026 145.7 7.877e-12 2.621e-08
## 1422612_at    0.4837          3.426   8.833 142.2 9.676e-12 2.840e-08

The top hits are probes where there is big change from P6 to P10, from P10 to 4_weeks, or both. Let's check that by plotting the data from the top 4 hits.

(topGenes <- rownames(topTable(wtEbFitCont)[1:4, ]))

## [1] "1440645_at"   "1416041_at"   "1425222_x_at" "1424852_at"

makeStripplot(prepDat(wtDat, wtDes, topGenes))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-20

These 4 probes show little expression change from P6 to P10 and a strong increase from P10 to 4_weeks.

Let's use decideTests() to adjust the p-values for both contrasts globally, i.e. all together and then threshhold them at a cutoff of 1e-04 (to find probes where there's a change in each case but perhaps in opposite direction - to find differentially expressed genes).

cutoff <- 1e-04
wtResCont <- decideTests(wtEbFitCont, p.value = cutoff, method = "global")
summary(wtResCont)

##    P10vsP6 fourWeeksVsP10
## -1       4              8
## 0    29945          29895
## 1        0             46

There are 4 probes that go down from P6 to P10 and no hits going the other way. There are 8 probes that go down from P10 to 4_weeks and 46 going the other way.

Let's try to pull out various hits and plot their data.

Here are the 4 probes that decline from P6 to P10.

(hits1 <- rownames(wtDat)[which(wtResCont[, "P10vsP6"] < 0)])

## [1] "1416635_at" "1437781_at" "1454752_at" "1455260_at"

makeStripplot(prepDat(wtDat, wtDes, hits1))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-22

here are the 4 out of 8 that decline from P10 to 4 weeks

(hits2 <- rownames(wtDat)[which(wtResCont[, "fourWeeksVsP10"] < 0)])

## [1] "1416021_a_at" "1423851_a_at" "1434500_at"   "1435415_x_at"
## [5] "1437502_x_at" "1448182_a_at" "1452679_at"   "1455447_at"

makeStripplot(prepDat(wtDat, wtDes, hits2[1:4]))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-23

is there any overlap between the two sets of probes?

intersect(hits1, hits2)

## character(0)

No, there is not.

Here are 4 of the 46 that increase from P10 to 4_weeks.

(hits3 <- rownames(wtDat)[which(wtResCont[, "fourWeeksVsP10"] > 0)])

##  [1] "1416041_at"   "1417280_at"   "1418406_at"   "1418710_at"  
##  [5] "1418789_at"   "1419069_at"   "1420725_at"   "1420726_x_at"
##  [9] "1421061_at"   "1421818_at"   "1422612_at"   "1424852_at"  
## [13] "1424895_at"   "1425222_x_at" "1426059_at"   "1426223_at"  
## [17] "1427388_at"   "1428763_at"   "1429394_at"   "1429791_at"  
## [21] "1430580_at"   "1431174_at"   "1433699_at"   "1434297_at"  
## [25] "1434573_at"   "1435436_at"   "1435679_at"   "1435727_s_at"
## [29] "1436265_at"   "1436287_at"   "1440402_at"   "1440645_at"  
## [33] "1441518_at"   "1442243_at"   "1443252_at"   "1446484_at"  
## [37] "1449170_at"   "1449393_at"   "1451042_a_at" "1451635_at"  
## [41] "1452243_at"   "1453385_at"   "1455493_at"   "1457878_at"  
## [45] "1458418_at"   "1459904_at"

makeStripplot(prepDat(wtDat, wtDes, hits3[1:4]))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-25

Is there any overlap between these probes and the previous “down” hits?

intersect(hits1, hits3)

## character(0)

intersect(hits2, hits3)

## character(0)

That's disappointing.

If I revisit this workflow but make the p-value cutoff less stringent, maybe I can find the gene expression profile I'm looking for.

cutoff <- 0.01
nHits <- 8
wtResCont <- decideTests(wtEbFitCont, p.value = cutoff, method = "global")
summary(wtResCont)

##    P10vsP6 fourWeeksVsP10
## -1      40             49
## 0    29897          29636
## 1       12            264

Probes going down from P6 to P10 (40 in total)

hits1 <- rownames(wtDat)[which(wtResCont[, "P10vsP6"] < 0)]
makeStripplot(prepDat(wtDat, wtDes, hits1[1:nHits]))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-28

Probes going down from P10 to 4weeks (49 in total)

hits2 <- rownames(wtDat)[which(wtResCont[, "fourWeeksVsP10"] < 0)]
makeStripplot(prepDat(wtDat, wtDes, hits2[1:nHits]))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-29

Probes going up from P6 to P10 (12 total)

hits3 <- rownames(wtDat)[which(wtResCont[, "P10vsP6"] > 0)]
makeStripplot(prepDat(wtDat, wtDes, hits3[1:nHits]))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-30

Probes going up from P10 to 4 weeks(264 in total)

hits4 <- rownames(wtDat)[which(wtResCont[, "fourWeeksVsP10"] > 0)]
makeStripplot(prepDat(wtDat, wtDes, hits4[1:nHits]))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-31

Venn diagram of the differentially expressed probes for P6 vs P10 vs 4weeks

vennDiagram(wtResCont)

plot of chunk unnamed-chunk-32

Probes differentially expressed for both P10vsP6 and 4weeksVsP10 (ie. differentially expressed from stage P6 till 4weeks) - 10 in total

hits5 <- rownames(wtDat)[which(wtResCont[, "P10vsP6"] != 0 & wtResCont[, "fourWeeksVsP10"] != 
    0)]
makeStripplot(prepDat(wtDat, wtDes, hits5))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-33

Probes going up from P6 to P10 and down from P10 to 4weeks

(hits6 <- rownames(wtDat)[which(wtResCont[, "P10vsP6"] > 0 & wtResCont[, "fourWeeksVsP10"] < 
    0)])

## [1] "1441815_at"

makeStripplot(prepDat(wtDat, wtDes, hits6))

plot of chunk unnamed-chunk-34

Q: are there any probes that have expression change up till P6 and then hold steady.

make contrast matrix comparing gene expression between the different developmental stages

colnames(wtDesMat)

## [1] "(Intercept)"     "devStageP2"      "devStageP6"      "devStageP10"    
## [5] "devStage4_weeks"

(contMat2 <- makeContrasts(P2vsE16 = devStageP2 - (Intercept), P6vsP2 = devStageP6 - 
    devStageP2, P10vsP6 = devStageP10 - devStageP6, fourWeeksVsP10 = devStage4_weeks - 
    devStageP10, levels = wtDesMat))

## Warning: Renaming (Intercept) to Intercept

##                  Contrasts
## Levels            P2vsE16 P6vsP2 P10vsP6 fourWeeksVsP10
##   Intercept            -1      0       0              0
##   devStageP2            1     -1       0              0
##   devStageP6            0      1      -1              0
##   devStageP10           0      0       1             -1
##   devStage4_weeks       0      0       0              1

Fit the contrast

wtFitCont2 <- contrasts.fit(wtFit, contMat2)

## Warning: row names of contrasts don't match col names of coefficients

wtEbFitCont2 <- eBayes(wtFitCont2)

look at top differentially expressed genes

topTable(wtEbFitCont2)

##              P2vsE16  P6vsP2    P10vsP6 fourWeeksVsP10 AveExpr    F
## 1438657_x_at  -12.64 -0.0150 -3.100e-01        -0.2650   12.71 2641
## 1423641_s_at  -12.20  0.0925 -7.500e-03         0.1125   12.24 2610
## 1438940_x_at  -12.78  0.0475  2.100e-01         0.0075   13.04 2485
## 1436884_x_at  -12.75  0.1450 -2.925e-01        -0.0050   13.04 2226
## 1454613_at    -12.57  0.0575 -5.250e-02        -0.2800   12.34 2212
## 1456736_x_at  -12.16  0.1425 -9.750e-02        -0.1350   12.47 2202
## 1456349_x_at  -11.58 -0.1200 -1.250e-02        -0.3375   11.53 2077
## 1451240_a_at  -12.69 -0.0300 -2.776e-16        -0.6500   13.10 2032
## 1438859_x_at  -13.00 -0.2425 -9.750e-02        -0.3125   12.63 1978
## 1416187_s_at  -12.08 -0.1300 -5.500e-02         0.0825   12.18 1964
##                P.Value adj.P.Val
## 1438657_x_at 1.309e-24 2.179e-20
## 1423641_s_at 1.455e-24 2.179e-20
## 1438940_x_at 2.259e-24 2.256e-20
## 1436884_x_at 6.061e-24 3.329e-20
## 1454613_at   6.413e-24 3.329e-20
## 1456736_x_at 6.669e-24 3.329e-20
## 1456349_x_at 1.125e-23 4.811e-20
## 1451240_a_at 1.369e-23 5.057e-20
## 1438859_x_at 1.744e-23 5.057e-20
## 1416187_s_at 1.858e-23 5.057e-20

(topGenes <- rownames(topTable(wtEbFitCont2[1:4, ])))

## [1] "1415673_at" "1415670_at" "1415672_at" "1415671_at"

makeStripplot(prepDat(wtDat, wtDes, topGenes))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-37

identify the up and down probes at each stage

cutoff = 1e-04
wtResCont2 <- decideTests(wtEbFitCont2, p.value = cutoff, method = "global")
summary(wtResCont2)

##    P2vsE16 P6vsP2 P10vsP6 fourWeeksVsP10
## -1   29903      0      32             37
## 0       46  29937   29907          29712
## 1        0     12      10            200

which probes have differential expression going up to P6 and then hold steady?

hits7 <- rownames(wtDat)[which(wtResCont2[, "P2vsE16"] != 0 & wtResCont2[, "P6vsP2"] != 
    0 & wtResCont2[, "P10vsP6"] == 0 & wtResCont2[, "fourWeeksVsP10"] == 0)]

makeStripplot(prepDat(wtDat, wtDes, hits7))

## Warning: row names were found from a short variable and have been
## discarded

plot of chunk unnamed-chunk-39

this does not look right?!