1 Motivation

Independent filtering (IF) or hypothesis weighting (IHW) can greatly increase the power of multiple testing analyses. In microarray or RNA-Seq gene expression analysis, the overall variance of a gene (or probe) across the samples can be an attractive covariate for filtering or weighting. However, both IF and IHW require that the distribution of p-values under the null is independent of the covariate. This is trivially the case if the p-value for each individual hypothesis is uniform (as e.g. for the ordinary t-test). The situation is less trivial for “empirical Bayes” methods that use information sharing across genes (i.e. hypotheses) in the calculation of their p-values. A prominent representative of such methods is the moderated t-test of limma (more precisely, limma’s eBayes function). Similar concerns also exist for edgeR, DESeq2, …

2 Analysis with 8 replicate samples from an experiment

Load required packages.

library("Biobase")
library("limma")
library("ggplot2")
library("beyonce")
library("magrittr")
library("reshape2")

Load an example microarray gene expression dataset, the ApoAI data from the limma user’s guide.

if (!exists("MA")) {
  zipfile <- "apoai.zip"
  download.file("https://www.bioconductor.org/help/course-materials/2005/BioC2005/labs/lab01/Data/apoai.zip",
                destfile = zipfile)
  unzip(zipfile)
  load("ApoAI.RData")
  MA <- normalizeWithinArrays(RG)
}

The dataset MA comprises 16 samples, which come in two groups:

table(MA$targets$Cy5)

## 
## ApoAI-/-  C57BL/6 
##        8        8

Let’s focus on the wildtype samples (controls).

MAwt <- MA[, MA$targets$Cy5=="C57BL/6"]

We use the function combn to generate all possible subdivisions of these 8 samples into two groups of equal size.

splits <- combn(ncol(MAwt), ncol(MAwt) %/% 2)
dim(splits)

## [1]  4 70

splits[, 1:12]

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]    1    1    1    1    1    1    1    1    1     1     1     1
## [2,]    2    2    2    2    2    2    2    2    2     2     2     2
## [3,]    3    3    3    3    3    4    4    4    4     5     5     5
## [4,]    4    5    6    7    8    5    6    7    8     6     7     8

The following code loops over the columns of splits and calls limma’s lmFit/eBayes differential expression analysis on each of the 70 subdivisions. Since the samples are all replicates, all null hypotheses are true, i.e. there is no true true differential expression. The resulting p-values are obtained in the matrix pvalues.

pvalues <- apply(splits, 2, function(i) {
  mock  <- seq_len(ncol(MAwt)) %in% i 
  fit   <- lmFit(MAwt$M, design = model.matrix(~ mock)) %>% eBayes
  fit$p.value[, "mockTRUE"]
})
str(pvalues)

##  num [1:6384, 1:70] 0.4365 0.2706 0.0474 0.1847 0.4188 ...

Let’s look at the histogram of the p-values. For ggplot2 it is convenient to stick the data into a data.frame df. We also define 4 equal-sized groups of genes, according to their overall variance, in the column strat_var, and similarly, 4 equal-sized groups according to quartiles of the overall mean.

df <- melt(pvalues, varnames = c("probe", "replicate"), value.name = "p")

means <- base::rowMeans(MAwt$M)      %>% rank %>% cut(4, labels = paste("mean quartile",     1:4))
vars  <- genefilter::rowVars(MAwt$M) %>% rank %>% cut(4, labels = paste("variance quartile", 1:4))
df$strat_mean <- means[df$probe]
df$strat_var  <- vars[df$probe]

h <- ggplot(df , aes(x = p, ..density..)) + geom_histogram(binwidth = 0.025, fill = beyonce_palette(9)[3]) +
  geom_hline(yintercept = 1)
h

If all null hypotheses are true, the distribution of each p-value should be uniform. Indeed, we see that the mixture distribution of the p-values for all genes is pretty much uniform. Now let’s stratify by overall variance, i.e. the factor strat_var that we defined above.

h + facet_wrap( ~ strat_var, ncol = 2)

The histograms -which correspond to the conditional distributions of the p-values, given the stratification variable- are no longer uniform. We see that limma tends to be conservative for genes with small overall variance, and anticonservative for genes with large overall variance. This is, of course, as intended by design of the method, due to the shrinkage estimation of the variance: genes that have a small overall (empirical) variance also tend to have a small empirical within-groups variance, which then gets shrunk upward for the purpose of the moderated t-test. And vice versa for genes with a large within-groups variance.

In contrast, stratifying (or conditioning) by mean leads to only weak (and probably negligible) deviations from uniformity.

h + facet_wrap( ~ strat_mean, ncol = 2)

3 Analysis with samples from an observational study

For good measure, we repeat this analysis for another dataset.

library("ALL")

ALL contains 128 samples of acute lymphoblastic leukemia (ALL), which were profiled on an Affymetrix array.

data("ALL")

In the code below, we compare 4 randomly drawn samples against 4 other randomly drawn samples. To average out chance alignments with true covariates in the data (e.g. sex), we do this a large number (1000) of times.

mock <- (1:8 <= 4)
pvalues <- replicate(1000, {
  ss    <- sample(ncol(ALL), length(mock))
  mat   <- exprs(ALL)[, ss]
  fit   <- lmFit(mat, design = model.matrix(~ mock)) %>% eBayes
  fit$p.value[, "mockTRUE"]
})

And we proceed as above.

df <- melt(pvalues, varnames = c("probe", "replicate"), value.name = "p")

means <- base::rowMeans(exprs(ALL))      %>% rank %>% cut(4, labels = paste("mean quartile",     1:4))
vars  <- genefilter::rowVars(exprs(ALL)) %>% rank %>% cut(4, labels = paste("variance quartile", 1:4)) 
names(means) <- names(vars) <- row.names(ALL)

df$strat_mean <- means[as.character(df$probe)]
df$strat_var  <- vars[ as.character(df$probe)]

h <- ggplot(df , aes(x = p, ..density..)) + geom_histogram(binwidth = 0.025, fill = beyonce_palette(9)[4]) +
  geom_hline(yintercept = 1)
h

h + facet_wrap( ~ strat_var, ncol = 2)

h + facet_wrap( ~ strat_mean, ncol = 2)

The deviations from uniformity when stratifying by mean are a bit more pronounced here than for the ApoAI data; this could be related to the rma normalisation, which tends to reduce the variance for low-intensity probes:

ggplot(
  data.frame(mean = rowMeans(exprs(ALL)), 
             sd   = genefilter::rowSds(exprs(ALL))), 
  aes(x = mean, y = sd)) + 
  stat_binhex( bins = 100) + ylim(c(0, 1.5))

sessionInfo()

## R Under development (unstable) (2015-12-28 r69816)
## Platform: x86_64-apple-darwin15.2.0 (64-bit)
## Running under: OS X 10.11.2 (El Capitan)
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ALL_1.13.0          reshape2_1.4.1      dplyr_0.4.3        
## [4] beyonce_0.1         ggplot2_2.0.0       limma_3.27.6       
## [7] Biobase_2.31.3      BiocGenerics_0.17.2 BiocStyle_1.9.2    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2          formatR_1.2.1        plyr_1.8.3          
##  [4] tools_3.3.0          digest_0.6.8         lattice_0.20-33     
##  [7] annotate_1.49.0      evaluate_0.8         RSQLite_1.0.0       
## [10] gtable_0.1.2         DBI_0.3.1            yaml_2.1.13         
## [13] hexbin_1.27.1        genefilter_1.53.0    stringr_1.0.0       
## [16] knitr_1.11           S4Vectors_0.9.15     IRanges_2.5.18      
## [19] stats4_3.3.0         grid_3.3.0           R6_2.1.1            
## [22] AnnotationDbi_1.33.4 XML_3.98-1.3         survival_2.38-3     
## [25] rmarkdown_0.9        magrittr_1.5         scales_0.3.0        
## [28] codetools_0.2-14     htmltools_0.3        splines_3.3.0       
## [31] assertthat_0.1       colorspace_1.2-6     xtable_1.8-0        
## [34] labeling_0.3         stringi_1.0-1        munsell_0.4.2

Notes on variance filtering and type I error control in limma

Contents

1 Motivation

2 Analysis with 8 replicate samples from an experiment

3 Analysis with samples from an observational study