The Experiment

The goal of this experiment is to compute gene expressions for inbred lines, which were assembled in hybridization pairs on a set of arrays. A two-color interwoven loop-design was used for this purpose. Hence, there is no common reference as typically used throughout the examples in the limma user manual.

Please note that you can recreate the design matrix yourself by using the necessary objects from the appendix at the bottom of this page!

Read data

library("limma")
# Read target files.
targets <- readTargets{"targets.txt"}
# Extract unique RNA sources (i.e. inbred lines)
lines <- uniqueTargets(targets)
# Import hybridization data.
RG <- read.maimages(targets$FileName, source = "genepix",
                    wt.fun <- wtflags(weight = 0, cutoff = -50))
# Read GAL files and SpotType files
RG$genes <- readGAL("Microarray_layout.gal")
spottypes <- readSpotTypes()
RG$genes$Status <- controlStatus(spottypes, RG)

Data Normalization

MA <- normalizeWithinArrays(RG)
MA <- normalizeBetweenArrays(MA, method = "scale")

Fitting the linear model

Limma generates a linear model (generalized least squares) for each gene with the following model: \[y_{i} = \textbf{X}\beta_{i} + \epsilon_{i}\]

isGene <- MA$genes$Status == "Gene"
fit <- lmFit(MA[isGene, ], design)

\[ \begin{array}{cc} \text{log-ratio} \\ \begin{pmatrix} P001.F002 \\ \vdots \\ F001.F003 \end{pmatrix}_{n \times 1} \end{array} = \begin{pmatrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}_{n \times (m - 1)} \times \begin{pmatrix} \beta_{P001.P003} \\ \vdots \\ \beta_{F001.F003} \end{pmatrix}_{(m - 1) \times 1} + \epsilon_{i} \]

  • \(n\): number of arrays
  • \(m\): number of unique inbred lines
  • \(y_i\): vector with log-ratios of each array
  • \(\textbf{X}\): design matrix
  • \(\beta_i\): parameters of all coefficients \(1,...,n - 1\)
  • \(\epsilon_i\): vector of residuals

\(\beta_i\) is calculated as \(\beta_{i} = (\textbf{X}^{\intercal}\textbf{X})^{-1}\textbf{X}^{\intercal}y_{i}\)

Creation of parameters for limma.

Below you can see the design of the entire microarray experiment, where the coefficient denotes the dye which was used for an inbred line in a particular array. In particular, the cell elements indicate whether

  • the inbred line (column) was not part of the array (row): 0
  • the inbred line was part of the array and labeled with Cy3: -1
  • the inbred line was part of the array and labeled with Cy5: 1
full_mat
##                 F001 F002 F003 F004 F005 P001 P002 P003
## F003Cy3xP003Cy5    0    0   -1    0    0    0    0    1
## F002Cy3xF001Cy5    1   -1    0    0    0    0    0    0
## F003Cy3xF002Cy5    0    1   -1    0    0    0    0    0
## P003Cy3xP002Cy5    0    0    0    0    0    0    1   -1
## F001Cy3xP003Cy5   -1    0    0    0    0    0    0    1
## F004Cy3xF002Cy5    0    1    0   -1    0    0    0    0
## F002Cy3xF004Cy5    0   -1    0    1    0    0    0    0
## P002Cy3xF005Cy5    0    0    0    0    1    0   -1    0
## P001Cy3xP003Cy5    0    0    0    0    0   -1    0    1
## P003Cy3xF002Cy5    0    1    0    0    0    0    0   -1
## F003Cy3xF004Cy5    0    0   -1    1    0    0    0    0
## P003Cy3xF001Cy5    1    0    0    0    0    0    0   -1
## P001Cy3xF005Cy5    0    0    0    0    1   -1    0    0
## F005Cy3xF002Cy5    0    1    0    0   -1    0    0    0
## F001Cy3xP001Cy5   -1    0    0    0    0    1    0    0
## P002Cy3xF003Cy5    0    0    1    0    0    0   -1    0
## F004Cy3xP003Cy5    0    0    0   -1    0    0    0    1

The reduced design, which still describes the entire experiment, but where the number of selected arrays is one less than the number of unique inbred lines:

parameters
##      F003Cy3xF004Cy5 P003Cy3xF001Cy5 P001Cy3xF005Cy5 F005Cy3xF002Cy5
## F001               0               1               0               0
## F002               0               0               0               1
## F003              -1               0               0               0
## F004               1               0               0               0
## F005               0               0               1              -1
## P001               0               0              -1               0
## P002               0               0               0               0
## P003               0              -1               0               0
##      F001Cy3xP001Cy5 P002Cy3xF003Cy5 F004Cy3xP003Cy5
## F001              -1               0               0
## F002               0               0               0
## F003               0               1               0
## F004               0               0              -1
## F005               0               0               0
## P001               1               0               0
## P002               0              -1               0
## P003               0               0               1

Creation of the design matrix

I cannot find any examples in the limma user guide that describe the setup of design matrices for a case where no common reference is used on every array and dye swaps are applied at the same time.

if (missing(targets)) 
    stop("targets is required argument")
targets <- as.matrix(targets)
if (!all(c("Cy3", "Cy5") %in% colnames(targets))) 
    stop("targets should contain columns: Cy3 and Cy5")
target.names <- sort(unique(as.vector(t(as.matrix(targets[, 
                                                          c("Cy3", "Cy5")])))))
parameters <- as.matrix(parameters)
if (length(target.names) != nrow(parameters)) 
    stop("rows of parameters don't match unique target names")
if (any(sort(target.names) != sort(rownames(parameters)))) 
    stop("rownames of parameters don't match unique target names")
target.names <- rownames(parameters)
ntargets <- nrow(parameters)
if (ncol(parameters) != ntargets - 1) 
warning("number of parameters should be one less than number of
        targets")
narrays <- nrow(targets)
J <- matrix(rep(target.names, narrays), ntargets, narrays)
J <- t((t(J) == targets[, "Cy5"]) - (t(J) == targets[, "Cy3"]))
rownames(J) <- target.names
colnames(J) <- rownames(targets)
zapsmall(t(solve(crossprod(parameters), crossprod(parameters, 
                                                  J))), 14)
##                 F003Cy3xF004Cy5 P003Cy3xF001Cy5 P001Cy3xF005Cy5
## F003Cy3xP003Cy5               1               0               0
## F002Cy3xF001Cy5               0               0              -1
## F003Cy3xF002Cy5               1               1               1
## P003Cy3xP002Cy5              -1               0               0
## F001Cy3xP003Cy5               0              -1               0
## F004Cy3xF002Cy5               0               1               1
## F002Cy3xF004Cy5               0              -1              -1
## P002Cy3xF005Cy5               1               1               1
## P001Cy3xP003Cy5               0              -1               0
## P003Cy3xF002Cy5               0               1               1
## F003Cy3xF004Cy5               1               0               0
## P003Cy3xF001Cy5               0               1               0
## P001Cy3xF005Cy5               0               0               1
## F005Cy3xF002Cy5               0               0               0
## F001Cy3xP001Cy5               0               0               0
## P002Cy3xF003Cy5               0               0               0
## F004Cy3xP003Cy5               0               0               0
##                 F005Cy3xF002Cy5 F001Cy3xP001Cy5 P002Cy3xF003Cy5
## F003Cy3xP003Cy5               0               0               0
## F002Cy3xF001Cy5              -1              -1               0
## F003Cy3xF002Cy5               1               1               0
## P003Cy3xP002Cy5               0               0              -1
## F001Cy3xP003Cy5               0               0               0
## F004Cy3xF002Cy5               1               1               0
## F002Cy3xF004Cy5              -1              -1               0
## P002Cy3xF005Cy5               0               1               1
## P001Cy3xP003Cy5               0              -1               0
## P003Cy3xF002Cy5               1               1               0
## F003Cy3xF004Cy5               0               0               0
## P003Cy3xF001Cy5               0               0               0
## P001Cy3xF005Cy5               0               0               0
## F005Cy3xF002Cy5               1               0               0
## F001Cy3xP001Cy5               0               1               0
## P002Cy3xF003Cy5               0               0               1
## F004Cy3xP003Cy5               0               0               0
##                 F004Cy3xP003Cy5
## F003Cy3xP003Cy5               1
## F002Cy3xF001Cy5               0
## F003Cy3xF002Cy5               1
## P003Cy3xP002Cy5              -1
## F001Cy3xP003Cy5               0
## F004Cy3xF002Cy5               1
## F002Cy3xF004Cy5              -1
## P002Cy3xF005Cy5               1
## P001Cy3xP003Cy5               0
## P003Cy3xF002Cy5               0
## F003Cy3xF004Cy5               0
## P003Cy3xF001Cy5               0
## P001Cy3xF005Cy5               0
## F005Cy3xF002Cy5               0
## F001Cy3xP001Cy5               0
## P002Cy3xF003Cy5               0
## F004Cy3xP003Cy5               1

To me this design matrix looks spurious. Generally I think of design matrices as planes of zeroes and ones that either link an experimental factor to the response (1) or not (0). However, in the design matrix above I do not see an obvious relationship between the columns (coefficients) and the rows (arrays). For example a value of 1 was assigned to row 1 of column 2 ([“F003Cy3xP001Cy5”, “F001Cy3xP002Cy5”]) although there is no obvious connection between the two. Digging into the modelMatrix() function from the limma package, I got stuck with the following line of code:

zapsmall(t(solve(crossprod(parameters), crossprod(parameters, 
                                                  J))), 14)

My understanding is that parameters represents the connection between all unique RNA sources (inbred lines) and the selected arrays that represent the whole experiment. Moreover, it looks as if J is an extension of parameters in a way that it represents the connection between all RNA sources and all arrays of the experiment. When comparing parameters, J and the output of modelMatrix() to the examples for design matrices in the limma user manual, it seems to me as if the design matrix should be rather J and not the output of modelMatrix().

It would be appreciated if you could fill this gap in my understanding!

Appendix

R-objects for the generation of the design matrix.

# parameters
dput(parameters)
structure(c(0, 0, -1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1, 
0, 0, 0, 0, 1, -1, 0, 0, 0, 1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 1), .Dim = c(8L, 
7L), .Dimnames = list(c("F001", "F002", "F003", "F004", "F005", 
"P001", "P002", "P003"), c("F003Cy3xF004Cy5", "P003Cy3xF001Cy5", 
"P001Cy3xF005Cy5", "F005Cy3xF002Cy5", "F001Cy3xP001Cy5", "P002Cy3xF003Cy5", 
"F004Cy3xP003Cy5")))
# targets
dput(targets)
structure(c("F003Cy3xP003Cy5", "F002Cy3xF001Cy5", "F003Cy3xF002Cy5", 
"P003Cy3xP002Cy5", "F001Cy3xP003Cy5", "F004Cy3xF002Cy5", "F002Cy3xF004Cy5", 
"P002Cy3xF005Cy5", "P001Cy3xP003Cy5", "P003Cy3xF002Cy5", "F003Cy3xF004Cy5", 
"P003Cy3xF001Cy5", "P001Cy3xF005Cy5", "F005Cy3xF002Cy5", "F001Cy3xP001Cy5", 
"P002Cy3xF003Cy5", "F004Cy3xP003Cy5", "F003", "F002", "F003", 
"P003", "F001", "F004", "F002", "P002", "P001", "P003", "F003", 
"P003", "P001", "F005", "F001", "P002", "F004", "P003", "F001", 
"F002", "P002", "P003", "F002", "F004", "F005", "P003", "F002", 
"F004", "F001", "F005", "F002", "P001", "F003", "P003"), .Dim = c(17L, 
3L), .Dimnames = list(c("F003Cy3xP003Cy5", "F002Cy3xF001Cy5", 
"F003Cy3xF002Cy5", "P003Cy3xP002Cy5", "F001Cy3xP003Cy5", "F004Cy3xF002Cy5", 
"F002Cy3xF004Cy5", "P002Cy3xF005Cy5", "P001Cy3xP003Cy5", "P003Cy3xF002Cy5", 
"F003Cy3xF004Cy5", "P003Cy3xF001Cy5", "P001Cy3xF005Cy5", "F005Cy3xF002Cy5", 
"F001Cy3xP001Cy5", "P002Cy3xF003Cy5", "F004Cy3xP003Cy5"), c("FileName", 
"Cy3", "Cy5")))
# full_mat
dput(full_mat)
structure(c(0, 1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 0, 
0, 0, -1, 1, 0, 0, 1, -1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, -1, 0, 
-1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
-1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 
0, -1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 
0, -1, 0, 1, 0, 0, -1, 1, 0, 0, 0, 1, -1, 0, -1, 0, 0, 0, 0, 
1), .Dim = c(17L, 8L), .Dimnames = list(c("F003Cy3xP003Cy5", 
"F002Cy3xF001Cy5", "F003Cy3xF002Cy5", "P003Cy3xP002Cy5", "F001Cy3xP003Cy5", 
"F004Cy3xF002Cy5", "F002Cy3xF004Cy5", "P002Cy3xF005Cy5", "P001Cy3xP003Cy5", 
"P003Cy3xF002Cy5", "F003Cy3xF004Cy5", "P003Cy3xF001Cy5", "P001Cy3xF005Cy5", 
"F005Cy3xF002Cy5", "F001Cy3xP001Cy5", "P002Cy3xF003Cy5", "F004Cy3xP003Cy5"
), c("F001", "F002", "F003", "F004", "F005", "P001", "P002", 
"P003")))

Further Data Processing

  1. Correct the standard deviations and apply a moderated F-Test to determine whether a gene was differentially expressed in any comparison (as determined by the used coefficients) using eBayes().
  2. Correct p-values for multiple testing using the approach.
fit <- eBayes(fit)
fit[["F.p.value"]] <- p.adjust(fit[["F.p.value"]], method = "fdr",
                               n = length(fit[["F.p.value"]]))

Write the results to a file:

write.fit(fit, results = NULL, file = "exp-001.dta", digits = 5,
          adjust = "BH", sep = "\t")

Read the results from a file:

exp.data <- read.table("exp-001.dta", header = TRUE, sep = "\t", 
                       stringsAsFactors = FALSE)
exp.data <- exp.data[!is.na(exp.data$F.p.value), ]
attr(exp.data, "row.names") <- exp.data[, "Genes.ID"]

Select genes, which were differentially expressed on at least one array:

exp.dta <- exp.data[exp.data$F.p.value < 0.01, ]

Extract coefficients and Amean:

exp.ratio <- exp.dta[, seq_along(lines)]

Calculate gene expressions for inbred lines:

Create another design matrix

\[ A_{m,m} = \begin{pmatrix} n_{1} / n & n_{2} / n & \cdots & n_{m} / n\\ 0 & 0 & \cdots & 1 \\ 1 & 0 & \cdots & -1 \\ 0 & 1 & \cdots & 0 \\ -1 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & -1 & \cdots & 0 \end{pmatrix} \], where …

  1. the columns represent m unique RNA sources (i.e. inbred lines).
  2. the first row contains the frequency of inbred lines 1,...,m within the entire experiment.
  3. all following rows represent the unique set of coefficients of length m - 1 that connect the entire experiment.
  4. the cell elements indicate whether
    • the inbred line (column) was not part of the array (row): 0
    • the inbred line was part of the array and labeled with Cy3: -1
    • the inbred line was part of the array and labeled with Cy5: 1

Calculate gene expressions

B <- t(exp.ratio)
# Obtain gene expression values for each inbred line and each mRNA by solving 
# the equation a %*% b = b for x
x <- solve(A, B)

\[ Ax = B\] \[ A_{coef \times lines} * x_{lines \times mRNAs} = B_{coef \times mRNAs}\]