The goal of this experiment is to compute gene expressions for inbred lines, which were assembled in hybridization pairs on a set of arrays. A two-color interwoven loop-design was used for this purpose. Hence, there is no common reference as typically used throughout the examples in the limma user manual.
Please note that you can recreate the design matrix yourself by using the necessary objects from the appendix at the bottom of this page!
library("limma")
# Read target files.
targets <- readTargets{"targets.txt"}
# Extract unique RNA sources (i.e. inbred lines)
lines <- uniqueTargets(targets)
# Import hybridization data.
RG <- read.maimages(targets$FileName, source = "genepix",
wt.fun <- wtflags(weight = 0, cutoff = -50))
# Read GAL files and SpotType files
RG$genes <- readGAL("Microarray_layout.gal")
spottypes <- readSpotTypes()
RG$genes$Status <- controlStatus(spottypes, RG)
MA <- normalizeWithinArrays(RG)
MA <- normalizeBetweenArrays(MA, method = "scale")
Limma generates a linear model (generalized least squares) for each gene with the following model: \[y_{i} = \textbf{X}\beta_{i} + \epsilon_{i}\]
isGene <- MA$genes$Status == "Gene"
fit <- lmFit(MA[isGene, ], design)
\[ \begin{array}{cc} \text{log-ratio} \\ \begin{pmatrix} P001.F002 \\ \vdots \\ F001.F003 \end{pmatrix}_{n \times 1} \end{array} = \begin{pmatrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 0 \end{pmatrix}_{n \times (m - 1)} \times \begin{pmatrix} \beta_{P001.P003} \\ \vdots \\ \beta_{F001.F003} \end{pmatrix}_{(m - 1) \times 1} + \epsilon_{i} \]
\(\beta_i\) is calculated as \(\beta_{i} = (\textbf{X}^{\intercal}\textbf{X})^{-1}\textbf{X}^{\intercal}y_{i}\)
limma.Below you can see the design of the entire microarray experiment, where the coefficient denotes the dye which was used for an inbred line in a particular array. In particular, the cell elements indicate whether
full_mat
## F001 F002 F003 F004 F005 P001 P002 P003
## F003Cy3xP003Cy5 0 0 -1 0 0 0 0 1
## F002Cy3xF001Cy5 1 -1 0 0 0 0 0 0
## F003Cy3xF002Cy5 0 1 -1 0 0 0 0 0
## P003Cy3xP002Cy5 0 0 0 0 0 0 1 -1
## F001Cy3xP003Cy5 -1 0 0 0 0 0 0 1
## F004Cy3xF002Cy5 0 1 0 -1 0 0 0 0
## F002Cy3xF004Cy5 0 -1 0 1 0 0 0 0
## P002Cy3xF005Cy5 0 0 0 0 1 0 -1 0
## P001Cy3xP003Cy5 0 0 0 0 0 -1 0 1
## P003Cy3xF002Cy5 0 1 0 0 0 0 0 -1
## F003Cy3xF004Cy5 0 0 -1 1 0 0 0 0
## P003Cy3xF001Cy5 1 0 0 0 0 0 0 -1
## P001Cy3xF005Cy5 0 0 0 0 1 -1 0 0
## F005Cy3xF002Cy5 0 1 0 0 -1 0 0 0
## F001Cy3xP001Cy5 -1 0 0 0 0 1 0 0
## P002Cy3xF003Cy5 0 0 1 0 0 0 -1 0
## F004Cy3xP003Cy5 0 0 0 -1 0 0 0 1
The reduced design, which still describes the entire experiment, but where the number of selected arrays is one less than the number of unique inbred lines:
parameters
## F003Cy3xF004Cy5 P003Cy3xF001Cy5 P001Cy3xF005Cy5 F005Cy3xF002Cy5
## F001 0 1 0 0
## F002 0 0 0 1
## F003 -1 0 0 0
## F004 1 0 0 0
## F005 0 0 1 -1
## P001 0 0 -1 0
## P002 0 0 0 0
## P003 0 -1 0 0
## F001Cy3xP001Cy5 P002Cy3xF003Cy5 F004Cy3xP003Cy5
## F001 -1 0 0
## F002 0 0 0
## F003 0 1 0
## F004 0 0 -1
## F005 0 0 0
## P001 1 0 0
## P002 0 -1 0
## P003 0 0 1
I cannot find any examples in the limma user guide that describe the setup of design matrices for a case where no common reference is used on every array and dye swaps are applied at the same time.
if (missing(targets))
stop("targets is required argument")
targets <- as.matrix(targets)
if (!all(c("Cy3", "Cy5") %in% colnames(targets)))
stop("targets should contain columns: Cy3 and Cy5")
target.names <- sort(unique(as.vector(t(as.matrix(targets[,
c("Cy3", "Cy5")])))))
parameters <- as.matrix(parameters)
if (length(target.names) != nrow(parameters))
stop("rows of parameters don't match unique target names")
if (any(sort(target.names) != sort(rownames(parameters))))
stop("rownames of parameters don't match unique target names")
target.names <- rownames(parameters)
ntargets <- nrow(parameters)
if (ncol(parameters) != ntargets - 1)
warning("number of parameters should be one less than number of
targets")
narrays <- nrow(targets)
J <- matrix(rep(target.names, narrays), ntargets, narrays)
J <- t((t(J) == targets[, "Cy5"]) - (t(J) == targets[, "Cy3"]))
rownames(J) <- target.names
colnames(J) <- rownames(targets)
zapsmall(t(solve(crossprod(parameters), crossprod(parameters,
J))), 14)
## F003Cy3xF004Cy5 P003Cy3xF001Cy5 P001Cy3xF005Cy5
## F003Cy3xP003Cy5 1 0 0
## F002Cy3xF001Cy5 0 0 -1
## F003Cy3xF002Cy5 1 1 1
## P003Cy3xP002Cy5 -1 0 0
## F001Cy3xP003Cy5 0 -1 0
## F004Cy3xF002Cy5 0 1 1
## F002Cy3xF004Cy5 0 -1 -1
## P002Cy3xF005Cy5 1 1 1
## P001Cy3xP003Cy5 0 -1 0
## P003Cy3xF002Cy5 0 1 1
## F003Cy3xF004Cy5 1 0 0
## P003Cy3xF001Cy5 0 1 0
## P001Cy3xF005Cy5 0 0 1
## F005Cy3xF002Cy5 0 0 0
## F001Cy3xP001Cy5 0 0 0
## P002Cy3xF003Cy5 0 0 0
## F004Cy3xP003Cy5 0 0 0
## F005Cy3xF002Cy5 F001Cy3xP001Cy5 P002Cy3xF003Cy5
## F003Cy3xP003Cy5 0 0 0
## F002Cy3xF001Cy5 -1 -1 0
## F003Cy3xF002Cy5 1 1 0
## P003Cy3xP002Cy5 0 0 -1
## F001Cy3xP003Cy5 0 0 0
## F004Cy3xF002Cy5 1 1 0
## F002Cy3xF004Cy5 -1 -1 0
## P002Cy3xF005Cy5 0 1 1
## P001Cy3xP003Cy5 0 -1 0
## P003Cy3xF002Cy5 1 1 0
## F003Cy3xF004Cy5 0 0 0
## P003Cy3xF001Cy5 0 0 0
## P001Cy3xF005Cy5 0 0 0
## F005Cy3xF002Cy5 1 0 0
## F001Cy3xP001Cy5 0 1 0
## P002Cy3xF003Cy5 0 0 1
## F004Cy3xP003Cy5 0 0 0
## F004Cy3xP003Cy5
## F003Cy3xP003Cy5 1
## F002Cy3xF001Cy5 0
## F003Cy3xF002Cy5 1
## P003Cy3xP002Cy5 -1
## F001Cy3xP003Cy5 0
## F004Cy3xF002Cy5 1
## F002Cy3xF004Cy5 -1
## P002Cy3xF005Cy5 1
## P001Cy3xP003Cy5 0
## P003Cy3xF002Cy5 0
## F003Cy3xF004Cy5 0
## P003Cy3xF001Cy5 0
## P001Cy3xF005Cy5 0
## F005Cy3xF002Cy5 0
## F001Cy3xP001Cy5 0
## P002Cy3xF003Cy5 0
## F004Cy3xP003Cy5 1
To me this design matrix looks spurious. Generally I think of design matrices as planes of zeroes and ones that either link an experimental factor to the response (1) or not (0). However, in the design matrix above I do not see an obvious relationship between the columns (coefficients) and the rows (arrays). For example a value of 1 was assigned to row 1 of column 2 ([“F003Cy3xP001Cy5”, “F001Cy3xP002Cy5”]) although there is no obvious connection between the two. Digging into the modelMatrix() function from the limma package, I got stuck with the following line of code:
zapsmall(t(solve(crossprod(parameters), crossprod(parameters,
J))), 14)
My understanding is that parameters represents the connection between all unique RNA sources (inbred lines) and the selected arrays that represent the whole experiment. Moreover, it looks as if J is an extension of parameters in a way that it represents the connection between all RNA sources and all arrays of the experiment. When comparing parameters, J and the output of modelMatrix() to the examples for design matrices in the limma user manual, it seems to me as if the design matrix should be rather J and not the output of modelMatrix().
It would be appreciated if you could fill this gap in my understanding!
R-objects for the generation of the design matrix.
# parameters
dput(parameters)
structure(c(0, 0, -1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -1,
0, 0, 0, 0, 1, -1, 0, 0, 0, 1, 0, 0, -1, 0, 0, 0, -1, 0, 0, 0,
0, 1, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0, 0, 0, 1), .Dim = c(8L,
7L), .Dimnames = list(c("F001", "F002", "F003", "F004", "F005",
"P001", "P002", "P003"), c("F003Cy3xF004Cy5", "P003Cy3xF001Cy5",
"P001Cy3xF005Cy5", "F005Cy3xF002Cy5", "F001Cy3xP001Cy5", "P002Cy3xF003Cy5",
"F004Cy3xP003Cy5")))
# targets
dput(targets)
structure(c("F003Cy3xP003Cy5", "F002Cy3xF001Cy5", "F003Cy3xF002Cy5",
"P003Cy3xP002Cy5", "F001Cy3xP003Cy5", "F004Cy3xF002Cy5", "F002Cy3xF004Cy5",
"P002Cy3xF005Cy5", "P001Cy3xP003Cy5", "P003Cy3xF002Cy5", "F003Cy3xF004Cy5",
"P003Cy3xF001Cy5", "P001Cy3xF005Cy5", "F005Cy3xF002Cy5", "F001Cy3xP001Cy5",
"P002Cy3xF003Cy5", "F004Cy3xP003Cy5", "F003", "F002", "F003",
"P003", "F001", "F004", "F002", "P002", "P001", "P003", "F003",
"P003", "P001", "F005", "F001", "P002", "F004", "P003", "F001",
"F002", "P002", "P003", "F002", "F004", "F005", "P003", "F002",
"F004", "F001", "F005", "F002", "P001", "F003", "P003"), .Dim = c(17L,
3L), .Dimnames = list(c("F003Cy3xP003Cy5", "F002Cy3xF001Cy5",
"F003Cy3xF002Cy5", "P003Cy3xP002Cy5", "F001Cy3xP003Cy5", "F004Cy3xF002Cy5",
"F002Cy3xF004Cy5", "P002Cy3xF005Cy5", "P001Cy3xP003Cy5", "P003Cy3xF002Cy5",
"F003Cy3xF004Cy5", "P003Cy3xF001Cy5", "P001Cy3xF005Cy5", "F005Cy3xF002Cy5",
"F001Cy3xP001Cy5", "P002Cy3xF003Cy5", "F004Cy3xP003Cy5"), c("FileName",
"Cy3", "Cy5")))
# full_mat
dput(full_mat)
structure(c(0, 1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 1, 0, 0, -1, 0,
0, 0, -1, 1, 0, 0, 1, -1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, -1, 0,
-1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
-1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0,
0, -1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0,
0, -1, 0, 1, 0, 0, -1, 1, 0, 0, 0, 1, -1, 0, -1, 0, 0, 0, 0,
1), .Dim = c(17L, 8L), .Dimnames = list(c("F003Cy3xP003Cy5",
"F002Cy3xF001Cy5", "F003Cy3xF002Cy5", "P003Cy3xP002Cy5", "F001Cy3xP003Cy5",
"F004Cy3xF002Cy5", "F002Cy3xF004Cy5", "P002Cy3xF005Cy5", "P001Cy3xP003Cy5",
"P003Cy3xF002Cy5", "F003Cy3xF004Cy5", "P003Cy3xF001Cy5", "P001Cy3xF005Cy5",
"F005Cy3xF002Cy5", "F001Cy3xP001Cy5", "P002Cy3xF003Cy5", "F004Cy3xP003Cy5"
), c("F001", "F002", "F003", "F004", "F005", "P001", "P002",
"P003")))
eBayes().fit <- eBayes(fit)
fit[["F.p.value"]] <- p.adjust(fit[["F.p.value"]], method = "fdr",
n = length(fit[["F.p.value"]]))
Write the results to a file:
write.fit(fit, results = NULL, file = "exp-001.dta", digits = 5,
adjust = "BH", sep = "\t")
Read the results from a file:
exp.data <- read.table("exp-001.dta", header = TRUE, sep = "\t",
stringsAsFactors = FALSE)
exp.data <- exp.data[!is.na(exp.data$F.p.value), ]
attr(exp.data, "row.names") <- exp.data[, "Genes.ID"]
Select genes, which were differentially expressed on at least one array:
exp.dta <- exp.data[exp.data$F.p.value < 0.01, ]
Extract coefficients and Amean:
exp.ratio <- exp.dta[, seq_along(lines)]
\[ A_{m,m} = \begin{pmatrix} n_{1} / n & n_{2} / n & \cdots & n_{m} / n\\ 0 & 0 & \cdots & 1 \\ 1 & 0 & \cdots & -1 \\ 0 & 1 & \cdots & 0 \\ -1 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & -1 & \cdots & 0 \end{pmatrix} \], where …
m unique RNA sources (i.e. inbred lines).1,...,m within the entire experiment.m - 1 that connect the entire experiment.B <- t(exp.ratio)
# Obtain gene expression values for each inbred line and each mRNA by solving
# the equation a %*% b = b for x
x <- solve(A, B)
\[ Ax = B\] \[ A_{coef \times lines} * x_{lines \times mRNAs} = B_{coef \times mRNAs}\]