When testing iDEP, I and other users noticed that sometimes when using limma to identify differentially expressed genes, up-regulated genes are labelled as down-regulated, and vice versa. I digged into this a bit more and found an issue related to design matrices for limma. The limma documentation is not very clear on this, so I will document it here.

Reading the data

We will use part of the demo data for iDEP, available on GitHub.

library(edgeR)
## Loading required package: limma
library(limma)
library(knitr)
fileURL <- "https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/BcellGSE71176_p53.csv" 

x <- read.csv(fileURL, row.names = 1)
x <- x[ , 1:8]
colnames(x) <- gsub("p53_","",colnames(x))
kable( head( x ) )
mock_1 mock_2 mock_3 mock_4 IR_1 IR_2 IR_3 IR_4
ENSMUSG00000028995 3210 2553 2096 3111 1054 663 1012 1027
ENSMUSG00000045007 122 97 53 78 22 10 15 22
ENSMUSG00000062785 183 182 143 232 530 384 590 655
ENSMUSG00000038268 1117 843 669 783 526 369 654 765
ENSMUSG00000054589 85 46 36 45 693 394 865 1120
ENSMUSG00000102752 52 59 45 75 21 16 20 11
# Correct orientation
groups = gsub("_.*", "", colnames( x ))
groups <- factor(groups, levels = c("mock","IR") )
design <- model.matrix( ~ 0 + groups )
colnames(design) <- c("mock","IR")
design
##   mock IR
## 1    1  0
## 2    1  0
## 3    1  0
## 4    1  0
## 5    0  1
## 6    0  1
## 7    0  1
## 8    0  1
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"
dge <- DGEList(counts=x);
dge <- calcNormFactors(dge, method = "TMM")  # normalization
v <- voom(dge, design)
fit <- lmFit(v, design) 

cont.matrix <- makeContrasts(contrasts = "IR-mock", levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=FALSE)

# calls differential gene expression 1 for up, -1 for down
results <- decideTests(fit2, p.value = .1, lfc= log2(2) )
topGenes =topTable(fit2, number = 1e12,sort.by="M" )
head(topGenes)
##                       logFC    AveExpr         t      P.Value    adj.P.Val
## ENSMUSG00000026831 9.892108 0.68261871  6.389315 9.581983e-05 5.409615e-04
## ENSMUSG00000000308 8.584652 0.03622513  8.150802 1.296283e-05 1.148136e-04
## ENSMUSG00000041801 8.556836 4.89825205 14.668745 6.909906e-08 3.507048e-06
## ENSMUSG00000026700 8.436601 3.83021718 11.872409 4.763704e-07 1.127824e-05
## ENSMUSG00000037188 7.932190 1.99753040 17.198558 1.581832e-08 1.522821e-06
## ENSMUSG00000009628 7.877015 4.17154790 11.394278 6.898578e-07 1.447440e-05
##                           B
## ENSMUSG00000026831 1.239444
## ENSMUSG00000000308 2.533939
## ENSMUSG00000041801 8.619459
## ENSMUSG00000026700 6.654007
## ENSMUSG00000037188 7.715722
## ENSMUSG00000009628 6.501137

This is correct, as we can see from the most upregulated genes.

ix <- match(row.names(topGenes)[1], row.names(x) )
top1 <- ( as.vector( t( x[ix,]) ) )
names( top1 ) = colnames(x)
barplot( top1 , las=3)

Wrong orientation

If the levels in the factor does not align correctly with the column names of the design matrix, then we will get a incorrect design matrix which could lead to up-regulated genes are makred as down-regulated ones.

    groups = gsub("_.*", "", colnames( x ))
    groups <- factor(groups, levels = c("IR","mock") )
    design <- model.matrix( ~ 0 + groups )
    colnames(design) <- c("mock","IR")
  design
##   mock IR
## 1    0  1
## 2    0  1
## 3    0  1
## 4    0  1
## 5    1  0
## 6    1  0
## 7    1  0
## 8    1  0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"
dge <- DGEList(counts=x);
dge <- calcNormFactors(dge, method = "TMM")  # normalization
v <- voom(dge, design)
fit <- lmFit(v, design) 

cont.matrix <- makeContrasts(contrasts = "IR-mock", levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=FALSE)

# calls differential gene expression 1 for up, -1 for down
results <- decideTests(fit2, p.value = .1, lfc= log2(2) )
topGenes =topTable(fit2, number = 1e12,sort.by="M" )
head(topGenes)
##                        logFC    AveExpr          t      P.Value
## ENSMUSG00000026831 -9.892108 0.68261871  -6.389315 9.581983e-05
## ENSMUSG00000000308 -8.584652 0.03622513  -8.150802 1.296283e-05
## ENSMUSG00000041801 -8.556836 4.89825205 -14.668745 6.909906e-08
## ENSMUSG00000026700 -8.436601 3.83021718 -11.872409 4.763704e-07
## ENSMUSG00000037188 -7.932190 1.99753040 -17.198558 1.581832e-08
## ENSMUSG00000009628 -7.877015 4.17154790 -11.394278 6.898578e-07
##                       adj.P.Val        B
## ENSMUSG00000026831 5.409615e-04 1.239444
## ENSMUSG00000000308 1.148136e-04 2.533939
## ENSMUSG00000041801 3.507048e-06 8.619459
## ENSMUSG00000026700 1.127824e-05 6.654007
## ENSMUSG00000037188 1.522821e-06 7.715722
## ENSMUSG00000009628 1.447440e-05 6.501137

This also works

The “levels =” parameters sets the reference levels and the order of the levels. This is used in creating the design matrix. So as long as the orders are the same, we will get correct answers. Notice now that it is “IR” is the reference level for groups, and the column names are reversed accordingly. This will give us correct results.

    groups = gsub("_.*", "", colnames( x ))
    groups <- factor(groups, levels = c("IR","mock") )
    design <- model.matrix( ~ 0 + groups )
    colnames(design) <- c("IR","mock")
  design
##   IR mock
## 1  0    1
## 2  0    1
## 3  0    1
## 4  0    1
## 5  1    0
## 6  1    0
## 7  1    0
## 8  1    0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

A more straightforward method

Perhaps a more straightforward way is not setting the levels and column names directly, but use whatever was generated.

    groups = gsub("_.*", "", colnames( x ))
    groups <- factor(groups )
    design <- model.matrix( ~ 0 + groups )
    design
##   groupsIR groupsmock
## 1        0          1
## 2        0          1
## 3        0          1
## 4        0          1
## 5        1          0
## 6        1          0
## 7        1          0
## 8        1          0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

Then, the only thing we need to do is to remove the “groups” from the column names.

    colnames(design) <- gsub("^groups","",colnames(design))
    design
##   IR mock
## 1  0    1
## 2  0    1
## 3  0    1
## 4  0    1
## 5  1    0
## 6  1    0
## 7  1    0
## 8  1    0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

This works because both the levels and the colnames are auto-generated. We just have to remove the name of the factor.

The new iDEP v 0.82, this issue has been resolved. If you used iDEP, please check your results carefully if you used limma to analyze two groups (control vs. experiment). As the factor levels are assigned as reference level based on alphbetical order, there is a 50% chance that the up- and down-regulated genes are reversed. If you have multiple groups, it is does not affect your results. If you used DESeq2, it should be fine.